# Creating a RapidBridge Sync Job

> Source: https://truto.one/docs/guides/rapid-bridge/creating-rapid-bridge/

A RapidBridge Sync Job first needs a webhook endpoint to send data to.

## Creating a webhook

Use the following request to create a webhook endpoint,

```bash
curl --location 'https://api.truto.one/webhook' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "target_url": "https://webhook.site/6a7cc86e-9286-4be8-ba79-bcd8dfdbeee1",
    "is_active": true,
    "event_types": 
      [
        "sync_job_run:created", 
        "sync_job_run:updated", 
        "sync_job_run:started",
        "sync_job_run:completed",
        "sync_job_run:failed",
        "sync_job_run:deleted",
        "sync_job_run:record",
        "sync_job_run:record_error",
        "sync_job_run:rate_limited",
        "integrated_account:created",
        "integrated_account:active",
        "integrated_account:post_connect_form_submitted"
    ]
}'
```

You can also follow our more detailed [guide into creating webhooks](/docs/guides/webhooks/creating-a-webhook-through-ui).

## Creating a Sync Job

In this guide, we'll be syncing users, contacts, tickets and comments for each ticket from Zendesk using the [Unified API for Ticketing](/docs/api-reference/unified-ticketing-api).

Use the following request to create a Sync Job,

```bash
curl --location 'https://api.truto.one/sync-job' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "integration_name": "zendesk",
    "resources": [
        {
            "resource": "ticketing/users",
            "method": "list"
        },
        {
            "resource": "ticketing/contacts",
            "method": "list"
        },
        {
            "resource": "ticketing/tickets",
            "method": "list"
        },
        {
            "resource": "ticketing/comments",
            "method": "list",
            "depends_on": "ticketing/tickets",
            "query": {
                "ticket_id": "{{resources.ticketing.tickets.id}}"
            }
        }
    ]
}'
```

In the request above,

- `integration_name` is the identifier of the integration that the sync job is for. In this case Zendesk, so it's value is set to `zendesk`.
- `resources` is the list of resources to fetch from Zendesk. Each item in the list has the following schema,
  - `resource` (required) is the name of the Unified API resource or the Proxy API resource. For Unified APIs, use the format `unified_api_name/resource_name` and for the Proxy APIs, just use `resource_name`.
  - `method` (required) can be `list` or `get` for Unified APIs. For Proxy APIs, it can be `list`, `get` or any other read-like custom method.
  - `depends_on` (optional) creates a dependency between this resource and some other resource in the `resources` list. In the example above, `ticketing/comments` resource needs a `ticket_id` query parameter to fetch the comments for, so we first fetch the list of `tickets` and then for each ticket, we fetch the `comments`.
  - `query` (optional) are the query parameters to be passed to each request. Placeholders can be used to populate the query parameters dynamically, like in the example above, `ticketing/comments` uses the <span v-pre>`{{resources.ticketing.tickets.id}}`</span> placeholder to refer to the `id` property of a Unified Ticket Resource and set it as the `ticket_id`. Refer the [placeholder reference](#placeholder-reference) section for more details on the placeholders that can be used.

## Running a Sync Job

Now that we have created a Webhook and a Sync Job, we can execute the Sync Job, create a Sync Job Run.

Make sure you have a Zendesk Integrated Account already created. Checkout our guide to connect an account.

To create a Sync Job Run, execute the following request,

```bash
curl --location 'https://api.truto.one/sync-job-run' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "sync_job_id": "7279a917-b447-4629-9e46-a1eeb791ad6b",
    "integrated_account_id": "7ae7b0ab-c6a7-4f29-aec1-1f123517af5d",
    "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c"
}'
```

This should start executing the Sync Job and the Webhook endpoint should start receiving the events.

In the request above,

- `sync_job_id` is the `id` of the Sync Job we created in the previous step
- `integrated_account_id` is the `id` of the Integrated Account connected to a Zendesk account
- `webhook_id` is the `id` of the Webhook created in the first step.

## Error handling

Truto by default ignores any errors that occur during a Sync Job Run and continues with the next resource, sending you `sync_job_run:record_error` webhook events for each error encountered. This can be changed by setting the `error_handling` attribute in the Sync Job Run request to `fail_fast`. This will cause the Sync Job Run to fail as soon as an error occurs. The default value of `error_handling` is `ignore`.

```bash
curl --location 'https://api.truto.one/sync-job-run' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "sync_job_id": "7279a917-b447-4629-9e46-a1eeb791ad6b",
    "integrated_account_id": "7ae7b0ab-c6a7-4f29-aec1-1f123517af5d",
    "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c",
    "error_handling": "fail_fast"
}'
```

## Incremental syncing of data

By default, the Sync Job above will fetch all the objects for a resource in every Sync Job Run, i.e. all the tickets will be synced on every Sync Job Run. In most cases, an incremental way of syncing would be preferred, where only tickets that have changed from the time the previous Sync Job ran. 

To do this, `updated_at` query parameter of the `ticketing/tickets` Unified API resource can be used, bound to `previous_run_date`. The binding can be created like so,

```json
{
    "resource": "ticketing/tickets",
    "method": "list",
    "query": {
      "updated_at": {
        "gt": "{{previous_run_date}}"
      }
    }
}
```

`previous_run_date` is a special attribute tracked by Truto has the last date on which a Sync Job ran and completed successfully for the Sync Job and a particular Integrated Account. It's set to `'1970-01-01T00:00:00.000Z'` on the very first Sync Job Run.

The previous Sync Job can be updated with the new resource binding,

```bash
curl --location 'https://api.truto.one/sync-job' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "integration_name": "zendesk",
    "resources": [
        {
            "resource": "ticketing/users",
            "method": "list"
        },
        {
            "resource": "ticketing/contacts",
            "method": "list"
        },
        {
            "resource": "ticketing/tickets",
            "method": "list",
            "query": {
              "updated_at": {
                "gt": "{{previous_run_date}}"
              }
            }
        },
        {
            "resource": "ticketing/comments",
            "method": "list",
            "depends_on": "ticketing/tickets",
            "query": {
                "ticket_id": "{{resources.ticketing.tickets.id}}"
            }
        }
    ]
}'
```

Now, everytime the Sync Job would execute, it will fetch only the tickets which have been changed from the last time a Sync Job Run ran.

### Doing a full sync on demand

Sometimes you want to fully sync the data and just ignore the `previous_run_date`. It's possible to do it using `ignore_previous_run` attribute set to `true` in the Sync Job Run request.

```bash
curl --location 'https://api.truto.one/sync-job-run' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "sync_job_id": "7279a917-b447-4629-9e46-a1eeb791ad6b",
    "integrated_account_id": "7ae7b0ab-c6a7-4f29-aec1-1f123517af5d",
    "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c",
    "ignore_previous_run": true
}'
```

## Running a Sync Job on schedule

:::callout{type="tip"}
The cron expression is in UTC timezone.
:::

To run a Sync Job on a recurring schedule, a Sync Job Cron Trigger can be created.

Use the following request to create one,

```bash
curl --location 'https://api.truto.one/sync-job-cron-trigger' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "sync_job_id": "d7fd45d6-136a-4244-aeb9-b6439bfa8b71",
    "integrated_account_id": "6680c7ff-9f0e-45be-9915-a7334dc37f23",
    "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c",
    "cron_expression": "0 */6 * * *"
}'
```

The request schema is similar to Sync Job Run with an additional attribute - `cron_expression`. The Cron expression above will run the Sync Job every 6 hours.

## Passing arguments to a Sync Job

Arguments can be passed to a Sync Job Run to fetch data dynamically. Imagine if in the example above, we needed to fetch tickets incrementally based on the previous Sync Job Run date, but also based on a date of our choosing for the initial sync.

To achieve this, first the schema of the arguments to be passed in the Sync Job Run needs to be added to the Sync Job. It's specified using `args_schema` attribute in the request body.

```json
{
  "args_schema": {
    "ticket_sync_start_date": {
      "type": "string",
      "format": "date-time"
    }
  }
}
```

Next, we use this argument in the `ticketing/tickets` resource like so, but instead of the normal variable binding, a JSONata expression is used for `query`. The [JSONata](https://docs.jsonata.org/overview) expression is like so,

```jsonata
{
    'updated_at': {
        'gt': args.ticket_sync_start_date ? args.ticket_sync_start_date : previous_run_date
    }
}
```

The expression above uses `ticket_sync_start_date` from the arguments if it's passed or falls back to `previous_run_date`.

:::callout{type="info"}
Update: We recently introduced conditional placeholders than can achieve the same result without the need for JSONata expressions. You'd still need JSONata expressions for more complex scenarios.

```json
{
  "updated_at": "{{args.ticket_sync_start_date|previous_run_date}}"
}
```

:::

The updated resource definition looks like so,

```json
{
    "resource": "ticketing/tickets",
    "method": "list",
    "query": "{ 'updated_at': { 'gt': args.ticket_sync_start_date ? args.ticket_sync_start_date : previous_run_date } }"
}
```

The final request to create a Sync Job with arguments,

```bash
curl --location 'https://api.truto.one/sync-job' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "integration_name": "zendesk",
    "args_schema": {
        "ticket_sync_start_date": {
            "type": "string",
            "format": "date-time"
        }
    },
    "resources": [
        {
            "resource": "ticketing/users",
            "method": "list"
        },
        {
            "resource": "ticketing/contacts",
            "method": "list"
        },
        {
            "resource": "ticketing/tickets",
            "method": "list",
            "query": "{ '\''updated_at'\'': { '\''gt'\'': args.ticket_sync_start_date ? args.ticket_sync_start_date : previous_run_date } }"
        },
        {
            "resource": "ticketing/comments",
            "method": "list",
            "depends_on": "ticketing/tickets",
            "query": {
                "ticket_id": "{{resources.ticketing.tickets.id}}"
            }
        }
    ]
}'
```

To run a Sync Job with arguments, `args` attribute needs to be added to the request body,

```json
{
  "args": {
    "ticket_sync_start_date": "2023-07-23T18:10:56.072Z"
  }
}
```

The request would be,

```bash
curl --location 'https://api.truto.one/sync-job-run' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "args": {
        "ticket_sync_start_date": "2023-07-23T18:10:56.072Z"
    },
    "sync_job_id": "7279a917-b447-4629-9e46-a1eeb791ad6b",
    "integrated_account_id": "7ae7b0ab-c6a7-4f29-aec1-1f123517af5d",
    "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c"
}'
```

## Looping a request over an array

Taking the arguments example forward, imagine a hypothetical scenario where we need to fetch a specific set of tickets based on a list of ticket ids. This can be achieved using the `loop_on` attribute in the Sync Job Run request.

To create such a Sync Job, we first need to define the args_schema like so,

```json
{
  "args_schema": {
    "ticket_ids": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  }
}
```

Then define the `ticketing/tickets` resource like so,

```json
{
    "resource": "ticketing/tickets",
    "method": "get",
    "loop_on": "args.ticket_ids",
    "id": "{{args.ticket_ids}}"
}
```

Here, the `loop_on` attribute specifies that for each element in the `ticket_ids` array, a request should be made. The `id` attribute specifies the placeholder to be used for the `id` of the ticket.

Complete request to create a Sync Job with looping,

```bash
curl --location 'https://api.truto.one/sync-job' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "integration_name": "zendesk",
    "args_schema": {
        "ticket_ids": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
    },
    "resources": [
        {
            "resource": "ticketing/tickets",
            "method": "get",
            "loop_on": "args.ticket_ids",
            "id": "{{args.ticket_ids}}"
        }
    ]
}'
```

## Recursively fetching data for the same resource

There are cases where you might need to fetch data for the same resource, where the resources might have a parent-child relationship. For example, in drive-items for the Unified File Storage API, there could be drive items of the type `folder` which might have child drive items within them. To fetch such resources, you can use the `recurse` attribute in the Sync Job resource,

```json
{
  "resource": "file-storage/drive-items",
  "method": "list",
  "recurse": {
    "if": "{{resources.file-storage.drive-items.has_children:bool}}",
    "config": {
      "query": {
        "parent": {
          "id": "{{resources.file-storage.drive-items.id}}"
        }
      }
    }
  }
}
```

In the above example, we set the recurse condition in the `if` attribute. Most of the resources which follow this parent-child relation have an `has_children` attribute which is a boolean (you'll need to check the documentation to make sure). If the `has_children` attribute is true, then the `config` attribute will be used to fetch the child resources. The `query` attribute in the `config` specifies the query parameters to be passed to the request. The `parent.id` is set to the `id` of the parent resource. You can use placeholders to refer to the values of other fields.

## Transforming the data fetched by the resources

Sometimes you might need to transform the data fetched by the resources before sending it to the webhook. This can be achieved using the `transform` notes in the Sync Job.

Transform nodes accept a JSONata expression which should return the final result to be sent to the webhook. These nodes are always dependent on a resource node (using depends_on), so you can't have an independent transform node. Also, the resource node which the transform nodes depend on should have a `name` attribute and the transform node itself should have a `name` attribute. You can also have transform nodes dependent on another transform node, and resource nodes dependent on transform nodes.

Some use cases where the transform nodes are especially helpful -- 

1. Filtering out the data fetched by the resource -- When the underlying API doesn't support the filters you need, like updated_at, tags, etc., you can use the transform nodes to filter out the data.
2. Modifying the output of the resource without modifying the Unified mappings -- You might need to modify the output of the resource before sending it to the webhook, like adding a new field, modifying the existing fields, etc.

This is the context object available for the JSONata expressions in the transform nodes,

- `args` - The arguments passed to the Sync Job Run.
- `resources.<resource_name>` - Contains the data fetched by the resource. The `resource_name` is the name of the resource.
- `previous_run_date` - Refers to the last date on which a Sync Job ran and completed successfully for the Sync Job and an Integrated Account. It's set to '1970-01-01T00:00:00.000Z' on the very first Sync Job Run.
- `resource` - Refers to the parent resource's attributes defined in the Sync Job with placeholders resolved.
- `<all_context_variables>` - Refers to the context variables set in the Integrated Account.

Taking the example from Zendesk, this Sync Job ignores the contacts NOT updated from the last time the sync job ran,

```json
{
    "integration_name": "zendesk",
    "args_schema": {},
    "resources": [
        {
            // needs a name
            "name": "all-contacts",
            "resource": "ticketing/contacts",
            "method": "list",
            "persist": false
        },
        {
            // needs a name
            "name": "filtered-contacts",
            "type": "transform",
            "config": {
              "expression": "resources.ticketing.contacts[updated_at >= %.%.%.previous_run_date]"
            },
            // refer to the name
            "depends_on": "all-contacts",
            "persist": true
        }
    ]
}
```

To make sure that you only get the filtered contacts on the webhook in the example above, `persist` has been set to `true` for the transform node and `false` for all the contacts.

Transform nodes need to have the `persist` attribute set to `true` if you need their output to be sent to the webhook. By default, it's set to `false`.

:::callout{type="tip"}
If the resource name contains special characters apart from underscore `_`, like `-`, then in JSONata you need to refer to them using backticks. For example, if the resource name is `knowledge-base/page-content`, then you need to refer to it as 
```
resources.`knowledge-base`.`page-content` 
```
in JSONata.
:::

## Spooling data into a single webhook event

Spool nodes allow you to paginate and fetch the complete resource and then send it in a single webhook event. One of the places where this is useful is in the Knowledge Base APIs where the page content might be split into multiple blocks and is provided through a paginated API. You can use the `spool` nodes to fetch all the blocks and then send them in a single webhook event. You can also have `transform` nodes dependent on `spool` nodes, which can transform the data fetched by the `spool` nodes.

As with `transform` nodes, `spool` nodes can't be independent and should be dependent on a resource node. The resource node which the `spool` nodes depend on should have a `name` attribute and the `spool` node itself should have a `name` attribute. You can't have a `spool` node dependent on another `spool` node.

Taking the example of a Notion integration, where we need the content of a Notion page as a markdown,

```json
{
    "integration_name": "notion",
    "args_schema": {
      "page_id": {
        "type": "string",
        "required": true
      }
    },
    "resources": [
      {
          "name": "page-content",
          "resource": "knowledge-base/page-content",
          "method": "list",
          "query": {
              "page": {
                  "id": "{{args.page_id}}"
              },
              "truto_ignore_remote_data": true
          },
          "recurse": {
              "if": "{{resources.knowledge-base.page-content.has_children:bool}}",
              "config": {
                  "query": {
                      "page_content_id": "{{resources.knowledge-base.page-content.id}}"
                  }
              }
          },
          "persist": false
      },
      {
          "name": "remove-remote-data",
          "type": "transform",
          "config": {
              "expression": "[resources.`knowledge-base`.`page-content`.$sift(function($v, $k) {$k != 'remote_data'})]"
          },
          "depends_on": "page-content"
      },
      {
          "name": "all-page-content",
          "type": "spool",
          "depends_on": "remove-remote-data"
      },
      {
          "name": "combine-page-content",
          "type": "transform",
          "config": {
              "expression": "$blob($reduce($sortNodes(resources.`knowledge-base`.`page-content`, 'id', 'parent.id'), function($acc, $v) { $acc & $v.body.content }, ''), { \"type\": \"text/markdown\" })"
          },
          "depends_on": "all-page-content",
          "persist": true
      }
    ]
}
```

In the example above,

1. We first fetch the page-content blocks for the page provided in the argument.
2. It then goes through the `remove-remote-data` transform node to remove the `remote_data` attribute from the fetched blocks.
3. The `all-page-content` spool node fetches all the blocks of the page content.
4. Then the `recurse` logic kicks in and fetches the child blocks for each block if any.
5. Finally, the `combine-page-content` transform node combines all the blocks fetched by the `all-page-content` spool node and sends it as a single webhook event. The data is converted into a Blob with the type `text/markdown`.

In `spool` nodes, the data is stored temporarily on Truto's servers. Once the sync job run completes/fails, the data is deleted.

#### Limitations

The amount of data you can store in spool nodes is limited to 128KB, so make sure that the data you are fetching in each page falls within that limit. That is the reason why in the example above, we have a `remove-remote-data` transform node to remove the `remote_data` attribute from the fetched blocks, as it might contain a lot of data.

## Running Sync Job after an Integrated Account is connected

To run a Sync Job after an Integrated Account is connected, listen to the `integrated_account:active` event. This event is sent when an Integrated Account is created and is ready to be used. If you have a RapidForm configured for the integration, then you'll need to listen to the `integrated_account:post_connect_form_submitted` event.

An example webhook event for both is shown below,

```json
{
  "id": "bed2145c-46d7-41fa-ad69-3e70cb5bb74d",
  "event": "integrated_account:active",
  "payload": {
    "id": "937fed13-712d-4647-dfe3-7613948b0348",
    "tenant_id": "acme-1",
    "environment_integration_id": "f7d82f6d-20f0-4bed-231c-d9c31e023710",
    // more data
  },
  "environment_id": "ac15abdc-b38e-47d0-97a2-6f494017c177",
  "created_at": "2023-08-31T18:08:27.879Z",
  "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c"
}
```

The Sync Job Run can be scheduled or executed immediately after receiving the webhook event by using the `payload.id` attribute as the `integrated_account_id` in the Sync Job Run request.

```bash
curl --location 'https://api.truto.one/sync-job-run' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api_token>' \
--data '{
    "sync_job_id": "7279a917-b447-4629-9e46-a1eeb791ad6b",
    "integrated_account_id": "937fed13-712d-4647-dfe3-7613948b0348", # from the webhook event
    "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c"
}'
```

## Placeholder Reference

Placeholders can be used in the Sync Job `resources` to refer to the values of other fields in the Sync Job. The placeholders are enclosed in double curly braces <code v-pre>{{}}</code>.

The placeholders available are,

- <code v-pre>{{args.&lt;arg_name&gt;}}</code> - Refers to the value of the argument passed to the Sync Job Run.
- <code v-pre>{{resources.&lt;resource_name&gt;.&lt;field_name&gt;}}</code> - Refers to the value of a field in a resource fetched in the Sync Job. The `field_name` can be any field in the resource. This is only available when the resource is dependent on another resource using the `depends_on` attribute.
- <code v-pre>{{previous_run_date}}</code> - Refers to the last date on which a Sync Job ran and completed successfully for the Sync Job and an Integrated Account. It's set to `'1970-01-01T00:00:00.000Z'` on the very first Sync Job Run.
- <code v-pre>{{truto_parent_resource.&lt;attribute&gt;}}</code> - If using `depends_on`, then the parent resource's attributes can be accessed using this placeholder. For example, you could use the queries used in the parent resource by using <code v-pre>{{truto_parent_resource.query.&lt;query_name&gt;}}</code>. This can be useful for [recurse](#recursively-fetching-data-for-the-same-resource) use cases.
  - <code v-pre>{{truto_parent_resource.query.&lt;parameter_name&gt;}}</code>
  - <code v-pre>{{truto_parent_resource.method}}</code>
  - <code v-pre>{{truto_parent_resource.resource}}</code>
  - <code v-pre>{{truto_parent_resource.id}}</code>
  - <code v-pre>{{truto_parent_resource.body.&lt;attribute_name&gt;}}</code>
- <code v-pre>{{&lt;integrated_account_context_variable&gt;}}</code> - Refers to the value of a [context variable](/docs/guides/integrated-accounts/adding-editing-context-variables) set in the Integrated Account. For example, if you have a variable with name `foo` in the integrated account, you can refer to it using <code v-pre>{{foo}}</code>.

## Sync Job API Reference

Refer [Sync Job API Reference](/docs/api-reference/admin/sync-jobs/list) for more details about the requests.
