Executing workflows via API

A workflow can be executed via an API call, which allows merging workflows with bigger processes, running analytics pipelines from external applications and further integrations that enable a wide range of use cases.

For Databricks connections, executing Workflows via API is currently a work in progress. The functionality will be available soon, please get in touch with [email protected] if you have feedback about this missing functionality.

Introduction

CARTO Workflows translates the analytical pipelines designed in the canvas to cloud native SQL queries that are executed directly on the data warehouse. This generated SQL code has different forms:

The code displayed in the SQL tab of the result panel contains all the control structures and creation of intermediate tables for each node in the workflow.
A stored procedure that is created in workflows_temp, executed when a workflow execution is triggered via API call. This stored procedure might have input parameters that are used later in the code, that are created from the variables marked as 'Parameters'.

The ability of workflows to generate SQL code that can be executed via an API call represents a significant advancement in integrating complex analytical processes with external systems. This capability not only streamlines the process of executing workflows in response to external triggers but also opens up a myriad of possibilities for automation and scalability within data analysis projects.

Enabling API access for a workflow

To enable API access for an existing workflow, click on the three dots in the upper-right corner and find 'API'. Click on 'Enable API access' and you will see a dialog screen that looks like this:

The endpoint displayed in this dialog depends on the configuration of the API Output component.

When using Sync, the /query endpoint will be used. The response will contain the data
When using Async (default), the /job endpoint will be used. The response will contain job metadata that can be used to poll the status and finding the table that is storing the result. Learn more.

This is an example of an API call that would trigger the execution of a workflow:

Synchronous GET example:

https://gcp-us-east1.api.carto.com/v3/sql/your_connection/query?q=CALL `workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279`(@number_of_clusters,@buffer_radius,@store_type)&queryParameters={"number_of_clusters":2,"buffer_radius":500,"store_type":"Supermarket"}&access_token=eyJhbGc(...)GUH-Lw

Asynchronous POST example:

curl --location 'https://gcp-us-east1.api.carto.com/v3/sql/your_connection/job' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer eyJhbGc(...)GUH-Lw' \
    --data '{
        "query": "CALL `workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279`(@number_of_clusters,@buffer_radius,@store_type)",
        "queryParameters": {"number_of_clusters":2,"buffer_radius":500,"store_type":"Supermarket"}
    }'

There are a few things that are worth taking into consideration:

It is actually a GET or POST request to CARTO's SQL API, which can be used to get the result of a query directly in the response, o to create asynchronous jobs that are executed in the data warehouse.
The query is a CALL statement for a stored procedure. Workflows will generate a stored procedure in your workflows_temp schema/dataset. The generated SQL is the same that you will obtain by using the 'Export' functionality.
The queryParameters object in the payload contains values for some variables that are passed as inputs to the stored procedure. These variables are the ones marked as 'Parameter' in the 'Variables' menu.
The provided API call is authorized with an API Access Token that has the specific grants necessary to run that query through same connection that was used to create the workflow.

Running a workflow via Async API call

The provided example uses curl to illustrate the POST request that triggers the execution of the workflow. This example call should be easily adapted to other methods in different languages, like the requests library in Python.

The response to that call will look like the following:

{
  "externalId": "job_h8UWzpdzX0s2XAAXQd3rdWCdPUT9",
  "accountId": "ac_jfakef5m",
  "userId": "auth0|61164b7xxx77c006a259f53",
  "connectionId": "5a32a0ea-555a-48dd-aeb1-8768aae8ef1c",
  "metadata": {},
  "createdAt": "2023-10-04T16:10:35.732Z",
  "query": "CALL `workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279`(@number_of_clusters,@buffer_radius,@store_type)",
  "jobMetadata": {
    "location": "US",
    "workflowOutputTableName": "workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279_out_33afd785675f081d"
  },
  "token": "eyJhbGc(...)GUH-Lw"
}

Checking status of a workflow execution

You can use that externalId to make a request to check the status of the execution, like:

curl --location 'https://gcp-us-east1.api.carto.com/v3/sql/your_connection/job/job_h8UWzpdzX0s2XAAXQd3rdWCdPUT9' \
--header 'Authorization: Bearer eyJhbGc(...)GUH-Lw'

The above is just an example that is using a specific API URL (gcp-us-east1.api.carto.com) and connection name. Please make sure that you use the same API endpoint that you obtained from the UI and you used to create the job in the first place.

The response to that API call is a JSON that contains metadata about the query execution, including a status object with information about the execution status.

The following is an example of a failed execution due to incorrect parameter types:

    "status": {
      "errorResult": {
        "reason": "invalidQuery",
        "location": "query",
        "message": "Query error: Cannot coerce expression @number_of_clusters to type INT64 at [1:158]"
      },
      "state": "DONE"
    }

While this would be an example of a successful execution:

    "status": {
      "state": "DONE"
    }

Get more information about using the job endpoint in the CARTO API documentation.

Output of a Workflow executed via API

In order to define which node of your workflow will be used as output of the API call, you need to use the Output component. This component will ensure that the content of the node connected to it is stored in a temporary table, which location is returned in the "workflowOutputTableName" object in the response of the API call.

The location of the temporary table is returned when the job is created, but the content of the table won't be complete until the execution is complete.

API executions of the same workflow, but with different queryParameters will produce different output tables. On the other hand, API executions with same queryParameters will yield the same result, as the same output table will be reused.

This behavior can be controlled with the cache settings in your Workflows UI. Setting it off will always force re-execution of the workflow, even if consecutive API calls with same queryParameters are performed.

Updating your workflow

Whenever you want to propagate changes in your workflow to the corresponding stored procedure that is executed with an API call, you need to update it.

For that, just click on the chip in the top header that says 'API enabled', which will open the API endpoint modal. Wait for a couple of seconds while CARTO checks for changes in the workflow and you will see this:

Click on 'Update' to sync the workflow that is executed via API with the current state of the workflow in the UI.

Disabling API access

If you need to prevent API access for a workflow, just click on 'Disable API access' in the API endpoint modal and confirm:

PreviousUsing variables in workflows NextWorkflows as MCP Tools

Last updated 3 months ago

Was this helpful?