Scheduling workflows

Once you have designed a workflow, you might want it to be executed periodically. This allows to have the result of a workflow automatically updated if the input data sources of a workflow have been modified between the last execution and the next one. For this purpose, CARTO offers a UI that helps creating a scheduled query in different Data Warehouses.

Keep reading this page to learn more about scheduling the execution of a workflow:

Creating a schedule

Once a workflow has been executed successfully, you can click on the button that you will find next to 'Run'

Check the provider specific information to learn more about the required permission and pre-requisites that are needed to enable this button.

Clicking on that button will open a new modal window where you can select the frequency and other settings for the schedule:

Hours: The workflow will be executed every X hours, at o'clock times.
Days: The workflow will be executed every day at a specific time.
Weeks: The workflow will be executed weekly, on a specific day, at a specific time.
Months: The workflow will be executed monthly, on a specific day, at a specific time.
Custom: Use a custom expression to define the schedule. See below for more information.

Custom expressions

Please note that CARTO Workflows supports BigQuery syntax for BigQuery and CARTO Data Warehouse connections. For Snowflake and PostgreSQL, Cron syntax is supported. For Databricks connections, Quartz cron syntax is supported.

Updating, modifying and deleting a schedule

Once a workflow has been scheduled, there are different actions that can be performed:

If your workflow has been modified, after a successful execution you will find this small notification.

Click on the schedule icon to be able to sync the latest changes to the scheduled version of your workflow.

By clicking on the schedule icon at any time, you will be able to modify the frequency at which the workflow will be executed. Select a different setting and click on 'Save changes'.
Same as above, by clicking on the schedule icon at any time, you will find a 'Delete schedule' button that will permanently disable the schedule.

Permissions and pre-requisites

BigQuery

In BigQuery, scheduled tasks are stored in the project. It allows the creation of tasks in different regions.

Permissions

The connection requires the following roles:

BigQuery User
BigQuery Data Editor

The project used in the connection will need to have this API enabled:

BigQuery Data Transfer API: https://console.cloud.google.com/apis/library/bigquerydatatransfer.googleapis.com

In the Connection settings, if the selected billing project is different than the Service Account project, the user must run this operation to enable Cross-project Service Account Authorization:

https://cloud.google.com/bigquery/docs/enable-transfer-service#cross-project_service_account_authorization

Data Transfer Version Info

For connections created using 'Sign in with Google', a Data Transfer Version Info code needs to be provided.

In order to create a Data Transfer Version Info code, you need to visit this page.

Once there, select your Google account and Allow access:

After allowing, you will be provided a code. Copy it and paste into your Data Transfer Version Info setting in the connection's Advanced Options.

Monitoring

The scheduled queries can be monitored from the BigQuery console, in the "Scheduled queries" section. Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>". The id of the workflow is a UUID that can be found in the URL.

Snowflake

In Snowflake, scheduled tasks are created in a schema. In this case, it uses the workflows_temp schema. If the schema is removed, the task will be removed too.

Snowflake makes use of Tasks to allow scheduled execution of SQL code. Tasks can be created via SQL statements using the CREATE TASK syntax.

Grant permissions

It requires access to the workflows_temp schema, and also a global grant "EXECUTE TASK" to be able to start a task.

GRANT ALL PRIVILEGES ON SCHEMA <schema> TO ROLE <role>;
GRANT EXECUTE TASK ON ACCOUNT TO ROLE <role>;

Revoke permissions

REVOKE ALL PRIVILEGES ON SCHEMA <schema> FROM ROLE <role>;
REVOKE EXECUTE TASK ON ACCOUNT FROM ROLE <role>;

Monitoring

The created tasks can be monitored from "Data > Databases", "MYDATABASE > MYSCHEMA > Tasks. Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>". The id of the workflow is a UUID that can be found in the URL.

Databricks

When scheduling a CARTO Workflow on a Databricks connection, a native Workflow will be created in Databricks, with a specific run schedule. In order to do so, your user will need the "Can Create" and "Can Manage" permission for Jobs.

"Can Create" Permissions

This permission allows a user to create new jobs. Specifically, users with this permission can:

Define new jobs/workflows with tasks and configurations.
Attach tasks to clusters they have access to.
Set parameters like schedules, libraries, and task dependencies during job creation.

"Can Manage" Permissions

This permission provides full control over a job and is the highest level of access. Users with "Can Manage" permissions can:

Create, edit, delete, and clone jobs.
Modify schedules, task configurations, and parameters.
Manage job runs (e.g., start, stop, or restart runs).
Assign or modify permissions for the job, delegating "Can Edit" or "Can View" permissions to others.

PostgreSQL

For PostgreSQL, the pg_cron extension provides the means to schedule the execution of a SQL query. Here are some resources to learn how to leverage pg_cron on different managed services based on PostgreSQL:

Amazon RDS. Usage guide available here.
Amazon Aurora PostgreSQL supports pg_cron since version 12.
Google Cloud SQL. Extension setup instructions here.
Azure Databases for PostgreSQL supports pg_cron since version 11 and provides some usage examples.

Grant permissions

It requires the extension pg_cron to be installed and available. Also, access to the cron schema.

CREATE EXTENSION pg_cron;
GRANT USAGE ON SCHEMA cron TO postgres;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA cron TO postgres;

Revoke permissions

REVOKE USAGE ON SCHEMA cron FROM postgres;
REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cron FROM postgres;
DROP EXTENSION pg_cron;

Monitoring

The created jobs can be monitored by running SQL queries in the database:

-- Existing jobs
select * from cron.job;

-- Executed jobs
select * from cron.job_run_details;

Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>". The id of the workflow is a UUID that can be found in the URL.

Schedule a Workflow in Redshift

Redshift doesn't offer an API to handle scheduled queries, which prevents CARTO from being able to schedule the execution of a workflow directly from the UI. However, users can schedule workflows on Redshift by following these simple steps:

Go to the "SQL" tab of the results panel of your workflow.
Click on the "Copy" button at the top right of the SQL tab to copy the SQL code corresponding to your entire workflow.
Redshift allows creating a schedule to run SQL statements using the query editor. There are more details about how to use this feature here.

PreviousData Sources NextSharing and collaboration

Last updated 4 months ago

Was this helpful?