Scheduling workflows
Last updated
Last updated
Once you have designed a workflow, you might want it to be executed periodically. This allows to have the result of a workflow automatically updated if the input data sources of a workflow have been modified between the last execution and the next one. For this purpose, CARTO offers a UI that helps creating a scheduled query in different Data Warehouses.
Keep reading this page to learn more about scheduling the execution of a workflow:
Once a workflow has been executed successfully, you can click on the button that you will find next to 'Run'
Check the provider specific information to learn more about the required permission and pre-requisites that are needed to enable this button.
Clicking on that button will open a new modal window where you can select the frequency and other settings for the schedule:
Hours: The workflow will be executed every X hours, at o'clock times.
Days: The workflow will be executed every day at a specific time.
Weeks: The workflow will be executed weekly, on a specific day, at a specific time.
Months: The workflow will be executed monthly, on a specific day, at a specific time.
Custom: Use a custom expression to define the schedule. See below for more information.
Please note that CARTO Workflows supports BigQuery syntax for BigQuery and CARTO Data Warehouse connections. For Snowflake and PostgreSQL, Cron syntax is supported.
Once a workflow has been scheduled, there are different actions that can be performed:
If your workflow has been modified, after a successful execution you will find this small notification.
Click on the schedule icon to be able to sync the latest changes to the scheduled version of your workflow.
By clicking on the schedule icon at any time, you will be able to modify the frequency at which the workflow will be executed. Select a different setting and click on 'Save changes'.
Same as above, by clicking on the schedule icon at any time, you will find a 'Delete schedule' button that will permanently disable the schedule.
In BigQuery, scheduled tasks are stored in the project. It allows the creation of tasks in different regions.
The connection requires the following roles:
BigQuery User
BigQuery Data Editor
The project used in the connection will need to have this API enabled:
BigQuery Data Transfer API: https://console.cloud.google.com/apis/library/bigquerydatatransfer.googleapis.com
In the Connection settings, if the selected billing project is different than the Service Account project, the user must run this operation to enable Cross-project Service Account Authorization:
For connections created using 'Sign in with Google', a Data Transfer Version Info code needs to be provided.
In order to create a Data Transfer Version Info code, you need to visit this page.
Once there, select your Google account and Allow access:
After allowing, you will be provided a code. Copy it and paste into your Data Transfer Version Info setting in the connection's Advanced Options.
The scheduled queries can be monitored from the BigQuery console, in the "Scheduled queries" section. Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>"
. The id of the workflow is a UUID that can be found in the URL.
In Snowflake, scheduled tasks are created in a schema. In this case, it uses the workflows_temp schema. If the schema is removed, the task will be removed too.
Snowflake makes use of Tasks to allow scheduled execution of SQL code. Tasks can be created via SQL statements using the CREATE TASK syntax.
It requires access to the workflows_temp schema, and also a global grant "EXECUTE TASK" to be able to start a task.
The created tasks can be monitored from "Data > Databases", "MYDATABASE > MYSCHEMA > Tasks. Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>"
. The id of the workflow is a UUID that can be found in the URL.
For PostgreSQL, the pg_cron
extension provides the means to schedule the execution of a SQL query. Here are some resources to learn how to leverage pg_cron
on different managed services based on PostgreSQL:
Amazon RDS. Usage guide available here.
Amazon Aurora PostgreSQL supports pg_cron
since version 12.
Google Cloud SQL. Extension setup instructions here.
Azure Databases for PostgreSQL supports pg_cron
since version 11 and provides some usage examples.
It requires the extension pg_cron
to be installed and available. Also, access to the cron
schema.
The created jobs can be monitored by running SQL queries in the database:
Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>"
. The id of the workflow is a UUID that can be found in the URL.
Redshift doesn't offer an API to handle scheduled queries, which prevents CARTO from being able to schedule the execution of a workflow directly from the UI. However, users can schedule workflows on Redshift by following these simple steps:
Go to the "SQL" tab of the results panel of your workflow.
Click on the "Copy" button at the top right of the SQL tab to copy the SQL code corresponding to your entire workflow.
Redshift allows creating a schedule to run SQL statements using the query editor. There are more details about how to use this feature here.