Temporary data in Workflows

To improve performance and to be able to inspect results in intermediate steps, Workflows makes use of temporary data objects that are stored, by default, under a so called carto_temp schema/dataset in your data warehouse.

Note that at the time of creating the connection to your data warehouse, you can also provide a custom location for storing the temporary data generated by Workflows. Check the Connections documentation for each provider for more details.

This system of intermediate tables functions as a cache, avoiding re-executing all the steps in a workflow when they have not been modified; thus, executing again only those that have been changed or their inputs have changed when you are re-running your workflows.

Specific permissions required by Workflows in your data warehouse

In order for CARTO to be able to execute workflows in your data warehouse and temporarily store the data generated in intermediate steps of your analysis, there is a specific set of permissions required on the connection selected when creating your workflow.

Since CARTO Workflows can be used with different providers (Google BigQuery, Snowflake, Amazon Redshift, and PostgreSQL), adapting the different terminology for each data warehouse, the recommended setup is:

The service accountarrow-up-right or user account used for creating the connection must have the following required roles:

  • BigQuery Data Editor

  • BigQuery User

circle-info

If the workflows_temp dataset does not exist, it will be automatically created in the region of the tenant (US, EU, AS, AU). In order to specify a custom region for the workflows_temp, manually create the dataset in the desired region and set the name (if different than workflows_temp) in the BigQuery connection > Advanced options.

In order to learn more details regarding data warehouse connection permissions please check here.

How to manage the temporary data generated by Workflows

When your workflow runs, it stores the output results in intermediate tables. Depending on the data warehouse platform, a different strategy is applied to clean up the intermediate tables generated by CARTO Workflows. The default configuration removes the intermediate tables after 30 days.

Below, you can find code and instructions for automating the process of cleaning up the intermediate tables in each data warehouse:

When your workflow runs in BigQuery or CARTO Data Warehouse, these tables are temporary and will be automatically removed after 30 days.

You can find more information about temporary tables in the BigQuery documentationarrow-up-right.

Cache options

Workflows uses intermediate tables as a cache system to avoid re-executing unchanged steps. You can control caching behavior independently for each execution method: Editor UI, API, and MCP Tools.

circle-info

Cache settings are configured separately for each execution method, allowing you to enable caching for API calls while disabling it for Editor runs, for example.

Editor UI

When running workflows from the Editor, you can choose between two execution modes using the dropdown next to the 'Run' button:

  • Run: Recompute the entire workflow from scratch.

  • Run (reuse): Skip unchanged nodes using cached results from previous executions.

API

When enabling API access for a workflow, you can configure cache behavior in the API Endpoint dialog under the "Execution settings" section:

  • Reuse output for repeated calls (toggle): When enabled, API calls with the same parameters will return the existing output table instead of recomputing the workflow.

MCP Tools

When enabling a workflow as an MCP Tool, you can configure cache behavior in the MCP Tool dialog under the "Execution settings" section:

  • Reuse output for repeated calls (toggle): When enabled, tool calls with the same parameters will return the existing output table instead of recomputing the workflow.

Provider-specific behavior

Cache behavior varies depending on your data warehouse:

  • CARTO Data Warehouse, BigQuery, Snowflake, Databricks and Oracle: Full caching support. All execution methods (Editor UI, API, MCP Tools) support cache configuration.

  • Redshift and PostgreSQL:

    • Editor UI: Cache options are not available. The Run button does not show the split dropdown, and all nodes are always computed completely.

    • API: The "Reuse output for repeated calls" setting is available and works as expected for API calls with identical parameters.

    • MCP Tools: Not available for these providers.

Last updated

Was this helpful?