# Temporary data in Workflows

To improve performance and to be able to inspect results in intermediate steps, Workflows makes use of temporary data objects that are stored, by default, under a so called **`carto_temp`** schema/dataset in your data warehouse.

Note that at the time of creating the connection to your data warehouse, you can also provide a custom location for storing the temporary data generated by Workflows. Check the [Connections](https://docs.carto.com/carto-user-manual/connections) documentation for each provider for more details.

This system of intermediate tables functions as a cache, avoiding re-executing all the steps in a workflow when they have not been modified; thus, executing again only those that have been changed or their inputs have changed when you are re-running your workflows.

### Specific permissions required by Workflows in your data warehouse

In order for CARTO to be able to execute workflows in your data warehouse and temporarily store the data generated in intermediate steps of your analysis, there is a specific set of permissions required on the connection selected when creating your workflow.

Since CARTO Workflows can be used with different providers (Google BigQuery, Snowflake, Amazon Redshift, Databricks, Oracle, and PostgreSQL), adapting the different terminology for each data warehouse, the recommended setup is:

{% tabs %}
{% tab title="Google BigQuery" %}
The[ service account](https://support.google.com/a/answer/7378726?hl=en) or user account used for creating the connection must have the following required roles:

* BigQuery Data Editor
* BigQuery User

{% hint style="info" %}
If the workflows\_temp dataset does not exist, it will be automatically created in the region of the tenant (US, EU, AS, AU). In order to specify a custom region for the workflows\_temp, manually create the dataset in the desired region and set the name (if different than workflows\_temp) in the BigQuery connection > Advanced options.
{% endhint %}
{% endtab %}

{% tab title="Snowflake" %}

```sql
-- If the workflows_temp schema already exists:
-- Grant permissions to manage the schema
GRANT ALL PRIVILEGES ON SCHEMA DATABASE_NAME.WORKFLOWS_TEMP TO ROLE ROLE_NAME;

-- If the workflows_temp schema does not exist yet:
-- Grant permissions to create the schema
GRANT USAGE ON DATABASE DATABASE_NAME TO ROLE ROLE_NAME;
GRANT CREATE SCHEMA ON DATABASE DATABASE_NAME TO ROLE ROLE_NAME;
```

{% endtab %}

{% tab title="Databricks" %}
Your Databricks user will need to have `USE CATALOG` and `CREATE SCHEMA` permission.

```sql
GRANT USE CATALOG ON CATALOG my_catalog TO user@example.com;
GRANT CREATE SCHEMA ON CATALOG my_catalog TO user@example.com;
```

{% endtab %}

{% tab title="Amazon Redshift" %}

```sql
-- If the workflows_temp schema already exists:
-- Grant permissions to manage the schema
GRANT ALL ON SCHEMA workflows_temp TO user_name;

-- If the workflows_temp schema does not exist yet:
-- Grant permissions to create the schema
GRANT CREATE ON DATABASE database_name TO user_name;
```

{% endtab %}

{% tab title="PostgreSQL" %}

```sql
-- If the workflows_temp schema already exists:
-- Grant permissions to manage the schema
GRANT ALL ON SCHEMA workflows_temp TO user_name;

-- If the workflows_temp schema does not exist yet:
-- Grant permissions to create the schema
GRANT CREATE ON DATABASE database_name TO user_name;
```

{% endtab %}
{% endtabs %}

In order to learn more details regarding data warehouse connection permissions please check [here](https://docs.carto.com/carto-user-manual/connections/required-permissions).

### How to manage the temporary data generated by Workflows

When your workflow runs, it stores the output results in intermediate tables. Depending on the data warehouse platform, a different strategy is applied to clean up the intermediate tables generated by CARTO Workflows. The default configuration **removes the intermediate tables after 30 days**.

Below, you can find code and instructions for automating the process of cleaning up the intermediate tables in each data warehouse:

{% tabs %}
{% tab title="Google BigQuery" %}
When your workflow runs in BigQuery or CARTO Data Warehouse, these tables are temporary and will be automatically removed after 30 days.

You can find more information about temporary tables in the [BigQuery documentation](https://cloud.google.com/bigquery/docs/multi-statement-queries#temporary_tables).
{% endtab %}

{% tab title="Snowflake" %}
When your workflow runs in Snowflake, a [Snowflake task](https://docs.snowflake.com/en/user-guide/tasks-intro) is automatically created in the `WORKFLOWS_TEMP` schema to remove the intermediate tables after 30 days. This task is executed periodically every day at 0h UTC.

These are the required permissions to create the task:

```sql
-- Grant permissions to manage the schema
GRANT ALL PRIVILEGES ON SCHEMA DATABASE_NAME.CARTO_TEMP TO ROLE ROLE_NAME;

-- Grant permissions to execute the task
GRANT EXECUTE TASK ON ACCOUNT TO ROLE ROLE_NAME;
```

Note that the task affects only the tables in that schema. If the name of the schema is changed in the connection, for example to `CARTO_TEMP_CUSTOM`, a new task will be created for the new schema.

{% hint style="info" %}
The task is created once per schema, and will be removed when the schema is removed. It can also be removed manually:

```sql
DROP TASK IF EXISTS DATABASE_NAME.CARTO_TEMP."carto_scheduler.workflow_clear_cache.workflows_temp"
```

{% endhint %}
{% endtab %}

{% tab title="Databricks" %}
In Databricks, a user with enough privileges will need to set up a workflow or notebook that runs periodically to clean up the content of the `carto_temp` schema created in the selected catalog.
{% endtab %}

{% tab title="Amazon Redshift" %}
When your workflow runs in Redshift, in order to remove the intermediate tables, you should create a [scheduled query in Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-schedule-query.html) to remove datasets older than 30 days.

Scheduled queries are only available for provisioned Redshift clusters. If you are using [Redshift Serverless](https://aws.amazon.com/redshift/redshift-serverless/) you'll have to execute manually the clean up, e.g. calling the procedure defined below, or find other external means of executing it periodically.

The following procedure should be created in your Redshift cluster to delete tables older than 30 days (you will have to replace all the occurrences of the `$database$` placeholder with the name of your database):

```sql
CREATE OR REPLACE PROCEDURE $database$.carto_temp._clear_cache_fn() as
$$
DECLARE
  statement RECORD;
  query VARCHAR(MAX);
BEGIN
  query := 'SELECT
      \'DROP TABLE $database$.workflows_temp.\' || relname || \';\' as statement
    FROM
      pg_class_info
    LEFT JOIN
      pg_namespace
    ON
      pg_class_info.relnamespace = pg_namespace.oid
    WHERE
      reltype != 0
      AND TRIM(nspname) = \'workflows_temp\'
      AND datediff(day, relcreationtime, current_date) > 30
  ';
  FOR statement IN EXECUTE query
  LOOP
     EXECUTE statement.statement;
  END LOOP;
END;
$$ LANGUAGE plpgsql;
```

After that, you should define a [Redshift scheduled query](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-schedule-query.html) with the following `CALL` to execute the previous procedure once per day:

```sql
CALL $database$.carto_temp._clear_cache_fn();
```

{% endtab %}

{% tab title="PostgreSQL" %}
When your workflow runs in PostgreSQL, a [pg\_cron task](https://github.com/citusdata/pg_cron) is automatically created in the `carto_temp` schema to remove the intermediate tables after 30 days. This task is executed periodically every day at 0h UTC.

Here are some resources to learn how to use pg\_cron on different managed services based on PostgreSQL:

* Amazon RDS. Usage guide available [here](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_pg_cron.html).
* Amazon Aurora PostgreSQL supports pg\_cron since [version 12](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraPostgreSQLReleaseNotes/AuroraPostgreSQL.Extensions.html#AuroraPostgreSQL.Extensions.12).
* Google Cloud SQL. Setup instructions [here](https://cloud.google.com/sql/docs/postgres/extensions#pg_cron).
* Azure Databases for PostgreSQL supports pg\_cron since [version 11](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-extensions#postgres-11-extensions) and provides [some usage examples](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-extensions#pg_cron).

This is the command to install the pg\_cron extension:

```sql
CREATE EXTENSION pg_cron;
```

These are the required permissions to create the pg\_cron task:

```sql
-- Grant permissions to manage the schema
GRANT ALL ON SCHEMA carto_temp TO user_name;

-- Grant permissions to manage the task
GRANT USAGE ON SCHEMA cron TO user_name;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA cron TO user_name;
```

Note that the task affects only the tables in that schema. If the name of the schema is changed in the connection, for example to `carto_temp_custom`, a new task will be created for the new schema.

{% hint style="info" %}
The task is created once per schema, and will be removed when the pg\_cron extension is removed. It can also be removed manually:

```sql
SELECT cron.unschedule('carto_scheduler.workflow_clear_cache.carto_temp');
```

{% endhint %}
{% endtab %}
{% endtabs %}

### Cache options

Workflows uses intermediate tables as a cache system to avoid re-executing unchanged steps. You can control caching behavior independently for each execution method: Editor UI, API, and MCP Tools.

{% hint style="info" %}
Cache settings are configured separately for each execution method, allowing you to enable caching for API calls while disabling it for Editor runs, for example.
{% endhint %}

#### Editor UI

When running workflows from the Editor, you can choose between two execution modes using the dropdown next to the 'Run' button:

* **Run**: Recompute the entire workflow from scratch.
* **Run (reuse)**: Skip unchanged nodes using cached results from previous executions.

#### API

When enabling API access for a workflow, you can configure cache behavior in the API Endpoint dialog under the "Execution settings" section:

* **Reuse output for repeated calls** (toggle): When enabled, API calls with the same parameters will return the existing output table instead of recomputing the workflow.

#### MCP Tools

When enabling a workflow as an MCP Tool, you can configure cache behavior in the MCP Tool dialog under the "Execution settings" section:

* **Reuse output for repeated calls** (toggle): When enabled, tool calls with the same parameters will return the existing output table instead of recomputing the workflow.

#### Provider-specific behavior

Cache behavior varies depending on your data warehouse:

* **CARTO Data Warehouse**, **BigQuery**, **Snowflake**, **Databricks** and **Oracle**: Full caching support. All execution methods (Editor UI, API, MCP Tools) support cache configuration.
* **Redshift** and **PostgreSQL**:
  * **Editor UI**: Cache options are not available. The Run button does not show the split dropdown, and all nodes are always computed completely.
  * **API**: The "Reuse output for repeated calls" setting is available and works as expected for API calls with identical parameters.
  * **MCP Tools**: Not available for these providers.
