LogoLogo
HomeAcademyLoginTry for free
  • Welcome
  • What's new
    • Q2 2025
    • Q1 2025
    • Q4 2024
    • Q3 2024
    • Q2 2024
    • Q1 2024
    • Q4 2023
    • Q3 2023
    • Q2 2023
    • Q1 2023
    • Q4 2022
    • Q3 2022
  • FAQs
    • Accounts
    • Migration to the new platform
    • User & organization setup
    • General
    • Builder
    • Workflows
    • Data Observatory
    • Analytics Toolbox
    • Development Tools
    • Deployment Options
    • CARTO Basemaps
    • CARTO for Education
    • Support Packages
    • Security and Compliance
  • Getting started
    • What is CARTO?
    • Quickstart guides
      • Connecting to your data
      • Creating your first map
      • Creating your first workflow
      • Developing your first application
    • CARTO Academy
  • CARTO User Manual
    • Overview
      • Creating your CARTO organization
      • CARTO Cloud Regions
      • CARTO Workspace overview
    • Maps
      • Data sources
        • Simple features
        • Spatial Indexes
        • Pre-generated tilesets
        • Rasters
        • Defining source spatial data
        • Managing data freshness
        • Changing data source location
      • Layers
        • Point
          • Grid point aggregation
          • H3 point aggregation
          • Heatmap point aggregation
          • Cluster point aggregation
        • Polygon
        • Line
        • Grid
        • H3
        • Raster
        • Zoom to layer
      • Widgets
        • Formula widget
        • Category widget
        • Pie widget
        • Histogram widget
        • Range widget
        • Time Series widget
        • Table widget
      • SQL Parameters
        • Date parameter
        • Text parameter
        • Numeric parameter
        • Publishing SQL parameters
      • Interactions
      • Legend
      • Basemaps
        • Basemap selector
      • AI Agents
      • SQL analyses
      • Map view modes
      • Map description
      • Feature selection tool
      • Search locations
      • Measure distances
      • Exporting data
      • Download PDF reports
      • Managing maps
      • Sharing and collaboration
        • Editor collaboration
        • Map preview for editors
        • Map settings for viewers
        • Comments
        • Embedding maps
        • URL parameters
      • Performance considerations
    • Workflows
      • Workflow canvas
      • Results panel
      • Components
        • Aggregation
        • Custom
        • Data Enrichment
        • Data Preparation
        • Generative AI
        • Input / Output
        • Joins
        • Parsers
        • Raster Operations
        • Spatial Accessors
        • Spatial Analysis
        • Spatial Constructors
        • Spatial Indexes
        • Spatial Operations
        • Statistics
        • Tileset Creation
        • BigQuery ML
        • Snowflake ML
        • Google Earth Engine
        • Google Environment APIs
        • Telco Signal Propagation Models
      • Data Sources
      • Scheduling workflows
      • Sharing workflows
      • Using variables in workflows
      • Executing workflows via API
      • Temporary data in Workflows
      • Extension Packages
      • Managing workflows
      • Workflows best practices
    • Data Explorer
      • Creating a map from your data
      • Importing data
        • Importing rasters
      • Geocoding data
      • Optimizing your data
    • Data Observatory
      • Terminology
      • Browsing the Spatial Data Catalog
      • Subscribing to public and premium datasets
      • Accessing free data samples
      • Managing your subscriptions
      • Accessing your subscriptions from your data warehouse
        • Access data in BigQuery
        • Access data in Snowflake
        • Access data in Databricks
        • Access data in Redshift
        • Access data in PostgreSQL
    • Connections
      • Google BigQuery
      • Snowflake
      • Databricks
      • Amazon Redshift
      • PostgreSQL
      • CARTO Data Warehouse
      • Sharing connections
      • Deleting a connection
      • Required permissions
      • IP whitelisting
      • Customer data responsibilities
    • Applications
    • Settings
      • Understanding your organization quotas
      • Activity Data
        • Activity Data Reference
        • Activity Data Examples
        • Activity Data Changelog
      • Users and Groups
        • Inviting users to your organization
        • Managing user roles
        • Deleting users
        • SSO
        • Groups
        • Mapping groups to user roles
      • CARTO Support Access
      • Customizations
        • Customizing appearance and branding
        • Configuring custom color palettes
        • Configuring your organization basemaps
        • Enabling AI Agents
      • Advanced Settings
        • Managing applications
        • Configuring S3 Bucket for Redshift Imports
        • Configuring OAuth connections to Snowflake
        • Configuring OAuth U2M connections to Databricks
        • Configuring S3 Bucket integration for RDS for PostgreSQL Exports in Builder
        • Configuring Workload Identity Federation for BigQuery
      • Data Observatory
      • Deleting your organization
    • Developers
      • Managing Credentials
        • API Base URL
        • API Access Tokens
        • SPA OAuth Clients
        • M2M OAuth Clients
      • Named Sources
  • Data and Analysis
    • Analytics Toolbox Overview
    • Analytics Toolbox for BigQuery
      • Getting access
        • Projects maintained by CARTO in different BigQuery regions
        • Manual installation in your own project
        • Installation in a Google Cloud VPC
        • Core module
      • Key concepts
        • Tilesets
        • Spatial indexes
      • SQL Reference
        • accessors
        • clustering
        • constructors
        • cpg
        • data
        • http_request
        • import
        • geohash
        • h3
        • lds
        • measurements
        • placekey
        • processing
        • quadbin
        • random
        • raster
        • retail
        • routing
        • s2
        • statistics
        • telco
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
        • Working with Raster data
      • Release notes
      • About Analytics Toolbox regions
    • Analytics Toolbox for Snowflake
      • Getting access
        • Native App from Snowflake's Marketplace
        • Manual installation
      • Key concepts
        • Spatial indexes
        • Tilesets
      • SQL Reference
        • accessors
        • clustering
        • constructors
        • data
        • http_request
        • import
        • h3
        • lds
        • measurements
        • placekey
        • processing
        • quadbin
        • random
        • raster
        • retail
        • s2
        • statistics
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
        • Working with Raster data
      • Release Notes
    • Analytics Toolbox for Databricks
      • Getting access
        • Personal (former Single User) cluster
        • Standard (former Shared) cluster
      • Reference
        • lds
        • tiler
      • Guides
      • Release Notes
    • Analytics Toolbox for Redshift
      • Getting access
        • Manual installation in your database
        • Installation in an Amazon Web Services VPC
        • Core version
      • Key concepts
        • Tilesets
        • Spatial indexes
      • SQL Reference
        • clustering
        • constructors
        • data
        • http_request
        • import
        • lds
        • placekey
        • processing
        • quadbin
        • random
        • s2
        • statistics
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
      • Release Notes
    • Analytics Toolbox for PostgreSQL
      • Getting access
        • Manual installation
        • Core version
      • Key concepts
        • Tilesets
        • Spatial Indexes
      • SQL Reference
        • h3
        • quadbin
        • tiler
      • Guides
        • Creating spatial index tilesets
        • Running queries from Builder
      • Release Notes
    • CARTO + Python
      • Installation
      • Authentication Methods
      • Visualizing Data
      • Working with Data
        • How to work with your data in the CARTO Data Warehouse
        • How to access your Data Observatory subscriptions
        • How to access CARTO's Analytics Toolbox for BigQuery and create visualizations via Python notebooks
        • How to access CARTO’s Analytics Toolbox for Snowflake and create visualizations via Python notebooks
        • How to visualize data from Databricks
      • Reference
    • CARTO QGIS Plugin
  • CARTO for Developers
    • Overview
    • Key concepts
      • Architecture
      • Libraries and APIs
      • Authentication methods
        • API Access Tokens
        • OAuth Access Tokens
        • OAuth Clients
      • Connections
      • Data sources
      • Visualization with deck.gl
        • Basemaps
          • CARTO Basemap
          • Google Maps
            • Examples
              • Gallery
              • Getting Started
              • Basic Examples
                • Hello World
                • BigQuery Tileset Layer
                • Data Observatory Tileset Layer
              • Advanced Examples
                • Arc Layer
                • Extrusion
                • Trips Layer
            • What's New
          • Amazon Location
            • Examples
              • Hello World
              • CartoLayer
            • What's New
        • Rapid Map Prototyping
      • Charts and widgets
      • Filtering and interactivity
      • Summary
    • Quickstart
      • Make your first API call
      • Visualize your first dataset
      • Create your first widget
    • Guides
      • Build a public application
      • Build a private application
      • Build a private application using SSO
      • Visualize massive datasets
      • Integrate CARTO in your existing application
      • Use Boundaries in your application
      • Avoid exposing SQL queries with Named Sources
      • Managing cache in your CARTO applications
    • Reference
      • Deck (@deck.gl reference)
      • Data Sources
        • vectorTableSource
        • vectorQuerySource
        • vectorTilesetSource
        • h3TableSource
        • h3QuerySource
        • h3TilesetSource
        • quadbinTableSource
        • quadbinQuerySource
        • quadbinTilesetSource
        • rasterSource
        • boundaryTableSource
        • boundaryQuerySource
      • Layers (@deck.gl/carto)
      • Widgets
        • Data Sources
        • Server-side vs. client-side
        • Models
          • getFormula
          • getCategories
          • getHistogram
          • getRange
          • getScatter
          • getTimeSeries
          • getTable
      • Filters
        • Column filters
        • Spatial filters
      • CARTO APIs Reference
    • Release Notes
    • Examples
    • CARTO for React
      • Guides
        • Getting Started
        • Views
        • Data Sources
        • Layers
        • Widgets
        • Authentication and Authorization
        • Basemaps
        • Look and Feel
        • Query Parameters
        • Code Generator
        • Sample Applications
        • Deployment
        • Upgrade Guide
      • Examples
      • Library Reference
        • Introduction
        • API
        • Auth
        • Basemaps
        • Core
        • Redux
        • UI
        • Widgets
      • Release Notes
  • CARTO Self-Hosted
    • Overview
    • Key concepts
      • Architecture
      • Deployment requirements
    • Quickstarts
      • Single VM deployment (Kots)
      • Orchestrated container deployment (Kots)
      • Advanced Orchestrated container deployment (Helm)
    • Guides
      • Guides (Kots)
        • Configure your own buckets
        • Configure an external in-memory cache
        • Enable Google Basemaps
        • Enable the CARTO Data Warehouse
        • Configure an external proxy
        • Enable BigQuery OAuth connections
        • Configure Single Sign-On (SSO)
        • Use Workload Identity in GCP
        • High availability configuration for CARTO Self-hosted
        • Configure your custom service account
      • Guides (Helm)
        • Configure your own buckets (Helm)
        • Configure an external in-memory cache (Helm)
        • Enable Google Basemaps (Helm)
        • Enable the CARTO Data Warehouse (Helm)
        • Configure an external proxy (Helm)
        • Enable BigQuery OAuth connections (Helm)
        • Configure Single Sign-On (SSO) (Helm)
        • Use Workload Identity in GCP (Helm)
        • Use EKS Pod Identity in AWS (Helm)
        • Enable Redshift imports (Helm)
        • Migrating CARTO Self-hosted installation to an external database (Helm)
        • Advanced customizations (Helm)
        • Configure your custom service account (Helm)
    • Maintenance
      • Maintenance (Kots)
        • Updates
        • Backups
        • Uninstall
        • Rotating keys
        • Monitoring
        • Change the Admin Console password
      • Maintenance (Helm)
        • Monitoring (Helm)
        • Rotating keys (Helm)
        • Uninstall (Helm)
        • Backups (Helm)
        • Updates (Helm)
    • Support
      • Get debug information for Support (Kots)
      • Get debug information for Support (Helm)
    • CARTO Self-hosted Legacy
      • Key concepts
        • Architecture
        • Deployment requirements
      • Quickstarts
        • Single VM deployment (docker-compose)
      • Guides
        • Configure your own buckets
        • Configure an external in-memory cache
        • Enable Google Basemaps
        • Enable the CARTO Data Warehouse
        • Configure an external proxy
        • Enable BigQuery OAuth connections
        • Configure Single Sign-On (SSO)
        • Enable Redshift imports
        • Configure your custom service account
        • Advanced customizations
        • Migrating CARTO Self-Hosted installation to an external database
      • Maintenance
        • Updates
        • Backups
        • Uninstall
        • Rotating keys
        • Monitoring
      • Support
    • Release Notes
  • CARTO Native App for Snowflake Containers
    • Deploying CARTO using Snowflake Container Services
  • Get Help
    • Legal & Compliance
    • Previous libraries and components
    • Migrating your content to the new CARTO platform
Powered by GitBook
On this page
  • Specific permissions required by Workflows in your data warehouse
  • How to manage the temporary data generated by Workflows
  • Cache options

Was this helpful?

Export as PDF
  1. CARTO User Manual
  2. Workflows

Temporary data in Workflows

PreviousExecuting workflows via APINextExtension Packages

Last updated 4 months ago

Was this helpful?

To improve performance and to be able to inspect results in intermediate steps, Workflows makes use of temporary data objects that are stored, by default, under a so called workflows_temp schema/dataset in your data warehouse.

Note that at the time of creating the connection to your data warehouse, you can also provide a custom location for storing the temporary data generated by Workflows.

This system of intermediate tables functions as a cache, avoiding re-executing all the steps in a workflow when they have not been modified; thus, executing again only those that have been changed or their inputs have changed when you are re-running your workflows.

Specific permissions required by Workflows in your data warehouse

In order for CARTO to be able to execute workflows in your data warehouse and temporarily store the data generated in intermediate steps of your analysis, there is a specific set of permissions required on the connection selected when creating your workflow.

Since CARTO Workflows can be used with different providers (Google BigQuery, Snowflake, Amazon Redshift, and PostgreSQL), adapting the different terminology for each data warehouse, the recommended setup is:

The or user account used for creating the connection must have the following required roles:

  • BigQuery Data Editor

  • BigQuery User

If the workflows_temp dataset does not exist, it will be automatically created in the region of the tenant (US, EU, AS, AU). In order to specify a custom region for the workflows_temp, manually create the dataset in the desired region and set the name (if different than workflows_temp) in the BigQuery connection > Advanced options.

-- If the workflows_temp schema already exists:
-- Grant permissions to manage the schema
GRANT ALL PRIVILEGES ON SCHEMA DATABASE_NAME.WORKFLOWS_TEMP TO ROLE ROLE_NAME;

-- If the workflows_temp schema does not exist yet:
-- Grant permissions to create the schema
GRANT USAGE ON DATABASE DATABASE_NAME TO ROLE ROLE_NAME;
GRANT CREATE SCHEMA ON DATABASE DATABASE_NAME TO ROLE ROLE_NAME;

Your Databricks user will need to have USE CATALOG and CREATE SCHEMA permission.

GRANT USE CATALOG ON CATALOG my_catalog TO user@example.com;
GRANT CREATE SCHEMA ON CATALOG my_catalog TO user@example.com;
-- If the workflows_temp schema already exists:
-- Grant permissions to manage the schema
GRANT ALL ON SCHEMA workflows_temp TO user_name;

-- If the workflows_temp schema does not exist yet:
-- Grant permissions to create the schema
GRANT CREATE ON DATABASE database_name TO user_name;
-- If the workflows_temp schema already exists:
-- Grant permissions to manage the schema
GRANT ALL ON SCHEMA workflows_temp TO user_name;

-- If the workflows_temp schema does not exist yet:
-- Grant permissions to create the schema
GRANT CREATE ON DATABASE database_name TO user_name;

In order to learn more details regarding data warehouse connection permissions please check .

How to manage the temporary data generated by Workflows

When your workflow runs, it stores the output results in intermediate tables. Depending on the data warehouse platform, a different strategy is applied to clean up the intermediate tables generated by CARTO Workflows. The default configuration removes the intermediate tables after 30 days.

Below, you can find code and instructions for automating the process of cleaning up the intermediate tables in each data warehouse:

When your workflow runs in BigQuery or CARTO Data Warehouse, these tables are temporary and will be automatically removed after 30 days.

You can find more information about temporary tables in the .

When your workflow runs in Snowflake, a is automatically created in the WORKFLOWS_TEMP schema to remove the intermediate tables after 30 days. This task is executed periodically every day at 0h UTC.

These are the required permissions to create the task:

-- Grant permissions to manage the schema
GRANT ALL PRIVILEGES ON SCHEMA DATABASE_NAME.WORKFLOWS_TEMP TO ROLE ROLE_NAME;

-- Grant permissions to execute the task
GRANT EXECUTE TASK ON ACCOUNT TO ROLE ROLE_NAME;

Note that the task affects only the tables in that schema. If the name of the schema is changed in the connection, for example to WORKFLOWS_TEMP_CUSTOM, a new task will be created for the new schema.

The task is created once per schema, and will be removed when the schema is removed. It can also be removed manually:

DROP TASK IF EXISTS DATABASE_NAME.WORKFLOWS_TEMP."carto_scheduler.workflow_clear_cache.workflows_temp"

In Databricks, a user with enough privileges will need to set up a workflow or notebook that runs periodically to clean up the content of the workflows_temp schema created in the selected catalog.

When your workflow runs in Redshift, in order to remove the intermediate tables, you should create a to remove datasets older than 30 days.

Scheduled queries are only available for provisioned Redshift clusters. If you are using you'll have to execute manually the clean up, e.g. calling the procedure defined below, or find other external means of executing it periodically.

The following procedure should be created in your Redshift cluster to delete tables older than 30 days (you will have to replace all the occurrences of the $database$ placeholder with the name of your database):

CREATE OR REPLACE PROCEDURE $database$.workflows_temp._clear_cache_fn() as
$$
DECLARE
  statement RECORD;
  query VARCHAR(MAX);
BEGIN
  query := 'SELECT
      \'DROP TABLE $database$.workflows_temp.\' || relname || \';\' as statement
    FROM
      pg_class_info
    LEFT JOIN
      pg_namespace
    ON
      pg_class_info.relnamespace = pg_namespace.oid
    WHERE
      reltype != 0
      AND TRIM(nspname) = \'workflows_temp\'
      AND datediff(day, relcreationtime, current_date) > 30
  ';
  FOR statement IN EXECUTE query
  LOOP
     EXECUTE statement.statement;
  END LOOP;
END;
$$ LANGUAGE plpgsql;
CALL $database$.workflows_temp._clear_cache_fn();

Here are some resources to learn how to use pg_cron on different managed services based on PostgreSQL:

This is the command to install the pg_cron extension:

CREATE EXTENSION pg_cron;

These are the required permissions to create the pg_cron task:

-- Grant permissions to manage the schema
GRANT ALL ON SCHEMA workflows_temp TO user_name;

-- Grant permissions to manage the task
GRANT USAGE ON SCHEMA cron TO user_name;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA cron TO user_name;

Note that the task affects only the tables in that schema. If the name of the schema is changed in the connection, for example to workflows_temp_custom, a new task will be created for the new schema.

The task is created once per schema, and will be removed when the pg_cron extension is removed. It can also be removed manually:

SELECT cron.unschedule('carto_scheduler.workflow_clear_cache.workflows_temp');

Cache options

When executing workflows from the UI, intermediate tables are created for each node in the workflow. Depending on the connection used for your workflow, these intermediate tables will be reused, avoiding recreation when the workflow structure or settings have not changed.

In order to control this behavior, you will find the Cache Settings menu, next to the 'Run' button:

When enabled, which is the default option, it will try and reuse previous successful executions of the workflow and its intermediate steps, with some differences depending on the data warehouse:

  • CARTO Data Warehouse, BigQuery and Snowflake: Intermediate and result tables will be reused as long as the the workflow structure have not changed.

    • This setting applies to executions from the UI, triggered by clicking on the 'Run' button.

  • Redshift and PostgreSQL: Intermediate tables are never reused. This means that all nodes in a workflow are always computed completely.

    • This setting doesn't have an effect on executions from the UI, triggered by clicking on the 'Run' button.

When disabled, all intermediate and result tables will be always be recomputed in all execution modes (UI, Schedule and API call), regardless of updates to source tables or parameter values.

After that, you should define a with the following CALL to execute the previous procedure once per day:

When your workflow runs in PostgreSQL, a is automatically created in the workflows_temp schema to remove the intermediate tables after 30 days. This task is executed periodically every day at 0h UTC.

Amazon RDS. Usage guide available .

Amazon Aurora PostgreSQL supports pg_cron since .

Google Cloud SQL. Setup instructions .

Azure Databases for PostgreSQL supports pg_cron since and provides .

This setting also applies to (beta).

For , the table will be reused between API calls that have the exact same parameter values. If parameters are changed, the output table will be recomputed.

This setting also doesn't have an effect on executions (only available for PostgreSQL connections).

For , the table will be reused between API calls that have the exact same parameter values. If parameters are changed, the output table will be recomputed.

service account
here
BigQuery documentation
Snowflake task
scheduled query in Redshift
Redshift Serverless
Redshift scheduled query
pg_cron task
here
version 12
here
version 11
some usage examples
scheduled workflows
Scheduled
workflows executed via API call
Output
workflows executed via API call
Output