All pages
Powered by GitBook
1 of 39

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Workflows

CARTO Workflows provides a visual language to design and execute multi-step spatial analytics procedures. With Workflows, our mission is to bring spatial analytics to a broader audience of data analysts and business users, and to democratize access to advanced Location Intelligence.

This tool, like the rest of the platform, is fully cloud native; running Spatial SQL in your own data warehouse, and leveraging CARTO’s Analytics Toolbox and the other components of our technology stack.

In order to improve performance and be able to inspect the results of intermediate steps, Workflows makes use of temporary data that, by default, is stored in a workflows_temp schema/dataset in your data warehouse (learn more about temporary tables here).

CARTO Workflows reduces the complexity and the high dependence on specialist knowledge. Users can leverage the scalability of cloud data warehouses through the use of spatial SQL without needing to write SQL code themselves. It opens up analytical modelling to all roles and skill levels, through a simple, familiar user interface.

Workflows runs in a directed graph structure, meaning that the workflow will run from left to right, step by step, but outputs of nodes can be used in the immediate next step or later in the Workflow. For example, if you have 5 nodes, the output of node 2 can connect to the immediate next node, or node 3, or nodes 4 or 5, or all if needed.

Workflow templates

The CARTO team has designed this collection of Workflows examples with a hands-on approach to empower users and ease the Workflows learning curve.

These examples showcase a wide range of scenarios and applications: from simple building blocks for your geospatial analysis to more complex, industry-specific workflows tailored to facilitate running specific geospatial use-cases.

Making use of these examples is very easy. Just click on "New Workflow" and "From template" in your CARTO Workspace to access the collection of templates. Once the workflow is re-created you will be able to modifying as with any other workflow, replacing the data sources and re-configuring the different nodes so it can be useful for your specific use-case.

Workflows UI

Before we jump into Workflows, let’s take a quick tour of the Workflows UI so you know your way around before getting started.

Canvas

First is the Canvas where you will design your Workflow. This is a free-form Canvas meaning you can drag nodes onto any part of the canvas. You can zoom in and out to see different parts of your workflow and see the layout of the workflow in the mini viewer in the lower right corner. As you add nodes to the canvas they will snap to a grid to align. \

Sources & Components

On the left side, you will find a menu where you can add data sources from the connection you created the Workflow. You can add any data source that you want that exists in your connection. You also have all the components, or nodes, that you can add to the canvas. We will go over the various components later. You can search for components or scroll to find the component you want.

Results

The bottom panel is the results space where you will see four different tabs:

  • Messages: Messages about the status of your Workflow including success and error messages.

  • Data: After clicking on a node, you can see the tabular data outputs of that specific workflow step.

  • Map: After clicking on a node, if that step returns a valid geometry, it will show up in the map. If there is more than one geometry you will have an option to pick which one to show.

SQL: The compiled SQL of the workflow. This includes different steps and procedural language.

Results panel

The Results panel in Workflows provide information about each node in your workflow. Click on a node to open the panel and see its different sections.

The header of the panel contains general information about the workflow, like the time it was last executed or the connection being used. It also allows expanding to full screen or hiding the panel to leave more space for the canvas.

Messages

This tab contains information about the execution of the workflow. Error messages and confirmation for successful runs will appear here.

Data

This tab contains a table visualization of the result of the node's execution.

Column stats

Next to each column name, you will find this icon . Click on it to show some column statistics on the right side of the panel:

For numeric columns, max, min, average and sum are displayed.

For string and date or timestamp columns, the frequency of the top 20 categories is calculated and shown.

Pagination and copy

The data tab also allows to explore the result in pages of different length, which can be configured in the bottom right corner of the panel:

The copy button in the top right corner will copy the content of the current page to the clipboard, using tabs as delimiter, which makes it easy to paste directly into a spreadsheet.

Export node's data

The export button will start the export process of the data on the current node. The export will happen asynchronously, and the status will be reported in the activity panel in the top bar. Once finished, the resulting link(s) will be available.

Bucket considerations

For SaaS, CARTO automatically manages buckets and data exports. Be aware that for exports from Snowflake and Redshift sources, the AWS cluster needs to match the organization's CARTO SaaS region. Check your CARTO SaaS region in .

For Self-Hosted deployments, a bucket owned by the customer needs to be configured. Please refer to for more information.

Map

The Map tab shows a preview of the result on a map. It also allows to create a map in Builder using the selected node as data source. When there are more than one columns that could be rendered (several geometry/geography columns, spatial indexes, etc), you can use a selector to make sure the correct column is used in the Builder map.

SQL

This tab contains the SQL code that is executed in the data warehouse when the workflow runs from the UI, by clicking on the Run button.

Part of this code handles the creation of temporal tables and other control strategies. These portion of the code is collapsed and hidden by default, but it can be expanded by clicking on the small two dots inline with the code.

The code displayed in the SQL tab corresponds exactly with what is executed in your data warehouse by clicking the 'Run' button in the Workflows UI.

However, this code is different to the one or . In these cases, all the control code is ommitted and CTEs are used instead of temporary tables.

Workflow canvas

This is the area of the interface where you will be adding the different components and data sources that are required to run your analysis, and connecting them in order to define each step of the workflow in order of execution.

In this central panel you will always find the main diagram of your workflow, with all its components and data sources connected according to what you have established.

Annotations

In the workflow canvas you have the possibility to add visual assets and annotations to provide explanations on your workflow.

You can do it by clicking on the corresponding icon in the toolbar and drag on the canvas. This will help you make very complex workflows more understandable.

The notes support , allowing to add different levels of headers, text formatting, code blocks, in line monospace code, bulleted and ordered lists, links, images, etc.

Import a file to your workflow

This feature has been deprecated in favor of the component. Take a look at the announcement in our section

On the toolbar on top of the canvas you will find the Import file into workflow button.

Clicking this button will let you select a location in the canvas to place a datasource. After that, select a local file from your computer or specify a URL that will be imported as a data node in your workflow.

Automatic description of a workflow

This feature is in public beta, available on demand. Please if you wish to have it enabled in your account.

On the toolbar on top of the canvas you will find the Generate workflow description button.

Clicking this button will let you select a location in the canvas that will contain an automatic description of the workflow.

This description is obtained using Generative AI on the workflow's SQL code.

Rename a node

Double click on a node's name to add a custom name to it. By using more descriptive names, your workflow will be much easier to understand.

Settings
this documentation
exported
executed via API
Markdown syntax
Import from URL
What's new
contact us

Geospatial Foundation Models

An extension for accessing open source Geospatial Foundation Models: large-scale AI models trained on diverse geospatial data like satellite imagery, topographical maps, and other location-specific datasets, that encapsulate spatial information and relationships into dense representations in a multi-dimensional space (embeddings).

Google PDFM County Embeddings

Description

This component loads Google's PDFM Embeddings at the county level.

This component loads Google's Population Dynamics Foundation Model (PDFM) Embeddings at the county level. Google's (PDFM) Embeddings are condensed vector representations designed to encapsulate the complex, multidimensional interactions among human behaviors, environmental factors, and local contexts at specific locations.

These embeddings capture patterns in aggregated data such as search trends, busyness trends, and environmental conditions (maps, air quality, temperature), providing a rich, location-specific snapshot of how populations engage with their surroundings.

Output

  • Output table: This component generates an output table with the PDFM embeddings at the county level.

Google PDFM ZCTA Embeddings

Description

This component loads at the ZIP Code Tabulation Area (ZCTA) level. Google's (PDFM) Embeddings are condensed vector representations designed to encapsulate the complex, multidimensional interactions among human behaviors, environmental factors, and local contexts at specific locations.

These embeddings capture patterns in aggregated data such as search trends, busyness trends, and environmental conditions (maps, air quality, temperature), providing a rich, location-specific snapshot of how populations engage with their surroundings.

Output

  • Output table: This component generates an output table with the PDFM embeddings at the ZCTA level.

Google Earth Engine Satellite Embeddings

This component loads summarized for input geometries.

The Satellite Embeddings dataset provides global, analysis-ready 64-dimensional vectors for each 10-meter land pixel, capturing annual temporal patterns of surface conditions using data from multiple Earth observation sources. Unlike traditional spectral bands, these vectors summarize complex, multi-source relationships in a more abstract but powerful format.

Input

  • Input table: This component expects an input table with a geography column

Settings

  • Geography column: A column that contains the geography data.

  • From: The starting year of data to extract.

  • To: The end year of data to extract.

  • Reducer function: Function to summarize the raster data over each polygon (e.g., mean, min, max).

Output

  • Output table: This component generates an output table with the Satellite Embeddings.

Scale in meters: The scale of the source data in meters.

Google's Population Dynamics Foundation Model (PDFM) Embeddings
Google Earth Engine's Satellite Embeddings

Using variables in workflows

Creating variables

In order to start using variables, click on the three dots in the upper-right corner and find 'Variables'.

  • Order: The order of the variables is relevant when setting them as parameters, as they will be used as inputs for the generated stored procedure in that order, defining the procedure's signature.

  • Name: Give a name to the variable. This name will be referenced as @name within an expression.

Using a variable name that conflicts with a column name can produce errors on the Workflow's SQL code. Please make sure that your variable's name is not the same as any of the column names in the data sources involved in the Workflow.

  • Type: Number and String variables are supported.

  • Default value: The default value for the variable. When executing a workflow from the UI, this is the value that will be given to the variable.

  • Parameter: Mark a variable as parameter if you need to allow external access to it. This is helpful when running a workflow via API call.

    {% hint style="info" %} For Databricks connections, executing Workflows via API is currently a work in progress. For this reason, the use of parameters is not supported on this type of execution. The functionality will be available soon, please get in touch with

Learn more about executing parameterized workflows in .

Click on the '+' button to confirm the creation of the variable. A new entry will appear where you can keep adding variables.

Using variables and expressions in components

Once a variable has been defined, you can use them to control the value of a component's settings. As global variables, they can be used in one or many occasions within the same workflow, and they will apply the same value to the setting where they're used.

To use a variable in a setting, just start typing {{@variable_name}} and the available matching variables will appear. Make sure that you close the variable name with double curly braces, as shown in the screenshot below.

Not only variables, but expressions including SQL operations and column names are also allowed, so something like this could be used as a setting on a node: {{@buffer_radius / distance * 10}} which combines a parameter (buffer_radius) with a column (distance) in a mathematical operation.

When using , include only @variable_name without curly braces.

Raster Operations

Components to work with raster data sources

Get Values from Raster

Description

This component allows obtaining values from a raster table. The values included in the output can be determined by a vector data source input.

Inputs

  • Raster table [Table]: raster data source that will be used as source for the values.

  • Vector table [Table]: vector data source that will be used to determine which pixel values are going to be obtained

  • Geo column: Column in the Vector table that contains a geography to determine which pixel values are going to be obtained.

  • Raster properties : List of band values that are going to be obtained from the raster, in the areas where it intersects with the vector table features.

  • Custom expression: (optional) Add a custom SQL expression containing an operation that uses values from the different bands in the raster, for example band_1+band_2 as sum

Advanced options

  • Include pixel [boolean]: Default false. If true, the output table will include a pixel column containing the quadbin id of the corresponding pixel in the raster.

Outputs

  • Result table [Table]: A table that contains a row for each pixel in the raster source that intersects with the vector source. It will contains an optional pixel column and a column for each selected property, plus any additional column generated by a custom expression.

Intersect and Aggregate Raster

Description

This component intersects a raster table with a table containing polygons and aggregates the values extracted from the intersection areas.

Inputs

  • Raster table [Table]: raster data source that will be used as source for the aggregated values.

  • Vector table [Table]: vector data source that contains the geographies to be enriched with aggragated data from the raster pixel values.

  • Geo column: Geography column in the Vector table that will be intersected with the raster.

Outputs

  • Result table [Table]: A table that contains the same rows as the input Vector table, with an additional column for each raster aggregated property and any additional property defined as a custom expression. The values for those columns are the computed aggregation from all pixels in the raster that intersect with the vector feature.

Generative AI

Components to leverage native Generative AI capabilities on Data Warehouses.

ML Generate Text

Description

This components calls ML.GENERATE_TEXT function in BigQuery for each row on the input table.

In order to use this component, a needs to be provided by the user.

Also, having the BigQuery Connection User (roles/bigquery.connectionUser) role granted in your BigQuery project is required.

Inputs

  • Model [FQN]: The path for the model to be used in the format project_id.dataset.model

  • Prompt column: The column in the input source used to generate the prompt.

  • Max Output Tokens: an INT64

Outputs

  • Result table [Table]

External links

Managing workflows

You will be able to manage workflows from the Workflows section of the workspace. From there, you will be able to:

  • Edit workflow names

  • Create and apply tags to workflows

  • Share workflows

  • Duplicate workflows

  • Delete workflows

These options may be restricted depending on the user's role and permissions.

Tagging workflows

You can tag workflows from the Workspace by editing their properties:

  1. Click on the three dotted-button in the workflow card.

  2. Select "Edit workflow properties".

  3. Create, apply or remove existing tags in the tag input.

Additionally, you will be able to filter workflows by tags using the tag filter in the workspace. Once a tag filter is applied, you will be able to copy the URL and share it with other users within your organization.

Tags are automatically deleted when they are no longer applied to any map or workflow.

Duplicating workflows

To duplicate an existing workflow from the Workspace:

  1. Click on the three-dotted menu in the workflow card.

  2. Select "Duplicate Workflow".

When duplicating a workflow, you will need access to the connection used in the original workflow. Otherwise, you will be prompted to select a new connection.

Deleting workflows

To delete a workflow from the Workspace:

  1. Click on the three-dotted menu in the workflow card.

  2. Select "Delete workflow".

  3. Confirm that you want to delete the selected workflow.

Sharing and collaboration

Share with other users in your organization

This guide describes how you can share a workflow with the other users within your CARTO organization.

When you create a workflow, it’s always set to private by default. To share a workflow, you just need to make sure it has a name, and then click on the Share button in the top-right corner to open the sharing options.

A new dialog screen will open allowing you to select who do you want to share your workflows with:

Extension Packages

Private Preview

Introduction

Workflows' functionality can be extended and tailored to specific use cases, by creating custom components, which can be grouped and distributed together as Extension Packages.

Extension Packages can be created for workflows that use BigQuery and Snowflake connections.

Control

Components to control the execution flow of the workflow.

Conditional Split

Description The Conditional Split component evaluates an aggregated measure or custom expression from the input data against a logical condition, directing execution into one of two branches: If and Else. This allows workflows to follow different processing paths depending on whether the condition evaluates to true or false.

Inputs

Raster aggregated properties : List of aggregated band values that are going to be obtained from the raster, in the areas where it intersects with each of the vector table rows.

  • Custom expression: (optional) Add a custom SQL expression containing an operation that aggregates values from the different bands in the raster, for example AVG(band_1+band_2) as avg_band_sum

  • Source table [Table]

    Settings The condition can be defined in two ways:

    1. Simple condition – built through the UI:

      • Column: The column from the input table to aggregate.

      • Aggregation method: Function applied to the column (e.g., MIN, MAX, AVG, SUM).

      • Operator: Comparison operator (e.g., Is equal to, Is greater than, Is less than).

      • Value: The value to compare the aggregated result against.

    2. Custom expression – a manually written SQL expression that can include multiple columns, functions, and logical operators (e.g., AVG(population) > 3000 AND (category LIKE 'capital_city' OR capital IS TRUE)).

    Outputs

    • If path [Table]: Passes all rows from the input table when the condition evaluates to TRUE.

    • Else path [Table]: Passes all rows from the input table when the condition evaluates to FALSE.

    Success/Error Split

    Description The Success/Error Split component determines the outcome of the previous component’s execution and routes the workflow accordingly. If the upstream component completes successfully, the workflow continues through the Success branch; if it fails, execution follows the Error branch. This enables workflows to handle errors gracefully, trigger alternative actions, or send notifications when issues occur.

    Inputs

    • Source table [Table] (only available if the upstream component executed successfully)

    Settings

    • None

    Outputs

    • Success path [Table]: Passes the table produced by the previous component if no errors occurred.

    • Error path [No table]: Produces no table output (since the previous step failed) but can connect to components such as HTTP Request, Send by Email, or logging for error handling.

    value in the range
    [1,1024]
    that sets the maximum number of tokens that the model outputs. Specify a lower value for shorter responses and a higher value for longer responses. The default is
    50
    .
  • Temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when top_k and top_p are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 1.0.

  • Top P: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 40.

  • Top K: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 1.0.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

  • BigQuery ML model
    BigQuery reference
    Applying the tag filter in the Workspace
    if you have feedback about this missing functionality. {% endhint %}
    [email protected]
    this section
    custom components
    Currently, we support the following two main sharing options:
    • Private: Only you can view and edit the workflow.

    • Organization: The workflow will be visible to all users in your organization. Editor users will be able to duplicate the workflow so they can create a copy and start working on it.

      • If you have Single Sign-On (SSO) and Groups enabled at your organization, you will be able to share workflows only with specific groups, instead of the entire organization. Read more about Managing user groups.

      • You can also share only with specific users in your organzation.

    Editor collaboration

    Enable the "Allow editing this workflow" option to allow collaboration from other editor users with access to the map.

    After you enable this setting, all editors with access to this map will also be able to edit it. A few things to take into consideration:

    • To enable editor collaboration, the connection used to create the workflow also needs to be shared.

    • When using the CARTO Data Warehouse, or a connection with Viewer Credentials, each user will use their own temporal storage for Workflows, meaning that the result of each node and their execution status will not be shared.

    Collaboration is asynchronous. If two users try to open the map at the same time, a pop-up will appear to decide who will take control of the map at that moment. If the current editor is away from the keyboard, the new editor can take control of the map in 30 seconds.

    The current editor can also choose to deny the take over, remaining in control of the map until the map is closed.

    Export a workflow as a SQL file

    Click on the three dots in the upper-left corner and find 'Export'

    This action will download a file that contains the SQL code needed to create a stored procedure based on the contents of your workflow. It will observe existing variables and parameters, the former being exposed as input parameters for the stored procedure.

    Import a workflow from a SQL file

    A .sql file that was obtained by exporting a workflow can be imported to reproduce the same workflow. There are two different ways to import a workflow:

    • Drag a drop a compatible .sql file into your Workflows page in Workspace.

    • Click the import button next to + New workflow and select a file from your computer.

    Explore the complete reference of the extension packages provided by CARTO

    Managing Extension Packages

    To explore the available extension packages provided and maintained by CARTO, install new packages from a ZIP file, or manage already installed ones, simply click on Manage Extension Packages at the bottom of the component's panel in Workflows.

    You will find three different tabs:

    • Explore: Here you will be able to see the catalog of available extensions. These extensions have been published by CARTO. By clicking on an extension's card you will access the details page with a deeper description of the extension and its components and settings.

    • Installed: This tab shows a list of installed extensions in the current connection. When a new version of the extension is available you will see an 'Outdated' tag. To update siply click on the three dots and "Update extension package".

    • Upload: This tab allows uploading a ZIP file for a custom extension. See the Creating an Extension Package section to find more information. After the upload, a details page will appear. Click on "Install extension" to install it. After that, the extension will be available in the Installed tab.

    Access to Extension Packages

    After installation, the custom components included in the extension can be found in the components panel, under the category defined for each by the extension's creator.

    Each component included in an extension is installed as an stored procedure in the Workflows working location for each connection. This implies that they're only accessible by workflows that are created with the same connection, or with connections that share the same working location.

    Disabling Extension Packages

    Admins can prevent users in their organization from installing extensions from the Governance settings.

    Creating an Extension Package

    Creating an extension package for Workflows involves some development work, as each custom component in the package needs to be programmed as a SQL Stored Procedure.

    We have created a GitHub template that can be used as a starting point, as it contains the basic folder structure, configuration files, examples and documentation to create your first package.

    Take a look at it in https://github.com/CartoDB/workflows-extension-template.

    The repository also includes some utilities to test your custom components, and deploy directly to your data warehouse to ease development.

    Additionally, the repository includes an script to package your extension, resulting in a ZIP file that can be used in Workflows to install the package.

    Custom

    Components that allow to create custom user-defined procedures and SQL queries in your workflows.

    Call Procedure

    Description

    This component executes an arbitrary CALL SQL statement.

    Use $a and $b as placeholders for the input table names (you can use one, both, or none of them), and $output as placeholder for the output table name.

    Aggregation

    Components to aggregate or disaggregate properties in your data in different ways.

    Count

    Description

    This component creates a new table with a single row and a 'count' column containing the row count of the input table

    Inputs

    Telco Signal Propagation Models

    Extension Package provided by CARTO

    This extension provides tools for telecommunications analysis, including path profile determination, signal propagation modelling, and obstacle identification.

    It can utilize raster data for terrain and clutter profiles, enabling accurate assessment of transmitter-receiver paths and signal characteristics in various environments.

    The components in this extension package leverage the functionality provided by the Telco module in the CARTO Analytics Toolbox for BigQuery. Check the module's for more detailed information about the analysis.

    Path Profile

    Description

    This component identifies the Path Profile between the transmitter (Tx) and receiver (Rx), including any obstacles along the propagation path. It utilizes tables containing information about buildings, vegetation, water (for clutter data), and the terrain elevation profile.

    Spatial Accessors

    Components used to obtain properties about a geography data type or a set of geographies.

    ST Geometry Type

    Description

    This component adds a column containing the geography type of the geography in each row of the input table.

    Inputs

    Source table [Table]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Group by

    Description

    This component creates a new table with aggregated values.

    It is required to convert spatial types to an allowed type by using 'ST_ASBINARY' before grouping by them.

    Inputs

    • Source table [Table]

    • Columns and methods for aggregation [String]

    • Columns to group by [String]

    Outputs

    • Result table [Table]

    BigQuery reference

    Snowflake reference

    Redshift reference

    Point Stats in Polygons

    Description

    This component adds new columns with statistics computed from the points of a second table that fall within each polygon.

    Inputs

    • Points table [Table]

    • Polygons table [Table]

    • Geo column in points table [Column]

    • Geo column in polygons table [Column]

    • Column to compute stats for in points table [Column]

    • Column with identifier in polygons table [Column]

    Outputs

    • Result table [Table]

    ST Dump

    Description

    This component explodes multi-part geographies into multiple single-part ones.

    The output table has the same structure as the input one, but the geo column only contains single part geographies. Rows with multi-part geographies are replaced by several rows with single-part geographies and identical values for the rest of columns.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Summarize

    This component creates a new table with aggregated values for the input whole table.

    Inputs

    • Source table [Table]

    • Aggregation: Use the UI of this component to select a list of aggregated properties that will be added to the result table.

    Outputs

    • Result table [Table]

    Unnest

    Description

    This component unnests the arrays contained an a column.

    Inputs

    • Source table [Table]

    • Column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Source table [Table]

  • Geo column [Column]

  • Outputs

    • Result table [Table]

    External links

    BigQuery reference

    PostgreSQL reference

    ST NumGeometries

    Description

    This component creates a new table with an additional column with the number of parts in each input geography.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Redshift reference

    PostgreSQL reference

    ST NumPoints

    Description

    This component creates a new table with an additional column with the number of points in each input geography.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Point N

    Description

    This component adds a new column with the point at the n-th position in the original input geography.

    The geography must be of type linestring.

    It will set the column value to null if the original geography is not a linestring, if it is empty, or if there is no vertex at the given index.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Point index [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST X / ST Y

    Description

    This component creates a new table with additional columns containing X and Y coordinates of the geographies in the input table. This only applies to the case of point geographies.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Inputs

    1. Tx locations: A table that contains points with the location of the transmitters.

    2. Rx locations: A table that contains points with the location of the receivers.

    3. Terrain profile: A table that contains geographies with the terrain profile.

    Settings

    • Buildings table FQN: (Optional) The fully qualified name of the table that contains buildings to be considered the propagation path calculation.

    • Vegetation table FQN: (Optional) The fully qualified name of the table that contains vegeatation to be considered the propagation path calculation.

    • Water table FQN: (Optional) The fully qualified name of the table that contains water bodies to be considered the propagation path calculation.

    • Frequency: The value of the operating frequency in GHz.

    • Intersect Fresnel zone: A flag whether to consider the line or the Fresnel zone for the calculation.

    • Terrain point sampling: A flag whether to calculate terrain points along the path.

    • Resolution of sampling for the terrain points: Resolution of sampling for the terrain points to be considered in the path profile.

    • Street Width calculation: A flag to Include street width in the path profile calculation.

    Outputs

    1. Output table: A table containing the path profiles.

    2. Output table details: A table with the details of the obstacles of the path profiles.

    Path Profile Raster

    Description

    This component determines the Path Profile between the transmitter (Tx) and receiver (Rx), identifying obstacles along the propagation path by using raster tables for clutter and terrain profiles.

    Inputs

    1. Tx locations: A table that contains points with the location of the transmitters.

    2. Rx locations: A table that contains points with the location of the receivers.

    3. Clutter: A raster table that contains data about the clutter.

    4. Terrain profile: A raster table with terrain profile data.

    Settings

    • Frequency: The value of the operating frequency in GHz.

    • Intersect Fresnel zone: A flag whether to consider the line or the Fresnel zone for the calculation.

    • Adjust the Rx height: A flag whether to adjust the rx height based on the clutter.

    • Intersect center: A flag whether to extract the pixel values from the Clutter raster by intersecting the pixel center or instead the pixel boundary.

    • Clutter raster band for height: (Optional) The band to be extracted from the raster as the height of the clutter.

    • Clutter raster band for type: (Optional) The band to be extracted from the raster as the type of the clutter. If empty then the type constant is used.

    • Terrain profile raster band for height: (Optional) The band to be extracted from the raster as the height of the terrain.

    Outputs

    1. Output table: A table containing the path profiles.

    2. Output table details: A table with the details of the obstacles of the path profiles.

    Close In

    Description

    This component implements the Close In empirical propagation model.

    Inputs

    • Path profiles: This input should be a table that contains path profiles obtained with either Path Profile or Path Profile Raster components.

    Settings

    • Frequency: The operating frequency in GHz.

    • Scenario: The environment scenario.

    • Number of data points: The number of data points used to compute $X sigma$.

    • Path Loss Exponent for LOS: The $PathlossExponent$ for the LOS case.

    • Path Loss Exponent for NLOS: The $PathlossExponent$ for the NLOS case.

    • Sigma LOS: The standard deviation of the zero-mean Gaussian random variable for the LOS case.

    • Sigma NLOS: The standard deviation of the zero-mean Gaussian random variable for the LOS case.

    Outputs

    • Output table: A table with the same schema as the input table plus a column that contains the result of the analysis.

    Extended HATA

    Description

    This component implements the Extended HATA empirical propagation model.

    Inputs

    1. Tx Locations: A table that contains points with the location of the transmitters.

    2. Rx Locations: A table that contains points with the location of the receivers.

    3. Path profiles: This input should be a table that contains path profiles obtained with either Path Profile or Path Profile Raster components.

    Settings

    • Frequency: The operating frequency in GHz.

    • Scenario: The environment scenario

    Outputs

    • Output table: A table with the path losses (dB).

    SQL reference
    The procedure that you are calling is responsible for creating the output table in the path stored in the $output variable.

    The following is a valid example of code to use in this component:

    CALL MY_PROCEDURE($a, 10, `$output`);

    Please take into account that $a, $b and $outputwill be parsed directly as they are. In some cases, the table names that work as input/output need to be quoted differently depending on the data warehouse. Users need to ensure the correct quoting, like `$output` or "$a".

    Due to Redshift limitations, calling stored procedures that have INOUT string parameters will cause an exception when running the workflow. Stored procedures with such parameters are not supported.

    Additional placeholders

    • $isStoredProcedure

    When using the Workflows API or workflows as MCP Tool, your workflow is executed as a stored procedure within your data warehouse. This changes how tables are handled.

    To manage this behavior, we provide the $isStoredProcedure placeholder. This will be replaced with true when the workflow is run as a stored procedure. This happens when running workflows via API.

    In stored procedures:

    • Table names are not fully qualified

    • Tables are expected to be temporary and session-scoped

    Your SQL code should account for this logic to avoid errors when running workflows in API mode.

    Here is a sample code for that (notice that no CALL is made in this case, since the component can run any arbitrary SQL code):

    Example:

    • $dryrun

    Workflows use a dry run mode to infer and propagate table schemas across components. When executing queries in this mode, you may want to differentiate logic from a standard run.

    To handle this, use the $dryrun placeholder, which is replaced by true when the workflow is being dry-run (e.g., when clicking "Validate schema").

    Example:

    Inputs

    • Source table [Table]

    • Source table [Table]

    • SQL CALL statement [String]

    Outputs

    • Result table [Table]

    Custom Python Code

    Description

    This component runs arbitrary Python code.

    Inputs

    1. Source table: Use df_ain your code to reference a dataframe that contains the data in the input. Use $a to reference the FQN of the input table in your code.

    2. Source table: Use df_bin your code to reference a dataframe that contains the data in the input. Use $b to reference the FQN of the input table in your code.

    3. Source table: Use df_cin your code to reference a dataframe that contains the data in the input. Use $c to reference the FQN of the input table in your code.

    Settings

    • Python code: Define your custom python code that will be executed by the component.

    Outputs

    • Output table: Use df_outputin your code to reference a dataframe that contains the data in the output. Use $output to reference the FQN of the output table in your code.

    Custom SQL Select

    Description

    This component executes an arbitrary SELECT SQL statement.

    A new table will be created from the result of the entered statement

    Use $a, $b and $c as placeholders for the input table names (you can use one, all, or none of them).

    The following is a valid example of code to use in this component, which will generate a new table with an added area column (based on a column named 'geom'), with only the rows corresponding to the five largest geometries:

    SELECT *, ST_AREA(geom) AS area FROM `$a` ORDER BY area LIMIT 5

    Please take into account that $a, $b and $cwill be parsed directly as they are. In some cases, they will need to be quoted differently depending on the data warehouse. Users need to ensure the correct quoting, like '$a' or "$a".

    When using Workflows' variables in your Custom SQL code, remember that they don't need to be wrapped into curly brackets like {{ @var_name}}. Variables can be used directly as @var_name inside your Custom SQL code.

    Please avoid trailing semi-colon ;in your Custom SQL code as it might cause unexpected behavior.

    Inputs

    • Source table [Table]

    • Source table [Table]

    • SQL SELECT statement [String]

    Outputs

    • Result table [Table]

    IF ($isStoredProcedure) THEN
      -- $output will be a simple table name (i.e mytable)
      CREATE TEMPORARY TABLE $output AS SELECT * FROM $a LIMIT 10;
    ELSE
      -- $output will be a fully qualified name (i.e database.schema.mytable)
      CREATE TABLE $output AS SELECT * FROM $a LIMIT 10;
    END IF
    IF ($dryrun) THEN
      -- During dry run: propagate schema so other components can recognize output
      CREATE TABLE `$output` AS SELECT * FROM `$a`;
    ELSE
      -- During normal run: actually create output tables
      CREATE TABLE `$output` AS SELECT * FROM `$a`;
    
      -- During normal run: write to a specific target table for post-processing
      CREATE OR REPLACE TABLE `project.schema.specific_table` AS SELECT * FROM `$a`;
    END IF;

    Google Earth Engine

    Extension Package provided by CARTO

    This extension provides powerful Google Earth Engine integration for spatial analysis.

    It includes components for summarizing raster data over polygons and efficiently processing large-scale vector data with Earth Engine imagery.

    These tools enable users to perform advanced geospatial analysis by combining BigQuery's data processing capabilities with Earth Engine's vast satellite imagery and geospatial datasets.

    These component make use of cloud functions deployed and maintained by CARTO on a GCP project.

    Usage of these functions is limited to 1000 rows/day and connection.

    These functions are available for the US BigQuery region.

    Please get in touch with us at if you need to deploy your own version of the functions.

    Summary Region

    Description

    Summarizes Google Earth Engine raster data over a polygon. Users specify an image, band, and reducer function for efficient spatial analysis.

    Inputs

    • Source table: This component is limited to an input of 10 rows with polygons. It's tipically used along with a input to define the regions that will be summarized.

    Settings

    • Geometry column: Select a column from the input table that contains polygons.

    • GEE Image ID: (Optional) Provide a Google Earth Engine image ID. Provider either this or a GEE Image expression.

    • GEE Image expression: (Optional) Provider a Google Earth Engine expression that returns a single image. Provider either this or a GEE Image ID.

    • Band: Specify which band from the image will be used for the reduce operation.

    Outputs

    • Output table: This component return a table with the same columns as the Source table, with an additional one named ee_summary_<reducer> that contains the value of the reduce operation.

    Summary Table

    Description

    Summarizes Google Earth Engine raster data over polygons on a BigQuery table. Users specify an image, band, and reducer function for efficient spatial analysis.

    Inputs

    • Source table: A table that contains polygons.

    Settings

    • Geometry column: Select a column from the input table that contains polygons.

    • GEE Image ID: (Optional) Provide a Google Earth Engine image ID. Provider either this or a GEE Image expression.

    • GEE Image expression: (Optional) Provider a Google Earth Engine expression that returns a single image. Provider either this or a GEE Image ID.

    • Band: Specify which band from the image will be used for the reduce operation.

    Outputs

    • Output table: This component return a table with the same columns as the Source table, with an additional one named ee_<reducer> that contains the value of the reduce operation.

    Get Elevation

    Description

    Retrieves and summarizes for specified polygons.

    Inputs

    • Source table: This component is limited to an input of 10 rows with polygons. It's tipically used along with a input to define the regions that will be summarized.

    Settings

    • Geometry column: Select a column from the input table that contains polygons.

    • Reducer: Select a function

    Outputs

    • Output table: This component return a table with the same columns as the Source table, with an additional one named ee_<reducer>_elevation that contains the value of the reduce operation.

    Get NDVI

    Description

    Retrieves and summarizes the Normalized Difference Vegetation Index (NDVI) obtained with NIR (B8) and RED (B4) bands for specified polygons and date ranges.

    Inputs

    • Source table: This component is limited to an input of 10 rows with polygons. It's tipically used along with a input to define the regions that will be summarized.

    Settings

    • Geometry column: Select a column from the input table that contains polygons.

    • Start date: Type a YYYY-MM-DD that defines the start date to consider for your analysis.

    • End date: Type a YYYY-MM-DD that defines the end date to consider for your analysis.

    Outputs

    • Output table: This component return a table with the same columns as the Source table, with an additional one named ee_ndvi that contains the mean NDVI value of the polygon.

    Get Precipitation

    Description

    Retrieves and summarizes for specified polygons and date ranges.

    Inputs

    • Source table: This component is limited to an input of 10 rows with polygons. It's tipically used along with a input to define the regions that will be summarized.

    Settings

    • Geometry column: Select a column from the input table that contains polygons.

    • Reducer: Select a function.

    • Start date: Type a YYYY-MM-DD that defines the start date to consider for your analysis.

    Outputs

    • Output table: This component return a table with the same columns as the Source table, with an additional one named ee_<reducer>_precipitation that contains the value of the reduce operation.

    Spatial Analysis

    Components to perform more advanced geospatial analytics leveraging the spatial properties and patterns in your data.

    K-Nearest Neighbors

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component performs a K-nearest neighbors analysis.

    Inputs

    • Points table [Table]

    • Id column [Column]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    ST Cluster DBSCAN

    Description

    This component performs a DB Scan clustering on the geographical coordinates. Given an input table and a geo column, it creates a new table with the same content as the input one and an additional column named 'cluster' containing the cluster id corresponding to each geography.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Search radius (m) [Number]

    Outputs

    • Result table [Table]

    External links

    ST Cluster K-Means

    This component requires installed in the chosen connection to build the workflow.

    Description

    This component performs a K-means clustering on the geographical coordinates. Given an input table and a geo column, it creates a new table with the same content as the input one and an additional column named 'cluster' containing the cluster id corresponding to each geography..

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Number of clusters [Number]

    Outputs

    • Result table [Table]

    External links

    ST Count Points in Polygons

    Description

    This component takes a points table and a polygons table. It creates a new table with the same content as the polygons table, with an additional column containing the number of points from the points table that fall within each polygon.

    Inputs

    • Points table [Table]

    • Polygons table [Table]

    • Geo column in points table [Column]

    Outputs

    • Result table [Table]

    ST Delaunay Polygons

    This component requires installed in the chosen connection to build the workflow.

    Description

    This component computes a Delaunay triangulation based in a input table with points.

    It creates a new table with a 'geom' column containing the triangulation geographies (lines or polygons)

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Return lines [Boolean]

    Outputs

    • Result table [Table]

    External links

    ST Voronoi

    This component requires installed in the chosen connection to build the workflow.

    Description

    This component computes a Voronoi tesellation based in a input table with points.

    It creates a new table with a 'geom' column containing the triangulation geographies (lines or polygons).

    The maximum number of points used to compute Voronoi diagrams is 300,000. This limit ensures efficient computation while maintaining accuracy in delineating regions based on proximity to specified points.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Return lines [Boolean]

    Outputs

    • Result table [Table]

    External links

    Google Environment APIs

    Extension Package provided by CARTO

    This extension provides powerful Google Environment APIs integration for spatial analysis.

    It includes components for integrating usage of Google Solar API, Air Quality API and Pollen API into geospatial analytics pipelines created with Workflows.

    These component make use of cloud functions deployed and maintained by CARTO on a GCP project.

    Usage of these functions is limited to 1000 rows/day per connection.

    These functions are available for the US BigQuery region.

  • Reducer: Select a reducer function

  • Scale in meters: Specify a size in meters to define the scale of your operation.

  • Reducer: Select a reducer function

  • Scale in meters: Specify a size in meters to define the scale of your operation.

  • End date: Type a YYYY-MM-DD that defines the end date to consider for your analysis
    [email protected]
    Draw Custom Features
    elevation data from Google Earth Engine
    Draw Custom Features
    reducer
    from Google Earth Engine data
    Draw Custom Features
    precipitation data from Google Earth Engine
    Draw Custom Features
    reducer

    Number of nearest neighbors [Number]

  • Maximum distance: Defines the maximum distance where neighbors will be considered.

  • Minimum number of geographies [Number]

    Geo column in polygons table [Column]

  • Column with identifier in polygons table [Column]

  • BigQuery reference
    BigQuery reference
    the CARTO Analytics Toolbox
    BigQuery reference
    the CARTO Analytics Toolbox
    BigQuery reference
    the CARTO Analytics Toolbox
    BigQuery reference
    Please get in touch with us at
    if you need to deploy your own version of the functions.

    Solar API Building Insights

    Description

    This component retrieves data from the Building Insights endpoint in Google Solar API for a set of points in a table. It will return data from buildings that intersects with the points in the input table. Find more information about Solar API here.

    Inputs

    • Source table: A table that contains points that will be used to get data from Solar API

    Settings

    • Geography column: A column in the the input table that contains the points to be used to get data from Solar API.

    • Google Maps Platform API Key: A Google Maps Platform API Key with access to Solar API

    • Required quality: The minimum quality level allowed in the results. No result with lower quality than this will be returned. Not specifying this is equivalent to restricting to HIGH quality only.

    • Fields from response: A list of fields to be included in the JSON response. Each field should follow the JSON path of the , and the result will preserve the same structure. Find a complete reference of available values . Example: imageryDate, solarPotential.panelCapacityWatts, solarPotential.buildingStats.areaMeters2

    Output

    • Result table: A table with the same columns as the source table plus a column that contains a JSON object with the requested properties.

    Air Quality API Current Conditions

    Description

    This component retrieves data from the Current Conditions endpoint in Google Air Quality API for a set of points in a table. Find more information about Air Quality API here.

    Inputs

    • Source table: A table that contains points that will be used to get data from Air Quality API

    Settings

    • Geography column: A column in the the input table that contains the points to be used to get data from Solar API.

    • Google Maps Platform API Key: A Google Maps Platform API Key with access to Air Quality API

    • Universal Air Quality Index: Optional. Check this option to include the Universal Air Quality Index in the response.

    • Local Air Quality Index: Optional. Check this option to include a Local Air Quality Index in the response, based on each point location. Checking this option automatically adds the LOCAL_AQI extra computation.

    • Additional computations: Optional. Add a list of to be added to the result. Example: HEALTH_RECOMMENDATIONS, DOMINANT_POLLUTANT_CONCENTRATION

    • Language Code: Optional. Type a to be used in the response. Example: es for Spanish.

    • Universal AQI color palette: Optional. Select a color palette for the Universal Air Quality Index. Only applies if Universal AQI is requested.

    • Custom Local AQI: Optional. Type a JSON object that defines a custom local AQI. Only applies if Local AQI is requested. Find a complete reference of custom local AQIs . Example: {"regionCode": "us", "aqi": "usa_epa_nowcast"}\

    Output

    • Result table: A table with the same columns as the source table plus a column that contains a JSON object with the response. Find a reference of the expected response here.\

    Air Quality API Forecast

    Description

    This component retrieves data from the Forecast endpoint in Google Air Quality API for a set of points in a table. Find more information about Air Quality API here.

    Inputs

    • Source table: A table that contains points that will be used to get data from Air Quality API

    Settings

    • Geography column: A column in the the input table that contains the points to be used to get data from Solar API.

    • Google Maps Platform API Key: A Google Maps Platform API Key with access to Air Quality API.

    • Range Type: Select 'Period' or 'Date time' to define the type of forecast to be obtained from the API.

    • Start Time: Defines a start time. Only applies if 'Period' is selected.

    • End Time: Defines an end time. Only applies if 'Period' is selected.

    • Date Time: Defines a specific timestamp to be used for the forescast. Only applies if 'Date time' is selected.

    • Universal Air Quality Index: Optional. Check this option to include the Universal Air Quality Index in the response.

    • Local Air Quality Index: Optional. Check this option to include a Local Air Quality Index in the response, based on each point location. Checking this option automatically adds the LOCAL_AQI extra computation.

    • Additional computations: Optional. Add a list of to be added to the result. Example: HEALTH_RECOMMENDATIONS, DOMINANT_POLLUTANT_CONCENTRATION

    • Language Code: Optional. Type a to be used in the response. Example: es for Spanish.

    • Universal AQI color palette: Optional. Select a color palette for the Universal Air Quality Index. Only applies if Universal AQI is requested.

    • Custom Local AQI: Optional. Type a JSON object that defines a custom local AQI. Only applies if Local AQI is requested. Find a complete reference of custom local AQIs . Example: {"regionCode": "us", "aqi": "usa_epa_nowcast"}

    Output

    • Result table: A table with the same columns as the source table plus a column that contains a JSON object with the response. Find a reference of the expected response here.\

    Air Quality API History

    Description

    This component retrieves data from the History endpoint in Google Air Quality API for a set of points in a table. Find more information about Air Quality API here.

    Inputs

    • Source table: A table that contains points that will be used to get data from Air Quality API

    Settings

    • Geography column: A column in the the input table that contains the points to be used to get data from Solar API.

    • Google Maps Platform API Key: A Google Maps Platform API Key with access to Air Quality API

    • Range Type: Select 'Period', 'Date time', or 'Hours' to define the type of forecast to be obtained from the API.

    • Start Time: Defines a start time. Only applies if 'Period' is selected.

    • End Time: Defines an end time. Only applies if 'Period' is selected.

    • Date Time: Defines a specific timestamp to be used for the forescast. Only applies if 'Date time' is selected.

    • Hours: Defines a number of hours starting from the current time to the get the forecast. Only applies if 'Hours' is selected.

    • Universal Air Quality Index: Optional. Check this option to include the Universal Air Quality Index in the response.

    • Local Air Quality Index: Optional. Check this option to include a Local Air Quality Index in the response, based on each point location. Checking this option automatically adds the LOCAL_AQI extra computation.

    • Additional computations: Optional. Add a list of to be added to the result. Example: HEALTH_RECOMMENDATIONS, DOMINANT_POLLUTANT_CONCENTRATION

    • Language Code: Optional. Type a to be used in the response. Example: es for Spanish.

    • Universal AQI color palette: Optional. Select a color palette for the Universal Air Quality Index. Only applies if Universal AQI is requested.

    • Custom Local AQI: Optional. Type a JSON object that defines a custom local AQI. Only applies if Local AQI is requested. Find a complete reference of custom local AQIs . Example: {"regionCode": "us", "aqi": "usa_epa_nowcast"}

    Output

    • Result table: A table with the same columns as the source table plus a column that contains a JSON object with the response. Find a reference of the expected response here.

    Pollen API Forecast

    Description

    This component retrieves data from the Forecast endpoint in Google Pollen API for a set of points in a table.

    Inputs

    • Source table: A table that contains points that will be used to get data from Pollen API

    Settings

    • Geography column: A column in the the input table that contains the points to be used to get data from Solar API.

    • Google Maps Platform API Key: A Google Maps Platform API Key with access to Pollen API

    • Days: Set a number of days (up to 4) to get a Pollen forecast.

    • Language Code: Optional. Type a to be used in the response. Example: es for Spanish.

    • Include Plants Description: Check this option to include a in the response.

    Output

    • Result table: A table with the same columns as the source table plus a column that contains a JSON object with the response. Find a reference of the expected response here.\

    [email protected]

    Parsers

    Components to process data from an external format, such as WKT or GeoJSON, into a table format including the processed geographies/geometries in a column when applicable.

    Draw Custom Features

    Description

    This component allows drawing new custom geospatial features.

    By clicking on "Draw features" you will be able to use a point and click UI to define custom features (points, lines or polygons) that will be used to generate a GeoJSON string.

    Inputs

    • GeoJSON string [String]

    Outputs

    • Result table [Table]

    ST as GeoJSON

    Description

    This component generates a new column with the GeoJSON representation of geographies in another column.

    Inputs

    • Source table [Table]

    • Column with geographies definition [Column]

    Outputs

    • Result table [Table]

    External links

    ST as Text

    Description

    This component generates a new column with the WKT representation of geographies in another column.

    Inputs

    • Source table [Table]

    • Column with geographies definition [Column]

    Outputs

    • Result table [Table]

    External links

    ST Geog from Text

    Description

    This component creates a new column with geometries created from the values in another column. This column can contain geometries defined in both WKT or GeoJSON format.

    Inputs

    • Source table [Table]

    • Text column with geography definition [Column]

    Outputs

    • Result table [Table]

    External links

    ST GeogPoint

    Description

    This component creates a new table with an additional column with point geographies. Geographies are created based on columns containing latitude and longitude values.

    The new column containing the points will be named geom. If there is already a column with that name in the input table, the component will return an error. Please rename any columns named geom from the input before connecting this component.

    Inputs

    • Source table [Table]

    • Latitude column [Column]

    • Longitude column [Column]

    Outputs

    • Result table [Table]

    External links

    Table from GeoJSON

    Description

    This creates a new table from a GeoJSON string.

    By clicking on "Draw features" you will be able to use a point and click UI to define custom features (points, lines or polygons) that will be used to generate the GeoJSON string.

    Inputs

    • GeoJSON string [String]

    Outputs

    • Result table [Table]

    WKB from GeoJSON

    Description

    This component creates a new column with geometries in WKB format created from the values in another column.

    Inputs

    • Source table

    Settings

    • Column with geometries definition: Expects a String column that contains GeoJSON defintion of geometries.

    Outputs

    • Output table: This component return a table with the same columns as the input, plus an additional binary column that contians WKB geometries

    WKB from Text

    Description

    This component creates a new column with geometries in WKB format created from the values in another column.

    Inputs

    • Source table

    Settings

    • Column with geometries definition: Expects a String column that contains WKT definition of geometries.

    Outputs

    • Output table: This component return a table with the same columns as the input, plus an additional binary column that contians WKB geometries

    Joins

    Components to join two or more streams of data.

    Cross Join

    Description

    This component performs a cross join.

    Inputs

    • Main table [Table]

    • Secondary table [Table]

    Outputs

    • Result table [Table]

    External links

    Join

    Description

    This component combines rows from two separate tables based on a related column between them.

    Inputs

    • Main table [Table]

    • Secondary table [Table]

    • Column on main table [Column]

    Outputs

    • Result table [Table]

    External links

    Join (extended)

    Description

    This component combines two input tables based on common fields between the two tables. You can also Join two tables based on record position.

    Inputs

    • Left table [Table]

    • Right table [Table]

    • Join By [Selection]

    Outputs

    • Table with unjoined rows from left table [Table]

    • Table with unjoined rows from right table [Table]

    • Joined table [Table]

    Spatial Join

    Description

    This component performs a spatial join between two tables, based on a spatial predicate.

    Inputs

    • Main table [Table]

    • Secondary table [Table]

    • Geo column on main table [Column]

    Outputs

    • Result table [Table]

    Spatial Match

    Description

    This component establishes the spatial relationship (contains, intersects, touches, etc.) between two sets of spatial objects. The component accepts a set of spatial objects from the Target Input and a set of spatial objects from the Universe Input. At least one input stream should contain Polygon type spatial objects.

    Inputs

    • Targets [Table]

    • Universe [Table]

    • Target Spatial Object Field [Column]

    Outputs

    • Result table [Table]

    ST Difference

    Description

    This component creates a new table containing the difference between the geographies of the input table and the combined boundary of the geographies in another table. That is, the part of the original geography that doesn't fall within the boundary of the ancillary table.

    Geographies in the secondary table must be of type polygon.

    Inputs

    • Source table [Table]

    • Polygons table [Table]

    • Geo column in source table [Column]

    Outputs

    • Result table [Table]

    External links

    ST Intersection

    Description

    This component creates a new table containing the intersection between the geographies of the input table and the geographies in another table.

    Columns from both tables are copied to the output one, and a new one with the intersection is added.

    Inputs

    • Source table [Table]

    • Polygons table [Table]

    • Geo column in source table [Column]

    Outputs

    • Result table [Table]

    External links

    Union All

    Description

    This component generates a new table that is the union of several input ones.

    The input tables must have the same schema.

    Inputs

    • Tables [Table]

    Outputs

    • Result table [Table]

    External links

    Workflows best practices

    As you will soon see Workflows can be simple with only a few steps, up to large and complex with many steps, sections, and outputs. So what are the best ways to manage your workflows and ensure that they are useful, readable, and manageable? Below our our tips to making Workflows work for you.

    Explore your data

    Before you get started and at each step of the process make sure you explore your data to see what is in the dataset and the columns and data types. This is important because:

    Snowflake ML

    Extension Package provided by CARTO

    The Snowflake ML extension package for CARTO Workflows includes a variety of components that enable users to integrate machine learning workflows with geospatial data. These components allow for creating, evaluating, explaining, forecasting, and managing ML models directly within CARTO Workflows, utilizing Snowflake ML’s capabilities.

    Get Model by Name

    Description

    This component imports a pre-trained model into the current workflow. If the name provided is not fully qualified, it will default to the connection's default database and the PUBLIC schema. The component assumes that the provided FQN points to an existing Snowflake ML model.

    Settings
    • Model's FQN: Fully qualified name for the model to be imported.

    Outputs

    • Output table: This component generates a single-row table with the FQN of the imported model.

    Create Classification Model

    Description

    This component trains a classification model on the provided input data. If the name provided is not fully qualified, it will default to the connection's default database and the PUBLIC schema.

    For more details, please refer to the SNOWFLAKE.ML.CLASSIFICATION official documentation in Snowflake.

    Inputs

    • Input table: A data table that is used as input for the model creation.

    Settings

    • Model's FQN: Fully qualified name for the model to be saved as.

    • ID Column: Column containing a unique identifier per sample.

    • Target Column: Column to be used as label in the training data.

    • Data Split: whether to perform or not a train/evaluation split on the data. Choosing SPLIT is required to use the Evaluate component with the resulting model.

    • Test Fraction: Fraction of the data to reserve for evaluation. A 0.2 will reserve 20% of the data for evaluation. Only applies if the Data Split is SPLIT.

    Outputs

    • Output table: This component generates a single-row table with the FQN of the imported model.

    Create Forecasting Model

    Description

    This component trains a forecasting model on the provided input data. If the name provided is not fully qualified, it will default to the connection's default database and the PUBLIC schema.

    For more details, please refer to the SNOWFLAKE.ML.FORECAST official documentation in Snowflake.

    Inputs

    • Input table: A data table that is used as input for the model creation.

    Settings

    • Model's FQN: Fully qualified name for the model to be saved as.

    • Time Series ID Column: Column containing a unique identifier per time series.

    • Timestamp Column: Column containing the series' timestamp in DATE or DATETIME format.

    • Target Column: Column to be used as target in the training data.

    • Consider exogenous variables: whether to consider exogenous variables for forecasting. If checked, the future values for the variables must be provided when forecasting. All variables in the input will be considered except the specified time series ID, timestamp and target column.

    • Method: which method to use when fitting the model. It can be best or fast.

    • Sample Frequency: the frequency of the time series. It can be auto or manual.

    • Period: number of units to define the sampling frequency. Only applies when the Sample Frequency has been set to manual.

    • Time Unit: time unit used to define the frequency. It can be seconds, minutes, hours, days, weeks, months, quarters, or years. Only applies when Sampling Frequency has been set to manual.

    • Aggregation (categorical): aggregation function used for categorical columns if needed due to the sampling frequency. It can be mode, first, or last.

    • Aggregation (numeric): aggregation function used for numeric columns if needed due to the sampling frequency. It can be mean, median, mode, min, max, sum, first, or last.

    • Aggregation (target): aggregation function used for the target column if needed due to the sampling frequency. It can be mean, median, mode, min, max, sum, first, or last.

    Outputs

    • Output table: This component generates a single-row table with the FQN of the imported model.

    Predict

    Description

    This component uses a pre-trained classification model (using Get Model by Name or Create Classification Model components) to perform predictions on some given input data.

    For more details, please refer to the !PREDICT function official documentation in Snowflake.

    Inputs

    • Model table: the pre-trained classification model.

    • Input table: A data table that is used as input for inference.

    Settings

    • Keep input columns: Whether to keep all the input columns in the output table.

    • ID Column: Column containing a unique identifier per sample. Only applies when Keep input columns is set to false.

    Outputs

    • Output table: The model's predictions.

    Forecast

    Description

    This component uses a pre-trained forecast model (using Get Model by Name or Create Forecasting Model components) to perform predictions on some given input data.

    For more details, please refer to the !FORECAST function official documentation in Snowflake.

    Inputs

    • Model table: the pre-trained classification model.

    • Input table: A data table that is used as input for inference. Only needed if the model has been trained using exogenous variables.

    Settings

    • Consider exogenous variables: whether the model was trained to use exogenous variables or not.

    • Number of periods: number of periods to forecast per time series. Only applies if Consider exogenous variables is false.

    • Time Series ID Column: Column containing a unique identifier per time series. Only applies if Consider exogenous variables is true.

    • Timestamp Column: Column containing the series' timestamp in DATE or DATETIME format. Only applies if Consider exogenous variables is true.

    • Prediction Interval: Expected confidence of the prediction interval.

    • Keep input columns: Whether to keep all the input columns in the output table.

    Outputs

    • Output table: The model's predictions.

    Evaluate Classification

    Description

    This component returns some evaluation metrics for a pre-trained classification model using Get Model by Name or Create Classification Model components.

    For more details, please refer to the !SHOW_EVALUATION_METRICS and !SHOW_GLOBAL_EVALUATION_METRICS functions official documentation in Snowflake.

    Inputs

    • Model table: the pre-trained classification model.

    Settings

    • Class level metrics: Whether to obtain the per-class evaluation metrics or the overall evaluation metrics.

    Outputs

    • Output table: The model's evaluation metrics.

    Evaluate Forecast

    Description

    This component returns some evaluation metrics for a pre-trained forecast model using Get Model by Name or Create Forecasting Model components.

    For more details, please refer to the !SHOW_EVALUATION_METRICS function official documentation in Snowflake.

    Inputs

    • Model table: the pre-trained classification model.

    • Input table (optional): additional out-of-sample data to compute the metrics on.

    Settings

    • Compute metrics on additional out-of-sample data: When checked, the component will compute cross-validation metrics on additional out-of-sample data. Otherwise, the component will return the metrics generated at training time.

    • Time Series ID Column: Column containing a unique identifier per time series. Only applies when the metrics are being computed on additional out-of-sample data.

    • Timestamp Column: Column containing the series' timestamp in DATE or DATETIME format. Only applies when the metrics are being computed on additional out-of-sample data.

    • Target Column: Column to use as label in the input data. Only applies when the metrics are being computed on additional out-of-sample data.

    • Prediction Interval: Expected confidence of the prediction interval. Only applies when the metrics are being computed on additional out-of-sample data.

    Outputs

    • Output table: The model's evaluation metrics.

    Feature Importance (Classification)

    This component displays the feature importances per variable of a pre-trained classification model.

    For more details, please refer to the Snowflake's !SHOW_FEATURE_IMPORTANCE function documentation.

    Inputs

    • Model table: the pre-trained classification model.

    Outputs

    • Output table: a table with the feature importance per variable.

    Feature Importance (Forecast)

    This component displays the feature importances per variable of a pre-trained forecast model.

    For more details, please refer to the Snowflake's !SHOW_FEATURE_IMPORTANCE function documentation.

    Inputs

    • Model table: the pre-trained forecast model.

    Outputs

    • Output table: a table with the feature importance per variable.

    full response object
    here
    additional computations
    IETF standard language code
    here
    additional computations
    IETF standard language code
    here
    additional computations
    IETF standard language code
    here
    IETF standard language code
    plant's description
    BigQuery reference
    Snowflake reference
    Redshift reference
    PostgreSQL reference
    BigQuery reference
    Snowflake reference
    Redshift reference
    PostgreSQL reference
    BigQuery reference
    Redshift reference
    BigQuery reference
    Redshift reference

    Column on secondary table [Column]

  • Join type [Selection]

  • Column in Left Table [Column]

  • Column in Right Table [Column]

  • Geo column on secondary table [Column]

  • Spatial predicate [Selection]

  • Universe Spatial Object Field [Column]

  • Type of Spatial Match [Selection]

  • Output Intersection Object (Intersects Only) [Boolean]

  • Geo column in polygons table [Column]

    Geo column in secondary table [Column]

    BigQuery reference
    BigQuery reference
    BigQuery reference
    BigQuery reference
    BigQuery reference
    You may have some data that is in the wrong data type, such as a string that should be a number
  • You may be missing a column

  • You might have too many columns that you don’t need

  • You might have data nested in arrays or JSON

  • You might have null or missing data that could cause issues later

  • There are certainly other instances that you might encounter in your data but understanding the structure and limits of your data early on will help prevent issues later in your workflow.

    Prune columns

    Similar to above we highly suggest removing any columns you do not need in your final analysis and to do the same throughout your workflow. This is due to the fact that most components will add new columns. This is because we never want to modify the original data from the previous step since we cannot presume what columns you do or do not want to keep.With that said as the Worflow progresses, more and more columns will be added. Because of this at various stages we recommend using the SELECT component to pick the specific columns you need at various stages to keep your data clean and easy to read.

    Casting to other data types

    Many times you may need to cast your data to other data types. For example a postal code could be stored as a numeric value rather than a text value. Make sure these are appropriately cast into:

    • The data format you want to keep

    • The data format required by the components you need to use

    You can use the Cast component in Workflows to do this otherwise you can cast using the Select component by selecting all columns using ‘*’ and the using the SQL CAST syntax as follows:*, CAST(zipcode AS TEXT)

    Preparing dates and timestamps

    Similar to the point above it is important to prepare dates or timestamps in the source data warehouse prior to running Workflows. Some functions only accept dates and others accept timestamps and ensuring you have the right data will help make things smoother.

    Proper labeling

    Using the labeling tools in Workflows can help you keep your Workflow organized and help others understand what is happening in each step. There are two different approaches to labeling your Workflows:

    • Label small sections individually across the Workflow canvas

    • Label larger parts of the Workflow and then add smaller sections as necessary

    Doing this while you are constructing your Workflow will help when you revisit your work or share work with another user.

    Multiple Workflows for large operations

    As the size of your workflow increases there may come a point when you need to break your Workflow into smaller sections. While it is not impossible to make large Workflows, depending on the complexity it may in fact be easier to have Workflows broken into different steps. This helps in cases such as:

    • If the source data changes and is inconsistent this may break a part of your Workflow. Having smaller Workflows can help you identify the issue faster.

    • Smaller Workflows are likely to run faster

    Optimize result tables

    It is usual that once you have designed a workflow, you would like to optimize the result table creating a cluster over one or more columns.

    Cluster and index optimization is the way to physically and/or logically locate records with similar relations (cluster column) to reduce the number of fetches or access time with prefetching technics.

    These advanced options depend on destination data warehouse and are available for any component that generates one o more output tables.

    Users can optimize their data by following these simple steps:

    1. Create a model with one or more component generating an output table and select this component e.g. as in the model below

    1. Fill all component parameters. This is necessary to allow final output schema generation that allow to fill up the dropdown menu of the optimization column.

    2. Select the column to be used for the optimization. The mechanism will depend on the data warehouse:

      • For BigQuery, Cluster by

      • For Snowflake, two settings: Search optimization column and Cluster by

      • For Redshift, Cluster by

      • For PostgreSQL, Index column and Cluster by index

    BigQuery allow select Cluster by column that have the following types:

    • Geography

    • Geometry

    • Boolean

    • Number

    • String

    • Date

    • Datetime

    • Time

    • Timestamp

    More info about BigQuery clustering .

    Snowflake allow select Search optimization column that have the following types:

    • Geography

    • Geometry

    This restriction is because ST_GEOHASH based ordering is used. More info about Snowflake ST_GEOHASH and the applied general search optimization .

    Whilst Cluster by

    Redshift allow select Cluster by column as SORTKEY that have the following types:

    • Boolean

    • Number

    • String

    More info about Redshift SORTKEY optimization

    Postgres allow select Index column that have the following types:

    • Geography

    • Geometry

    • Boolean

    Data Enrichment

    Components to enrich your data with variables from other data sources. These components work for simple features and spatial indexes grids.

    Enrich H3 Grid

    Description

    This component enriches a target table with data from a source. Enriching here means adding columns with aggregated data from the source that matches the target geographies.

    • The target, (which is the upper input connection of this component), must have a column that contains H3 indices, which will be used to join with the source.

    • The source (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geography column.

    For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

    • DATAOBS_ENRICH_GRID if the source is a Data Observatory subscription

    • ENRICH_GRID otherwise

    Inputs

    • Target geo column: it's the column of the target that will be used to join with the source and select the rows that will be aggregated for each target row.

    • Source geo column: (only necessary for non-DO sources) is the column of the source that will join with the target.

    • Variables: this allows selecting the data from the source that will be aggregated and added to the target.

    For spatially smoothed enrichments that take into account the surrounding cells, use the following input parameters:

    • Kring size: size of the k-ring where the decay function will be applied. This value can be 0, in which case no k-ring will be computed and the decay function won't be applied.

    • Decay function: decay function to aggregate and smooth the data. Supported values are uniform, inverse, inverse_square and exponential.

    Outputs

    • Result table [Table]

    Enrich Points

    Description

    This component enriches a target table with data from a source. Enriching here means adding columns with aggregated data from the source that matches (intersects) the target geographies.

    • The target, (which is the upper input connection of this component), must have a geo column, which will be used to intersect with the source.

    • The source (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geo column.

    For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

    • DATAOBS_ENRICH_POINTS if the source is a Data Observatory subscription

    • ENRICH_POINTS otherwise

    Inputs

    • Target geo column: it's the column of the target that will be used to intersect with the source and select the rows that will be aggregated for each target row.

    • Source geo column: (only necessary for non-DO sources) is the column of the source that will intersect with the target.

    • Variables: this allows selecting the data from the source that will be aggregated and added to the target.

    Outputs

    • Result table [Table]

    Enrich Polygons

    Description

    This component enriches a target table with data from a source. Enriching here means adding columns with aggregated data from the source that matches (intersects) the target geographies.

    • The target, (which is the upper input connection of this component), must have a geo column, which will be used to intersect with the source.

    • The source (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geo column.

    For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

    • DATAOBS_ENRICH_POLYGONS if the source is a Data Observatory subscription

    • ENRICH_POLYGONS otherwise

    Inputs

    • Target geo column: it's the column of the target that will be used to intersect with the source and select the rows that will be aggregated for each target row.

    • Source geo column: (only necessary for non-DO sources) is the column of the source that will intersect with the target.

    • Variables: this allows selecting the data from the source that will be aggregated and added to the target.

    Outputs

    • Result table [Table]

    Enrich Polygons with Weights

    Description

    This component uses a data source (either a table or a DO subscription) to enrich another target table using weights to control the enrichment.

    Inputs

    • Target table to be enriched

    • Source table with data for the enrichment data

    • Weights table with data to weight the enrichment

    Settings

    • Target polygons geo column: Select the column from the target table that contains a valid geography.

    • Source table geo column: Select the column from the source table that contains a valid geography.

    • Variables: Select a list of variables and aggregation method from the source table to be used to enrich the target table. Valid aggregation methods are:

    💡 The component will return an error if all variables selected are aggregated as MIN or MAX, since the result wouldn't actually be weighted.

    • Weights geo column: Select the column from the weights table that contains a valid geography.

    • Weights variable: Select one variable and aggregation operation to be used as weight for the enrichment.

    If your weight variables are included in the same table as the source variables, you can connect the same node to both inputs in this component.

    When the source for the enrichment is a standard table, the weights source can't be a DO subscription.

    The same limitation applies when the source for the enrichment is a DO subscription; the weights source can't be a standard table.

    Outputs

    • Output table with the following schema

      • All columns from Target

      • A column from each variable in 'Variables', named like 'name_sum', 'name_avg', 'name_max' depending on the original column name and the aggregation method.

    Enrich Quadbin Grid

    Description

    This component enriches a target table with data from a source. Enriching here means adding columns with aggregated data from the source that matches the target geographies.

    • The target, (which is the upper input connection of this component), must have a column that contains Quadbin indices, which will be used to join with the source.

    • The source (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geography column.

    For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

    • DATAOBS_ENRICH_GRID if the source is a Data Observatory subscription

    • ENRICH_GRID otherwise

    Inputs

    • Target geo column: it's the column of the target that will be used to join with the source and select the rows that will be aggregated for each target row.

    • Source geo column: (only necessary for non-DO sources) is the column of the source that will join with the target.

    • Variables: this allows selecting the data from the source that will be aggregated and added to the target.

    For spatially smoothed enrichments that take into account the surrounding cells, use the following input parameters:

    • Kring size: size of the k-ring where the decay function will be applied. This value can be 0, in which case no k-ring will be computed and the decay function won't be applied.

    • Decay function: decay function to aggregate and smooth the data. Supported values are uniform, inverse, inverse_square and exponential.

    Outputs

    • Result table [Table]

    Tileset Creation

    Components to create tilesets out of vector and spatial index data.

    These components require the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Valid table names for tileset's destination:

    Provided FQNs need to follow convention on each data warehouse. Invalid names (according to each data warehouse constraints) will make the component fail. Check the documentation for each.

    Create H3 Aggregation Tileset

    Description

    This component creates an aggregation tileset from an input containing H3 cell indexes. On each level of the tileset, cells will be aggregated into their parents, along with their associated properties.

    Inputs

    • Input table: This component expects a table that has at least one column containing H3 indexes.

    Settings

    • Tileset destination: select a destination that is available on your connection to store the resulting tileset table.

    • H3 column: The column in he input table that contains the H3 index.

    • H3 input data resolution: It is required to manually enter the actual resolution of the indexes contained in your input.

    • Minimum H3 tileset resolution: Your tileset will be generated at a range of different H3 resolutions. This setting specifies the minimum resolution in that range. Take into account that H3 resolutions don't match exactly the zoom levels in a web map.

    Taking the above into consideration, bear this formula in mind:

    Maximum H3 tileset resolution + Aggregation resolution <= H3 input data resolution

    If your settings don't follow this rule, you will get an error when generating the tileset.

    • Properties: Select a property from the input columns and an aggregation method to add aggregated properties to the tileset. The aggregation will be computed for every cell on each resolution level, considering the child cells in the input's data.

    • Add custom expression: Check this option to define additional custom properties. The custom properties are specified on a JSON object that needs to follow this format: { "new_prop_name": { "formula": "COUNT(*)", "type": "Number" }, "most_common_category": {

    Advanced options

    • Metadata: Use this setting (as a JSON input) to define "name", "description" and "legend" metata objects to be included in the TileJSON. Other custom objects with arbitrary names can also be added, resulting in a "extra_metadata" object in the TileJSON.

    Output

    This component doesn't have an output that can be connected to other nodes downstream. The generated tileset will be stored on the specified destination table.

    Create Point Aggregation Tileset

    Description

    This component creates a tileset from an input containing Points. It will aggregate points close to each other and their properties.

    Inputs

    • Input table: This component expects a table that has at least one geospatial column containing points of type GEOMETRY or GEOGRAPHY column depending on the connection provider.

    Settings

    • Tileset destination: select a destination that is available on your connection to store the resulting tileset table.

    • Geo column: A column in the input table that contains the Points.

    • Minimum zoom level: Your tileset will be generated at a range of different zoom levels. This setting specifies the lower limit in that range.

    • Maximum zoom level: Your tileset will be generated at a range of different zoom levels. This setting specifies the higher limit in that range.

    Advanced options

    • Minimum zoom column: A column that controls the minimum zoom level where that specific point should be considered to be included in the tile.

    • Maximum zoom column: A column that controls the maximum zoom level where that specific point should be considered to be included in the tile.

    • Maximum features per tile: The maximum number of features that can be included in the tileset. This setting is applied after the aggregation is computed. When the limit is reached, points are going to be dropped from the tile.

    Output

    This component doesn't have an output that can be connected to other nodes downstream. The generated tileset will be stored on the specified destination table.

    Create Quadbin Aggregation Tileset

    Description

    This component creates an aggregation tileset from an input containing Quadbin cell indexes. On each level of the tileset, cells will be aggregated into their parents, along with their associated properties.

    Inputs

    • Input table: This component expects a table that has at least one column containing H3 indexes.

    Settings

    • Tileset destination: select a destination that is available on your connection to store the resulting tileset table.

    • Quadbin column: The column int he input table that contains the Quadbin index.

    • Quadbin input data resolution: It is required to manually enter the actual resolution of the indexes contained in your input.

    • Minimum Quadbin tileset resolution: Your tileset will be generated at a range of different Quadbin resolutions. This setting specifies the minimum resolution in that range. Take into account that Quadbin resolutions levels are equivalent to zoom levels on a CARTO map.

    Taking the above into consideration, bear this formula in mind:

    Maximum Quadbin tileset resolution + Aggregation resolution <= Quadbin input data resolution

    If your settings don't follow this rule, you will get an error when generating the tileset.

    • Properties: Select a property from the input columns and an aggregation method to add aggregated properties to the tileset. The aggregation will be computed for every cell on each resolution level, considering the child cells in the input's data.

    • Add custom expression: Check this option to define additional custom properties. The custom properties are specified on a JSON object that needs to follow this format: { "new_prop_name": { "formula": "COUNT(*)", "type": "Number" }, "most_common_category": {

    Advanced options

    • Metadata: Use this setting (as a JSON input) to define "name", "description" and "legend" metata objects to be included in the TileJSON. Other custom objects with arbitrary names can also be added, resulting in a "extra_metadata" object in the TileJSON.

    Output

    This component doesn't have an output that can be connected to other nodes downstream. The generated tileset will be stored on the specified destination table.

    Create Vector Tileset

    Description

    This component creates a tileset from an input containing Point, Lines or Polygons.

    Inputs

    • Input table: This component expects a table that has at least one geospatial column of type GEOMETRY or GEOGRAPHY depending on the connection provider.

    Settings

    • Tileset destination: select a destination that is available on your connection to store the resulting tileset table.

    • Geo column: A column in the input table that contains the Points.

    • Minimum zoom level: Your tileset will be generated at a range of different zoom levels. This setting specifies the lower limit in that range.

    • Maximum zoom level: Your tileset will be generated at a range of different zoom levels. This setting specifies the higher limit in that range.

    Advanced options

    • Minimum zoom column: A column that controls the minimum zoom level where that specific point should be considered to be included in the tile.

    • Maximum zoom column: A column that controls the maximum zoom level where that specific point should be considered to be included in the tile.

    • Maximum features per tile: The maximum number of features that can be included in the tileset. This setting is applied after the aggregation is computed. When the limit is reached, points are going to be dropped from the tile.

    Output

    This component doesn't have an output that can be connected to other nodes downstream. The generated tileset will be stored on the specified destination table.

    Data Sources

    In the Sources explorer you can find all the data sources available via the data warehouse connection that you have selected when creating the workflow. Here you can learn more about how to create your data warehouse connections in CARTO.

    There are three different types of data source: Connection data, Data Observatory and Files

    Connection data

    Note

    In case you are using a service account to connect to Google BigQuery, note that it must have at least BigQuery Data Editor (roles/bigquery.dataEditor) permissions in order to edit schemas, create and edit tables, and get information regarding jobs.

    It is important to note that currently each workflow can only be associated with one single data warehouse connection and that it is defined at the time of creating the workflow (without being able to modify it afterwards).

    Through "Connection data" you will have access to tables in schemas/datasets in your data warehouse. These sources can be used in workflows by just adding them to the canvas.

    While it is supported to insert or update or data in your table sources, modifying the schema of these tables (such as adding or removing columns, or changing data types) may cause unexpected behavior or errors during execution, as the schema won't be automatically updated in the workflow.

    If you expect schema changes in your tables, it is recommended to use a and explicitly specify the column names to reflect those changes throughout the workflow.

    CARTO Data Warehouse in non-US regions

    If your CARTO organization has been provisioned in a cloud region outside of the US (Europe-West, Asia-Northeast or Australia-Southeast), you will be able to select between two different options in your Sources panel in Workflows:

    • Your organization's region. You will only have access to your own 'shared' and 'private' datasets within 'organization data'.

    • CARTO's region (US). This one will grant to access to a different 'organization data' provisioned in the US region and also to CARTO's 'demo data'.

    Files

    If you want to use data that is not available via the data warehouse connection used in the workflow, you always have the possibility to drag and drop files to your workflow at any time which will get imported to your data warehouse via the established connection. Please find the list of supported file formats and size limits .

    The files added by drag and drop, or selected from your browser will be stored and remain accessible from the same Workflow.

    You can star connections, database/projects/schemas and tables to quickly access them later. For more information, see .

    Analytics on Embeddings

    An extension package for CARTO Workflows that provides comprehensive embedding analytics capabilities. This package enables users to analyze, cluster, compare, and visualize high-dimensional vector embeddings derived from spatial data, satellite imagery, or any other geospatial data source.

    All columns in the input table (except the ID column and date column, if specified) will be automatically used as embedding vector dimensions.

    All components in this extension package work with embeddings resulting from using the components in the extension package. They also work with other embeddings as inputs, as long as each embedding feature is stored on a column (as opposed to an array of vectors).

    PostgreSQL

  • Maximum H3 tileset resolution: Your tileset will be generated at a range of different H3 resolutions. This setting specifies the maximum resolution in that range. Take into account that H3 resolutions don't match exactly the zoom levels in a web map.

  • Aggregation resolution: On each resolution level in the range set by the minimum and maximum tileset resolution, cells will be aggregated into their parent cells, along with their properties. This setting controls the granularity of that aggregation. For example, let's take a dataset that contains H3 cells of resolution 14. When generating the lower resolution levels of the tileset (bigger cells), the original cells are smaller than a pixel on your screen and can't be visualized. In this case, assuming that 'Aggregation resolution' is set to 5:

    • at resolution 4, the original cells are going to be aggregated into their level 9 parents (4+5);

    • at resolution 5, original cells will be aggregated into their level 10 parents (5+5)s;

    • at resolution 6, original cells will be aggregated into their level 11 parents (6+5).

    • In general, cells will be aggregated into a resolution level that is the result of: current resolution level + aggregation resolution.

  • "formula": "APPROX_TOP_COUNT(category, 1)[OFFSET(0)].value",
    "type": "String"
    },
    ...
    }
    The expression in
    formula
    needs to follow your data warehouse's SQL syntax.
  • Aggregation resolution: On each zoom level in the range set with the options above, points will be aggregated into a grid, along with their properties. This setting controls the granularity of that aggregation. In other words, it controls the the size of the grid cells used to compute the aggregation. For example, assuming the Aggregation resolution is set to 8:

    • at zoom level 6, the points will be aggregated into a grid where cells are equivalent to tiles of zoom level 14 (6+8).

    • At zoom level 7, the points will be aggregated into a grid where cells are equivalent to tiles of zoom level 15 (7+8).

    • At zoom level 8, the points will be aggregated into a grid where cells are equivalent to tiles of zoom level 16 (8+8).

    • In general, the size of the grid's cells is going to be determined by current zoom level + aggregation resolution.

  • Aggregation placement: Since points are aggregated into a grid, there are different options for the resulting placement:

    • cell-centroid: The resulting feature in the tileset is a point a the cell's centroid.

    • cell: The resulting feature in the tileset is the cell's boundary.

    • features-any: The resulting feature in the tileset is a random point from the ones that are being aggregated on that cell. This helps avoiding a gridded effect on the visualization

    • features-centroid: The resulting feature in the tileset is a point that represents the centroid of the points aggregated on that cell. This helps avoiding a gridded effect, while keeping a sense of the distribution of points within the cell.

  • Properties: Select a property from the input columns and an aggregation method to add aggregated properties to the tileset. The aggregation will be computed for every cell on each resolution level, considering the child cells in the input's data.

  • Add custom expression: Check this option to define additional custom properties. The custom properties are specified on a JSON object that needs to follow this format: { "new_prop_name": { "formula": "COUNT(*)", "type": "Number" }, "most_common_category": { "formula": "APPROX_TOP_COUNT(category, 1)[OFFSET(0)].value", "type": "String" }, ... } The expression in formula needs to follow your data warehouse's SQL syntax.

  • This setting is not available in all data warehouses.

  • Maximum kilobytes per tile: The maximum resulting size in kilobytes. This limit is based on an estimation that only considers the size of the geometries and not the included properties. When the limit is reached, points are going to be dropped from the tile.

    • This setting is not available in all data warehouses.

  • Tile feature order: When any of the limits above is reached, points are dropped from the tile. This setting expects a SQL expression to define the order of priority on the resulting points. The higher in the ordering, the less likely to be dropped.

  • Metadata: Use this setting (as a JSON input) to define "name", "description" and "legend" metata objects to be included in the TileJSON. Other custom objects with arbitrary names can also be added, resulting in a "extra_metadata" object in the TileJSON.

  • Maximum Quadbin tileset resolution: Your tileset will be generated at a range of different Quadbin resolutions. This setting specifies the maximum resolution in that range.

  • Aggregation resolution: On each resolution level in the range set by the minimum and maximum tileset resolution, cells will be aggregated into their parent cells, along with their properties. This setting controls the granularity of that aggregation. For example, let's take a dataset that contains Quadbin cells of resolution 14. When generating the lower resolution levels of the tileset (bigger cells), the original cells are smaller than a pixel on your screen and can't be visualized. In this case, assuming that 'Aggregation resolution' is set to 5:

    • at resolution 4, the original cells are going to be aggregated into their level 9 parents (4+5);

    • at resolution 5, original cells will be aggregated into their level 10 parents (5+5)s;

    • at resolution 6, original cells will be aggregated into their level 11 parents (6+5).

    • In general, cells will be aggregated into a resolution level that is the result of: current resolution level + aggregation resolution.

  • "formula": "APPROX_TOP_COUNT(category, 1)[OFFSET(0)].value",
    "type": "String"
    },
    ...
    }
    The expression in
    formula
    needs to follow your data warehouse's SQL syntax.
  • Properties: Select a list of properties from the input columns.

  • This setting is not available in all data warehouses.

  • Maximum kilobytes per tile: The maximum resulting size in kilobytes. This limit is based on an estimation that only considers the size of the geometries and not the included properties. When the limit is reached, points are going to be dropped from the tile.

    • This setting is not available in all data warehouses.

  • Tile feature order: When any of the limits above is reached, points are dropped from the tile. This setting expects a SQL expression to define the order of priority on the resulting points. The higher in the ordering, the less likely to be dropped.

  • Generate this tileset as a custom boundary for CARTO + deck.gl: Check this option if your tileset is going to be used as a boundary. Learn more about Boundaries.

  • Metadata: Use this setting (as a JSON input) to define "name", "description" and "legend" metata objects to be included in the TileJSON. Other custom objects with arbitrary names can also be added, resulting in a "extra_metadata" object in the TileJSON.

  • BigQuery
    Snowflake
    Databricks
    Redshift
    Custom SQL component
    here
    Starring items for quick access
    column can be selected among that with the following types:
    • Boolean

    • Number

    • String

    More info about Snowflake Cluster optimization here.

    .

    Number

  • String

  • on which create GIST index. More about Indexes here.

    Whilst Cluster by index column can be selected among any types. More info about Posgtgres Cluster optimization here.

    here
    here
    here
    here
  • For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their variable slug;

  • for other sources they are the columns in the source table.

  • Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods. At the moment only numeric variables are supported.

  • For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their variable slug;

  • for other sources they are the columns in the source table.

  • Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods. At the moment only numeric variables are supported.

  • For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their variable slug;

  • for other sources they are the columns in the source table.

  • Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods. At the moment only numeric variables are supported.

    SUM
    : It assumes the aggregated variable is an
    (e.g. population). Accordingly, the value corresponding to the feature intersected is weighted by the fraction of the intersected weight variable.
  • MIN: It assumes the aggregated variable is an intensive property (e.g. temperature, population density). Thus, the value is not altered by the weight variable.

  • MAX: It assumes the aggregated variable is an intensive property (e.g. temperature, population density). Thus, the value is not altered by the weight variable.

  • AVG: It assumes the aggregated variable is an intensive property (e.g. temperature, population density). A weighted average is computed, using the value of the intersected weight variable as weights.

  • COUNT It computes the number of features that contain the enrichment variable and are intersected by the input geography.

  • For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their variable slug;

  • for other sources they are the columns in the source table.

  • Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods. At the moment only numeric variables are supported.

    extensive property

    Change Detection

    Description

    Analyzes temporal changes in embeddings by comparing vector representations across different time periods. It calculates distance metrics between consecutive time points for each location, enabling the identification of significant changes in spatial patterns, environmental conditions, or other time-varying characteristics encoded in the embeddings.

    This component is particularly useful for monitoring temporal evolution, detecting anomalies, tracking environmental changes, and identifying locations that have undergone significant transformations over time.

    Inputs

    • Embedding table: Table containing the embeddings with associated timestamps and location identifiers.

    Settings

    • ID column: Column containing unique identifiers for each embedding. This column is used to track changes for the same entity across different time periods.

    • Date column: Column containing the timestamps or dates associated with each embedding. This column is used to order the data chronologically and identify consecutive time periods for comparison.

    • Distance metric: Distance function used to compare vectors across time (default: Cosine).

      • Cosine: Measures the alignment between two vectors based on the cosine of the angle between them. Values closer to 0 indicate high similarity in direction, while values closer to 2 indicate larger change. This metric is ideal when you want to focus on the direction of change rather than magnitude.

      • Euclidean: Measures the straight-line distance between vectors in the embedding space. Values closer to 0 indicate high similarity, while larger values indicate greater changes. This metric considers both direction and magnitude of change, making it suitable when the absolute values of the embeddings are meaningful.

    • Comparison method: Determines how embeddings are compared across time (default: Rolling pairs).

      • Rolling pairs: Compares the embeddings of each timestamp with those of the previous time period, allowing you to track incremental changes over time. This mode is ideal for detecting gradual changes and trends.

      • Cross pairs: Compares the embeddings of each timestamp with all remaining time periods, providing a broader view of how a location evolves relative to any other time point. This mode is useful for identifying significant transformations and outliers.

    • Reduce embedding dimensionality: Whether to use Principal Component Analysis (PCA) to reduce the dimensionality of the vector embeddings before change detection (default: false). Enabling this can improve performance and reduce noise by focusing on the most significant patterns in your data.

    • PCA explained variance: Variance threshold for PCA dimensionality reduction (0.01-0.99, default: 0.9). This determines how much of the original variance should be preserved. Higher values (closer to 1.0) retain more information but may include more noise, while lower values focus on the most important patterns, with loss of information.

    Outputs

    • Output table: The change detection table with the following columns:

      • <geoid>: Original identifier for each location/entity.

      • <date>_from: Starting timestamp for the change period.

      • <date>_to: Ending timestamp for the change period.

      • distance: Calculated distance metric between the embeddings at the two time points.

      • change_score: Measure that quantifies the degree of change between two embedding vectors for the same location across different time periods. 0 means no change.

    • Metrics table: The metrics table with the following columns:

      • total_explained_variance_ratio: The proportion of variance in the original data that is explained by the PCA dimensionality reduction. This metric is particularly useful when PCA dimensionality reduction is applied, showing how much of the original information is preserved in the reduced dimensions.

      • number_of_principal_components: The number of principal components extracted from the PCA.

    The component requires complete temporal coverage for each location - every geoid must have embedding values for each date in the dataset. This ensures accurate change detection by maintaining consistent temporal baselines for comparison.

    Clustering

    Description

    Uses BigQuery ML's K-means algorithm to group embeddings into k clusters, identifying locations with similar spatial patterns or features. It optionally runs dimensionality reduction via Principal Component Analysis to improve clustering performance and reduce computational cost by focusing on the most significant patterns.

    This component is particularly useful for discovering natural groupings in spatial data, segmenting areas based on their characteristics, and identifying outliers or distinct regions.

    Inputs

    • Embedding table: Table containing the embeddings to cluster.

    Settings

    • ID column: Column containing unique identifiers for each embedding. This column is used to maintain the relationship between input embeddings and their assigned clusters in the output.

    • Analysis type: Determines whether the analysis is spatial or spatio-temporal (default: Spatial).

      • Spatial: Analyzes embeddings at a single point in time or without temporal considerations. Use this for static spatial pattern analysis.

      • Spatio-temporal: Analyzes embeddings across multiple time periods, enabling detection of temporal changes in spatial patterns. Requires a date column to be specified.

    • Date column: Column containing temporal information for spatio-temporal analysis (required when Analysis type is Spatio-temporal). This column should contain date, datetime, timestamp, or numeric time values that identify when each embedding was captured.

    • Number of clusters: Number of clusters to create (2-100, default: 5). This parameter determines how many groups the algorithm will divide your embeddings into.

    • Distance type: Distance metric used for clustering calculations (default: EUCLIDEAN).

      • EUCLIDEAN: Measures straight-line distance between points in the embedding space. Best for embeddings where all dimensions have similar scales and importance.

      • COSINE: Measures the angle between vectors, ignoring magnitude. Ideal for embeddings where the direction matters more than the absolute values, such as normalized feature vectors.

    • Reduce embedding dimensionality: Whether to apply Principal Component Analysis (PCA) before clustering (default: false). Enabling this can improve performance and reduce noise by focusing on the most significant patterns in your data.

    • PCA explained variance: Variance threshold for PCA dimensionality reduction (0.01-0.99, default: 0.9). This determines how much of the original variance should be preserved. Higher values (closer to 1.0) retain more information but may include more noise, while lower values focus on the most important patterns, with loss of information.

    • Maximum iterations: Maximum number of iterations for the K-means algorithm (1-1000, default: 20). The algorithm will stop after this many iterations even if it hasn't converged. Increase this value for complex datasets that may need more iterations to find optimal clusters.

    • Early stopping: Whether to stop the algorithm early if it converges before reaching the maximum iterations (default: true). Enabling this improves performance by stopping as soon as cluster assignments stabilize, but may miss optimal solutions in some cases.

    • Minimum relative progress: Threshold for early stopping convergence detection (0.001-0.1, default: 0.01). The algorithm stops when the improvement between iterations falls below this threshold. Lower values allow more precise convergence but may require more iterations.

    Outputs

    • Output table: The clustering table with the following columns:

      • <geoid>: Original identifier for each embedding.

      • cluster_id: Assigned cluster ID (1 to k).

      • distance_to_centroid: Distance from the embedding to its cluster centroid.

    • Metrics table: The metrics table with the following columns:

      • davies_bouldin_index: A clustering quality metric that measures the ratio of within-cluster scatter to between-cluster separation. Lower values indicate better clustering quality, with values closer to 0 representing well-separated, compact clusters.

      • mean_squared_distance: The average squared distance from each point to its assigned cluster centroid. This metric helps assess how tightly packed the clusters are - lower values indicate more compact clusters.

    Similarity Search

    Description

    Identifies regions with comparable spatial or contextual characteristics by leveraging spatially aware embeddings. It compares the embeddings of one or more reference locations against a set of search location embeddings to identify areas that share similar spatial patterns or features. This component supports any embedding representation that encodes spatial relationships and can be applied across a wide range of domains, including remote sensing, environmental monitoring, and spatial data analysis.

    This component is particularly useful for finding locations with similar characteristics, identifying patterns across different regions, discovering areas that match specific reference conditions, and conducting spatial similarity analysis at scale.

    Inputs

    • Reference location(s): Table containing vector(s) representing the baseline data or items to compare against.

    • Search locations: Table containing vectors for which similarity to the reference set will be computed.

    Settings

    • Reference location(s) ID column: Column containing the IDs of the reference vector(s). This column is used to maintain the relationship between input embeddings and their similarity scores in the output.

    • Search location(s) ID column: Column containing the IDs of the search vector(s). This column is used to maintain the relationship between input embeddings and their similarity scores in the output.

    • Distance metric: Distance function used to compare vectors (default: Cosine).

      • Cosine: Measures the alignment between two vectors based on the cosine of the angle between them. Values closer to 0 indicate high similarity in direction, while values closer to 2 indicate high dissimilarity. This metric is ideal when you want to focus on the direction of change rather than magnitude.

      • Euclidean: Measures the straight-line distance between vectors in the embedding space. Values closer to 0 indicate high similarity, while larger values indicate greater dissimilarity. This metric considers both direction and magnitude, making it suitable when the absolute values of the embeddings are meaningful.

    • Aggregate across reference locations: Whether the similarity results from multiple reference locations should be aggregated into a single output (default: true). When enabled, the component computes similarity maps for each reference location and returns their mean or maximum, producing one consolidated similarity result.

    • Aggregate function: The function to use when aggregating similarity results from multiple reference locations (default: AVG).

      • AVG: Returns the average similarity score across all reference locations.

      • MAX: Returns the maximum similarity score across all reference locations.

    • Return top-k similar locations: Whether to return only the top-k most similar locations (default: true). When enabled, only the most similar locations are returned, improving performance and focusing on the most relevant results.

    • Top-k metric: Number of most similar locations to return for each reference vector (1-infinity, default: 10). This parameter determines how many of the most similar locations will be included in the output.

    • Reduce embedding dimensionality: Whether to use Principal Component Analysis (PCA) to reduce the dimensionality of the vector embeddings (default: false). Enabling this can improve performance and reduce noise by focusing on the most significant patterns in your data.

    • PCA explained variance: Variance threshold for PCA dimensionality reduction (0.01-0.99, default: 0.9). This determines how much of the original variance should be preserved. Higher values (closer to 1.0) retain more information but may include more noise, while lower values focus on the most important patterns, with loss of information.

    • Create vector index: Whether to create a vector index to optimize the similarity search at scale (default: false, advanced option). When enabled, uses Approximate Nearest Neighbor search to improve performance with the trade-off of returning more approximate results. Without a vector index, uses brute force search to measure distance for every record.

    Outputs

    • Output table: The similarity table with the following columns:

      • reference_<id>: Identifier of the reference location.

      • search_<id>: Identifier of the search location.

      • distance: Distance between the reference and search embeddings.

      • similarity_score: Calculated similarity metric between the reference and search embeddings. When Cosine distance is computed, the similarity score is based on the cosine similarity between the two vectors - ranging from 0 (opposite direction) to 1 (identical direction), where higher values indicate greater similarity. When Euclidean distance is computed, the similarity score compares the distance for each search location to the distance from the mean vector data. The score will be positive if and only if the search location is more similar to the reference than the mean vector data. The larger the score, the greater the similarity.

    • Metrics table: The metrics table with the following columns:

      • total_explained_variance_ratio: The proportion of variance in the original data that is explained by the PCA dimensionality reduction. This metric is particularly useful when PCA dimensionality reduction is applied, showing how much of the original information is preserved in the reduced dimensions.

      • number_of_principal_components: The number of principal components extracted from the PCA.

    The component requires that both reference and search tables contain compatible embedding vectors with the same dimensionality. The similarity calculation will be performed for all combinations of reference and search locations, or for the top-k most similar pairs when the top-k option is enabled.

    Visualization

    Description

    Transforms high-dimensional embeddings into RGB color representations for intuitive visualization. It supports two dimensionality reduction approaches: Principal Component Analysis (PCA) for automatic feature extraction, or manual selection of specific embedding dimensions to map directly to red, green, and blue color channels.

    This component is particularly useful for creating visual representations of spatial patterns, identifying clusters through color similarity, and generating color-coded maps that reveal underlying data structures in your embeddings.

    Inputs

    • Embedding table: Table containing the embeddings to visualize.

    Settings

    • ID column: Column containing unique identifiers for each embedding. This column is used to maintain the relationship between input embeddings and their color assignments in the output.

    • Analysis type: Determines whether the analysis is spatial or spatio-temporal (default: Spatial).

      • Spatial: Analyzes embeddings at a single point in time or without temporal considerations. Use this for static spatial pattern visualization.

      • Spatio-temporal: Analyzes embeddings across multiple time periods, enabling visualization of temporal changes in spatial patterns. Requires a date column to be specified.

    • Date column: Column containing temporal information for spatio-temporal analysis (required when Analysis type is Spatio-temporal). This column should contain date, datetime, timestamp, or numeric time values that identify when each embedding was captured.

    • Dimensionality reduction technique: Method for reducing embeddings to 3 dimensions for RGB visualization (default: PCA).

      • PCA: Automatically extracts the three most significant principal components from your embeddings and maps them to RGB channels. This approach preserves the maximum variance in your data and is ideal when you want to discover the most important patterns automatically.

      • Manual: Allows you to manually select three specific embedding dimensions to map directly to red, green, and blue channels. Use this when you have domain knowledge about which features are most important for your visualization.

    • Red channel: Column containing values for the red color channel (required when using Manual reduction mode). Select the embedding dimension that should control the red intensity in your visualization.

    • Green channel: Column containing values for the green color channel (required when using Manual reduction mode). Select the embedding dimension that should control the green intensity in your visualization.

    • Blue channel: Column containing values for the blue color channel (required when using Manual reduction mode). Select the embedding dimension that should control the blue intensity in your visualization.

    Outputs

    • Output table: The image table with the following columns:

      • <geoid>: Original identifier for each embedding.

      • r_channel: Normalized red channel value (0-255).

      • g_channel: Normalized green channel value (0-255).

      • b_channel: Normalized blue channel value (0-255).

      • hex_color: Hexadecimal color representation (e.g., #FF5733) for easy use in mapping applications.

    • Metrics table: The metrics table with the following columns:

      • total_explained_variance_ratio: The proportion of variance in the original data that is explained by the dimensionality reduction. When using PCA, this indicates how much information is preserved in the visualization. Values closer to 1.0 indicate that the visualization captures most of the original data's structure.

      • luminance_variance: The variance of the luminance values across all visualized embeddings. Luminance is calculated using the standard RGB-to-luminance conversion formula (0.2989xR + 0.5870xG + 0.1140xB) and represents the perceived brightness of each color. Higher variance indicates greater diversity in brightness levels across your visualization, while lower variance suggests more uniform brightness distribution.

    Geospatial Foundation Models

    Territory Planning

    An extension package for CARTO Workflows that provides comprehensive territory planning and location allocation optimization capabilities. This package enables users to solve complex spatial optimization problems by creating balanced territorial boundaries, or finding optimal facility locations, allocating demand points to facilities, and managing territorial constraints while minimizing costs or maximizing coverage.

    Facilities Preparation

    Description

    Processes and formats facilities data for use in a Location Allocation analysis.

    Inputs

    • Candidate Facilities input table: The source table containing candidate facility data. These represent facilities that could be opened or activated based on the allocation strategy.

    • Required Facilities input table: optional The source table containing required facility data. These are pre-selected locations that are always active, regardless of optimization results.

    • Competitor Facilities input table: optional The source table containing competitors data. These are used to model competition and influence demand allocation or market share in the optimization process.

    Facility IDs must be unique in the sense that the overall set of facilities (candidate + required + competitor) cannot have duplicated IDs.

    Settings

    • Candidate facilities ID column: The column containing the candidate facility unique IDs.

    • Candidate facilities geometry column: The column containing the candidate facility point locations.

    • Use required facilities: Whether to consider mandatory or pre-selected facilities. Requires a Required Facilities input table to be connected. If enabled:

    Outputs

    • Output table: The table with the prepared facilities data, which contains the following columns:

      • facility_id: The selected ID column.

      • geom: The selected Geometry column.

    Demand Points Preparation

    Description

    Processes and formats demand points (customers, delivery addresses, clinics...) data for use in a Location Allocation analysis.

    Inputs

    • Demand points input table: The source table containing demand points data.

    Settings

    • ID column: The column containing the unique identifier for each demand point.

    • Geometry column: The column containing each demand point location.

    • Use demand: Whether each demand point should have an associated demand value. If enabled:

      • Demand column

    When no demand is set, the optimization determines which facility serves each demand point. Otherwise, the optimization determines how much demand is served by each facility for each demand point.

    Outputs

    • Output table: The table with the prepared demand points data, which contains the following columns:

      • dpoint_id: The selected ID column.

      • geom: The selected Geometry column.

    Cost Matrix Preparation

    Description

    Processes and formats cost data for use in a Location Allocation analysis.

    Inputs

    • Costs input table: The source table containing the cost of assigning each facility to each demand point (such as travel distace, travel time...).

    Settings

    • Facilities ID column: The column containing the facility IDs.

    • Demand points ID column: The column containing the demand point IDs.

    • Costs column: The column containing the cost of assigning each facility to each demand point.

    • Apply transformation to cost: Whether to apply a transformation to the cost. If enabled:

    You can use native workflows components to compute costs based on distance or time, such as or .

    Outputs

    • Output table: The table with the prepared costs data, which contains the following columns:

      • facility_id: The ID of the facilities.

      • dpoint_id: The ID of the demand points.

    Constraints Definition

    Description

    Processes and formats constraint-specific additional data for use in a Location Allocation analysis.

    As of now, this component is used to prepare data for the following constraints:

    • compatibility: Use required/forbidden assignments. It contains the following columns:

      • facility_id

    Inputs

    • Required input table: optional The source table containing required facility-demand point pairs (for every row k, facility k must serve demand point k).

    • Forbidden input table: optional The source table containing banned facility-demand point pairs (for every row k, facility k cannot serve demand point k).

    Settings

    • Consider required facility-demand point pairs: Whether to consider mandatory relationships between facilities and demand points. Requires a Required input table to be connected. If enabled:

      • Facilities ID column: The column containing the facility IDs.

      • Demand points ID column: The column containing the demand point IDs.

    Outputs

    • Output table: The table with the metadata of the prepared contraints data, which contains the following columns:

      • constraint_id: The ID of the prepared data.

      • constraint_description: The description of the prepared data.

    \

    Location Allocation

    Description

    Performs Location Allocation optimization, enabling users to determine the best locations for facilities (e.g., warehouses, stores, hospitals, service centers) and assign demand points to those facilities. The goal is to minimize total/maximum cost or maximize coverage, while respecting a variety of customizable constraints.

    This component uses under the hood for solving the optimization problem.

    It is mandatory to use the Prepare components in advance, to ensure correct formatting of input sources.

    Inputs

    • Facilities input table: A table with facilities data, as produced by the .

    • Demand points input table: A table with demand points data, as produced by the .

    • Costs matrix input table: A table with costs data, as produced by the .

    • Constraints input table

    Settings

    • Optimization strategy: The strategy of the solver to solve optimization. It can be one of the following:

      • Minimize total cost: Minimizes the total sum of costs between assigned demand points and open facilities.

      • Minimize maximum cost: Minimizes the highest individual cost between any assigned facility-demand point pairs.

    Outputs

    • Output table: The table with the optimized assignments:

      • facility_id: The ID of the open facilities.

      • dpoint_id: The ID of the demand point.

    Territory Balancing

    Description

    A component that splits a gridified area into a set of optimal, continuous territories, ensuring balance according to a specified metric while maintaining internal similarity within each territory.

    If the input grid contains disconnected elements—such as isolated cells or cell clusters—the algorithm applies a proxy mechanism to establish connectivity across all regions. However, the presence of numerous unconnected components may lead to unexpected behavior or suboptimal results.

    For more details, refer to the official documentation.

    Inputs

    • Input table: The table with the gridify-enriched Area of Interest (AOI).

    Settings

    • Index column: Unique identifier for each spatial cell. Must be an H3 or Quadbin index.

    • Demand column: The business KPI used for balancing territories. This must be an extensive variable (i.e., one that adds up across space, such as number of points of sale, total population, etc.).

    • Similarity features(s): Optional variable(s) used to measure similarity between neighboring cells. The similarity score influences the grouping of the cells by penalizing large differences between them:

    Outputs

    • Output table: The table with the specified territories that will contain the index column, the territory ID (cluster) and, if specified, the remaining input columns.

    Input / Output

    Components that allow you to import or export data to/from CARTO and your connected cloud data warehouse.

    Export to Bucket

    Description

    This components exports a table to a storage bucket.

    total_explained_variance_ratio: The proportion of variance in the original data that is explained by the clustering solution. This metric is particularly useful when PCA dimensionality reduction is applied, showing how much of the original information is preserved in the reduced dimensions.

  • number_of_principal_components: The number of principal components extracted from the PCA.

  • Required facilities ID column: The column containing the required facility unique IDs.
  • Required facilities geometry column: The column containing the required facility point locations.

  • Use competitor facilities: Whether to consider competitors. Requires a Competitor Facilities input table to be connected. If enabled:

    • Competitor facilities ID column: The column containing the competitor facility unique IDs.

    • Competitor facilities geometry column: The column containing the competitor facility point locations.

  • Use facility groups: Whether to use facility groups (i.e. region, brand) to limit the number of facilities to be opened.

    • Candidate facilities group ID column: The column containing the ID representing the group each candidate facility belongs to.

    • Required facilities group ID column: The column containing the ID representing the group each required facility belongs to. Requires a Required Facilities input table to be connected.

  • Enforce minimum usage of facilities: Whether to use the minimum usage of a facility.

    • Candidate facilities minimum usage column: The column containing the minimum usage of each candidate facility.

    • Required facilities minimum usage column: The column containing the minimum usage of each required facility. Requires a Required Facilities input table to be connected.

  • Use facility maximum capacities: Whether to use the maximum capacity of a facility.

    • Candidate facilities maximum capacity column: The column containing the maximum capacity of each candidate facility.

    • Required facilities maximum capacity column: The column containing the maximum capacity of each required facility. Requires a Required Facilities input table to be connected.

  • Use facility costs of opening: Whether to use the fixed costs of opening a specific facility.

    • Candidate facilities cost of open column: The column containing the cost of opening each candidate facility.

    • Required facilities cost of open column: The column containing the cost of opening each required facility. Requires a Required Facilities input table to be connected.

  • facility_type: The type of facility: candidate (0), required (1) or competitor (2).

  • group_id: The selected Group ID column. If no group ID column is provided, it will be filled up with NULL values.

  • min_usage: The selected Minimum usage column. If no minimum usage column is provided, it will be filled up with NULL values.

  • max_capacity: The selected Maximum capacity column. If no maximum capacity column is provided, it will be filled up with NULL values.

  • cost_of_open: The selected Cost of open column. If no cost of open column is provided, it will be filled up with NULL values.

  • : The column containing demand values for each demand point.

    demand: The selected Demand column. If no demand column is provided, it will be filled up with NULL values.

  • Tranformation function: The function to apply: linear, power, exponential.

  • Tranformation parameter: The parameter value to use. Only applicable for power and exponential transformations.

  • cost. The resulting cost.
    : The column containing the facility IDs.
  • dpoint_id: The column containing the demand point IDs.

  • compatibility: The column containing the status of the relationship between each specified facility and demand point: required (1) or banned (0). If the facility-demand point pair is not specified in this table, the Locatioon Allocation algorithm will find the optimal assignment itself.

  • Consider forbidden facility-demand point pairs: Whether to consider prohibited relationships between facilities and demand points. Requires an Forbidden input table to be connected. If enabled:
    • Facilities ID column: The column containing the facility IDs.

    • Demand points ID column: The column containing the demand point IDs.

    table_name. The name of the prepared table containning the necessary data to use constraint_id in the Location Allocation problem.

    :
    optional
    A table with additional constraints metadata, as output by
    .

    Maximize coverage: Maximizes the number of demand points within the specified coverage radius of each open facility.

  • Facilities coverage radius: Maximum distance or time (e.g., in kilometers or minutes) that a facility can cover. Used for the "Maximize coverage" strategy.

  • Use required facilities: Whether to consider mandatory facilities. If enabled, these facilities are guaranteed to be opened.

  • Use competitor facilities: Whether to account for competitor locations that may influence demand point assignment.

  • Cost threshold: The distance under which a demand point is considered fully influenced by a competitor, and thus removed from consideration.

  • Use facility costs of opening: If enabled, the fixed cost of opening each facility is added to the objective function.

  • Use maximum budget: Whether to consider a maximum budget for opening facilities. When enabled, the sum of opening costs of selected facilities must not exceed the specified budget limit.

  • Budget limit: Maximum budget allowed for opening facilities (in cost of open units). This parameter is only available when "Use maximum budget" is enabled.

  • Limit the total number of facilities: If enabled, restricts the number of facilities that can be opened.

  • Maximum number of facilities: The maximum number of facilities allowed to open if the above option is enabled.

  • Limit the total number of facilities per group: If enabled, restricts the number of facilities that can be opened per defined group (e.g., region, brand).

  • Maximum number of facilities per group: The maximum number of facilities allowed to open per group if the above option is enabled.

  • Use demand: If enabled, ensures that the full demand of each demand point is met across assigned facilities.

  • Enforce minimum usage of facilities: If enabled, restricts facilities from being opened unless they meet a minimum utilization level.

  • Use facility maximum capacities: If enabled, restricts the total amount of demand a facility can serve.

  • Use required/forbidden assignments: Allows enforcing or excluding specific demand point-facility assignments (e.g., a facility must serve a specific demand point). Requires the Constraints input table to be connected.

  • Relative Gap Tolerance (%): The maximum allowed relative optimality gap between the best known solution and the theoretical optimum.

  • Maximum Solving Time (sec): The maximum allowed time taken for the solver to solve the problem (in seconds).

  • demand: The amount of demand allocated from the facility to the demand point.
  • geom: The line geometry connecting the facility with the assigned demand point.

  • Metrics table: The table with solver's metrics and statistics:

    • objective_value: Final value of the objective function (e.g., total cost or coverage).

    • gap: The relative optimality gap between the best known solution and the theoretical optimum.

    • solving_time: Time taken for the solver to solve the problem (in seconds).

    • termination_reason: Reason why the solver stopped (e.g., optimal, infeasible...).

    • stats: Additional statistics such as open facilities ID, total demand satisfied or percentage of demand covered.

  • If multiple variables are selected, their values are normalized and averaged.

  • If a single variable is selected, it is normalized to the [0, 1] range.

  • If no similarity features are provided, only the demand column is used for partitioning, and similarity is not considered.

  • Number of territories: The desired number of partitions (i.e., territories) to divide the Area of Interest (AOI) into.

  • Keep input columns: Whether to include all input columns in the output table.

  • Distance
    Create Routing Matrix
    Google's OR-Tools
    Facilities Preparation
    Demand Points Preparation
    Cost Matrix Preparation
    METIS
    Constraints Definition
    For Self-Hosted deployments, a bucket owned by the customer needs to be configured. Please refer to this documentation for more information.

    Inputs

    • Source table

    Settings

    • File format: Select between CSV and JSON formats for the exported file(s).

    • Export compressed: Choose between a compressed file (.gz) or uncomprsesed.

    • GCS Bucket name: A bucket location provided by the user will be used by entering its name. Optionally a path can be added to the bucket name separated by slashes (/). The name of the bucket should follow the GCS specification and the name of the folder(s), in addition to GCS requirements should not contain single quotes (') or backslashes (). The connection (the email which appears on the connections page) should at least have permission for creating and deleting objects. To download the urls in a browser they should grant permissions to the email of an account they can login with. This custom bucket won't work with CARTO Data Warehouse connections unless the bucket is public.

    Using this component requires that the connection has BigQuery User permission in the billing project.

    Outputs

    • Result table: A table containing links to the result of the export.

    Create Builder Map

    Description

    This component creates a Builder map on its first execution. On subsequent executions, it can update, overwrite or create a copy of the existing map.

    Inputs

    • This component can receive any number of source table inputs. Each connected node will result in a datasource (with its corresponding layer) in the created Builder map. These datasources will be tagged as "Temporal" in Builder, since they might be subject to an expiration date or live in a temporal dataset/schema (more info).

    If you need to create permanent data sources in your map, use the "Save as Table" component and connect its output to the "Create Builder Map" component. The result will be a standard, persistent data source in your Builder map.

    Settings

    • Map name: A string that defines the name of the created Builder map. Usage of expressions and variables is allowed to generate a name by concatenating strings and variables. For example: {{'Map of the state of ' || @state_name}}

    • Re-running mode: This setting controls the behaviour of the component after the first execution. It has three different modes:

      • Update (default): In this mode, executing the workflow updates data sources in the map, syncing them with the currently connected input nodes while keeping map styles, widgets and other map settings.

      • Overwrite: In this mode, executing the workflow deletes the current map and creates a new one from scratch with default settings. The new map will have a different ID and URL.

      • Create a copy: In this mode, executing the workflow creates a new copy of the map with same styles and other settings. The copy will reflect the currently connected input nodes.

    Output

    • This component generates a single-row table with three columns that contain the map ID, the map URL and a timestamp of the last execution.

    Get Table by Name

    Description

    Load a table in the workflow, giving its name as a string. eg: db.schema.table.

    Inputs

    • Source table reference: A string that contains the FQN of the table that will be loaded to the canvas.

    Output

    • Result table

    Import from URL

    Description

    This components imports a table from a URL.

    This component needs to be executed (by running the workflow) before being able to connect . This process may take several minutes.

    Inputs

    • Source URL: The public URL to be used to get the file that will be imported.

      • Check the "Automatically guess column data types in the imported table" checkbox to automatically guess data types on the input file. If disabled, all properties will be imported as a String.

    Outputs

    • Output table: The component generates a table that contains the result of importing the file from "Source URL".

    HTTP Request

    Description

    This component perform requests to external endpoints.

    Inputs

    • Source table (Optional): If the component has an input table, columns from this input can be used within expressions in other settings.

    Take into account that this component doesn't accept more than 10 rows as input. This limit is in place to ensure stability.

    Settings

    • URL : The URL that will be used to perform the HTTP request

    • Options (Optional): JSON object with options for the HTTP Request. You can use this to set a different method than GET, specify headers or a body for your request. Follows the fetch() API options specification.

      • Example: { "method":"POST", "headers":{ "Content-Type":"application/json" }, "mode":"cors", "cache":"default", "body": "{ ...}" }

    Incorrect method values (like "method":"GETS" , "method":"FOO" ) will be ignored and the request will default to using GET .

    Both URL and Options settings allow the usage of expressions and variables:

    • This is an example of using values from an input's column in the URL, concatenating strings with values from columns, all within an expression enclosed with double curly braces: &#xNAN;{{'https://your-api-domain.com/coordinates_endpoint?lat=' ||`` ``latitude``|| '&lon=' ||``longitude ||`` ``'&otherparams=...'}}

    • Using expressions in the Options setting is a bit different, since the content of it is stringified on the data warehouse, and we can use column names between braces directly instead of concatenating strings, for example: { "method":"POST", "headers":{ "Content-Type":"application/json" }, "mode":"cors", "cache":"default", "body": "{\"lng\":`` ``{{longitude_column}}, \"lat\":{{latitude_column}}`` ``}" }

    In the examples above, values from latitude and longitude columns have been used with expressions to generate a different URL /options for each row in the input table.

    • Allowed hosts (Optional): When expressions or variables are used in the URL, you need to set a comma-separated list of allowed hosts for your requests. If expressions/variables are not used in URL, this setting is ignored.

    Warning: API Key Security Including an API key or an authorization header in the URL or options object may expose them in data warehouse logs. Handle keys responsibly to prevent unauthorized access.

    Outputs

    • Output table: This components produces a table that contains all columns from the input table (if any was connected) plus a column response_data containing a string with the response's data.

    External links

    This component makes use of the functions in the http_request module of the CARTO Analytics Toolbox. Check the links below for more specific documentation:

    • BigQuery

    • Snowflake

    • Redshift

    API Output

    Description

    This component defines the output of the Workflow when executed via API.

    Inputs

    • Source table

    Settings

    • Execution method: This setting controls the type of return value of the API call. Sync returns the result immediately, while Async returns a job ID.

      • Sync: When using this method, the API request is made to the /query endpoint in SQL API to call the procedure, and the response will include the resulting data. Bear in mind that the workflow need to run within the standard timeout limit. This mode is only intended for very short-running workflows.

      • Async (default): When using this method, the API request is made to the /job endpoint in SQL API to call the procedure. The result of the workflow is stored in a temporary table which FQN is returned in the API response as: "workflowOutputTableName": "workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279_out_33afd785675f081d" , along with the job ID and other job metadata.

    This component can only be used once on a workflow.

    Find more information about running workflows via API here.

    MCP Tool Output

    Description

    This component defines the output of the Workflow when executed as an MCP Tool.

    Inputs

    • Source table

    Settings

    • Execution method: This setting controls the type of return value of the MCP Tool invocation. Sync returns the result immediately, while Async returns a job ID.

      • Sync: When using this method, the MCP Tool response will include the resulting data. Bear in mind that the workflow need to run within the standard timeout limit. This mode is only intended for very short-running workflows.

      • Async (default): When using this method, the MCP Tool response will include a job ID. Agents can make use of internal tools in the CARTO MCP Server to poll the status of the execution and fetch the results when finished.

    This component can only be used once on a workflow.

    Find more information about running workflows as MCP Tools here.

    Save as Table

    Description

    This component writes an input table to a non-temporary location.

    If a table exists under that location, it will be overwritten.

    Inputs

    • Source table

    Settings

    • FQN of table to create/overwrite. There are two different options:

      • Use the UI to select a destination for the target table in the current connection.

      • Type a FQN or an expression that uses variables or concatenates strings and variables, like: {{@variable_fqn}} or {{'database.schema.result_table_' || @variable_name}}

    • Append [true/false]: Determines whether the result will be appended to an existing table; or it will overwritten.

    Provided FQNs need to follow convention on each data warehouse. Invalid names (according to each data warehouse constraints) will make the component fail. Check the documentation for each.

    • BigQuery

    • Snowflake

    Outputs

    • The output of this component references the table created in the defined location.

    Send by Email

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    For Self-Hosted deployments, a bucket owned by the customer needs to be configured in order to use the "Include data" option. Please refer to this documentation for more information.

    Description

    This component sends an email with a custom subject and body. Optionally, it can export data to a bucket and add the resulting link(s) to the email's body.

    Inputs

    • Source table [Table]

    Settings

    • Email address: The email address that will receive the email. This field support multiple addresses as separate recipients for the email.

    • Subject: The subject of the email.

    • Body: The content of the email.

    • Include data: If enabled, data from the input table will be exported to a bucket and resulting link(s) will be added to the email's body.

    • File format:

      • CSV

      • JSON

    • GCS Bucket name (Optional). If provided the data will be exported to a custom GCS bucket. Optionally a path can be added to the bucket name separated by slashes (/). The name of the bucket should follow the and the name of the folder(s), in addition to should not contain single quotes (') or backslashes (). The connection (the email which appears on the connections page) should at least have permission for creating and deleting objects. To download the urls in a browser they should grant permissions to the email of an account they can login with. This custom bucket won't work with CARTO Data Warehouse connections unless the bucket is public.

    Outputs

    • This component doesn't have an output that can be connected to another node.

    Statistics

    Components to perform spatial statistics calculations.

    These components require installed in the chosen connection to build the workflow.

    Cluster Time Series

    BigQuery ML

    Extension Package provided by CARTO

    The BigQuery ML extension package for CARTO Workflows includes a variety of components that enable users to integrate machine learning workflows with geospatial data. These components allow for creating, evaluating, explaining, forecasting, and managing ML models directly within CARTO Workflows, utilizing BigQuery ML’s capabilities.

    The following table summarises available components, and explains how different components can be connected to each other:

    Create Classification Model

    Description

    This component trains a classification model using a table of input data.

    Databricks
    Redshift
    PostgreSQL
    GCS specification
    GCS requirements
    Description

    This component computes a clusterization based on the values of a feature over time (i.e. sales of different stores, temperature in different H3 cells, etc) and assigns a cluster number to each, using different methods to evaluate the values and their distribution over time.

    Inputs

    • Input table: This component expects a table containing a column with at least a timestamp column, a value column and an index to be used as partition (usually an spatial index or a feature id)

    Settings

    • Timestamp column: Select a column in the input that contains timestamps/dates.

    • Partition column: Each unique value on this column will be assigned a cluster.

    • Value column: Clusters will be calculated based on the value of this column along time

    • Number of clusters: Define the number of clusters to be generated.

    • Method:

      • Value

      • Profile

    Outputs

    • Output table: This component will generate a table with the following columns:

      • Partitioning column with the same name as the given column from inputs.

      • Cluster (String): Contains the assigned cluster number.

    Composite Score Supervised

    Description

    This component derives a spatial composite score as the residuals of a regression model which is used to detect areas of under- and over-prediction. The response variable should be measurable and correlated with the set of variables defining the score. For each data point, the residual is defined as the observed value minus the predicted value. Rows with a NULL value in any of the input variables are dropped.

    Inputs

    • Input table

    Settings

    • Column with unique geographic identifier (spatial index)

    • Input variables

    • Response variables

    • Model regressor

    • Model options

    • Output formatting

    • R-squared threshold

    Outputs

    • Output table

    External links

    BigQuery reference

    Snowflake reference

    Composite Score Unsupervised

    Description

    This component combines (spatial) input variables into a meaningful composite score. The composite score can be derived using different methods, scaling and aggregation functions and weights. Rows with a NULL value in any of the model predictors are dropped.

    Inputs

    • Input table

    Settings

    • Column with unique geographic identifier (spatial index)

    • Input variables

    • Scoring method

      • Entropy

      • Custom Weights:

        • Variables and weights:When defining custom weights, you will find a button that controls the direction of the weight:

          • By default, it is set as "Direct" (), which means that the variable's value will weight in to make the final score higher.

    • Output formatting

      • None

      • Equal Intervals

      • Jenks

    Outputs

    • Result table

    • Lookup table:only available if output formatting is Equal Intervals, Jenks, or Quantiles

    External links

    BigQuery reference

    Snowflake reference

    Cronbach Alpha Coefficient

    Description

    This component computes the Cronbach Alpha Coefficient, which can be used to measure internal consistency of different variables.

    This component can be used to determine wether a combination of variables are consistent enough to be used to create a Composite Score using one of the components above.

    Inputs

    • Input table

    Settings

    • Input variables

    Outputs

    • Output table

    External links

    BigQuery reference

    Snowflake reference

    Detect Space-time Anomalies

    Description

    This component can be used to detect space-time regions where the variable of interest is higher (or lower) than its baseline value. It implements the scan statistics framework developed in this R package to detect space-time regions where the variable of interest is higher (or lower) than its baseline value.

    Inputs

    • Source table [Table]: The source table. It must contain the index_column, the date_column, the value_column, a column named value_column_baseline with the values that should be used as a baseline to detect the anomalies and, when the distributional model parameter is set to ‘GAUSSIAN’ also its variance, value_column_baseline_sigma2.

    Settings

    • Index column [Column]: The name of the column with the unique geographic identifier of the spatial index, either ‘H3’ or ‘QUADBIN’.

    • Date column [Column]: The name of the column with the date identifier.

    • Value column [Column]: The name of the column with the variable for which the space-time anomalies should be detected.

    • Time frequency [Selection]: The temporal frequency of the data selected from one of the following: ‘SECOND’, ‘MINUTE’, ‘HOUR’, ‘DAY’, ‘WEEK’, ‘MONTH’, ‘QUARTER’, ‘YEAR’.

    • Distributional model [Selection]: The distributional model of the data, either ‘POISSON’ or ‘GAUSSIAN’.

    • Estimation method [Selection]: The estimation method, either ‘EXPECTATION’ or ‘POPULATION’ for the expectation- and population-based methods respectively.

    • Prospective analysis [Boolean]: Option to specify if the analysis is retrospective or prospective. In a prospective analysis, only temporal zones that end with the last timestamp are considered and the interest lies in detecting new emerging anomalies. In a retrospective analysis instead, the space-time anomalies can happen at any point in time over all the past data (a temporal zone can end at any timestamp)

    • High-mean anomalies [Boolean]: Option to specify if the analysis is for detecting space-time zones higher or lower than the baseline.

    • K-ring size (advanced, optional) [Range]: The minimum and maximum k-ring size used to define the spatial zones.

    • Temporal window (advanced) [Range]: The minimum and maximum temporal window used to define the temporal zones.

    • Number of permutations (advanced) [Number]: The number of permutations used to derive the random replicas to test the anomaly statistical significance

    • Max results (advanced) [Number]: The maximum number of space-time zones returned

    Outputs

    • Output table [Table]: The output table

    External links

    BigQuery reference

    Detect Spatial Anomalies

    This component can be used to detect spatial regions where the variable of interest is higher (or lower) than its baseline value. It implements the scan statistics framework developed in this R package to detect spatial regions where the variable of interest is higher (or lower) than its baseline value.

    Inputs

    • Source table [Table]: The source table. It must contain the index_column, the date_column, the value_column, a column named value_column_baseline with the values that should be used as a baseline to detect the anomalies and, when the distributional model parameter is set to ‘GAUSSIAN’ also its variance, value_column_baseline_sigma2.

    Settings

    • Index column [Column]: The name of the column with the unique geographic identifier of the spatial index, either ‘H3’ or ‘QUADBIN’.

    • Value column [Column]: The name of the column with the variable for which the spatial anomalies should be detected.

    • Distributional model [Selection]: The distributional model of the data, either ‘POISSON’ or ‘GAUSSIAN’.

    • Estimation method [Selection]: The estimation method, either ‘EXPECTATION’ or ‘POPULATION’ for the expectation- and population-based methods respectively.

    • High-mean anomalies [Boolean]: Option to specify if the analysis is for detecting space-time zones higher or lower than the baseline.

    • K-ring size (advanced, optional) [Range]: The minimum and maximum k-ring size used to define the spatial zones.

    • Number of permutations (advanced) [Number]: The number of permutations used to derive the random replicas to test the anomaly statistical significance

    • Max results (advanced) [Number]: The maximum number of space-time zones returned

    Outputs

    • Output table [Table]: The output table

    External links

    BigQuery reference

    Getis Ord

    Description

    This component computes the Getis-Ord Gi* statistic for each spatial index in the source table.

    Inputs

    • Source table [Table]

    Settings

    • Index column [Column]

    Make sure that the input data for this component doesn't contain any NULL values on the index column. Otherwise the execution will fail.

    • Value column [Column]

    • Kernel function for spatial weights [Selection]

    • Size [Number]

    Outputs

    • Result table [Table]

    External links

    • BigQuery reference to H3 procedure

    • BigQuery reference to Quadbin procedure

    Getis Ord Spacetime

    Description

    This component computes the spacetime Getis-Ord Gi* statistic for each spatial index and datetime timestamp in the source table.

    Inputs

    • Source table [Table]

    • Index column [Column]

    Make sure that the input data for this component doesn't contain any NULL values on the index column. Otherwise the execution will fail.

    • Date column [Column]

    • Value column [Column]

    • Kernel function for`` spatial ``weights [Selection]

    • Kernel function for`` temporal ``weights [Selection]

    • Size [Number]

    • Temporal bandwidth [Number]

    • Time interval [Selection]

    Outputs

    • Result table [Table]

    External links

    • BigQuery reference to H3 procedure

    • BigQuery reference to Quadbin procedure

    GWR

    Description

    This component runs a spatial-index-based Geographically Weighted Regression (GWR) model.

    Inputs

    • Input table [Table]

    • Index column [Column]

    • Feature variables [Column]

    • Target variable [Column]

    • K-ring size [Number]

    • Kernel function [Selection]

    • Fit intercept [Boolean]

    Outputs

    • Result table [Table]

    External links

    • BigQuery reference

    Hotspot Analysis

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component is used to locate hotspot areas by calculating a combined Getis-Ord Gi* statistic using a uniform kernel over one or several variables.

    Inputs

    • Source table [Table]

    • Index column [Column]

    • Input variables [Column]

    • Variable weights [String]

    • K-ring size [Number]

    • Significance level [Number]

    Outputs

    • Result table [Table]

    External links

    • BigQuery reference

    Local Moran's I

    Description

    This component computes the local Moran's I statistic for each spatial index in the source table.

    Inputs

    • Source table [Table]

    • Index column [Column]

    • Value column [Column]

    • Size [Number]

    • Decay function [Selection]

    • Permutations [Number]

    Outputs

    • Result table [Table]

    External links

    • BigQuery reference to H3 procedure

    • BigQuery reference to Quadbin procedure

    Moran's I

    Description

    This component computes the Moran's I statistic for each spatial index in the source table.

    Inputs

    • Source table [Table]

    • Index column [Column]

    • Value column [Column]

    • Size [Number]

    • Decay function [Selection]

    Outputs

    • Result table [Table]

    External links

    • BigQuery reference to H3 procedure

    • BigQuery reference to Quadbin procedure

    Spacetime Hotspots Classification

    Description

    This component takes the output of Getis Ord Spacetime and classifies each location into specific types of hotspots or coldspots, based on patterns of spatial clustering and intensity trends over time.

    Inputs

    • Input table: This component expects to be connected to the output of a Getis Ord Spacetime successfully executed node, or a table that contains that same result.

    Settings

    • Index column: Select a column in the input that contains a spatial index (Quadbin or H3).

    • Date column: Select a column in the input that contains date/timestamp.

    • Gi Value: Select a column in the input that contains a Gi value generated by a Getis Ord Spacetime component.

    • P Value: Select a column in the input that contains a P value generated by a Getis Ord Spacetime component.

    • Threshold: Select the threshold of the P value for a location to be considered as hotspot/coldspot.

    • Algorithm: Select the algorithm to be used for the monotonic trend test:

      • Mann-Kendall (default)

      • Modified Mann-Kendall

    Output

    • Output table: The component will return a table with the column that contains the index and the resulting columns from the procedure's execution.

    the CARTO Analytics Toolbox

    For more details, refer to the official ML.CREATE_MODEL documentation.

    Inputs

    • Input table: A data table that is used as input for the model creation.

    Settings

    • Model's FQN: Fully qualified name for the model created by this component.

    • Unique identifier column: A column from the input table to be used as unique identifier for the model.

    • Input label column: A column from the input table to be used as source of labels for the model.

    • Model type: Select the type of model to be created. Options are:

      • LOGISTIC_REG

      • BOOSTED_TREE_CLASSIFIER

      • RANDOM_FOREST_CLASSIFIER

    • Fit intercept: Determines whether to fit an intercept term in the model. Only applies if Model type is "LOGISTIC_REG".

    • Max tree depth: Determines the maximum depth of the individual trees. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

    • Number of parallel trees: Determines the number of parallel trees constructed on each iteration. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

    • Minimum tree child weight: Determines the minimum sum of instance weight needed in a child. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

    • Subsample: Determines whether to subsample. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

    • Column sample by tree: Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

    • Column sample by node: "Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

    • Data split method: The method used to split the input data into training, evaluation and test data. Options are:

      • AUTO_SPLIT automatically splits the data

      • RANDOM splits randomly based on specified fractions

      • CUSTOM uses a specified column

    • Data split evaluation fraction: Fraction of data to use for evaluation. Only applies if Data split method is RANDOM or SEQ.

    • Data split test fraction: Fraction of data to use for testing. Only applies if Data split method is RANDOM or SEQ.

    • Data split column: Column to use for splitting the data. Only applies if Data split method is CUSTOM.

    Outputs

    • Output table: This component generates a single-row table with the FQN of the created model.

    Create Forecast Model

    Description

    This component trains a forecast model using a table of input data.

    For more details, refer to the official ML.CREATE_MODEL documentation.

    Inputs

    • Input table: A data table that is used as input for the model creation.

    • Holidays table: A table containing custom holidays to use during model training.

    Settings

    • Model's FQN: Fully qualified name for the model created by this component.

    • Model type: Select a type of model to be created. Options are:

      • ARIMA_PLUS

      • ARIMA_PLUS_XREG

    • Time-series ID column: Column from Input table that uniquely identifies each individual time series in the input data. Only applies if Model type is ARIMA_PLUS and Auto ARIMA is set to true.

    • Time-series timestamp column: Column from Input table containing timestamps for each data point in the time series.

    • Time-series data column: Column from Input table containing the target values to forecast for each data point in the time series.

    • Auto ARIMA: Automatically determine ARIMA hyperparameters.

    • P value: Number of autoregressive terms. Only applies if Auto ARIMA is set to false.

    • D value: Number of non-seasonal differences. Only applies if Auto ARIMA is set to false.

    • Q value: Number of lagged forecast errors. Only applies if Auto ARIMA is set to false.

    • Data frequency: Frequency of data points in the time series. Used by BigQuery ML to properly interpret the time intervals between data points for forecasting. AUTO_FREQUENCY will attempt to detect the frequency from the data. Options are:

      • AUTO_FREQUENCY (default)

      • PER_MINUTE

      • HOURLY

    • Holiday region: Region for wich the holidays will be applied. Check .

    • Clean spikes and dips: Determines whether to remove spikes and dips from the time series data.

    Outputs

    • Output table: This component generates a single-row table with the FQN of the created model.

    Create Regression Model

    Description

    This component trains a regression model using a table of input data.

    For more details, refer to the official ML.CREATE_MODEL documentation.

    Inputs

    • Input table: A data table that is used as input for the model creation.

    Settings

    • Model's FQN: Fully qualified name for the model created by this component.

    • Unique identifier column: A column from the input table to be used as unique identifier for the model.

    • Input label column: A column from the input table to be used as source of labels for the model.

    • Model type: Select the type of model to be created. Options are:

      • LINEAR_REG

      • BOOSTED_TREE_REGRESSOR

      • RANDOM_FOREST_REGRESSOR

    • Fit intercept: Determines whether to fit an intercept term in the model. Only applies if Model type is "LINEAR_REG".

    • Max tree depth: Determines the maximum depth of the individual trees. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

    • Number of parallel trees: Determines the number of parallel trees constructed on each iteration. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

    • Minimum tree child weight: Determines the minimum sum of instance weight needed in a child. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

    • Subsample: Determines whether to subsample. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

    • Column sample by tree: Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

    • Column sample by node: "Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

    • Data split method: The method used to split the input data into training, evaluation and test data. Options are:

      • AUTO_SPLIT automatically splits the data

      • RANDOM splits randomly based on specified fractions

      • CUSTOM uses a specified column

    • Data split evaluation fraction: Fraction of data to use for evaluation. Only applies if Data split method is RANDOM or SEQ.

    • Data split test fraction: Fraction of data to use for testing. Only applies if Data split method is RANDOM or SEQ.

    • Data split column: Column to use for splitting the data. Only applies if Data split method is CUSTOM.

    Outputs

    • Output table: This component generates a single-row table with the FQN of the created model.

    Evaluate

    Description

    Given a pre-trained ML model and an input table, this component evaluates its performance against some input data provided. The result will contain some metrics regarding the model performance. The user is offered some extra options when the model is a forecasting model.

    For more details, refer to the official ML.EVALUATE function documentation.

    Inputs

    • Model: This component's receives a trained model table as input.

    • Input table: This component receives a data table to be used as data input.

    Settings

    • Forecast: Determines whether to evaluate on the model as a forecast.

    • Perform aggregation: Determines whether to evaluate on the time series level or the timestep level. Only applies if Forecast is set to true.

    • Horizon: Number of forecasted time points to evaluate against. Only applies if Forecast is set to true.

    • Confidence level: Percentage of the future values that fall in the prediction interval. Only applies if Forecast is set to true.

    Outputs

    • Output table: This components produces a table with a predictions column.

    Evaluate Forecast

    Description

    Given a pre-trained ML forecast model, this component evaluates its performance using the ARIMA_EVALUATE function in BigQuery.

    Inputs

    • Model: This component's receives a trained model table as input.

    Settings

    • Show all candidate models: Determines whether to show evaluation metrics for all candidate models or only for the best model.

    Outputs

    • Output table: This component produces a table with the evaluation metrics.

    Explain Forecast

    Description

    Given a pre-trained ML model and an input table, this component runs an explainability analysis invoking the EXPLAIN_FORECAST function in BigQuery.

    Inputs

    • Model: This component's receives a trained model table as input.

    • Input table: This component receives a data table to be used as data input. It can only be used with ARIMA models. The execution will fail if a different model is selected.

    Settings

    • Model type: Select the type of model to be used with the input. Options are:

      • ARIMA_PLUS

      • ARIMA_PLUS_XREG

    • Horizon: The number of time units to forecast into the future.

    • Confidence level: The confidence level to use for the prediction intervals.

    Outputs

    • Output table: This component produces a table with the explainability metrics.

    Explain Predict

    Description

    Given a pre-trained ML model, this component generates a predicted value and a set of feature attributions for each instance of the input data. Feature attributions indicate how much each feature in your model contributed to the final prediction for each given instance.

    For more details, refer to the official ML.EXPLAIN_PREDICT function documentation.

    Inputs

    • Model: This component's receives a trained model table as input.

    • Input table: This component receives a data table to be used as data input.

    Settings

    • Number of top features: The number of the top features to be returned.

    Outputs

    • Output table: This component produces a table with the attribution per feature for the input data.

    Forecast

    Description

    Given a pre-trained ML model and an optional input table, this component infers the predictions for each of the input samples. Take into account that the actual forecasting happens when creating the model, this component only retrieves the desired results.

    For more details, refer to the ML.FORECAST function documentation.

    Inputs

    • Model: This component's receives a trained model table as input.

    • Input table: This component receives a data table to be used as data input. It can only be used with ARIMA models. The execution will fail if a different model is selected.

    Settings

    • Model type: Select the type of model to be used with the input. Options are:

      • ARIMA_PLUS

      • ARIMA_PLUS_XREG

    • Horizon: The number of time units to forecast into the future.

    • Confidence level: The confidence level to use for the prediction intervals.

    Outputs

    • Output table: This component produces a table with the predictions column.

    Get Model by Name

    Description

    This component loads an pre-existing model in BigQuery into the expected Workflows format to be used with the rest of BigQuery ML components.

    Inputs

    • Model FQN: Fully-qualified name to get the model from.

    Outputs

    • Output: This component returns a model that can be connected to other BigQuery ML components that expect a model as input.

    Global Explain

    Description

    Given a pre-trained ML model, this component lets you provide explanations for the entire model by aggregating the local explanations of the evaluation data. It returns the attribution of each feature. In the case of classification model, an option can be set to provide explanation for each class of the model.

    For more details, refer to the official ML.GLOBAL_EXPLAIN function documentation.

    Inputs

    • Model: This component's receives a trained model table as input.

    Settings

    • Class level explain: Determines whether global feature importances are returned for each class in the case of classification.

    Outputs

    • Output table: This component produces a table with the attributions per row of the input data.

    Import model

    Description

    This component imports an ONNX model from Google Cloud Storage. The model will be loaded into BigQuery ML using the provided FQN and it will be ready to use in Workflows with the rest of ML components.

    Settings

    • Model path: Google Cloud Storage URI (gs://) of the pre-trained ONNX file.

    • Model FQN: A fully-qualified name to save the model to.

    • Overwrite model: Determines whether to overwrite the model if it already exists.

    Outputs

    • Output table: This component returns a model that can be connected to other BigQuery ML components that expect a model as input.

    Predict

    Description

    Given a pre-trained ML model and an input table, this component infers the predictions for each of the input samples. A new variable prediction will be returned. All columns in the input table will be returned by default; an option can be unmarked to select a single ID column that will be returned with the prediction.

    For more details, check out the ML.PREDICT function documentation.

    Inputs

    • Model: This component's receives a trained model table as input.

    • Input table: This component receives a data table to be used as data input.

    Settings

    • Keep input columns: Determines whether to keep all input columns in the output or not.

    • ID column: Select a column from the input table to be used as the unique identifier for the model. Only applies if Keep input columns is set to false.

    Outputs

    • Output table: This component produces a table with the predictions column.

    Workflows as MCP Tools

    Workflows are the foundation for creating MCP Tools in CARTO. Each tool exposed through the CARTO MCP Server is powered by a workflow that defines how spatial problems are solved, the data required as input, and the results returned to the AI Agent.

    This page explains how to configure workflows for MCP, including how to define outputs, write clear tool descriptions, and keep tools in sync as you make updates. By following these steps, you ensure that your AI Agents can reliably use the tools you create to answer geospatial questions.

    Create a Workflow

    Each MCP Tool is backed by a Workflow. For every type of question or task you want the agent to answer, you will need to create a dedicated Workflow.

    Define the Output

    Use the MCP Tool Output component in Workflows to define the expected output of the tool.

    • Select Sync mode if the workflow returns results immediately.

    • Select Async mode if the workflow requires longer processing time and results are retrieved after execution.

    Writing Descriptions and Enabling the Tool

    Click on the three-dot options button in the top-right corner to access the MCP Tool dialog.

    Once in the dialog:

    • Write a clear description that explains the purpose of the tool.

    • Define all input parameters with explicit names and descriptions.

    • Enable the tool so it can be exposed through the MCP Server.

    Propagating Changes

    When you make updates to a workflow, ensure you sync the workflow. This propagates the changes to the MCP Tool so the AI Agent always uses the latest version.

    Provide your Agent with access to Workflows MCP Tools

    Once Workflows is enabled as an MCP Tool, you can give your Agents in CARTO or outside CARTO access to it.

    To learn how to connect to the CARTO MCP Server and use your MCP Tools in other MCP-compliant agentic applications, .

    If you want to provide access to the Workflows MCP Tool to an AI Agent in CARTO, .

    Spatial Indexes

    Components to operate with spatial indexes: H3 and Quadbin. about how to leverage spatial indexes in your analyses.

    H3 Boundary

    This component requires installed in the chosen connection to build the workflow.

    SEQ splits sequentially

  • NO_SPLIT uses all data for training.

  • DAILY

  • WEEKLY

  • MONTHLY

  • QUARTERLY

  • YEARLY

  • SEQ splits sequentially

  • NO_SPLIT uses all data for training.

  • the reference for available values
    see this section
    see this section

    If set to "Inverse" (), the variable's value will weight in to make the final score lower.

  • When specifying custom weights, the component will internally apply a normalization so that the sum of all weights is equal to 1

  • Scaling function:

    • MIN_MAX_SCALER: data is rescaled into the range [0,1] based on minimum and maximum values. Only numerical variables are allowed.

    • STANDARD_SCALER: data is rescaled by subtracting the mean value and dividing the result by the standard deviation. Only numerical variables are allowed.

    • RANKING: data is replaced by its percent rank, that is by values ranging from 0 lowest to 1. Both numerical and ordinal variables are allowed (categorical and boolean variables need to be transformed to ordinal).

    • DISTANCE_TO_TARGET_MIN(_MAX,_AVG):data is rescaled by dividing by the minimum, maximum, or mean of all the values. Only numerical variables are allowed.

    • PROPORTION: data is rescaled by dividing by the sum total of all the values. Only numerical variables are allowed.

  • Aggregation Function:

    • LINEAR: the spatial composite is derived as the weighted sum of the scaled individual variables.

    • GEOMETRIC: the spatial composite is given by the product of the scaled individual variables, each to the power of its weight.

  • First PC

    • Correlation variable: the spatial score will be positively correlated with the selected variable (i.e. the sign the spatial score is set such that the correlation between the selected variable and the first principal component score is positive).

    • Correlation threshold: the minimum absolute value of the correlation between each individual variable and the first principal component score.

  • Quantiles

  • Return Range

  • Description

    This component takes a table with H3 cell indices and generates a new one with an additional column containing the boundaries of those H3 cells.

    Inputs

    • Source table [Table]

    • H3 column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    H3 Center

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component adds a new column containing the centers of H3 cells.

    Inputs

    • Source table [Table]

    • H3 column [Column]

    Outputs

    • Result table [Table]:

    External links

    BigQuery reference

    H3 Grid Distance

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component computes the H3 grid distance between two H3 index columns, creating a new table with the same columns as the source one and adding a new 'h3_distance' column.

    The H3 distance is defined as the length of the shortest path between the cells in the graph formed by connecting adjacent cells.

    This function will return 'null' if the cells are too far apart to compute the distance.

    Inputs

    • Source table [Table]

    • First H3 index column [Column]

    • Second H3 index column [Column]

    Outputs

    • Result table [Table]:

    External links

    BigQuery reference

    H3 from GeoPoint

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component adds a new column containing the indices of H3 cells corresponding to points.

    Inputs

    • Source table [Table]

    • Points column [Column]

    • Resolution [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    H3 KRing

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    Returns the neighboring indexes in all directions under the K distance size.

    Inputs

    • Source table [Table]

    • H3 index [Column]

    • Size [Number]

    • Include distance [Boolean]: add or not column with the value of H3 distance

    • Include columns [Boolean]: copy or not source columns in the result table

    Outputs

    • Result table [Table]:

    External links

    BigQuery reference

    Snowflake reference

    Postgres reference

    H3 Polyfill

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This generates a table with indices of all H3 cells included within a given extent for a given resolution.

    The extent is specified using a table with polygons.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Resolution [Number]

    • Keep input table columns: If checked, all the original columns of the source table will be added to the result table. Otherwise, only columns with the geometry and the H3 index will be added.

    • Mode:

      • Center:Returns only the cells which centroid intersects with the input features.

      • Intersects:Returns all cells that intersect with the input features.

    Outputs

    • Result table [Table]:

    External links

    BigQuery reference

    Snowflake reference

    Postgres reference

    H3 To Parent

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    Adds a new column named h3_parent with the value of the parent h3 at a specific resolution. If required resolution is higher than h3 column, a null value will be set.

    Inputs

    • Source table [Table]

    • H3 index [Column]

    • Resolution [Number]

    Outputs

    • Result table [Table]:

    External links

    BigQuery reference

    Snowflake reference

    Postgres reference

    Quadbin Boundary

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component takes a table with Quadbin cell indices and generates a new one with an additional column containing the boundaries of those Quadbin cells.

    Inputs

    • Source table [Table]

    • Quadbin column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Quadbin Center

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component adds a new column containing the centers of Quadbin cells.

    Inputs

    • Source table [Table]

    • Quadbin column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Quadbin from GeoPoint

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component adds a new column containing the indices of Quadbin cells corresponding to points.

    Inputs

    • Source table [Table]

    • Points column [Column]

    • Resolution [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Quadbin Kring

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    Returns all Quadbin cell indexes and their distances in a filled square k-ring of a user defined size centered at the origin in no particular order.

    Inputs

    • Source table [Table]

    • Quadbin index [Column]

    • Size [Number]

    • Include distance [Boolean]: add or not column with the value of Quadbin distance

    • Include columns [Boolean]: copy or not source columns in the result table

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Quadbin Polyfill

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This generates a table with indices of all Quadbin cells included within a given extent for a given resolution.

    The extent is specified using a table with polygons.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Resolution [Number]

    • Mode:

      • Center:Returns only the cells which centroid intersects with the input features.

      • Intersects:Returns all cells that intersect with the input features.

      • Contains:Returns only the cells that are entirely contained within the input features.

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Quadbin To Parent

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    It adds a new column named quadbin_parent with the value of the parent quadbin at a specific resolution. If required resolution is higher than quadbin column, than an "Invalid resolution" query error will be returned.

    Inputs

    • Source table [Table]

    • Quadbin index [Column]

    • Resolution [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Learn more
    the CARTO Analytics Toolbox

    Executing workflows via API

    A workflow can be executed via an API call, which allows merging workflows with bigger processes, running analytics pipelines from external applications and further integrations that enable a wide range of use cases.

    For Databricks connections, executing Workflows via API is currently a work in progress. The functionality will be available soon, please get in touch with [email protected] if you have feedback about this missing functionality.

    Introduction

    CARTO Workflows translates the analytical pipelines designed in the canvas to cloud native SQL queries that are executed directly on the data warehouse. This generated SQL code has different forms:

    • The code displayed in the contains all the control structures and creation of intermediate tables for each node in the workflow.

    • A stored procedure that is created in , executed when a workflow execution is triggered via API call. This stored procedure might have input parameters that are used later in the code, that are created from the marked as 'Parameters'. \

    The ability of workflows to generate SQL code that can be executed via an API call represents a significant advancement in integrating complex analytical processes with external systems. This capability not only streamlines the process of executing workflows in response to external triggers but also opens up a myriad of possibilities for automation and scalability within data analysis projects.

    Enabling API access for a workflow

    To enable API access for an existing workflow, click on the three dots in the upper-right corner and find 'API'. Click on 'Enable API access' and you will see a dialog screen that looks like this:

    The endpoint displayed in this dialog depends on the configuration of the API Output component.

    • When using Sync, the /query endpoint will be used. The response will contain the data

    • When using Async (default), the /job endpoint will be used. The response will contain job metadata that can be used to poll the status and finding the table that is storing the result. .

    This is an example of an API call that would trigger the execution of a workflow:

    Synchronous GET example:

    Asynchronous POST example:

    There are a few things that are worth taking into consideration:

    • It is actually a GET or POST request to CARTO's , which can be used to get the result of a query directly in the response, o to create asynchronous jobs that are executed in the data warehouse.

    • The query is a CALL statement for a stored procedure. Workflows will generate a stored procedure in your schema/dataset. The generated SQL is the same that you will obtain by using the .

    • The

    Running a workflow via Async API call

    The provided example uses curl to illustrate the POST request that triggers the execution of the workflow. This example call should be easily adapted to other methods in different languages, like the library in Python.

    The response to that call will look like the following:

    Checking status of a workflow execution

    You can use that externalId to make a request to check the status of the execution, like:

    The above is just an example that is using a specific API URL (gcp-us-east1.api.carto.com) and connection name. Please make sure that you use the same API endpoint that you obtained from the UI and you used to create the job in the first place.

    The response to that API call is a JSON that contains metadata about the query execution, including a status object with information about the execution status.

    The following is an example of a failed execution due to incorrect parameter types:

    While this would be an example of a successful execution:

    Get more information about using the job endpoint in the .

    Output of a Workflow executed via API

    In order to define which node of your workflow will be used as output of the API call, you need to use the component. This component will ensure that the content of the node connected to it is stored in a temporary table, which location is returned in the "workflowOutputTableName" object in the response of the API call.

    The location of the temporary table is returned when the job is created, but the content of the table won't be complete until the execution is complete.

    API executions of the same workflow, but with different queryParameters will produce different output tables. On the other hand, API executions with same queryParameters will yield the same result, as the same output table will be reused.

    This behavior can be controlled with the in your Workflows UI. Setting it off will always force re-execution of the workflow, even if consecutive API calls with same queryParameters are performed.

    Updating your workflow

    Whenever you want to propagate changes in your workflow to the corresponding stored procedure that is executed with an API call, you need to update it.

    For that, just click on the chip in the top header that says 'API enabled', which will open the API endpoint modal. Wait for a couple of seconds while CARTO checks for changes in the workflow and you will see this:

    Click on 'Update' to sync the workflow that is executed via API with the current state of the workflow in the UI.

    Disabling API access

    If you need to prevent API access for a workflow, just click on 'Disable API access' in the API endpoint modal and confirm:

    Contains:Returns only the cells that are entirely contained within the input features.
    queryParameters
    object in the payload contains values for some variables that are passed as inputs to the stored procedure. These variables are the ones marked as 'Parameter' in the '
    ' menu.
  • The provided API call is authorized with an API Access Token that has the specific grants necessary to run that query through same connection that was used to create the workflow.

  • SQL tab of the result panel
    workflows_temp
    variables
    Learn more
    SQL API
    workflows_temp
    'Export' functionality
    requests
    CARTO API documentation
    Output
    cache settings
    Variables
    https://gcp-us-east1.api.carto.com/v3/sql/your_connection/query?q=CALL `workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279`(@number_of_clusters,@buffer_radius,@store_type)&queryParameters={"number_of_clusters":2,"buffer_radius":500,"store_type":"Supermarket"}&access_token=eyJhbGc(...)GUH-Lw
    curl --location 'https://gcp-us-east1.api.carto.com/v3/sql/your_connection/job' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer eyJhbGc(...)GUH-Lw' \
        --data '{
            "query": "CALL `workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279`(@number_of_clusters,@buffer_radius,@store_type)",
            "queryParameters": {"number_of_clusters":2,"buffer_radius":500,"store_type":"Supermarket"}
        }'
    {
      "externalId": "job_h8UWzpdzX0s2XAAXQd3rdWCdPUT9",
      "accountId": "ac_jfakef5m",
      "userId": "auth0|61164b7xxx77c006a259f53",
      "connectionId": "5a32a0ea-555a-48dd-aeb1-8768aae8ef1c",
      "metadata": {},
      "createdAt": "2023-10-04T16:10:35.732Z",
      "query": "CALL `workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279`(@number_of_clusters,@buffer_radius,@store_type)",
      "jobMetadata": {
        "location": "US",
        "workflowOutputTableName": "workflows-api-demo.workflows_temp.wfproc_f2f8df5df4ddf279_out_33afd785675f081d"
      },
      "token": "eyJhbGc(...)GUH-Lw"
    }
    curl --location 'https://gcp-us-east1.api.carto.com/v3/sql/your_connection/job/job_h8UWzpdzX0s2XAAXQd3rdWCdPUT9' \
    --header 'Authorization: Bearer eyJhbGc(...)GUH-Lw' 
        "status": {
          "errorResult": {
            "reason": "invalidQuery",
            "location": "query",
            "message": "Query error: Cannot coerce expression @number_of_clusters to type INT64 at [1:158]"
          },
          "state": "DONE"
        }
        "status": {
          "state": "DONE"
        }

    Temporary data in Workflows

    To improve performance and to be able to inspect results in intermediate steps, Workflows makes use of temporary data objects that are stored, by default, under a so called carto_temp schema/dataset in your data warehouse.

    Note that at the time of creating the connection to your data warehouse, you can also provide a custom location for storing the temporary data generated by Workflows. Check the Connections documentation for each provider for more details.

    This system of intermediate tables functions as a cache, avoiding re-executing all the steps in a workflow when they have not been modified; thus, executing again only those that have been changed or their inputs have changed when you are re-running your workflows.

    Specific permissions required by Workflows in your data warehouse

    In order for CARTO to be able to execute workflows in your data warehouse and temporarily store the data generated in intermediate steps of your analysis, there is a specific set of permissions required on the connection selected when creating your workflow.

    Since CARTO Workflows can be used with different providers (Google BigQuery, Snowflake, Amazon Redshift, and PostgreSQL), adapting the different terminology for each data warehouse, the recommended setup is:

    The or user account used for creating the connection must have the following required roles:

    • BigQuery Data Editor

    • BigQuery User

    In order to learn more details regarding data warehouse connection permissions please check .

    How to manage the temporary data generated by Workflows

    When your workflow runs, it stores the output results in intermediate tables. Depending on the data warehouse platform, a different strategy is applied to clean up the intermediate tables generated by CARTO Workflows. The default configuration removes the intermediate tables after 30 days.

    Below, you can find code and instructions for automating the process of cleaning up the intermediate tables in each data warehouse:

    When your workflow runs in BigQuery or CARTO Data Warehouse, these tables are temporary and will be automatically removed after 30 days.

    You can find more information about temporary tables in the .

    When your workflow runs in Snowflake, a is automatically created in the WORKFLOWS_TEMP schema to remove the intermediate tables after 30 days. This task is executed periodically every day at 0h UTC.

    These are the required permissions to create the task:

    Note that the task affects only the tables in that schema. If the name of the schema is changed in the connection, for example to CARTO_TEMP_CUSTOM, a new task will be created for the new schema.

    Cache options

    When executing workflows from the UI, intermediate tables are created for each node in the workflow. Depending on the connection used for your workflow, these intermediate tables will be reused, avoiding recreation when the workflow structure or settings have not changed.

    In order to control this behavior, you will find the Cache Settings menu, next to the 'Run' button:

    When enabled, which is the default option, it will try and reuse previous successful executions of the workflow and its intermediate steps, with some differences depending on the data warehouse:

    • CARTO Data Warehouse, BigQuery and Snowflake: Intermediate and result tables will be reused as long as the the workflow structure have not changed.

      • This setting applies to executions from the UI, triggered by clicking on the 'Run' button.

      • This setting also applies to (beta).

    When disabled, all intermediate and result tables will be always be recomputed in all execution modes (UI, Schedule and API call), regardless of updates to source tables or parameter values.

    Spatial Operations

    Components to compute operations and measures on geometries/geographies, or to alter geometry/geography size or shape.

    Distance Matrix

    Description

    This component creates a new table with a distance matrix computed for the geometries contained in an input table.

    The output table contains three columns: 'id1', 'id2' and 'distance'.

    Inputs

    Spatial Constructors

    Components that given the specifications in the input generate new sets of spatial data; those being points, line, polygons and even tilesets.

    BBOX from Values

    Description

    This component creates a new table containing a 'geom' column and a single row containing a geography that represents a bounding box as defined by the input min/max values.

    Inputs

  • Source table [Table]

  • Geo column [Column]

  • Id column [Column]

  • Outputs

    • Result table [Table]

    Distance (single table)

    Description

    This component calculates the ellipsoidal direct point-to-point, point-to-edge, or the drive distance between two sets of spatial objects.

    There are three types of distance calculation relationships available:

    1. Point-to-point: The Distance tool calculates the distance between two sets of point type spatial objects. An example is calculating the distance from records in a customer data table (point type) to records in a store data table (point type).

    2. Point-to-edge (line): The Distance tool calculates the distance between a set of point type spatial objects and the closest edge of a set of line type spatial objects. An example is calculating the distance from the records in a customer data table (point type) to the nearest interstate highway (edge of line spatial type).

    3. Point-to-edge (polygon): The Distance tool calculates the distance between a set of point type spatial objects and the closest edge of a set of polygon type spatial objects.

      • When the points are contained within the polygons, you can choose one of two options: Return a distance of zero, or Return the distance to the nearest edge. If the point is located inside of a polygon the DISTANCE component returns a negative number for the distance to the edge of the containing polygon.

      • When the points are contained outside of the polygons, you can also choose to have the DISTANCE component calculate the Nearest Interior point within the nearest edge of a polygon.

      • While this method of calculating distance takes the curvature of the Earth into consideration, it is not a routing routine and does not follow street patterns or honor physical barriers such as bodies of water.

    Inputs

    • Source table [Table]

    • Point or centroid source [Column]

    • Point, line or polygon destination [Column]

    • When a point is inside a polygon [Selection]

    • Create nearest interior point inside of the polygon [Boolean]

    • Output direction in degrees [Boolean]

    • Output cardinal direction [Boolean]

    • Units [Selection]

    Outputs

    • Result table [Table]

    Distance to Nearest

    Description

    This component identifies the shortest distance between spatial objects in one table and the objects in a second table.

    There are many use cases for this component. For example, to find the closest store locations to consumers in the customer file (both point tables), identify the closest cell towers (point tables) to LATAs (polygon tables), or select congressional districts (polygon tables) within 50 miles of a major thoroughfare (line table).

    Inputs

    • Targets [Table]

    • Universe [Table]

    • Target Spatial Object Field [Column]

    • Universe Spatial Object Field [Column]

    • Id column in target table [Column]

    • Id column in universe table [Column]

    • Maximum Distance [Number]

    • Units [Selection]

    Outputs

    • Result table [Table]

    Prepare for Visualization

    Description

    This component generates a new table that is ready to be visualized with the CARTO Maps API. Learn more abou this in Performance Considerations section.

    Inputs

    • Source table

    Settings

    • Geo column: Select a binary column that contains WKB geometries.

    Outputs

    • Output table: This component returns a table with the same schema as the input one, plus additional columns describing the feature bounding box; the geometry column as WKB binaries and clustered by the boundbinx columns. Learn more.

    Spatial Info

    Description

    This component extracts tabular information about the spatial object. Attributes like area, spatial object, number of parts, number of points, and centroid Latitude /Longitude coordinates can be revealed.

    Inputs

    • Source table [Table]

    • Spatial Object Field [Column]

    • Area (Square Kilometers) [Boolean]

    • Area (Square Miles) [Boolean]

    • Bounding Rectangle as Points [Boolean]

    • Bounding Rectangle as Polygon [Boolean]

    • Bounding Rectangle as X and Y Fields [Boolean]

    • Centroid as Spatial Object [Boolean]

    • Centroid as X and Y Fields [Boolean]

    • End Points as Spatial Object [Boolean]

    • End Points as X and Y Fields [Boolean]

    • Length (Kilometers) [Boolean]

    • Length (Miles) [Boolean]

    • Number of Parts [Boolean]

    • Number of Points [Boolean]

    • Spatial Object Type [Boolean]

    Outputs

    • Result table [Table]

    ST Area

    Description

    This component creates a new table with an additional column containing the area of the geographies in the input table. This only applies to the case of polygon geographies. For other geography types, the value will be zero.

    Inputs

    • Source polygons table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Buffer

    Description

    This component adds a new column with a buffer of the original input geography.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Distance [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Redshift reference

    PostgreSQL reference

    ST Centroid

    Description

    This component creates a new table with a new column containing points representing the centroids of the original input geographies.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Concave Hull

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component creates a new table containing a single row with the concave hull of point geographies in the input table. The concave hull is stored in a column named 'geo'.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Max edge (km) [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    ST Convex Hull

    Description

    This component creates a new table containing a single row with the convex hull of geographies in the input table. The convex hull is stored in a column named 'geo'.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Redshift reference

    PostgreSQL reference

    ST Distance

    Description

    This component calculates the minimum distance between geographies of an input table and those of a second table.

    It adds two new columns: 'nearest_id' and 'nearest_distance'.

    Inputs

    • Main table [Table]

    • Secondary table [Table]

    • Geo column in main table [Column]

    • Geo column in secondary table [Column]

    • Id column in main table [Column]

    • Id column in secondary table [Column]

    • Radius [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST DWithin

    Description

    This component calculates if geographies, of an input table and those of a second table that match a given criteria to join them, are within a given distance.

    Inputs

    1. Source table

    2. Source table

    Settings

    • Geo column in main table

    • Geo colum in second table

    • Column in main table for joining

    • Column in second table for joining

    • Max distance

    Outputs

    • Output table

    ST Length

    Description

    This component creates a new table with an additional 'length' column containing the length of the lines in the input table.

    Inputs

    • Source lines table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Line Interpolate Point

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component takes a table and columns containing lines and distances. It adds a new column with points along the lines, at a specified distance from the start of each line.

    Inputs

    • Source table [Table]

    • Lines column [Column]

    • Distance [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Perimeter

    Description

    This component adds a new column containing the perimeter of the geographies in the input table. This only applies to the case of polygon geographies. For other geography types, the value will be zero.

    Inputs

    • Source polygons table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Simplify

    Description

    This component adds a new column containing a simplified version of the original input geography.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Tolerance [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Snap to Grid

    Description

    This component takes an input table and creates a new one replacing its geo column by a new column containing the original input geography snapped to a grid a specified size (in degrees).

    Arbitrary grid sizes are not supported. The grid_size parameter is rounded so that it is of the form 10^n, where -10 < n < 0.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Grid size (degrees) [Number]

    Outputs

    • Result table [Table

    External links

    BigQuery reference

    PostgreSQL reference

    Subdivide

    Description

    This component subdivided large features into smaller ones, using a quadgrid. It returns multiple rows, each with a smaller portion of the input feature and the rest of columns.

    Inputs

    • Source table [Table]

    • Geo column [column]

    • Size: The size is the approximate length, in meters, of the cell's side used for the subdivision.

    Trade Areas

    Description

    This component creates regions around specified point objects in the input table.

    Inputs

    • Source table [Table]

    • Spatial Object Field of Point Source [Column]

    • Radius, Doughnuts [String]

    • Units [Selection]

    • Include Point in Output [Boolean]

    • Eliminate Overlap [Boolean]

    Outputs

    • Result table [Table]

    Minimum X [Number]

  • Minimum Y [Number]

  • Maximum X [Number]

  • Maximum Y [Number]

  • Outputs

    • Result table [Table]

    BigQuery reference

    Clip with Polygons

    Description

    This component creates a new table containing the geographies of the input table clipped to the combined boundary of the geographies in another table. Geographies in this second table must be of type polygon.

    Inputs

    • Table to clip [Table]

    • Polygons table [Table]

    • Geo column in table to clip [Column]

    • Geo column in polygons table [Column]

    Outputs

    • Result table [Table]

    Create Grid

    Description

    This component creates a new table containing points distributed in a regular grid, according to a given interval. Point geographies are stored in a column named 'geo'. The extent to cover is defined by an ancillary table with geographies.

    Inputs

    • Reference table (used for extent) [Table]

    • Geo column [Column]

    • Interval (degrees) [Number]

    Outputs

    • Result table [Table]

    Create Isolines

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component computes isolines.

    Inputs

    • Source table

    Settings

    • Geo column: select a column that contains points that will be the origin point of the isoline.

    • Mode: Select a mode of transport for the isoline calculation.

    • Type of range: Select between time or distance for the isoline calculation.

    • Range value (seconds/meters): Input a numeric value that represent seconds if time was selected or meters if distance was selected.

    Provider-specific settings:

    CARTO makes use of different isoline providers. The default one is TravelTime, but TomTom and HERE are also available. Please contact the team at [email protected] for more information.

    TravelTime:

    • Departure time: Take into account historical traffic conditions to generate the isoline. Accepted values are now or a timestamp in this format: YYYY-MM-DDThh:mm:ss, for example: 2024-01-31T09:30:00

    • Single shape (default true): Force the resulting isoline to be a single polygon.

    TomTom

    • Departure time: Accepted values are any, now or a timestamp in this format: YYYY-MM-DDThh:mm:ss, for example: 2024-01-31T09:30:00

    • Use traffic information if available: Check this option to take into account historical traffic conditions to generate the isoline.

    HERE

    • Arrival Time: any, now or timestamp (YYYY-MM-DDThh:mm:ss). Not compatible with Departure Time

    • Departure Time: any, now or timestamp (YYYY-MM-DDThh:mm:ss). Not compatible with Arrival time

    • Optimize for: balanced (default), quality, performance.

    • Routing mode: fast (default), short

    • Origin radius (m): Consider any position in the radius of the origin point as origin.

    Advanced Options:

    • Isoline Options: A JSON object that can contain other provider specific options, for example: {"option1":"value", "option2": 123}. Check each provider documentation for a complete reference of available options in the External links section below.

    Outputs

    • Result table: The component generates a table that contains the following columns:

      • All columns from input table.

      • An orig_geom column that contains the original point geometry in the selected geo column.

      • A geom column that contains the generated isoline.

      • A carto_isoline_metadata column that can contain additional information. Usually details if the isoline failed for some reason. In most scenarios it will be null if the isoline was generated correctly.

    External links

    • TravelTime API reference

    • HERE API reference

    • TomTom API reference

    Create Routes

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component creates routes between origin and destination points stored in a table.

    Inputs

    • Source table [Table]

    Settings

    • Origin column [Column]

    • Destination column [Column]

    • Name for column with routes [String]

    • Mode [Selection]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Create Routing Matrix

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component computes the routing matrix between a set of origins and destinations and return their distances and duration until arrival.

    Inputs

    • Origin table: A table that contains points with the origin points for the routes.

    • Destination table: A table that contains points with the destination points for the routes.

    Settings

    • Type of transport: Select the type of transport for the route computation. The available options will depend on the provider. See the 'External links' reference for a full description of the options.

    • Departure time: (Optional) Set the departure time for the route computation. Date-time as "<YYYY-MM-DD>T<hh:mm:ss>" . Default is "now" or "any" if your provider is TomTom.

    • Travel time: (Not available with TomTom) The maximum time allowed for the route calculation. If a set of origin/destination produces a route that is longer that this setting (in seconds) it won't be computed.

    • Advanced settings: a free-text input that supports a JSON formatted string with advanced options. See 'External links' section below for a list of additional options supported on each provider.

    Advanced settings is preset with carto_origins_batch_size and carto_destinations_batch_size configurations. Together they define the size of the batches that will be made to optimize calls to the external provider. You can configure these settings to adapt to the shape of your matrix. For example, rectangular (fewer origins than destinations) or square (same number of origins and destinations) matrices. Please refer to the 'External links' section below for more details. NOTE: The product of both origins and destinations batch sizes can't be greater than 10000.

    Outputs

    • Result table: This component generates a table that is the cartesian product of both input tables with additional columns for the route computation.

    External links

    • BigQuery

    • Snowflake

    • Redshift

    Create Tileset

    On July, 17th 2024 this component was deprecated in favor of the components in the Tileset Creation category.

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component creates a tileset.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Tileset name [String]

    • Tileset description [String]

    • Minimum zoom [Number]

    • Maximum zoom [Number]

    • Minimum zoom column [Column]

    • Maximum zoom column [Column]

    • Maximum tile size (KB) [Number]

    • Tile feature order [String]

    • Drop duplicates [Boolean]

    Outputs

    External links

    BigQuery reference

    Point from Static LatLon

    Description

    This component creates a new table containing a single row with a single point geography. The point is stored in a column named 'geo'.

    Inputs

    • Longitude [Number]

    • Latitude [Number]

    Outputs

    • Result table [Table]

    Remove Holes

    Description

    This component generates a new geo column with polygons without holes.

    Inputs

    • Source polygons table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    Spatial Process

    Description

    This component performs high-level spatial object editing from a simple, single component. You can combine multiple objects or cut the spatial objects of the input table.

    With this component you can combine objects, cut 1st from 2nd, cut 2nd from 1st, create the intersection object or create the inverse intersection object. Define the action that you want to do using the “Action” field.

    Inputs

    • source table [Table]

    • 1st Spatial Field [Column]

    • 2nd Spatial Field [Column]

    • Action [Selection]

    Outputs

    • Result table [Table]

    ST Boundary

    Description

    This component adds a new column with the boundary of the input geography.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]:

    External links

    BigQuery reference

    Redshift reference

    ST Bounding Box

    Description

    This component creates a new column containing rectangles that represents the bounding box of each geometry.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Extent

    Description

    This component creates a new table containing a 'geom' column and a single row containing a geography that represents the bounding box of the input table.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    ST Generate Points

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component generates a table with a specified number of random points within a extent.

    The extent is specified using a table with polygons.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • Number of points [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    ST Geocode

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component geocodes an input table.

    It generates a new table with an additional column named 'geo' containing the geocoded addresses.

    Inputs

    • Source table [Table]

    • Column with addresses [Column]

    • Country [Selection]

    Outputs

    • Table with geocoded rows [Table]

    • Table with rows that could not be geocoded [Table]

    External links

    BigQuery reference

    ST Make Line

    Description

    This component creates a new table with an additional column with line geographies. Geographies are created based on another column containing arrays of points.

    Inputs

    • Source table [Table]

    • Points column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    PostgreSQL reference

    ST Make Polygon

    Description

    This component creates a new table with an additional column with polygon geographies. Geographies are created based on another column containing polylines.

    Inputs

    • Source table [Table]

    • Polylines column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    ST Polygonize

    This component requires the CARTO Analytics Toolbox installed in the chosen connection to build the workflow.

    Description

    This component adds a new column with polygons, created from a column containing arrays of lines representing polygon edges.

    Inputs

    • Source table [Table]

    • Geo column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    If the workflows_temp dataset does not exist, it will be automatically created in the region of the tenant (US, EU, AS, AU). In order to specify a custom region for the workflows_temp, manually create the dataset in the desired region and set the name (if different than workflows_temp) in the BigQuery connection > Advanced options.
    -- If the workflows_temp schema already exists:
    -- Grant permissions to manage the schema
    GRANT ALL PRIVILEGES ON SCHEMA DATABASE_NAME.WORKFLOWS_TEMP TO ROLE ROLE_NAME;
    
    -- If the workflows_temp schema does not exist yet:
    -- Grant permissions to create the schema
    GRANT USAGE ON DATABASE DATABASE_NAME TO ROLE ROLE_NAME;
    

    Your Databricks user will need to have USE CATALOG and CREATE SCHEMA permission.

    GRANT USE CATALOG ON CATALOG my_catalog TO [email protected];
    GRANT CREATE SCHEMA ON CATALOG my_catalog TO [email protected];
    -- If the workflows_temp schema already exists:
    -- Grant permissions to manage the schema
    GRANT ALL ON SCHEMA workflows_temp TO user_name;
    
    -- If the workflows_temp schema does not exist yet:
    -- Grant permissions to create the schema
    GRANT CREATE ON DATABASE database_name TO user_name;

    The task is created once per schema, and will be removed when the schema is removed. It can also be removed manually:

    DROP TASK IF EXISTS DATABASE_NAME.CARTO_TEMP."carto_scheduler.workflow_clear_cache.workflows_temp"

    In Databricks, a user with enough privileges will need to set up a workflow or notebook that runs periodically to clean up the content of the carto_temp schema created in the selected catalog.

    When your workflow runs in Redshift, in order to remove the intermediate tables, you should create a scheduled query in Redshift to remove datasets older than 30 days.

    Scheduled queries are only available for provisioned Redshift clusters. If you are using Redshift Serverless you'll have to execute manually the clean up, e.g. calling the procedure defined below, or find other external means of executing it periodically.

    The following procedure should be created in your Redshift cluster to delete tables older than 30 days (you will have to replace all the occurrences of the $database$ placeholder with the name of your database):

    CREATE OR REPLACE PROCEDURE $database$.carto_temp._clear_cache_fn() as
    $$
    DECLARE
      statement RECORD;
      query VARCHAR(MAX);
    BEGIN
      query := 'SELECT
          \'DROP TABLE $database$.workflows_temp.\' || relname || \';\' as statement
    

    After that, you should define a Redshift scheduled query with the following CALL to execute the previous procedure once per day:

    When your workflow runs in PostgreSQL, a pg_cron task is automatically created in the carto_temp schema to remove the intermediate tables after 30 days. This task is executed periodically every day at 0h UTC.

    Here are some resources to learn how to use pg_cron on different managed services based on PostgreSQL:

    • Amazon RDS. Usage guide available here.

    • Amazon Aurora PostgreSQL supports pg_cron since version 12.

    • Google Cloud SQL. Setup instructions .

    • Azure Databases for PostgreSQL supports pg_cron since and provides .

    This is the command to install the pg_cron extension:

    These are the required permissions to create the pg_cron task:

    Note that the task affects only the tables in that schema. If the name of the schema is changed in the connection, for example to carto_temp_custom, a new task will be created for the new schema.

    The task is created once per schema, and will be removed when the pg_cron extension is removed. It can also be removed manually:

    For workflows executed via API call, the Output table will be reused between API calls that have the exact same parameter values. If parameters are changed, the output table will be recomputed.
  • Redshift and PostgreSQL: Intermediate tables are never reused. This means that all nodes in a workflow are always computed completely.

    • This setting doesn't have an effect on executions from the UI, triggered by clicking on the 'Run' button.

    • This setting also doesn't have an effect on Scheduled executions (only available for PostgreSQL connections).

    • For , the table will be reused between API calls that have the exact same parameter values. If parameters are changed, the output table will be recomputed.

  • service account
    here
    BigQuery documentation
    Snowflake task
    scheduled workflows
    -- Grant permissions to manage the schema
    GRANT ALL PRIVILEGES ON SCHEMA DATABASE_NAME.CARTO_TEMP TO ROLE ROLE_NAME;
    
    -- Grant permissions to execute the task
    GRANT EXECUTE TASK ON ACCOUNT TO ROLE ROLE_NAME;
    -- If the workflows_temp schema already exists:
    -- Grant permissions to manage the schema
    GRANT ALL ON SCHEMA workflows_temp TO user_name;
    
    -- If the workflows_temp schema does not exist yet:
    -- Grant permissions to create the schema
    GRANT CREATE ON DATABASE database_name TO user_name;
    CALL $database$.carto_temp._clear_cache_fn();
    GRANT CREATE SCHEMA ON DATABASE DATABASE_NAME TO ROLE ROLE_NAME;
    FROM
    pg_class_info
    LEFT JOIN
    pg_namespace
    ON
    pg_class_info.relnamespace = pg_namespace.oid
    WHERE
    reltype != 0
    AND TRIM(nspname) = \'workflows_temp\'
    AND datediff(day, relcreationtime, current_date) > 30
    ';
    FOR statement IN EXECUTE query
    LOOP
    EXECUTE statement.statement;
    END LOOP;
    END;
    $$ LANGUAGE plpgsql;
    here
    version 11
    some usage examples
    workflows executed via API call
    Output
    CREATE EXTENSION pg_cron;
    -- Grant permissions to manage the schema
    GRANT ALL ON SCHEMA carto_temp TO user_name;
    
    -- Grant permissions to manage the task
    GRANT USAGE ON SCHEMA cron TO user_name;
    GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA cron TO user_name;
    SELECT cron.unschedule('carto_scheduler.workflow_clear_cache.carto_temp');

    Scheduling workflows

    Once you have designed a workflow, you might want it to be executed periodically. This allows to have the result of a workflow automatically updated if the input data sources of a workflow have been modified between the last execution and the next one. For this purpose, CARTO offers a UI that helps creating a scheduled query in different Data Warehouses.

    Keep reading this page to learn more about scheduling the execution of a workflow:

    Creating a schedule

    Once a workflow has been executed successfully, you can click on the button that you will find next to 'Run'

    Check the provider specific information to learn more about the required permission and pre-requisites that are needed to enable this button.

    Clicking on that button will open a new modal window where you can select the frequency and other settings for the schedule:

    • Hours: The workflow will be executed every X hours, at o'clock times.

    • Days: The workflow will be executed every day at a specific time.

    • Weeks: The workflow will be executed weekly, on a specific day, at a specific time.

    Custom expressions

    Please note that CARTO Workflows supports for BigQuery and CARTO Data Warehouse connections. For Snowflake and PostgreSQL, is supported. For Databricks connections, is supported.

    Updating, modifying and deleting a schedule

    Once a workflow has been scheduled, there are different actions that can be performed:

    • If your workflow has been modified, after a successful execution you will find this small notification.

    • Click on the schedule icon to be able to sync the latest changes to the scheduled version of your workflow.

    • By clicking on the schedule icon at any time, you will be able to modify the frequency at which the workflow will be executed. Select a different setting and click on 'Save changes'.

    • Same as above, by clicking on the schedule icon at any time, you will find a 'Delete schedule' button that will permanently disable the schedule.

    Permissions and pre-requisites

    BigQuery

    In BigQuery, scheduled tasks are stored in the project. It allows the creation of tasks in different regions.

    Permissions

    The connection requires the following roles:

    • BigQuery User

    • BigQuery Data Editor

    The project used in the connection will need to have this API enabled:

    • BigQuery Data Transfer API:

    In the , if the selected billing project is different than the Service Account project, the user must run this operation to enable Cross-project Service Account Authorization:

    Data Transfer Version Info

    For connections created using 'Sign in with Google', a Data Transfer Version Info code needs to be provided.

    In order to create a Data Transfer Version Info code, you need to .

    Once there, select your Google account and Allow access:

    After allowing, you will be provided a code. Copy it and paste into your Data Transfer Version Info setting in the connection's .

    Monitoring

    The scheduled queries can be monitored from the BigQuery console, in the "Scheduled queries" section. Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>". The id of the workflow is a UUID that can be found in the URL.

    Snowflake

    In Snowflake, scheduled tasks are created in a schema. In this case, it uses the workflows_temp schema. If the schema is removed, the task will be removed too.

    Snowflake makes use of to allow scheduled execution of SQL code. Tasks can be created via SQL statements using the syntax.

    Grant permissions

    It requires access to the workflows_temp schema, and also a global grant "EXECUTE TASK" to be able to start a task.

    Revoke permissions

    Monitoring

    The created tasks can be monitored from "Data > Databases", "MYDATABASE > MYSCHEMA > Tasks. Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>". The id of the workflow is a UUID that can be found in the URL.

    Databricks

    When scheduling a CARTO Workflow on a Databricks connection, a, with a specific run schedule. In order to do so, your user will need the "Can Create" and "Can Manage" permission for Jobs.

    "Can Create" Permissions

    This permission allows a user to create new jobs. Specifically, users with this permission can:

    • Define new jobs/workflows with tasks and configurations.

    • Attach tasks to clusters they have access to.

    • Set parameters like schedules, libraries, and task dependencies during job creation.

    "Can Manage" Permissions

    This permission provides full control over a job and is the highest level of access. Users with "Can Manage" permissions can:

    • Create, edit, delete, and clone jobs.

    • Modify schedules, task configurations, and parameters.

    • Manage job runs (e.g., start, stop, or restart runs).

    • Assign or modify permissions for the job, delegating "Can Edit" or "Can View" permissions to others.

    PostgreSQL

    For PostgreSQL, the extension provides the means to schedule the execution of a SQL query. Here are some resources to learn how to leverage pg_cron on different managed services based on PostgreSQL:

    • Amazon RDS. Usage guide available .

    • Amazon Aurora PostgreSQL supports pg_cron since .

    • Google Cloud SQL. Extension setup instructions .

    Grant permissions

    It requires the extension pg_cron to be installed and available. Also, access to the cron schema.

    Revoke permissions

    Monitoring

    The created jobs can be monitored by running SQL queries in the database:

    Notice that the name of the scheduled query will be "carto_scheduler.workflows.<id>". The id of the workflow is a UUID that can be found in the URL.

    Schedule a Workflow in Redshift

    Redshift doesn't offer an API to handle scheduled queries, which prevents CARTO from being able to schedule the execution of a workflow directly from the UI. However, users can schedule workflows on Redshift by following these simple steps:

    1. Go to the "SQL" tab of the results panel of your workflow.

    2. Click on the "Copy" button at the top right of the SQL tab to copy the SQL code corresponding to your entire workflow.

    3. Redshift allows creating a schedule to run SQL statements using the query editor. There are more details about how to use this feature .

    Months: The workflow will be executed monthly, on a specific day, at a specific time.
  • Custom: Use a custom expression to define the schedule. See below for more information.

  • Azure Databases for PostgreSQL supports pg_cron since version 11 and provides some usage examples.
    BigQuery syntax
    Cron syntax
    Quartz cron syntax
    https://console.cloud.google.com/apis/library/bigquerydatatransfer.googleapis.com
    Connection settings
    https://cloud.google.com/bigquery/docs/enable-transfer-service#cross-project_service_account_authorization
    visit this page
    Advanced Options
    Tasks
    CREATE TASK
    native Workflow will be created in Databricks
    pg_cron
    here
    version 12
    here
    here
    GRANT ALL PRIVILEGES ON SCHEMA <schema> TO ROLE <role>;
    GRANT EXECUTE TASK ON ACCOUNT TO ROLE <role>;
    REVOKE ALL PRIVILEGES ON SCHEMA <schema> FROM ROLE <role>;
    REVOKE EXECUTE TASK ON ACCOUNT FROM ROLE <role>;
    CREATE EXTENSION pg_cron;
    GRANT USAGE ON SCHEMA cron TO postgres;
    GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA cron TO postgres;
    REVOKE USAGE ON SCHEMA cron FROM postgres;
    REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cron FROM postgres;
    DROP EXTENSION pg_cron;
    -- Existing jobs
    select * from cron.job;
    
    -- Executed jobs
    select * from cron.job_run_details;

    Data Preparation

    Components to prepare your data for downstream analysis, this can include altering a table structure, re-ordering data, subsampling data, etc.

    Case When

    Description

    This component generates column values that depend on a set of specified conditions.

    Inputs

    Source table [Table]

  • Conditional expressions: The UI of this component helps creating a conditional expression involving multiple columns and SQL operators. Each expression will produce a different result, as set on the component.

  • Result Column [Column]: Select a column that will contain the specified resulting value.

  • Cast

    Description

    This component casts the content of a column to a given type

    Inputs

    • Source table [Table]

    • Column [Column]

    • New type [Selection]

    Outputs

    • Result table [Table]

    Columns to Array

    Description

    This component adds a new column with an array containing the values in a set of selected columns.

    Inputs

    • Source table [Table]

    • Columns

    • Array column name

    Outputs

    • Result table [Table]

    Create Column

    Description

    This component creates a new table with an additional column computed using an expression.

    Inputs

    • Source table [Table]

    • Name for new column [String]

    • Expression [String]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Drop Columns

    Description

    This component generates a new table with the same content as the input one, except one of its columns.

    The component will fail if the column to remove is the only one in the input table.

    Inputs

    • Source table [Table]

    • Column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Edit Schema

    Description

    This component simplifies the process of modifying table schemas. It allows to select specific columns, with the option to adjust their names and data types as required.

    Inputs

    • Source table [Table]

    • Columns: The component's UI allows selecting a column, giving a new name and selecting a data type to cast the column.

    Outputs

    • Result table [Table]

    When casting a STRING to TIMESTAMP the expected format is 1970-01-01T00:00:00.000Z. Having a differently formated string might produce an incorrect timestamp.

    Extract from JSON

    Description

    This component creates a new column with values extracted from the JSON strings in another column. It uses the Data Warehouse syntax to specify the path to the key that needs to be extracted. See the documentation links below for more information.

    This component only extracts one single property per done. If you're looking to extract multiple keys from your JSON into separate columns, refer to Parse JSON component.

    Inputs

    • Source table [Table]

    • JSON column [Column]

    • JSON path [expression]

    • New column [Column]

    Output

    • Result table

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Find and Replace

    Description

    This component finds a string in one column of a table and replaces it with the specified value from another table.

    As an alternative, columns from the lookup table can be added to the original table in those rows where the searched string is found. This is regulated by the Mode parameter.

    Inputs

    • Source table [Table]

    • Find within column [Column]

    • Lookup table [Table]

    • Find value column [Column]

    • Replacement column [Column]

    • Find mode [Selection]

    • Case insensitive [Boolean]

    • Match whole word [Boolean]

    • Mode [Selection]

    • Columns to append [Column] [Multiple]: only used if Append field(s) to record mode is selected

    Outputs

    • Result table [Table]

    Generate UUID

    Description

    This component creates a new table with an additional UUID column named id.

    Inputs

    • Source table [Table]

    Outputs

    • Result table [Table]

    BigQuery reference

    Snowflake reference

    Geography to Geometry

    Description

    This component converts a column from geography to geometry data type.

    Inputs

    • Source table [Table]

    • geography column [Column]

    Outputs

    • Result table [Table]

    Geometry to Geography

    Description

    This component converts a column from geometry to geography data type.

    Inputs

    • Source table [Table]

    • Geometry column [Column]

    Outputs

    • Result table [Table]

    Hex Color Generator

    Description

    This component create hex color for each distinct value of an input string column. Column values as NULL will be associated to grey value. The component generates a copy of the source with a new string column called: [name_of_input_col] + '_hex_color'.

    Inputs

    • Source table [Table]

    • Column with category values

    Outputs

    • Result table [Table]

    Is not Null

    Description

    This component filters an input table using the presence or absence of null values in a given column.

    Inputs

    • Source table [Table]

    • Column [Column]

    Outputs

    • Not null values table [Table]

    • Null values table [Table]

    Limit

    Description

    This component creates a new table with only the N first rows of the input table.

    Inputs

    • Source table [Table]

    • Number of rows [Number]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Multi-col formula

    Description

    This component computes new values based on a given expression and a set of fields to apply the expression to. Use $a to refer to the value of the current column.

    Inputs

    • Source table[Table]

    • Expression [String]. The expression to apply

    • Mode [Selection]. The mode used to put new values in the table

    • Prefix [String]. Only for the case of mode='Create new columns'

    • Columns [Column][Multiple]. The columns to apply the formula to

    Outputs

    • Result table [Table]

    Multi-row formula

    Description

    This component creates a new table containing a new column computed using a multi-row formula based on one or several input columns.

    To refer to a value in the previous row, use {colname - 1} and to refer to a value in the next row, use {colname + 1}.

    Inputs

    • Table[Table]

    • New column name [String]

    • New column type [Selection]

    • Expression [String]

    • Value for missing row values [Selection]

    • Column to sort by [Column]

    • Column to group by [Column]

    Outputs

    • Result table [Table]

    Normalize

    Description

    This component normalizes the values of a given column.

    It adds a new column named '[column_name]_norm'.

    Normalization can be computed as 0-1 values or as z-scores

    Inputs

    • Source table [Table]

    • Column to normalize [Column]

    • Use z-scores [Boolean].

      • Disabled (default): When disabled, the resulting normalized values will range between 0 an 1.

      • Enabled: The normalized value will be calcuated as a z-score or standard score: the number of standard deviations that the value is above or below the mean of the whole column. See .

    Outputs

    • Result table [Table]

    Order by

    Description

    This component generates a new table containing the rows of an input one sorted according to the values in one of its columns, and an optional second column.

    Columns to use cannot be of type geometry.

    Inputs

    • Table to order [Table]

    • Column to order by [Column]

    • Use descending order [Boolean]

    • Optional secondary column to order by [Column]

    • Use descending order in secondary column [Boolean]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    Parse JSON

    Description

    This component creates new columns with values extracted from the JSON string in other columns. It uses the data warehouse syntax to specify the path to the key that needs to be extracted.

    Input

    • Source table. This components expects an input table containing at least one column with a JSON object.

    Settings

    • Select JSON column: This allows selecting a string column that contains a JSON object

    • Add column name: Type a name for a new column that will be added with the content of a specific JSON key.

    • Add JSON path: Type a JSON path expression, in the syntax of your data warehouse to identify the key to be extracted

    Output

    • Output table: this component generates a table with the same schema as the input source table, plus an additional column per JSON key specified in the Settings.

    External links

    BigQuery reference

    Snowflake reference

    Redshift reference

    PostgreSQL reference

    Poly Build

    Description

    This component takes a group of spatial point objects and draws a polygon or polyline in a specific sort order to represent that group of points.

    This component can also be used for spatial layer development by translating a collection of GPS data into polygon or polyline objects, where a polygon is a simple bounded region, such as a state boundary, and a polyline contains multiple line segments with any number of points between its start and endpoints, such as a river or road.

    Inputs

    • Build Method [Selection]

    • Source table [Table]

    • Source Field [Column]

    • Source Field [Column]

    • Sequence Field [Column]

    Outputs

    • Result table [Table]

    Poly Split

    Description

    This component splits polygon or polyline objects into their component point, line, or region objects.

    This is a very specialized component used for spatial layer development. A typical use of this component is to disaggregate complex regions that may contain more than one polygon or to separate a polyline into its individual nodes.

    Inputs

    • Source table [Table]

    • Spatial Field [Column]

    • Split To [Selection]

    Outputs

    • Result table [Table]

    Remove Duplicated

    Description

    This component takes an input table and generates a new one in which duplicates rows from the input table have been removed.

    Inputs

    • Source table [Table]

    Outputs

    • Result table [Table]

    Rename Column

    Description

    This component generates a new table with the same content as the input one, renaming one or multiple of its columns.

    Inputs

    • Source table [Table]

    • Column to rename [Column]

    • New column name [String]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Row Number

    Description

    This component creates a new table with an additional column containing row numbers.

    Inputs

    • Source table [Table]

    Outputs

    • Result table [Table]

    Sample

    Description

    This component generates a new table with a random sample of N rows from an input table.

    Inputs

    • Source table [Table]

    • Number of rows to sample [Number]

    Outputs

    • Result table [Table]

    Select

    Description

    This component runs an arbitrary SELECT statement.

    It generates a new table from a SELECT statement applied on the input table.

    Inputs

    • Source table [Table]

    • SELECT statement [String]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Snowflake reference

    PostgreSQL reference

    Select Distinct

    Description

    This component generates a new table with the unique values that appear in a given column of an input table.

    Inputs

    • Source table [Table]

    • Column [Column]

    Outputs

    • Result table [Table]

    External links

    BigQuery reference

    Redshift reference

    Simple Filter

    Description

    This component filters an input table according to a filter expression based on a single column.

    It generates a new table with only the rows of the input table that meet the filter criteria and another one with those that do not meet it.

    Inputs

    • Source table [Table]

    • Column [Column]

    • Operator [Selection]

    • Value [String]

    Outputs

    • Table with rows that pass the filter [Table]

    • Table with rows that do not pass the filter [Table]

    BigQuery reference

    Redshift reference

    Spatial Filter

    Description

    This component filters an input table using a spatial predicate and a filter table.

    It generates a new table with only the rows of the input table that meet the filter criteria and another one with those that do not meet it.

    Inputs

    • Source table [Table]

    • Filter table [Table]

    • Geo column in source table [Column]

    • Geo column in filter table [Column]

    • Spatial predicate [Selection]

    Outputs

    • Table with rows that pass the filter [Table]

    • Table with rows that do not pass the filter [Table]

    ST SetSRID

    Description

    This component sets the SRID of a geo column

    Inputs

    • Source table [Table]

    • Geo column [Column]

    • SRID [String]

    Outputs

    • Result table [Table]

    Text to columns

    Description

    This component adds new columns based on splitting the text string in a text column.

    Inputs

    • Table [Table]

    • Column to split [Column]

    • Delimiters [String]

    • Mode [Selection]. Whether to add new columns or new rows with splitted strings

    • Number of new columns: Only used if mode = 'Split to columns'

    • Prefix for new column names [String]: Only used if mode = 'Split to columns'

    • Extra characters: What to do with extra characters if there are more tokens after dividing the string according to the delimiters than the ones defined in the 'Number of new columns' parameter

    Outputs

    • Result table [Table]

    Transpose / Unpivot

    Description

    This component rotates table columns into rows.

    Inputs

    • Table to unpivot [Table]

    • Key columns [Column][Multiple]:The columns to use for identifying rows

    • Data columns [Column][Multiple]: The columns to use for key-value pairs

    Outputs

    • Result table [Table]

    Unique

    Description

    This component separates unique rows and duplicated rows. Unique rows are defined using the values in one or several columns of the input table, as defined by the user.

    Inputs

    • Tables [Table]

    • Columns to find unique values [Column][Multiple]

    Outputs

    • Table with unique rows [Table]

    • Table with duplicated rows [Table]

    Where

    Description

    This component filters an input table according to a filter expression.

    It generates a new table with only the rows of the input table that meet the filter criteria and another one with those that do not meet it.

    Inputs

    • Source table [Table]

    • Filter expression [String]

    Outputs

    • Table with rows that pass the filter [Table]

    • Table with rows that do not pass the filter [Table]

    External links

    BigQuery reference

    Snowflake reference

    • Result table [Table]

    reference

    Components

    Components in CARTO Workflows are, together with the data sources, the available resources to build your analytical workflows. To use these elements, drag and drop them from the Components Explorer to the Workflow canvas.

    In this section you can find the list of all components in CARTO Workflows and their availability in the different cloud data warehouse platforms. Please check this page regularly as currently we are continuously incorporating new components and also completing their availability across platforms.

    This section also contains information about Extension Packages, which extend Workflows' functionality for specific use cases.

    Extension Packages can be created for workflows that use BigQuery and Snowflake connections.

    In addition to the extensions created by external developers, partners and other third-parties, CARTO provides a collection of extension packages developed and maintained internally. These are described below, and are distributed via .zip files that can be installed from the Workflows UI.

    Please note that the components available via your CARTO Data Warehouse connection are the same ones as for Google BigQuery.

    Aggregation

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Control

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Custom

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Data Enrichment

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Data Preparation

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Generative AI

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Input / Output

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Joins

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Parsers

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Raster Operations

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Spatial Accessors

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Spatial Analysis

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Spatial Constructors

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Spatial Indexes

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Spatial Operations

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Statistics

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Tileset Creation

    Component
    BigQuery
    Snowflake
    Databricks
    Oracle
    Redshift
    PostgreSQL

    Extension Packages

    Workflows' functionality can be extended and tailored to specific use cases, by creating custom components, which can be grouped and distributed together as .

    Extension Packages can be created for workflows that use BigQuery and Snowflake connections.

    In addition to the extensions created by external developers, partners and other third-parties, CARTO provides a collection of extension packages developed and maintained internally. These are described below, and are distributed via .zip files that can be .

    BigQuery ML

    Component
    BigQuery

    Snowflake ML

    Component
    Snowflake

    Google Earth Engine

    Component
    BigQuery

    Google Environment APIs

    Component
    BigQuery

    Geospatial Foundation Models

    Component
    Bigquery

    Telco Signal Propagation Models

    Component
    BigQuery

    Territory Planning

    Component
    BigQuery
    Snowflake

    Analytics on Embeddings

    Component
    BigQuery

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    l

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    (coming soon)

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ❌

    Count

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    Group by

    Conditional Split

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    Success/Error Split

    Call Procedure

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    Custom Python Code

    Enrich H3 Grid

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    Enrich Points

    Case When

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    Cast

    ML Generate Text

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    Export to Bucket

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    Create Builder Map

    Cross Join

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    Join

    Draw Custom Features

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ST as GeoJSON

    Get Values from Raster

    ✔️

    ✔️

    ❌

    ❌

    ❌

    ❌

    Intersect and Aggregate Raster

    ST Geometry Type

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ST NumGeometries

    K-Nearest Neighbors

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    ST Cluster DBSCAN

    BBOX from Values

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    Clip with Polygons

    H3 Boundary

    ✔️

    ✔️

    ✔️

    ❌

    ❌

    ❌

    H3 Center

    Distance Matrix

    ✔️

    ✔️

    ❌

    ❌

    ✔️

    ✔️

    Distance (single table)

    Cluster Time Series

    ✔️

    ❌

    ❌

    ❌

    ❌

    ❌

    Composite Score Supervised

    Create H3 Aggregation Tileset

    ✔️

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    Create Point Aggregation Tileset

    Create Classification Model

    ✔️

    Create Forecast Model

    ✔️

    Create Regression Model

    ✔️

    Evaluate

    ✔️

    Evaluate Forecast

    ✔️

    Explain Forecast

    ✔️

    Get Model by Name

    ✔️

    Create Classification Model

    ✔️

    Create Forecasting Model

    ✔️

    Predict

    ✔️

    Forecast

    ✔️

    Evaluate Classification

    ✔️

    Summary Region

    ✔️

    Summary Table

    ✔️

    Get Elevation

    ✔️

    Get NDVI

    ✔️

    Get Precipitation

    ✔️

    Solar API Building Insights

    ✔️

    Air Quality API Current Conditions

    ✔️

    Air Quality API Forecast

    ✔️

    Air Quality API History

    ✔️

    Pollen API Forecast

    ✔️

    Google PDFM County Embeddings

    ✔️

    Google PDFM ZCTA Embeddings

    ✔️

    Google Earth Engine Satellite Embeddings

    ✔️

    Path Profile

    ✔️

    Path Profile Raster

    ✔️

    Close In

    ✔️

    Extended HATA

    ✔️

    Facilities Preparation

    ✔️

    ✔️

    Demand Points Preparation

    ✔️

    ✔️

    Cost Matrix Preparation

    ✔️

    ✔️

    Constraints Definition

    ✔️

    Change Detection

    ✔️

    Clustering

    ✔️

    Similarity Search

    ✔️

    Visualization

    ✔️

    Extension Packages
    installed from the Workflows UI

    ✔️

    ✔️

    ❌

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    ✔️

    Point Stats in Polygons
    ST Dump
    Summarize
    Unnest
    Custom SQL Select
    Enrich Polygons
    Enrich Polygons with Weights
    Enrich Quadbin Grid
    Create Column
    Columns to Array
    Drop Columns
    Edit Schema
    Extract from JSON
    Find Replace
    Generate UUID
    Geography to Geometry
    Geometry to Geography
    Hex Color Generator
    Limit
    Multi-col formula
    Multi-row formula
    Normalize
    Order by
    Parse JSON
    Poly Build
    Poly Split
    Remove Duplicated
    Rename Column
    Row Number
    Sample
    Select
    Select Distinct
    Simple Filter
    Spatial Filter
    ST SetSRID
    Text to columns
    Transpose
    Unique
    Where
    Get Table by Name
    HTTP Request
    Import from URL
    API Output
    MCP Tool Output
    Save as Table
    Send by Email
    Join (extended)
    Spatial Join
    Spatial Match
    ST Difference
    ST Intersection
    Union All
    ST as Text
    ST Geog from Text
    ST GeogPoint
    Table from GeoJSON
    WKB from GeoJSON
    WKB from Text
    ST NumPoints
    ST Point N
    ST X / ST Y
    ST Cluster K-Means
    ST Count Points in Polygons
    ST Delaunay Polygons
    ST Voronoi
    Create Grid
    Create Isolines
    Create Routes
    Create Routing Matrix
    Point from Static LatLon
    Remove Holes
    Spatial Process
    ST Boundary
    ST Bounding Box
    ST Extent
    ST Generate Points
    ST Geocode
    ST Make Line
    ST Make Polygon
    ST Polygonize
    H3 Grid Distance
    H3 from GeoPoint
    H3 KRing
    H3 Polyfill
    H3 To Parent
    Quadbin Boundary
    Quadbin Center
    Quadbin from GeoPoint
    Quadbin KRing
    Quadbin Polyfill
    Quadbin To Parent
    Distance to Nearest
    Prepare for Visualization
    Spatial Info
    ST Area
    ST Buffer
    ST Centroid
    ST Concave Hul
    ST Convex Hull
    ST Distance
    ST DWithin
    ST Length
    ST Line Interpolate Point
    ST Perimeter
    ST Simplify
    ST Snap to Grid
    Subdivide
    Trade Areas
    Composite Score Unsupervised
    Cronbach Alpha Coefficient
    Detect Space-time Anomalies
    Detect Spatial Anomalies
    Getis Ord
    Getis Ord Spacetime
    GWR
    Hotspot Analysis
    Local Moran's I
    Moran's I
    Spacetime Hotspots Classification
    Create Quadbin Aggregation Tileset
    Create Vector Tileset
    Explain Predict
    Forecast
    Get Model by Name
    Global Explain
    Import Model
    Predict
    Evaluate Forecast
    Feature Importance (Classification)
    Feature Importance (Forecast)
    Location-Allocation
    Territory Balancing