LogoLogo
HomeAcademyLoginTry for free
  • Welcome
  • What's new
    • Q2 2025
    • Q1 2025
    • Q4 2024
    • Q3 2024
    • Q2 2024
    • Q1 2024
    • Q4 2023
    • Q3 2023
    • Q2 2023
    • Q1 2023
    • Q4 2022
    • Q3 2022
  • FAQs
    • Accounts
    • Migration to the new platform
    • User & organization setup
    • General
    • Builder
    • Workflows
    • Data Observatory
    • Analytics Toolbox
    • Development Tools
    • Deployment Options
    • CARTO Basemaps
    • CARTO for Education
    • Support Packages
    • Security and Compliance
  • Getting started
    • What is CARTO?
    • Quickstart guides
      • Connecting to your data
      • Creating your first map
      • Creating your first workflow
      • Developing your first application
    • CARTO Academy
  • CARTO User Manual
    • Overview
      • Creating your CARTO organization
      • CARTO Cloud Regions
      • CARTO Workspace overview
    • Maps
      • Data sources
        • Simple features
        • Spatial Indexes
        • Pre-generated tilesets
        • Rasters
        • Defining source spatial data
        • Managing data freshness
        • Changing data source location
      • Layers
        • Point
          • Grid point aggregation
          • H3 point aggregation
          • Heatmap point aggregation
          • Cluster point aggregation
        • Polygon
        • Line
        • Grid
        • H3
        • Raster
        • Zoom to layer
      • Widgets
        • Formula widget
        • Category widget
        • Pie widget
        • Histogram widget
        • Range widget
        • Time Series widget
        • Table widget
      • SQL Parameters
        • Date parameter
        • Text parameter
        • Numeric parameter
        • Publishing SQL parameters
      • Interactions
      • Legend
      • Basemaps
        • Basemap selector
      • AI Agents
      • SQL analyses
      • Map view modes
      • Map description
      • Feature selection tool
      • Search locations
      • Measure distances
      • Exporting data
      • Download PDF reports
      • Managing maps
      • Sharing and collaboration
        • Editor collaboration
        • Map preview for editors
        • Map settings for viewers
        • Comments
        • Embedding maps
        • URL parameters
      • Performance considerations
    • Workflows
      • Workflow canvas
      • Results panel
      • Components
        • Aggregation
        • Custom
        • Data Enrichment
        • Data Preparation
        • Generative AI
        • Input / Output
        • Joins
        • Parsers
        • Raster Operations
        • Spatial Accessors
        • Spatial Analysis
        • Spatial Constructors
        • Spatial Indexes
        • Spatial Operations
        • Statistics
        • Tileset Creation
        • BigQuery ML
        • Snowflake ML
        • Google Earth Engine
        • Google Environment APIs
        • Telco Signal Propagation Models
      • Data Sources
      • Scheduling workflows
      • Sharing workflows
      • Using variables in workflows
      • Executing workflows via API
      • Temporary data in Workflows
      • Extension Packages
      • Managing workflows
      • Workflows best practices
    • Data Explorer
      • Creating a map from your data
      • Importing data
        • Importing rasters
      • Geocoding data
      • Optimizing your data
    • Data Observatory
      • Terminology
      • Browsing the Spatial Data Catalog
      • Subscribing to public and premium datasets
      • Accessing free data samples
      • Managing your subscriptions
      • Accessing your subscriptions from your data warehouse
        • Access data in BigQuery
        • Access data in Snowflake
        • Access data in Databricks
        • Access data in Redshift
        • Access data in PostgreSQL
    • Connections
      • Google BigQuery
      • Snowflake
      • Databricks
      • Amazon Redshift
      • PostgreSQL
      • CARTO Data Warehouse
      • Sharing connections
      • Deleting a connection
      • Required permissions
      • IP whitelisting
      • Customer data responsibilities
    • Applications
    • Settings
      • Understanding your organization quotas
      • Activity Data
        • Activity Data Reference
        • Activity Data Examples
        • Activity Data Changelog
      • Users and Groups
        • Inviting users to your organization
        • Managing user roles
        • Deleting users
        • SSO
        • Groups
        • Mapping groups to user roles
      • CARTO Support Access
      • Customizations
        • Customizing appearance and branding
        • Configuring custom color palettes
        • Configuring your organization basemaps
        • Enabling AI Agents
      • Advanced Settings
        • Managing applications
        • Configuring S3 Bucket for Redshift Imports
        • Configuring OAuth connections to Snowflake
        • Configuring OAuth U2M connections to Databricks
        • Configuring S3 Bucket integration for RDS for PostgreSQL Exports in Builder
        • Configuring Workload Identity Federation for BigQuery
      • Data Observatory
      • Deleting your organization
    • Developers
      • Managing Credentials
        • API Base URL
        • API Access Tokens
        • SPA OAuth Clients
        • M2M OAuth Clients
      • Named Sources
  • Data and Analysis
    • Analytics Toolbox Overview
    • Analytics Toolbox for BigQuery
      • Getting access
        • Projects maintained by CARTO in different BigQuery regions
        • Manual installation in your own project
        • Installation in a Google Cloud VPC
        • Core module
      • Key concepts
        • Tilesets
        • Spatial indexes
      • SQL Reference
        • accessors
        • clustering
        • constructors
        • cpg
        • data
        • http_request
        • import
        • geohash
        • h3
        • lds
        • measurements
        • placekey
        • processing
        • quadbin
        • random
        • raster
        • retail
        • routing
        • s2
        • statistics
        • telco
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
        • Working with Raster data
      • Release notes
      • About Analytics Toolbox regions
    • Analytics Toolbox for Snowflake
      • Getting access
        • Native App from Snowflake's Marketplace
        • Manual installation
      • Key concepts
        • Spatial indexes
        • Tilesets
      • SQL Reference
        • accessors
        • clustering
        • constructors
        • data
        • http_request
        • import
        • h3
        • lds
        • measurements
        • placekey
        • processing
        • quadbin
        • random
        • raster
        • retail
        • s2
        • statistics
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
        • Working with Raster data
      • Release Notes
    • Analytics Toolbox for Databricks
      • Getting access
        • Personal (former Single User) cluster
        • Standard (former Shared) cluster
      • Reference
        • lds
        • tiler
      • Guides
      • Release Notes
    • Analytics Toolbox for Redshift
      • Getting access
        • Manual installation in your database
        • Installation in an Amazon Web Services VPC
        • Core version
      • Key concepts
        • Tilesets
        • Spatial indexes
      • SQL Reference
        • clustering
        • constructors
        • data
        • http_request
        • import
        • lds
        • placekey
        • processing
        • quadbin
        • random
        • s2
        • statistics
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
      • Release Notes
    • Analytics Toolbox for PostgreSQL
      • Getting access
        • Manual installation
        • Core version
      • Key concepts
        • Tilesets
        • Spatial Indexes
      • SQL Reference
        • h3
        • quadbin
        • tiler
      • Guides
        • Creating spatial index tilesets
        • Running queries from Builder
      • Release Notes
    • CARTO + Python
      • Installation
      • Authentication Methods
      • Visualizing Data
      • Working with Data
        • How to work with your data in the CARTO Data Warehouse
        • How to access your Data Observatory subscriptions
        • How to access CARTO's Analytics Toolbox for BigQuery and create visualizations via Python notebooks
        • How to access CARTO’s Analytics Toolbox for Snowflake and create visualizations via Python notebooks
        • How to visualize data from Databricks
      • Reference
    • CARTO QGIS Plugin
  • CARTO for Developers
    • Overview
    • Key concepts
      • Architecture
      • Libraries and APIs
      • Authentication methods
        • API Access Tokens
        • OAuth Access Tokens
        • OAuth Clients
      • Connections
      • Data sources
      • Visualization with deck.gl
        • Basemaps
          • CARTO Basemap
          • Google Maps
            • Examples
              • Gallery
              • Getting Started
              • Basic Examples
                • Hello World
                • BigQuery Tileset Layer
                • Data Observatory Tileset Layer
              • Advanced Examples
                • Arc Layer
                • Extrusion
                • Trips Layer
            • What's New
          • Amazon Location
            • Examples
              • Hello World
              • CartoLayer
            • What's New
        • Rapid Map Prototyping
      • Charts and widgets
      • Filtering and interactivity
      • Summary
    • Quickstart
      • Make your first API call
      • Visualize your first dataset
      • Create your first widget
    • Guides
      • Build a public application
      • Build a private application
      • Build a private application using SSO
      • Visualize massive datasets
      • Integrate CARTO in your existing application
      • Use Boundaries in your application
      • Avoid exposing SQL queries with Named Sources
      • Managing cache in your CARTO applications
    • Reference
      • Deck (@deck.gl reference)
      • Data Sources
        • vectorTableSource
        • vectorQuerySource
        • vectorTilesetSource
        • h3TableSource
        • h3QuerySource
        • h3TilesetSource
        • quadbinTableSource
        • quadbinQuerySource
        • quadbinTilesetSource
        • rasterSource
        • boundaryTableSource
        • boundaryQuerySource
      • Layers (@deck.gl/carto)
      • Widgets
        • Data Sources
        • Server-side vs. client-side
        • Models
          • getFormula
          • getCategories
          • getHistogram
          • getRange
          • getScatter
          • getTimeSeries
          • getTable
      • Filters
        • Column filters
        • Spatial filters
      • CARTO APIs Reference
    • Release Notes
    • Examples
    • CARTO for React
      • Guides
        • Getting Started
        • Views
        • Data Sources
        • Layers
        • Widgets
        • Authentication and Authorization
        • Basemaps
        • Look and Feel
        • Query Parameters
        • Code Generator
        • Sample Applications
        • Deployment
        • Upgrade Guide
      • Examples
      • Library Reference
        • Introduction
        • API
        • Auth
        • Basemaps
        • Core
        • Redux
        • UI
        • Widgets
      • Release Notes
  • CARTO Self-Hosted
    • Overview
    • Key concepts
      • Architecture
      • Deployment requirements
    • Quickstarts
      • Single VM deployment (Kots)
      • Orchestrated container deployment (Kots)
      • Advanced Orchestrated container deployment (Helm)
    • Guides
      • Guides (Kots)
        • Configure your own buckets
        • Configure an external in-memory cache
        • Enable Google Basemaps
        • Enable the CARTO Data Warehouse
        • Configure an external proxy
        • Enable BigQuery OAuth connections
        • Configure Single Sign-On (SSO)
        • Use Workload Identity in GCP
        • High availability configuration for CARTO Self-hosted
        • Configure your custom service account
      • Guides (Helm)
        • Configure your own buckets (Helm)
        • Configure an external in-memory cache (Helm)
        • Enable Google Basemaps (Helm)
        • Enable the CARTO Data Warehouse (Helm)
        • Configure an external proxy (Helm)
        • Enable BigQuery OAuth connections (Helm)
        • Configure Single Sign-On (SSO) (Helm)
        • Use Workload Identity in GCP (Helm)
        • Use EKS Pod Identity in AWS (Helm)
        • Enable Redshift imports (Helm)
        • Migrating CARTO Self-hosted installation to an external database (Helm)
        • Advanced customizations (Helm)
        • Configure your custom service account (Helm)
    • Maintenance
      • Maintenance (Kots)
        • Updates
        • Backups
        • Uninstall
        • Rotating keys
        • Monitoring
        • Change the Admin Console password
      • Maintenance (Helm)
        • Monitoring (Helm)
        • Rotating keys (Helm)
        • Uninstall (Helm)
        • Backups (Helm)
        • Updates (Helm)
    • Support
      • Get debug information for Support (Kots)
      • Get debug information for Support (Helm)
    • CARTO Self-hosted Legacy
      • Key concepts
        • Architecture
        • Deployment requirements
      • Quickstarts
        • Single VM deployment (docker-compose)
      • Guides
        • Configure your own buckets
        • Configure an external in-memory cache
        • Enable Google Basemaps
        • Enable the CARTO Data Warehouse
        • Configure an external proxy
        • Enable BigQuery OAuth connections
        • Configure Single Sign-On (SSO)
        • Enable Redshift imports
        • Configure your custom service account
        • Advanced customizations
        • Migrating CARTO Self-Hosted installation to an external database
      • Maintenance
        • Updates
        • Backups
        • Uninstall
        • Rotating keys
        • Monitoring
      • Support
    • Release Notes
  • CARTO Native App for Snowflake Containers
    • Deploying CARTO using Snowflake Container Services
  • Get Help
    • Legal & Compliance
    • Previous libraries and components
    • Migrating your content to the new CARTO platform
Powered by GitBook
On this page
  • Create Classification Model
  • Create Forecast Model
  • Create Regression Model
  • Evaluate
  • Evaluate Forecast
  • Explain Forecast
  • Explain Predict
  • Forecast
  • Get Model by Name
  • Global Explain
  • Import model
  • Predict

Was this helpful?

Export as PDF
  1. CARTO User Manual
  2. Workflows
  3. Components

BigQuery ML

Extension Package provided by CARTO

PreviousTileset CreationNextSnowflake ML

Last updated 2 months ago

Was this helpful?

The BigQuery ML extension package for CARTO Workflows includes a variety of components that enable users to integrate machine learning workflows with geospatial data. These components allow for creating, evaluating, explaining, forecasting, and managing ML models directly within CARTO Workflows, utilizing BigQuery ML’s capabilities.

The following table summarises available components, and explains how different components can be connected to each other:

Create Classification Model

Description

This component trains a classification model using a table of input data.

For more details, refer to the official ML.CREATE_MODEL documentation.

Inputs

  • Input table: A data table that is used as input for the model creation.

Settings

  • Model's FQN: Fully qualified name for the model created by this component.

  • Unique identifier column: A column from the input table to be used as unique identifier for the model.

  • Input label column: A column from the input table to be used as source of labels for the model.

  • Model type: Select the type of model to be created. Options are:

    • LOGISTIC_REG

    • BOOSTED_TREE_CLASSIFIER

    • RANDOM_FOREST_CLASSIFIER

  • Fit intercept: Determines whether to fit an intercept term in the model. Only applies if Model type is "LOGISTIC_REG".

  • Max tree depth: Determines the maximum depth of the individual trees. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

  • Number of parallel trees: Determines the number of parallel trees constructed on each iteration. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

  • Minimum tree child weight: Determines the minimum sum of instance weight needed in a child. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

  • Subsample: Determines whether to subsample. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

  • Column sample by tree: Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

  • Column sample by node: "Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".

  • Data split method: The method used to split the input data into training, evaluation and test data. Options are:

    • AUTO_SPLIT automatically splits the data

    • RANDOM splits randomly based on specified fractions

    • CUSTOM uses a specified column

    • SEQ splits sequentially

    • NO_SPLIT uses all data for training.

  • Data split evaluation fraction: Fraction of data to use for evaluation. Only applies if Data split method is RANDOM or SEQ.

  • Data split test fraction: Fraction of data to use for testing. Only applies if Data split method is RANDOM or SEQ.

  • Data split column: Column to use for splitting the data. Only applies if Data split method is CUSTOM.

Outputs

  • Output table: This component generates a single-row table with the FQN of the created model.

Create Forecast Model

Description

This component trains a forecast model using a table of input data.

For more details, refer to the official ML.CREATE_MODEL documentation.

Inputs

  • Input table: A data table that is used as input for the model creation.

  • Holidays table: A table containing custom holidays to use during model training.

Settings

  • Model's FQN: Fully qualified name for the model created by this component.

  • Model type: Select a type of model to be created. Options are:

    • ARIMA_PLUS

    • ARIMA_PLUS_XREG

  • Time-series ID column: Column from Input table that uniquely identifies each individual time series in the input data. Only applies if Model type is ARIMA_PLUS and Auto ARIMA is set to true.

  • Time-series timestamp column: Column from Input table containing timestamps for each data point in the time series.

  • Time-series data column: Column from Input table containing the target values to forecast for each data point in the time series.

  • Auto ARIMA: Automatically determine ARIMA hyperparameters.

  • P value: Number of autoregressive terms. Only applies if Auto ARIMA is set to false.

  • D value: Number of non-seasonal differences. Only applies if Auto ARIMA is set to false.

  • Q value: Number of lagged forecast errors. Only applies if Auto ARIMA is set to false.

  • Data frequency: Frequency of data points in the time series. Used by BigQuery ML to properly interpret the time intervals between data points for forecasting. AUTO_FREQUENCY will attempt to detect the frequency from the data. Options are:

    • AUTO_FREQUENCY (default)

    • PER_MINUTE

    • HOURLY

    • DAILY

    • WEEKLY

    • MONTHLY

    • QUARTERLY

    • YEARLY

  • Holiday region: Region for wich the holidays will be applied. Check the reference for available values.

  • Clean spikes and dips: Determines whether to remove spikes and dips from the time series data.

Outputs

  • Output table: This component generates a single-row table with the FQN of the created model.

Create Regression Model

Description

This component trains a regression model using a table of input data.

For more details, refer to the official ML.CREATE_MODEL documentation.

Inputs

  • Input table: A data table that is used as input for the model creation.

Settings

  • Model's FQN: Fully qualified name for the model created by this component.

  • Unique identifier column: A column from the input table to be used as unique identifier for the model.

  • Input label column: A column from the input table to be used as source of labels for the model.

  • Model type: Select the type of model to be created. Options are:

    • LINEAR_REG

    • BOOSTED_TREE_REGRESSOR

    • RANDOM_FOREST_REGRESSOR

  • Fit intercept: Determines whether to fit an intercept term in the model. Only applies if Model type is "LINEAR_REG".

  • Max tree depth: Determines the maximum depth of the individual trees. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

  • Number of parallel trees: Determines the number of parallel trees constructed on each iteration. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

  • Minimum tree child weight: Determines the minimum sum of instance weight needed in a child. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

  • Subsample: Determines whether to subsample. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

  • Column sample by tree: Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

  • Column sample by node: "Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".

  • Data split method: The method used to split the input data into training, evaluation and test data. Options are:

    • AUTO_SPLIT automatically splits the data

    • RANDOM splits randomly based on specified fractions

    • CUSTOM uses a specified column

    • SEQ splits sequentially

    • NO_SPLIT uses all data for training.

  • Data split evaluation fraction: Fraction of data to use for evaluation. Only applies if Data split method is RANDOM or SEQ.

  • Data split test fraction: Fraction of data to use for testing. Only applies if Data split method is RANDOM or SEQ.

  • Data split column: Column to use for splitting the data. Only applies if Data split method is CUSTOM.

Outputs

  • Output table: This component generates a single-row table with the FQN of the created model.

Evaluate

Description

Given a pre-trained ML model and an input table, this component evaluates its performance against some input data provided. The result will contain some metrics regarding the model performance. The user is offered some extra options when the model is a forecasting model.

For more details, refer to the official ML.EVALUATE function documentation.

Inputs

  • Model: This component's receives a trained model table as input.

  • Input table: This component receives a data table to be used as data input.

Settings

  • Forecast: Determines whether to evaluate on the model as a forecast.

  • Perform aggregation: Determines whether to evaluate on the time series level or the timestep level. Only applies if Forecast is set to true.

  • Horizon: Number of forecasted time points to evaluate against. Only applies if Forecast is set to true.

  • Confidence level: Percentage of the future values that fall in the prediction interval. Only applies if Forecast is set to true.

Outputs

  • Output table: This components produces a table with a predictions column.

Evaluate Forecast

Description

Given a pre-trained ML forecast model, this component evaluates its performance using the ARIMA_EVALUATE function in BigQuery.

Inputs

  • Model: This component's receives a trained model table as input.

Settings

  • Show all candidate models: Determines whether to show evaluation metrics for all candidate models or only for the best model.

Outputs

  • Output table: This component produces a table with the evaluation metrics.

Explain Forecast

Description

Given a pre-trained ML model and an input table, this component runs an explainability analysis invoking the EXPLAIN_FORECAST function in BigQuery.

Inputs

  • Model: This component's receives a trained model table as input.

  • Input table: This component receives a data table to be used as data input. It can only be used with ARIMA models. The execution will fail if a different model is selected.

Settings

  • Model type: Select the type of model to be used with the input. Options are:

    • ARIMA_PLUS

    • ARIMA_PLUS_XREG

  • Horizon: The number of time units to forecast into the future.

  • Confidence level: The confidence level to use for the prediction intervals.

Outputs

  • Output table: This component produces a table with the explainability metrics.

Explain Predict

Description

Given a pre-trained ML model, this component generates a predicted value and a set of feature attributions for each instance of the input data. Feature attributions indicate how much each feature in your model contributed to the final prediction for each given instance.

For more details, refer to the official ML.EXPLAIN_PREDICT function documentation.

Inputs

  • Model: This component's receives a trained model table as input.

  • Input table: This component receives a data table to be used as data input.

Settings

  • Number of top features: The number of the top features to be returned.

Outputs

  • Output table: This component produces a table with the attribution per feature for the input data.

Forecast

Description

Given a pre-trained ML model and an optional input table, this component infers the predictions for each of the input samples. Take into account that the actual forecasting happens when creating the model, this component only retrieves the desired results.

For more details, refer to the ML.FORECAST function documentation.

Inputs

  • Model: This component's receives a trained model table as input.

  • Input table: This component receives a data table to be used as data input. It can only be used with ARIMA models. The execution will fail if a different model is selected.

Settings

  • Model type: Select the type of model to be used with the input. Options are:

    • ARIMA_PLUS

    • ARIMA_PLUS_XREG

  • Horizon: The number of time units to forecast into the future.

  • Confidence level: The confidence level to use for the prediction intervals.

Outputs

  • Output table: This component produces a table with the predictions column.

Get Model by Name

Description

This component loads an pre-existing model in BigQuery into the expected Workflows format to be used with the rest of BigQuery ML components.

Inputs

  • Model FQN: Fully-qualified name to get the model from.

Outputs

  • Output: This component returns a model that can be connected to other BigQuery ML components that expect a model as input.

Global Explain

Description

Given a pre-trained ML model, this component lets you provide explanations for the entire model by aggregating the local explanations of the evaluation data. It returns the attribution of each feature. In the case of classification model, an option can be set to provide explanation for each class of the model.

For more details, refer to the official ML.GLOBAL_EXPLAIN function documentation.

Inputs

  • Model: This component's receives a trained model table as input.

Settings

  • Class level explain: Determines whether global feature importances are returned for each class in the case of classification.

Outputs

  • Output table: This component produces a table with the attributions per row of the input data.

Import model

Description

This component imports an ONNX model from Google Cloud Storage. The model will be loaded into BigQuery ML using the provided FQN and it will be ready to use in Workflows with the rest of ML components.

Settings

  • Model path: Google Cloud Storage URI (gs://) of the pre-trained ONNX file.

  • Model FQN: A fully-qualified name to save the model to.

  • Overwrite model: Determines whether to overwrite the model if it already exists.

Outputs

  • Output table: This component returns a model that can be connected to other BigQuery ML components that expect a model as input.

Predict

Description

Given a pre-trained ML model and an input table, this component infers the predictions for each of the input samples. A new variable prediction will be returned. All columns in the input table will be returned by default; an option can be unmarked to select a single ID column that will be returned with the prediction.

For more details, check out the ML.PREDICT function documentation.

Inputs

  • Model: This component's receives a trained model table as input.

  • Input table: This component receives a data table to be used as data input.

Settings

  • Keep input columns: Determines whether to keep all input columns in the output or not.

  • ID column: Select a column from the input table to be used as the unique identifier for the model. Only applies if Keep input columns is set to false.

Outputs

  • Output table: This component produces a table with the predictions column.