BigQuery ML
Extension Package provided by CARTO
The BigQuery ML extension package for CARTO Workflows includes a variety of components that enable users to integrate machine learning workflows with geospatial data. These components allow for creating, evaluating, explaining, forecasting, and managing ML models directly within CARTO Workflows, utilizing BigQuery ML’s capabilities.
Create Classification Model
Description
This component trains a classification model using a table of input data.
For more details, refer to the official ML.CREATE_MODEL
documentation.
Inputs
Input table: A data table that is used as input for the model creation.
Settings
Model's FQN: Fully qualified name for the model created by this component.
Unique identifier column: A column from the input table to be used as unique identifier for the model.
Input label column: A column from the input table to be used as source of labels for the model.
Model type: Select the type of model to be created. Options are:
LOGISTIC_REG
BOOSTED_TREE_CLASSIFIER
RANDOM_FOREST_CLASSIFIER
Fit intercept: Determines whether to fit an intercept term in the model. Only applies if Model type is "LOGISTIC_REG".
Max tree depth: Determines the maximum depth of the individual trees. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".
Number of parallel trees: Determines the number of parallel trees constructed on each iteration. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".
Minimum tree child weight: Determines the minimum sum of instance weight needed in a child. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".
Subsample: Determines whether to subsample. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".
Column sample by tree: Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".
Column sample by node: "Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_CLASSIFIER".
Data split method: The method used to split the input data into training, evaluation and test data. Options are:
AUTO_SPLIT automatically splits the data
RANDOM splits randomly based on specified fractions
CUSTOM uses a specified column
SEQ splits sequentially
NO_SPLIT uses all data for training.
Data split evaluation fraction: Fraction of data to use for evaluation. Only applies if Data split method is RANDOM or SEQ.
Data split test fraction: Fraction of data to use for testing. Only applies if Data split method is RANDOM or SEQ.
Data split column: Column to use for splitting the data. Only applies if Data split method is CUSTOM.
Outputs
Output table: This component generates a single-row table with the FQN of the created model.
Create Forecast Model
Description
This component trains a forecast model using a table of input data.
For more details, refer to the official ML.CREATE_MODEL
documentation.
Inputs
Input table: A data table that is used as input for the model creation.
Holidays table: A table containing custom holidays to use during model training.
Settings
Model's FQN: Fully qualified name for the model created by this component.
Model type: Select a type of model to be created. Options are:
ARIMA_PLUS
ARIMA_PLUS_XREG
Time-series ID column: Column from Input table that uniquely identifies each individual time series in the input data. Only applies if Model type is ARIMA_PLUS and Auto ARIMA is set to true.
Time-series timestamp column: Column from Input table containing timestamps for each data point in the time series.
Time-series data column: Column from Input table containing the target values to forecast for each data point in the time series.
Auto ARIMA: Automatically determine ARIMA hyperparameters.
P value: Number of autoregressive terms. Only applies if Auto ARIMA is set to false.
D value: Number of non-seasonal differences. Only applies if Auto ARIMA is set to false.
Q value: Number of lagged forecast errors. Only applies if Auto ARIMA is set to false.
Data frequency: Frequency of data points in the time series. Used by BigQuery ML to properly interpret the time intervals between data points for forecasting. AUTO_FREQUENCY will attempt to detect the frequency from the data. Options are:
AUTO_FREQUENCY (default)
PER_MINUTE
HOURLY
DAILY
WEEKLY
MONTHLY
QUARTERLY
YEARLY
Holiday region: Region for wich the holidays will be applied. Check the reference for available values.
Clean spikes and dips: Determines whether to remove spikes and dips from the time series data.
Outputs
Output table: This component generates a single-row table with the FQN of the created model.
Create Regression Model
Description
This component trains a regression model using a table of input data.
For more details, refer to the official ML.CREATE_MODEL
documentation.
Inputs
Input table: A data table that is used as input for the model creation.
Settings
Model's FQN: Fully qualified name for the model created by this component.
Unique identifier column: A column from the input table to be used as unique identifier for the model.
Input label column: A column from the input table to be used as source of labels for the model.
Model type: Select the type of model to be created. Options are:
LINEAR_REG
BOOSTED_TREE_REGRESSOR
RANDOM_FOREST_REGRESSOR
Fit intercept: Determines whether to fit an intercept term in the model. Only applies if Model type is "LINEAR_REG".
Max tree depth: Determines the maximum depth of the individual trees. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".
Number of parallel trees: Determines the number of parallel trees constructed on each iteration. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".
Minimum tree child weight: Determines the minimum sum of instance weight needed in a child. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".
Subsample: Determines whether to subsample. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".
Column sample by tree: Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".
Column sample by node: "Subsample ratio of columns when constructing each tree. A fraction between 0 and 1 that controls the number of columns used by each tree. Only applies if Model type is "BOOSTED_TREE_REGRESSOR".
Data split method: The method used to split the input data into training, evaluation and test data. Options are:
AUTO_SPLIT automatically splits the data
RANDOM splits randomly based on specified fractions
CUSTOM uses a specified column
SEQ splits sequentially
NO_SPLIT uses all data for training.
Data split evaluation fraction: Fraction of data to use for evaluation. Only applies if Data split method is RANDOM or SEQ.
Data split test fraction: Fraction of data to use for testing. Only applies if Data split method is RANDOM or SEQ.
Data split column: Column to use for splitting the data. Only applies if Data split method is CUSTOM.
Outputs
Output table: This component generates a single-row table with the FQN of the created model.
Evaluate
Description
Given a pre-trained ML model and an input table, this component evaluates its performance against some input data provided. The result will contain some metrics regarding the model performance. The user is offered some extra options when the model is a forecasting model.
For more details, refer to the official ML.EVALUATE
function documentation.
Inputs
Model: This component's receives a trained model table as input.
Input table: This component receives a data table to be used as data input.
Settings
Forecast: Determines whether to evaluate on the model as a forecast.
Perform aggregation: Determines whether to evaluate on the time series level or the timestep level. Only applies if Forecast is set to true.
Horizon: Number of forecasted time points to evaluate against. Only applies if Forecast is set to true.
Confidence level: Percentage of the future values that fall in the prediction interval. Only applies if Forecast is set to true.
Outputs
Output table: This components produces a table with a predictions column.
Evaluate Forecast
Description
Given a pre-trained ML forecast model, this component evaluates its performance using the ARIMA_EVALUATE
function in BigQuery.
Inputs
Model: This component's receives a trained model table as input.
Settings
Show all candidate models: Determines whether to show evaluation metrics for all candidate models or only for the best model.
Outputs
Output table: This component produces a table with the evaluation metrics.
Explain Forecast
Description
Given a pre-trained ML model and an input table, this component runs an explainability analysis invoking the EXPLAIN_FORECAST
function in BigQuery.
Inputs
Model: This component's receives a trained model table as input.
Input table: This component receives a data table to be used as data input. It can only be used with ARIMA models. The execution will fail if a different model is selected.
Settings
Model type: Select the type of model to be used with the input. Options are:
ARIMA_PLUS
ARIMA_PLUS_XREG
Horizon: The number of time units to forecast into the future.
Confidence level: The confidence level to use for the prediction intervals.
Outputs
Output table: This component produces a table with the explainability metrics.
Explain Predict
Description
Given a pre-trained ML model, this component generates a predicted value and a set of feature attributions for each instance of the input data. Feature attributions indicate how much each feature in your model contributed to the final prediction for each given instance.
For more details, refer to the official ML.EXPLAIN_PREDICT
function documentation.
Inputs
Model: This component's receives a trained model table as input.
Input table: This component receives a data table to be used as data input.
Settings
Number of top features: The number of the top features to be returned.
Outputs
Output table: This component produces a table with the attribution per feature for the input data.
Forecast
Description
Given a pre-trained ML model and an optional input table, this component infers the predictions for each of the input samples. Take into account that the actual forecasting happens when creating the model, this component only retrieves the desired results.
For more details, refer to the ML.PREDICT
function documentation.
Inputs
Model: This component's receives a trained model table as input.
Input table: This component receives a data table to be used as data input. It can only be used with ARIMA models. The execution will fail if a different model is selected.
Settings
Model type: Select the type of model to be used with the input. Options are:
ARIMA_PLUS
ARIMA_PLUS_XREG
Horizon: The number of time units to forecast into the future.
Confidence level: The confidence level to use for the prediction intervals.
Outputs
Output table: This component produces a table with the predictions column.
Get Model by Name
Description
This component loads an pre-existing model in BigQuery into the expected Workflows format to be used with the rest of BigQuery ML components.
Inputs
Model FQN: Fully-qualified name to get the model from.
Outputs
Output: This component returns a model that can be connected to other BigQuery ML components that expect a model as input.
Global Explain
Description
Given a pre-trained ML model, this component lets you provide explanations for the entire model by aggregating the local explanations of the evaluation data. It returns the attribution of each feature. In the case of classification model, an option can be set to provide explanation for each class of the model.
For more details, refer to the official ML.GLOBAL_EXPLAIN
function documentation.
Inputs
Model: This component's receives a trained model table as input.
Settings
Class level explain: Determines whether global feature importances are returned for each class in the case of classification.
Outputs
Output table: This component produces a table with the attributions per row of the input data.
Import model
Description
This component imports an ONNX model from Google Cloud Storage. The model will be loaded into BigQuery ML using the provided FQN and it will be ready to use in Workflows with the rest of ML components.
Settings
Model path: Google Cloud Storage URI (
gs://
) of the pre-trained ONNX file.Model FQN: A fully-qualified name to save the model to.
Overwrite model: Determines whether to overwrite the model if it already exists.
Outputs
Output table: This component returns a model that can be connected to other BigQuery ML components that expect a model as input.
Predict
Description
Given a pre-trained ML model and an input table, this component infers the predictions for each of the input samples. A new variable prediction
will be returned. All columns in the input table will be returned by default; an option can be unmarked to select a single ID column that will be returned with the prediction.
For more details, check out the ML.PREDICT
function documentation.
Inputs
Model: This component's receives a trained model table as input.
Input table: This component receives a data table to be used as data input.
Settings
Keep input columns: Determines whether to keep all input columns in the output or not.
ID column: Select a column from the input table to be used as the unique identifier for the model. Only applies if Keep input columns is set to false.
Outputs
Output table: This component produces a table with the predictions column.
Last updated