# Snowflake ML

The Snowflake ML extension package for CARTO Workflows includes a variety of components that enable users to integrate machine learning workflows with geospatial data. These components allow for creating, evaluating, explaining, forecasting, and managing ML models directly within CARTO Workflows, utilizing Snowflake ML’s capabilities.

## Get Model by Name

**Description**

This component imports a pre-trained model into the current workflow. If the name provided is not fully qualified, it will default to the connection's default database and the `PUBLIC` schema. The component assumes that the provided FQN points to an existing Snowflake ML model.

**Settings**

* **Model's FQN:** Fully qualified name for the model to be imported.

**Outputs**

* **Output table:** This component generates a single-row table with the FQN of the imported model.

## Create Classification Model

**Description**

This component trains a classification model on the provided input data. If the name provided is not fully qualified, it will default to the connection's default database and the `PUBLIC` schema.

For more details, please refer to the [`SNOWFLAKE.ML.CLASSIFICATION`](https://docs.snowflake.com/en/user-guide/ml-functions/classification) official documentation in Snowflake.

**Inputs**

* **Input table:** A data table that is used as input for the model creation.

**Settings**

* **Model's FQN:** Fully qualified name for the model to be saved as.
* **ID Column:** Column containing a unique identifier per sample.
* **Target Column**: Column to be used as label in the training data.
* **Data Split**: whether to perform or not a train/evaluation split on the data. Choosing `SPLIT` is required to use the Evaluate component with the resulting model.
* **Test Fraction**: Fraction of the data to reserve for evaluation. A 0.2 will reserve 20% of the data for evaluation. Only applies if the Data Split is `SPLIT`.

**Outputs**

* **Output table:** This component generates a single-row table with the FQN of the imported model.

## Create Forecasting Model

**Description**

This component trains a forecasting model on the provided input data. If the name provided is not fully qualified, it will default to the connection's default database and the `PUBLIC` schema.

For more details, please refer to the [`SNOWFLAKE.ML.FORECAST`](https://docs.snowflake.com/sql-reference/classes/forecast/commands/create-forecast) official documentation in Snowflake.

**Inputs**

* **Input table:** A data table that is used as input for the model creation.

**Settings**

* **Model's FQN:** Fully qualified name for the model to be saved as.
* **Time Series ID Column:** Column containing a unique identifier per time series.
* **Timestamp Column:** Column containing the series' timestamp in `DATE` or `DATETIME` format.
* **Target Column**: Column to be used as target in the training data.
* **Consider exogenous variables:** whether to consider exogenous variables for forecasting. If checked, the future values for the variables must be provided when forecasting. All variables in the input will be considered except the specified time series ID, timestamp and target column.
* **Method**: which method to use when fitting the model. It can be `best` or `fast`.
* **Sample Frequency**: the frequency of the time series. It can be `auto` or `manual`.
* **Period**: number of units to define the sampling frequency. Only applies when the Sample Frequency has been set to `manual`.
* **Time Unit**: time unit used to define the frequency. It can be `seconds`, `minutes`, `hours`, `days`, `weeks`, `months`, `quarters`, or `years`. Only applies when Sampling Frequency has been set to `manual`.
* **Aggregation (categorical)**: aggregation function used for categorical columns if needed due to the sampling frequency. It can be `mode`, `first`, or `last`.
* **Aggregation (numeric)**: aggregation function used for numeric columns if needed due to the sampling frequency. It can be `mean`, `median`, `mode`, `min`, `max`, `sum`, `first`, or `last`.
* **Aggregation (target)**: aggregation function used for the target column if needed due to the sampling frequency. It can be `mean`, `median`, `mode`, `min`, `max`, `sum`, `first`, or `last`.

**Outputs**

* **Output table:** This component generates a single-row table with the FQN of the imported model.

## Predict

**Description**

This component uses a pre-trained classification model (using [Get Model by Name](#get-model-by-name) or [Create Classification Model](#create-classification-model) components) to perform predictions on some given input data.

For more details, please refer to the [`!PREDICT`](https://docs.snowflake.com/en/sql-reference/classes/classification/methods/predict) function official documentation in Snowflake.

**Inputs**

* **Model table:** the pre-trained classification model.
* **Input table:** A data table that is used as input for inference.

**Settings**

* **Keep input columns:** Whether to keep all the input columns in the output table.
* **ID Column:** Column containing a unique identifier per sample. Only applies when Keep input columns is set to false.

**Outputs**

* **Output table:** The model's predictions.

## Forecast

**Description**

This component uses a pre-trained forecast model (using [Get Model by Name](#get-model-by-name) or [Create Forecasting Model](#create-forecasting-model) components) to perform predictions on some given input data.

For more details, please refer to the [`!FORECAST`](https://docs.snowflake.com/en/sql-reference/classes/forecast/methods/forecast) function official documentation in Snowflake.

**Inputs**

* **Model table:** the pre-trained classification model.
* **Input table:** A data table that is used as input for inference. Only needed if the model has been trained using exogenous variables.

**Settings**

* **Consider exogenous variables**: whether the model was trained to use exogenous variables or not.
* **Number of periods**: number of periods to forecast per time series. Only applies if Consider exogenous variables is false.
* **Time Series ID Column:** Column containing a unique identifier per time series. Only applies if Consider exogenous variables is true.
* **Timestamp Column:** Column containing the series' timestamp in `DATE` or `DATETIME` format. Only applies if Consider exogenous variables is true.
* **Prediction Interval**: Expected confidence of the prediction interval.
* **Keep input columns:** Whether to keep all the input columns in the output table.

**Outputs**

* **Output table:** The model's predictions.

## Evaluate Classification

**Description**

This component returns some evaluation metrics for a pre-trained classification model using [Get Model by Name](#get-model-by-name) or [Create Classification Model](#create-classification-model) components.

For more details, please refer to the [`!SHOW_EVALUATION_METRICS`](https://docs.snowflake.com/sql-reference/classes/classification/methods/show_evaluation_metrics) and [`!SHOW_GLOBAL_EVALUATION_METRICS`](https://docs.snowflake.com/sql-reference/classes/classification/methods/show_global_evaluation_metrics) functions official documentation in Snowflake.

**Inputs**

* **Model table:** the pre-trained classification model.

**Settings**

* **Class level metrics:** Whether to obtain the per-class evaluation metrics or the overall evaluation metrics.

**Outputs**

* **Output table:** The model's evaluation metrics.

## Evaluate Forecast

**Description**

This component returns some evaluation metrics for a pre-trained forecast model using [Get Model by Name](#get-model-by-name) or [Create Forecasting Model ](#create-forecasting-model)components.

For more details, please refer to the [`!SHOW_EVALUATION_METRICS`](https://docs.snowflake.com/sql-reference/classes/forecast/methods/show_evaluation_metrics) function official documentation in Snowflake.

**Inputs**

* **Model table:** the pre-trained classification model.
* **Input table (optional)**: additional out-of-sample data to compute the metrics on.

**Settings**

* **Compute metrics on additional out-of-sample data:** When checked, the component will compute cross-validation metrics on additional out-of-sample data. Otherwise, the component will return the metrics generated at training time.
* **Time Series ID Column:** Column containing a unique identifier per time series. Only applies when the metrics are being computed on additional out-of-sample data.
* **Timestamp Column:** Column containing the series' timestamp in `DATE` or `DATETIME` format. Only applies when the metrics are being computed on additional out-of-sample data.
* **Target Column**: Column to use as label in the input data. Only applies when the metrics are being computed on additional out-of-sample data.
* **Prediction Interval**: Expected confidence of the prediction interval. Only applies when the metrics are being computed on additional out-of-sample data.

**Outputs**

* **Output table:** The model's evaluation metrics.

## Feature Importance (Classification)

This component displays the feature importances per variable of a pre-trained classification model.

For more details, please refer to the Snowflake's [`!SHOW_FEATURE_IMPORTANCE`](https://docs.snowflake.com/en/sql-reference/classes/classification/methods/show_feature_importance) function documentation.

**Inputs**

* **Model table:** the pre-trained classification model.

**Outputs**

* **Output table:** a table with the feature importance per variable.

## Feature Importance (Forecast)

This component displays the feature importances per variable of a pre-trained forecast model.

For more details, please refer to the Snowflake's [`!SHOW_FEATURE_IMPORTANCE`](https://docs.snowflake.com/en/sql-reference/classes/classification/methods/show_feature_importance) function documentation.

**Inputs**

* **Model table:** the pre-trained forecast model.

**Outputs**

* **Output table:** a table with the feature importance per variable.
