# Data Preparation

Components to prepare your data for downstream analysis, this can include altering a table structure, re-ordering data, subsampling data, etc.

## Case When

**Description**

This component generates column values that depend on a set of specified conditions.

**Inputs**

* `Source table [Table]`
* `Conditional expressions`: The UI of this component helps creating a conditional expression involving multiple columns and SQL operators. Each expression will produce a different result, as set on the component.
* `Result Column [Column]:` Select a column that will contain the specified resulting value.

## Cast

**Description**

This component casts the content of a column to a given type

**Inputs**

* `Source table [Table]`
* `Column [Column]`
* `New type [Selection]`

**Outputs**

* `Result table [Table]`

## Columns to Array

**Description**

This component adds a new column with an array containing the values in a set of selected columns.

**Inputs**

* `Source table [Table]`
* `Columns`
* `Array column name`

**Outputs**

* `Result table [Table]`

## Create Column

**Description**

This component creates a new table with an additional column computed using an expression.

**Inputs**

* `Source table [Table]`
* `Name for new column [String]`
* `Expression [String]`

**Outputs**

* `Result table [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_add_column_statement)

## Drop Columns

**Description**

This component generates a new table with the same content as the input one, except one of its columns.

The component will fail if the column to remove is the only one in the input table.

**Inputs**

* `Source table [Table]`
* `Column [Column]`

**Outputs**

* `Result table [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_drop_column_statement)

## Edit Schema

**Description**

This component simplifies the process of modifying table schemas. It allows to select specific columns, with the option to adjust their names and data types as required.

**Inputs**

* `Source table [Table]`
* `Columns`: The component's UI allows selecting a column, giving a new name and selecting a data type to cast the column.

**Outputs**

* `Result table [Table]`

{% hint style="info" %}
When casting a `STRING` to `TIMESTAMP` the expected format is `1970-01-01T00:00:00.000Z`. Having a differently formated string might produce an incorrect timestamp.
{% endhint %}

## Extract from JSON

**Description**

This component creates a new column with values extracted from the JSON strings in another column. It uses the **Data Warehouse syntax** to specify the path to the key that needs to be extracted. See the documentation links below for more information.

{% hint style="info" %}
This component only extracts one single property per done. If you're looking to extract multiple keys from your JSON into separate columns, refer to [Parse JSON](#parse-json) component.
{% endhint %}

**Inputs**

* `Source table [Table]`
* `JSON column [Column]`
* `JSON path [expression]`
* `New column [Column]`

**Output**

* `Result table`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#JSONPath_format)

[Snowflake reference](https://docs.snowflake.com/en/sql-reference/functions/json_extract_path_text)

[Redshift reference](https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html)

[PostgreSQL reference](https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH)

## Find and Replace

**Description**

This component finds a string in one column of a table and replaces it with the specified value from another table.

As an alternative, columns from the lookup table can be added to the original table in those rows where the searched string is found. This is regulated by the `Mode` parameter.

**Inputs**

* `Source table [Table]`
* `Find within column [Column]`
* `Lookup table [Table]`
* `Find value column [Column]`
* `Replacement column [Column]`
* `Find mode [Selection]`
* `Case insensitive [Boolean]`
* `Match whole word [Boolean]`
* `Mode [Selection]`
* `Columns to append [Column] [Multiple]`: only used if `Append field(s) to record` mode is selected

**Outputs**

* `Result table [Table]`

## Generate UUID

**Description**

This component creates a new table with an additional UUID column named `id.`

**Inputs**

* `Source table [Table]`

**Outputs**

* `Result table [Table]`

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/utility-functions#generate_uuid)

[Snowflake reference](https://docs.snowflake.com/en/sql-reference/functions/uuid_string)

## Geography to Geometry

**Description**

This component converts a column from geography to geometry data type.

**Inputs**

* `Source table [Table]`
* `geography column [Column]`

**Outputs**

* `Result table [Table]`

## Geometry to Geography

**Description**

This component converts a column from geometry to geography data type.

**Inputs**

* `Source table [Table]`
* `Geometry column [Column]`

**Outputs**

* `Result table [Table]`

## Hex Color Generator

**Description**

This component create hex color for each distinct value of an input string column. Column values as NULL will be associated to grey value. The component generates a copy of the source with a new string column called: \[name\_of\_input\_col] + '\_hex\_color'.

**Inputs**

* `Source table [Table]`
* `Column with category values`

**Outputs**

* `Result table [Table]`

## Is not Null

**Description**

This component filters an input table using the presence or absence of null values in a given column.

**Inputs**

* `Source table [Table]`
* `Column [Column]`

**Outputs**

* `Not null values table [Table]`
* `Null values table [Table]`

## Limit

**Description**

This component creates a new table with only the N first rows of the input table.

**Inputs**

* `Source table [Table]`
* `Number of rows [Number]`

**Outputs**

* `Result table [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#limit_and_offset_clause)

[Snowflake reference](https://docs.snowflake.com/en/sql-reference/constructs/limit.html)

## Multi-col formula

**Description**

This component computes new values based on a given expression and a set of fields to apply the expression to. Use $a to refer to the value of the current column.

**Inputs**

* `Source table[Table]`
* `Expression [String]`. The expression to apply
* `Mode [Selection]`. The mode used to put new values in the table
* `Prefix [String]`. Only for the case of mode='Create new columns'
* `Columns [Column][Multiple]`. The columns to apply the formula to

**Outputs**

* `Result table [Table]`

## Multi-row formula

**Description**

This component creates a new table containing a new column computed using a multi-row formula based on one or several input columns.

To refer to a value in the previous row, use `{colname - 1}` and to refer to a value in the next row, use `{colname + 1}`.

**Inputs**

* `Table[Table]`
* `New column name [String]`
* `New column type [Selection]`
* `Expression [String]`
* `Value for missing row values [Selection]`
* `Column to sort by [Column]`
* `Column to group by [Column]`

**Outputs**

* `Result table [Table]`

## Normalize

**Description**

This component normalizes the values of a given column.

It adds a new column named '\[column\_name]\_norm'.

Normalization can be computed as 0-1 values or as z-scores

**Inputs**

* `Source table [Table]`
* `Column to normalize [Column]`
* `Use z-scores [Boolean]`.
  * Disabled (default): When disabled, the resulting normalized values will range between 0 an 1.
  * Enabled: The normalized value will be calcuated as a z-score or standard score: the number of standard deviations that the value is above or below the mean of the whole column. See [reference](https://en.wikipedia.org/wiki/Standard_score).

**Outputs**

* `Result table [Table]`

## Order by

**Description**

This component generates a new table containing the rows of an input one sorted according to the values in one of its columns, and an optional second column.

Columns to use cannot be of type geometry.

**Inputs**

* `Table to order [Table]`
* `Column to order by [Column]`
* `Use descending order [Boolean]`
* `Optional secondary column to order by [Column]`
* `Use descending order in secondary column [Boolean]`

**Outputs**

* `Result table [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#order_by_clause)

[Snowflake reference](https://docs.snowflake.com/en/sql-reference/constructs/order-by.html)

[Redshift reference](https://docs.aws.amazon.com/redshift/latest/dg/r_ORDER_BY_clause.html)

## Parse JSON

**Description**

This component creates new columns with values extracted from the JSON string in other columns. It uses the data warehouse syntax to specify the path to the key that needs to be extracted.

**Input**

* Source table. This components expects an input table containing at least one column with a JSON object.

**Settings**

* Select JSON column: This allows selecting a string column that contains a JSON object
* Add column name: Type a name for a new column that will be added with the content of a specific JSON key.
* Add JSON path: Type a JSON path expression, in the syntax of your data warehouse to identify the key to be extracted

**Output**

* Output table: this component generates a table with the same schema as the input source table, plus an additional column per JSON key specified in the Settings.

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#JSONPath_format)

[Snowflake reference](https://docs.snowflake.com/en/sql-reference/functions/json_extract_path_text)

[Redshift reference](https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html)

[PostgreSQL reference](https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH)

## Poly Build

**Description**

This component takes a group of spatial point objects and draws a polygon or polyline in a specific sort order to represent that group of points.

This component can also be used for spatial layer development by translating a collection of GPS data into polygon or polyline objects, where a polygon is a simple bounded region, such as a state boundary, and a polyline contains multiple line segments with any number of points between its start and endpoints, such as a river or road.

**Inputs**

* `Build Method [Selection]`
* `Source table [Table]`
* `Source Field [Column]`
* `Source Field [Column]`
* `Sequence Field [Column]`

**Outputs**

* `Result table [Table]`

## Poly Split

**Description**

This component splits polygon or polyline objects into their component point, line, or region objects.

This is a very specialized component used for spatial layer development. A typical use of this component is to disaggregate complex regions that may contain more than one polygon or to separate a polyline into its individual nodes.

**Inputs**

* `Source table [Table]`
* `Spatial Field [Column]`
* `Split To [Selection]`

**Outputs**

* `Result table [Table]`

## Remove Duplicated

**Description**

This component takes an input table and generates a new one in which duplicates rows from the input table have been removed.

**Inputs**

* `Source table [Table]`

**Outputs**

* `Result table [Table]`

## Rename Column

**Description**

This component generates a new table with the same content as the input one, renaming one or multiple of its columns.

**Inputs**

* `Source table [Table]`
* `Column to rename [Column]`
* `New column name [String]`

**Outputs**

* `Result table [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_rename_column_statement)

## Row Number

**Description**

This component creates a new table with an additional column containing row numbers.

**Inputs**

* `Source table [Table]`

**Outputs**

* `Result table [Table]`

## Sample

**Description**

This component generates a new table with a random sample of N rows from an input table.

**Inputs**

* `Source table [Table]`
* `Number of rows to sample [Number]`

**Outputs**

* `Result table [Table]`

## Select

**Description**

Executes a custom SQL SELECT statement to transform, compute, or filter data. Use SQL expressions to create calculated columns, apply functions, or reshape your data. The source table columns are available as column names in your SELECT expression.

**Inputs**

* `Source table [Table]`
* `SELECT statement [String]`

**Outputs**

* `Result table [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_list)

[Snowflake reference](https://docs.snowflake.com/en/sql-reference/sql/select)

[PostgreSQL reference](https://postgis.net/workshops/postgis-intro/simple_sql.html)

## Select Distinct

**Description**

This component generates a new table with the unique values that appear in a given column of an input table.

**Inputs**

* `Source table [Table]`
* `Column [Column]`

**Outputs**

* `Result table [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_distinct)

[Redshift reference](https://docs.aws.amazon.com/es_es/redshift/latest/dg/r_DISTINCT_examples.html)

## Simple Filter

**Description**

This component filters an input table according to a filter expression based on a single column.

It generates a new table with only the rows of the input table that meet the filter criteria and another one with those that do not meet it.

**Inputs**

* `Source table [Table]`
* `Column [Column]`
* `Operator [Selection]`
* `Value [String]`

**Outputs**

* `Table with rows that pass the filter [Table]`
* `Table with rows that do not pass the filter [Table]`

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause)

[Redshift reference](https://docs.aws.amazon.com/redshift/latest/dg/r_WHERE_clause.html)

## Spatial Filter

**Description**

This component filters an input table using a spatial predicate and a filter table.

It generates a new table with only the rows of the input table that meet the filter criteria and another one with those that do not meet it.

**Inputs**

* `Source table [Table]`
* `Filter table [Table]`
* `Geo column in source table [Column]`
* `Geo column in filter table [Column]`
* `Spatial predicate [Selection]`

**Outputs**

* `Table with rows that pass the filter [Table]`
* `Table with rows that do not pass the filter [Table]`

## ST SetSRID

**Description**

This component sets the SRID of a geo column

**Inputs**

* `Source table [Table]`
* `Geo column [Column]`
* `SRID [String]`

**Outputs**

* `Result table [Table]`

## Text to columns

**Description**

This component adds new columns based on splitting the text string in a text column.

**Inputs**

* `Table [Table]`
* `Column to split [Column]`
* `Delimiters [String]`
* `Mode [Selection]`. Whether to add new columns or new rows with splitted strings
* `Number of new columns`: Only used if mode = 'Split to columns'
* `Prefix for new column names [String]`: Only used if mode = 'Split to columns'
* `Extra characters`: What to do with extra characters if there are more tokens after dividing the string according to the delimiters than the ones defined in the 'Number of new columns' parameter

**Outputs**

* `Result table [Table]`

## Transpose / Unpivot <a href="#transpose-unpivot" id="transpose-unpivot"></a>

**Description**

This component rotates table columns into rows.

**Inputs**

* `Table to unpivot [Table]`
* `Key columns [Column][Multiple]`:The columns to use for identifying rows
* `Data columns [Column][Multiple]`: The columns to use for key-value pairs

**Outputs**

* `Result table [Table]`

## Unique

**Description**

Identifies and separates unique rows from duplicates based on selected columns. Outputs two tables: 'Unique' containing the first occurrence of each distinct combination, and 'Duplicated' containing all subsequent occurrences.

**Inputs**

* `Tables [Table]`
* `Columns to find unique values [Column][Multiple]`

**Outputs**

* `Table with unique rows [Table]`
* `Table with duplicated rows [Table]`
*

## Where

**Description**

This component filters an input table according to a filter expression.

It generates a new table with only the rows of the input table that meet the filter criteria and another one with those that do not meet it.

**Inputs**

* `Source table [Table]`
* `Filter expression [String]`

**Outputs**

* `Table with rows that pass the filter [Table]`
* `Table with rows that do not pass the filter [Table]`

**External links**

[BigQuery reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause)

[Snowflake reference](https://docs.snowflake.com/en/sql-reference/constructs/where.html)

* `Result table [Table]`
