How to optimize tables
It is usual that once you have designed a workflow, you would like to optimize the result table creating a cluster over one or more columns.
Cluster and index optimization is the way to physically and/or logically locate records with similar relations (cluster column) to reduce the number of fetches or access time with prefetching technics.
These advanced options depend on destination data warehouse and are available for any component that generates one o more output tables.
Users can optimize their data by following these simple steps:
- 1.Create a model with one or more component generating an output table and select this component e.g. as in the model below

- 2.Fill all component parameters. This is necessary to allow final output schema generation that allow to fill up the dropdown menu of the optimization column.
- 3.Select the column among that listed in the pull down named depending on datawareouse used. e.g:
- if BigQuery then
Cluster by
- if SnowFlake then two pull down named
Search optimization column
andCluster by
- if Redshift then
Cluster by
- if Postgres then two pull down named
Index column
andCluster by index

- 4.If a column is selected, the user can check that the clustering or indexing action is added into the generated SQL. Note that generated string would vary depending on data warehouse.

BigQuery
Snowflake
Redshift
PostgreSQL
BigQuery allow select
Cluster by
column that have the following types:- Geography
- Geometry
- Boolean
- Number
- String
- Date
- Datetime
- Time
- Timestamp
Snowflake allow select
Search optimization column
that have the following types:- Geography
- Geometry
Whilst
Cluster by
column can be selected among that with the following types:- Boolean
- Number
- String
Redshift allow select
Cluster by
column as SORTKEY that have the following types:- Boolean
- Number
- String
Last modified 1mo ago