How to optimize tables
It is usual that once you have designed a workflow, you would like to optimize the result table creating a cluster over one or more columns.
Cluster and index optimization is the way to physically and/or logically locate records with similar relations (cluster column) to reduce the number of fetches or access time with prefetching technics.
These advanced options depend on destination data warehouse and are available for any component that generates one o more output tables.
Users can optimize their data by following these simple steps:
Create a model with one or more component generating an output table and select this component e.g. as in the model below

Fill all component parameters. This is necessary to allow final output schema generation that allow to fill up the dropdown menu of the optimization column.
Select the column among that listed in the pull down named depending on datawareouse used. e.g:
if BigQuery then
Cluster by
if SnowFlake then two pull down named
Search optimization column
andCluster by
if Redshift then
Cluster by
if Postgres then two pull down named
Index column
andCluster by index

If a column is selected, the user can check that the clustering or indexing action is added into the generated SQL. Note that generated string would vary depending on data warehouse.

BigQuery allow select Cluster by
column that have the following types:
Geography
Geometry
Boolean
Number
String
Date
Datetime
Time
Timestamp
More info about BigQuery clustering here.
Last updated
Was this helpful?