How to optimize tables
It is usual that once you have designed a workflow, you would like to optimize the result table creating a cluster over one or more columns.
Cluster and index optimization is the way to physically and/or logically locate records with similar relations (cluster column) to reduce the number of fetches or access time with prefetching technics.
These advanced options depend on destination data warehouse and are available for any component that generates one o more output tables.
Users can optimize their data by following these simple steps:
- 1.Create a model with one or more component generating an output table and select this component e.g. as in the model below
- 2.Fill all component parameters. This is necessary to allow final output schema generation that allow to fill up the dropdown menu of the optimization column.
- 3.Select the column among that listed in the pull down named depending on datawareouse used. e.g:
- if BigQuery then
- if SnowFlake then two pull down named
Search optimization columnand
- if Redshift then
- if Postgres then two pull down named
Cluster by index
- 4.If a column is selected, the user can check that the clustering or indexing action is added into the generated SQL. Note that generated string would vary depending on data warehouse.
BigQuery allow select
Cluster bycolumn that have the following types:
Snowflake allow select
Search optimization columnthat have the following types:
Cluster bycolumn can be selected among that with the following types:
Redshift allow select
Cluster bycolumn as SORTKEY that have the following types: