How to optimize tables

It is usual that once you have designed a workflow, you would like to optimize the result table creating a cluster over one or more columns.

Cluster and index optimization is the way to physically and/or logically locate records with similar relations (cluster column) to reduce the number of fetches or access time with prefetching technics.

These advanced options depend on destination data warehouse and are available for any component that generates one o more output tables.

Users can optimize their data by following these simple steps:

  1. Create a model with one or more component generating an output table and select this component e.g. as in the model below

  1. Fill all component parameters. This is necessary to allow final output schema generation that allow to fill up the dropdown menu of the optimization column.

  2. Select the column among that listed in the pull down named depending on datawareouse used. e.g:

    • if BigQuery then Cluster by

    • if SnowFlake then two pull down named Search optimization column and Cluster by

    • if Redshift then Cluster by

    • if Postgres then two pull down named Index column and Cluster by index

  1. If a column is selected, the user can check that the clustering or indexing action is added into the generated SQL. Note that generated string would vary depending on data warehouse.

BigQuery allow select Cluster by column that have the following types:

  • Geography

  • Geometry

  • Boolean

  • Number

  • String

  • Date

  • Datetime

  • Time

  • Timestamp

More info about BigQuery clustering here.

Last updated

Was this helpful?