Performance Considerations
Builder will always try and get the data in the most convenient format for a performant visualization. Depending on the size of the data you are loading as a source in Builder, different mechanisms will be used.
For small tables, the complete dataset will be loaded client-side, in the browser’s memory. This means that no further request to the server is needed when panning the map and moving across different zoom levels. Once loaded, this methods offers very good performance and a very smooth user experience across zoom levels.
The limits for this mode depend on the type of source and the data warehouse used for the connection:
| BigQuery | Redshift | Snowflake | PostgreSQL |
Table size | 30MB | 30k rows | 30MB | 30MB |
- Try and load just the columns that you need for your visualization.
- Aggregating data before visualization helps dealing with big volumes of data.
- Sometimes you won’t need very precise geometries, try simplifying them.
For all SQL queries, and datasets bigger than the limits in the chart above, data needs to be loaded progressively as vector tiles. These tiles will be dynamically generated via SQL queries pushed down to your data warehouse and rendered client-side as you pan the map.
Response times and general performance for dynamic tiles will be different depending on many factors:
- The size of the table or query result.
- The size and complexity of the individual geometries.
- The zoom level. The lower the zoom level, the more geometries fit in a single tile and the more costly is the query that needs to be performed.
- The data structure. Different mechanisms such as indexing, clusterization or partitions depending on the data warehouse will help a lot with the execution performance of the queries when generating dynamic tiles.
When there are a lot of geometries to be included in a tile, CARTO will automatically ignore some of them to make sure that the tile size is kept within reasonable limits for transferring and rendering. This can happen in different scenarios, like:
- When requesting a tile at a lower zoom level than recommended for the extent of your geometries
- When geometries are very densely distributed
- When geometries are small but very complex
There are optimizations that can be applied to a table to improve query performance and reduce processing cost. These optimizations can be applied via the Data Explorer UI.
The queries below show how the optimizations can be applied manually from your Data Warehouse console or SQL clients:
Use clustering by the geometry column to ensure that data is structured in a way that is fast to access. Check out this documentation page for more information. In order to create a clustered table out of an existing table with geometries, you can try with something like:
CREATE TABLE your_dataset.clustered_table
CLUSTER BY geom
AS
(SELECT * FROM your_original_table)
Use
ST_GEOHASH(geom)
to order your table like:CREATE TABLE POINTS_OPTIMIZED AS SELECT * FROM points ORDER BY ST_GEOHASH(geom);
Activate Search Optimization Service (only available in Snowflake Enterprise Edition) explicitly for the GEO index on the GEOGRAPHY column:
ALTER TABLE POINTS_OPTIMIZED ADD search optimization ON GEO(geom);
Also, take into account that your Snowflake role must have been granted the
SEARCH OPTIMIZATION
privilege on the relevant schema:GRANT ADD SEARCH OPTIMIZATION ON SCHEMA <schema_name> TO ROLE <role>
CREATE INDEX nyc_census_blocks_geom_idx
ON nyc_census_blocks
USING GIST (geom);
And use the index to cluster the table:
CLUSTER table_name USING nyc_census_blocks_geom_idx;
Remember that the cluster needs to be recreated if the data changes.
Also to avoid intermediate transformations, geometries should to be projected into
EPSG:3857
and make sure that the SRID is set for the column. Take a look at the ST_Transform
and ST_SetSRID
functions reference.For optimal performance, geometries need to be projected into
EPSG:4326
and make sure that the SRID is set for the column. Take a look at the ST_Transform
and ST_SetSRID
functions reference.Working with spatial indexes has many performance advantages by itself, but there are some optimizations that can be applied to your tables to improve query performance and reduce the processing cost.
Clustering the tables by the column containing the spatial index:
CREATE TABLE table_name CLUSTER BY (h3) AS SELECT h3 from table_name
or
CREATE TABLE table_name CLUSTER BY (quadbin) AS SELECT quadbin from table_name
Clustering the tables by the column containing the spatial index:
ALTER TABLE table_name CLUSTER BY (h3)
or
ALTER TABLE table_name CLUSTER BY (quadbin)
Optimizing the table using
ZORDER BY
expression, like:OPTIMIZE table_name ZORDER BY h3
Using the
SORTKEY
:ALTER TABLE table_name ALTER SORTKEY (h3);
or
ALTER TABLE table_name ALTER SORTKEY (quadbin);
Creating an index and using it to cluster the table:
CREATE INDEX index_name ON table_name (h3);
or
CREATE INDEX index_name ON table_name (quadbin);
and use the index to cluster the table:
CLUSTER table_name USING index_name;
Remember that the cluster needs to be recreated if the data changes.
When the dynamic tile generation is not an option due to the table size, the complexity of the geometries, or any other of the possible caveats mentioned before, the best option to achieve a performant visualization is to generate a tileset.
Generating a tileset basically means that the table (or SQL query) will be pre-processed and a new table containing all the tiles for a selected zoom range will produced. This method is great for visualization of large volumes of data, and it leverages some advanced functionality from the CARTO Analytics Toolbox available for different cloud data warehouses:
Last modified 24d ago