Performance considerations
CARTO Builder strives to load data in the most efficient format for optimal visualization performance. Depending on the size of the data source, different mechanisms and performance recommendations are applied:
Up to large datasets and SQL Queries: These include simple features, SQL query sources, and spatial indexes. Data is loaded progressively as vector tiles generated dynamically via SQL queries to your data warehouse, optimizing performance while maintaining responsiveness.
Very large datasets: For large datasets, tilesets are pre-generated to handle high data volumes efficiently. This method is ideal for complex geometries or extensive datasets, ensuring high-performance visualizations.
Up to large datasets and SQL Queries
For all SQL queries, spatial index source types and datasets bigger than the limits in the chart above, data is loaded progressively as vector tiles, a method named Dynamic Tiling. These tiles will be dynamically generated via SQL queries pushed down to your data warehouse and rendered client-side as you pan the map.
Visualization optimization
When using dynamic tiling, large-scale data visualization is optimized by adjusting the display of different geometry types. For point geometries, fewer points are shown at higher zoom levels. For lines and polygons, features are simplified or selectively rendered based on zoom level. This approach ensures efficient, scalable, and clear map performance.
Best practices
Response times depend on table size, geometry complexity, zoom level, and data structure. Indexing, clustering, or partitions can enhance query performance.
There are optimizations that can be applied to a table to improve query performance and reduce processing cost:
These optimizations can be applied via the Data Explorer UI or manually from your Data Warehouse console or SQL clients
BigQuery
If your source contains simple features, you should cluster your table by the geometry column to ensure that data is structured in a way that is fast to access:
For spatial index sources, you must cluster the tables by the column containing the spatial index, as per this example:
Check out this documentation page for more information.
Snowflake
When working with simple features, use ST_GEOHASH(geom)
to order your table:
Activate Search Optimization Service (only available in Snowflake Enterprise Edition) explicitly for the GEO index on the GEOGRAPHY column:
Also, take into account that your Snowflake role must have been granted the SEARCH OPTIMIZATION
privilege on the relevant schema:
If you are working with spatial indexes, clustering the tables by the column containing the spatial index:
Databricks
CARTO supports simple features in tables with a couple of requirements in order to be able to make fast geospatial queries to generate tiles dynamically:
The geo column must be of binary type and contains a WKB representation of the geography.
Each row must contain four additional columns
__carto_xmin
,__carto_ymin
,__carto_xmax
,__carto_ymax
that describe the Bounding Box of each feature. These columns help store the table in a way that allow fast queries and avoid full scans on each query.
This is an example query that uses Databricks Spatial SQL (running on Photon) to prepare a table with these requirements. In this example, the geom
column contains features as WKT strings.
If your original table already contains geographies as WKB binary, the query could be a bit simpler:
Your Databricks workspace needs to be enabled with Spatial SQL functions, which are currently in Private Preview.
The Databricks team has made this form available to request access to the functions. Please get in touch with them through the form to gain access to all Spatial SQL functions.
When working with h3 spatial index, you should optimizie the table using ZORDER BY
expression, like:
PostgreSQL (with PostGIS)
Database indexes will also help with performance. For example:
And use the index to cluster the table:
Remember that the cluster needs to be recreated if the data changes.
To avoid intermediate transformations, geometries should to be projected into EPSG:3857
and make sure that the SRID is set for the column. Take a look at the ST_Transform
and ST_SetSRID
functions reference.
Creating an index and using it to cluster the table:
or
and use the index to cluster the table:
Remember that the cluster needs to be recreated if the data changes.
Redshift
For optimal performance, geometries need to be projected into EPSG:4326
and make sure that the SRID is set for the column. Take a look at the ST_Transform
and ST_SetSRID
functions reference.
If working with spatial indexes, you can use the SORTKEY
:
Very large datasets
When the dynamic tile generation is not an option due to the table size, the complexity of the geometries, or any other of the possible caveats mentioned before, the best option to achieve a performant visualization is to generate a tileset.
Generating a tileset basically means that the table (or SQL query) will be pre-processed and a new table containing all the tiles for a selected zoom range will produced. This method is great for visualization of large volumes of data.
You can create a pre-generated tilesets using CARTO Workflows or CARTO Analytics Toolbox for the different cloud data warehouse as below:
Tilesets in BigQuery
Tilesets in Snowflake
Tilesets in Redshift
Tilesets in PostgreSQL
Tilesets in Databricks
Last updated