Performance considerations

CARTO Builder strives to load data in the most efficient format for optimal visualization performance. Depending on the size of the data source, different mechanisms and performance recommendations are applied:

  • Up to large datasets and SQL Queries: These include simple features, SQL query sources, and spatial indexes. Data is loaded progressively as vector tiles generated dynamically via SQL queries to your data warehouse, optimizing performance while maintaining responsiveness.

  • Very large datasets: For large datasets, tilesets are pre-generated to handle high data volumes efficiently. This method is ideal for complex geometries or extensive datasets, ensuring high-performance visualizations.

Up to large datasets and SQL Queries

For all SQL queries, spatial index source types and datasets bigger than the limits in the chart above, data is loaded progressively as vector tiles, a method named Dynamic Tiling. These tiles will be dynamically generated via SQL queries pushed down to your data warehouse and rendered client-side as you pan the map.

Visualization optimization

When using dynamic tiling, large-scale data visualization is optimized by adjusting the display of different geometry types. For point geometries, fewer points are shown at higher zoom levels. For lines and polygons, features are simplified or selectively rendered based on zoom level. This approach ensures efficient, scalable, and clear map performance.

Best practices

Response times depend on table size, geometry complexity, zoom level, and data structure. Indexing, clustering, or partitions can enhance query performance.

There are optimizations that can be applied to a table to improve query performance and reduce processing cost:

These optimizations can be applied via the Data Explorer UI or manually from your Data Warehouse console or SQL clients

BigQuery

If your source contains simple features, you should cluster your table by the geometry column to ensure that data is structured in a way that is fast to access:

CREATE TABLE your_dataset.clustered_table
CLUSTER BY geom
AS
(SELECT * FROM your_original_table)

For spatial index sources, you must cluster the tables by the column containing the spatial index, as per this example:

CREATE TABLE table_name CLUSTER BY (h3) AS SELECT h3 from table_name

Check out this documentation page for more information.

Snowflake

When working with simple features, use ST_GEOHASH(geom) to order your table:

CREATE TABLE POINTS_OPTIMIZED AS SELECT * FROM points ORDER BY ST_GEOHASH(geom);

Activate Search Optimization Service (only available in Snowflake Enterprise Edition) explicitly for the GEO index on the GEOGRAPHY column:

ALTER TABLE POINTS_OPTIMIZED ADD search optimization ON GEO(geom);

Also, take into account that your Snowflake role must have been granted the SEARCH OPTIMIZATION privilege on the relevant schema:

GRANT ADD SEARCH OPTIMIZATION ON SCHEMA <schema_name> TO ROLE <role> 

If you are working with spatial indexes, clustering the tables by the column containing the spatial index:

ALTER TABLE table_name CLUSTER BY (h3)

Databricks

CARTO supports simple features in tables with a couple of requirements in order to be able to make fast geospatial queries to generate tiles dynamically:

  • The geo column must be of binary type and contains a WKB representation of the geography.

  • Each row must contain four additional columns __carto_xmin, __carto_ymin, __carto_xmax, __carto_ymax that describe the Bounding Box of each feature. These columns help store the table in a way that allow fast queries and avoid full scans on each query.

This is an example query that uses Databricks Spatial SQL (running on Photon) to prepare a table with these requirements. In this example, the geom column contains features as WKT strings.

CREATE TABLE simple_features_table_manually_prepared
          AS (
            SELECT
                st_xmin(geom) AS __carto_xmin,
                st_xmax(geom) AS __carto_xmax,
                st_ymin(geom) AS __carto_ymin,
                st_ymax(geom) AS __carto_ymax,
                st_asbinary(st_geomfromtext(geom)) AS geom,
                * EXCEPT(geom)
            FROM simple_features_wkt_table
            ORDER BY 1, 2, 3, 4
          )
          CLUSTER BY (__carto_xmin, __carto_xmax, __carto_ymin, __carto_ymax)

If your original table already contains geographies as WKB binary, the query could be a bit simpler:

CREATE TABLE simple_features_table_manually_prepared
          AS (
            SELECT
                st_xmin(geom) AS __carto_xmin,
                st_xmax(geom) AS __carto_xmax,
                st_ymin(geom) AS __carto_ymin,
                st_ymax(geom) AS __carto_ymax,
                your_geometry_column AS geom,
                *
            FROM simple_features_wkb_table
            ORDER BY 1, 2, 3, 4
          )
          CLUSTER BY (__carto_xmin, __carto_xmax, __carto_ymin, __carto_ymax)

Your Databricks workspace needs to be enabled with Spatial SQL functions, which are currently in Private Preview.

The Databricks team has made this form available to request access to the functions. Please get in touch with them through the form to gain access to all Spatial SQL functions.

When working with h3 spatial index, you should optimizie the table using ZORDER BY expression, like:

OPTIMIZE table_name  ZORDER BY h3

PostgreSQL (with PostGIS)

Database indexes will also help with performance. For example:

CREATE INDEX nyc_census_blocks_geom_idx
  ON nyc_census_blocks
  USING GIST (geom);

And use the index to cluster the table:

CLUSTER table_name USING nyc_census_blocks_geom_idx;

Remember that the cluster needs to be recreated if the data changes.

To avoid intermediate transformations, geometries should to be projected into EPSG:3857 and make sure that the SRID is set for the column. Take a look at the ST_Transform and ST_SetSRID functions reference.

Creating an index and using it to cluster the table:

CREATE INDEX index_name ON table_name (h3);

or

CREATE INDEX index_name ON table_name (quadbin);

and use the index to cluster the table:

CLUSTER table_name USING index_name;

Remember that the cluster needs to be recreated if the data changes.

Redshift

For optimal performance, geometries need to be projected into EPSG:4326 and make sure that the SRID is set for the column. Take a look at the ST_Transform and ST_SetSRID functions reference.

If working with spatial indexes, you can use the SORTKEY:

ALTER TABLE table_name ALTER SORTKEY (h3);

Very large datasets

When the dynamic tile generation is not an option due to the table size, the complexity of the geometries, or any other of the possible caveats mentioned before, the best option to achieve a performant visualization is to generate a tileset.

Generating a tileset basically means that the table (or SQL query) will be pre-processed and a new table containing all the tiles for a selected zoom range will produced. This method is great for visualization of large volumes of data.

You can create a pre-generated tilesets using CARTO Workflows or CARTO Analytics Toolbox for the different cloud data warehouse as below:

Last updated