Visualize massive datasets
Learn how to visualize massive datasets in a performance and cost-effective way (up to billions of rows)
Following this guide we're going to visualize data for all the buildings in the world (publicly available in OpenStreetMap) in a performance and cost-effective way. This dataset has 481 M polygons and a size of 119GB.
When visualizing a dataset in CARTO, using the CartoLayer, it uses a technique called tiles. Map tiles were invented by Google when creating Google Maps. In short, instead of rendering the entire map each time the user zooms in and out, the map is broken down into many subsequent smaller parts.
Tiles map representation
Our Maps API is responsible of generating the tiles by performing requests to the data warehouse. The CARTO backend will pass this request, and will also handle the response with the data of a given tile. Finally, deck.gl will render it using the GPU in your computer.
In the CartoLayer (our layer for deck.gl), you can specify the type of the layer: Table, Query, or Tileset. When using Table or Query, the backend generates the tiles on the fly, and when using Tileset, the tiles have been pre-generated beforehand.
Generating tiles is not a simple task, it requires to run multiple geospatial operations (calculate intersections with your tiles, simplify polygons, drop invisible features, etc..), and for small datasets it can be done in real-time, but for large datasets (like the one in this guide) is better to pre-generate them in a tileset to get the best performance.
The CARTO platform provides an advanced CDN integration so that the API is only reached once if the source dataset has not been modified, no matter the number of requests.
In other words, if you run 1M requests to get a tile, 999K requests will hit the CDN, and only 1 request will reach the API and the warehouse.
In this guide you will learn:
- How to pregenerate a tileset.
- How to visualize it using deck.gl.
During this guide, we're using the CARTO Data Warehouse. The process explained here is also compatible with other Warehouses like BigQuery, Snowflake, Redshift, or Postgres. Instead of using
connection=carto_dw, you need to use connection=<your_connection>
CREATE TABLE cartobq.public_account.osm_buildings CLUSTER BY geometry
WHERE 'building' IN (SELECT key FROM UNNEST(all_tags)) AND geometry IS NOT NULL
This table is huge, it has 481M polygons and 119GB of data.
So it seems obvious that we need to create a tileset to visualize our 418M polygons dataset in a performant way.
SELECT geometry AS geom, osm_id
'osm_buildings_tileset' AS name,
NULL AS description,
NULL AS legend,
4 AS zoom_min,
15 AS zoom_max,
'geom' AS geom_column_name,
NULL AS zoom_min_column,
NULL AS zoom_max_column,
NULL AS max_tile_size_kb,
NULL AS tile_feature_order,
NULL AS drop_duplicates,
NULL AS extra_metadata
Once the tileset is created, we can visualize it using deck.gl.
It should look like this:
World Buildings visualization
The API takes less than 1s in general for a tile request:
In the case of CARTO Data Warehouse or BigQuery, the bytes billed by a tile request against the CARTO Data Warehouse are:
This price is approximated. It can vary based on the number of features of the tile, and the configuration of the tileset.