Visualize massive datasets
Learn how to visualize massive datasets in a performance and cost-effective way (up to billions of rows)
Last updated
Learn how to visualize massive datasets in a performance and cost-effective way (up to billions of rows)
Last updated
Following this guide we're going to visualize data for all the buildings in the world (publicly available in OpenStreetMap) in a performance and cost-effective way. This dataset has 481 M polygons and a size of 119GB.
When visualizing a dataset using CARTO + deck.gl, it uses a technique called tiles. Map tiles were invented by Google when creating Google Maps. In short, instead of rendering the entire map each time the user zooms in and out, the map is broken down into many subsequent smaller parts.
Our Maps API is responsible of generating the tiles by performing requests to the data warehouse. The CARTO backend will pass this request, and will also handle the response with the data of a given tile. Finally, deck.gl will render it using the GPU in your computer.
In CARTO + deck.gl you can specify the type of data source: Table, Query, or Tileset. When using Table or Query, CARTO generates automatically tiles on the fly, and when using Tileset, the tiles have been pre-generated beforehand.
Generating tiles is not a simple task, it requires to run multiple geospatial operations (calculate intersections with your tiles, simplify polygons, drop invisible features, etc..), and for small datasets it can be done in real-time, but for large datasets (like the one in this guide) is better to pre-generate them in a tileset to get the best performance.
The CARTO platform provides an advanced CDN integration so that the API is only reached once if the source dataset has not been modified, no matter the number of requests.
In other words, if you run 1M requests to get a tile, 999K requests will hit the CDN, and only 1 request will reach the API and the warehouse.
In this guide you will learn:
How to pregenerate a tileset.
How to visualize it using deck.gl.
CARTO provides procedures to create Tilesets inside the Data Warehouse.
During this guide, we're using the CARTO Data Warehouse. The process explained here is also compatible with other Warehouses like BigQuery, Snowflake, Redshift, or Postgres. Instead of using connection=carto_dw
, you need to use connection=<your_connection>
BigQuery has the OpenStreetMap dataset available for public usage, and using the following query we can extract all the buildings in the world. To run this query you will need to access the console of the CARTO Data Warehouse (recommended) or use SQL API.
This table is huge, it has 481M polygons and 119GB of data.
So it seems obvious that we need to create a tileset to visualize our 418M polygons dataset in a performant way.
The tiler module in the CARTO Analytics Toolbox will allow us to perform that operation. Use the following SQL query to create a tileset containing all the buildings
For more info visit the CREATE TILESET reference.
Once the tileset is created, we can visualize it using deck.gl.
It should look like this:
The API takes less than 1s in general for a tile request:
Our tileset solution is cost-effective in terms of computing and storage. More details here.
In the case of CARTO Data Warehouse or BigQuery, the bytes billed by a tile request against the CARTO Data Warehouse are:
The cost of processing using On-Demand is $5 per TB. So, it will be $0.00021 per 1 tile request. 21 cents for 1000 requests.
This price is approximated. It can vary based on the number of features of the tile, and the configuration of the tileset.