Visualize massive datasets
Learn how to visualize massive datasets in a performance and cost-effective way (up to billions of rows)
Last updated
Was this helpful?
Learn how to visualize massive datasets in a performance and cost-effective way (up to billions of rows)
Last updated
Was this helpful?
Following this guide we're going to visualize data for all the buildings in the world (publicly available in ) in a performance and cost-effective way. This dataset has 481 M polygons and a size of 119GB.
When visualizing a dataset using , it uses a technique called tiles. Map tiles were invented by Google when creating Google Maps. In short, instead of rendering the entire map each time the user zooms in and out, the map is broken down into many subsequent smaller parts.
Our is responsible of generating the tiles by performing requests to the data warehouse. The CARTO backend will pass this request, and will also handle the response with the data of a given tile. Finally, deck.gl will render it using the GPU in your computer.
Generating tiles is not a simple task, it requires to run multiple geospatial operations (calculate intersections with your tiles, simplify polygons, drop invisible features, etc..), and for small datasets it can be done in real-time, but for large datasets (like the one in this guide) is better to pre-generate them in a tileset to get the best performance.
In this guide you will learn:
How to pregenerate a tileset.
How to visualize it using deck.gl.
This table is huge, it has 481M polygons and 119GB of data.
So it seems obvious that we need to create a tileset to visualize our 418M polygons dataset in a performant way.
Once the tileset is created, we can visualize it using deck.gl.
It should look like this:
The API takes less than 1s in general for a tile request:
In the case of CARTO Data Warehouse or BigQuery, the bytes billed by a tile request against the CARTO Data Warehouse are:
In you can specify the type of data source: Table, Query, or Tileset. When using Table or Query, CARTO generates automatically tiles on the fly, and when using Tileset, the tiles have been pre-generated beforehand.
CARTO provides procedures to create inside the Data Warehouse.
During this guide, we're using the . The process explained here is also compatible with other Warehouses like BigQuery, Snowflake, Redshift, or Postgres. Instead of using connection=carto_dw
, you need to use connection=<your_connection>
BigQuery has the dataset available for public usage, and using the following query we can extract all the buildings in the world. To run this query you will need to (recommended) or use .
The in the CARTO Analytics Toolbox will allow us to perform that operation. Use the following SQL query to create a tileset containing all the buildings
For more info visit the reference.
If you're using another Data Warehouse check the proper reference: , , , .
You can follow the steps explained in other guides to build a or application, and then simply change your layer to a Tileset Layer to integrate a tileset in your application.
Our tileset solution is cost-effective in terms of computing and storage. More details .
The cost of processing using is $5 per TB. So, it will be $0.00021 per 1 tile request. 21 cents for 1000 requests.