Visualize massive datasets

Learn how to visualize massive datasets in a performance and cost-effective way (up to billions of rows)

Following this guide we're going to visualize data for all the buildings in the world (publicly available in OpenStreetMap) in a performance and cost-effective way. This dataset has 481 M polygons and a size of 119GB.

When visualizing a dataset in CARTO, using the CartoLayer, it uses a technique called tiles. Map tiles were invented by Google when creating Google Maps. In short, instead of rendering the entire map each time the user zooms in and out, the map is broken down into many subsequent smaller parts.

Our Maps API is responsible of generating the tiles by performing requests to the data warehouse. The CARTO backend will pass this request, and will also handle the response with the data of a given tile. Finally, deck.gl will render it using the GPU in your computer.

In the CartoLayer (our layer for deck.gl), you can specify the type of the layer: Table, Query, or Tileset. When using Table or Query, the backend generates the tiles on the fly, and when using Tileset, the tiles have been pre-generated beforehand.

Generating tiles is not a simple task, it requires to run multiple geospatial operations (calculate intersections with your tiles, simplify polygons, drop invisible features, etc..), and for small datasets it can be done in real-time, but for large datasets (like the one in this guide) is better to pre-generate them in a tileset to get the best performance.

The CARTO platform provides an advanced CDN integration so that the API is only reached once if the source dataset has not been modified, no matter the number of requests.

In other words, if you run 1M requests to get a tile, 999K requests will hit the CDN, and only 1 request will reach the API and the warehouse.

In this guide you will learn:

  • How to pregenerate a tileset.

  • How to visualize it using deck.gl.

CARTO provides procedures to create Tilesets inside the Data Warehouse.

During this guide, we're using the CARTO Data Warehouse. The process explained here is also compatible with other Warehouses like BigQuery, Snowflake, Redshift, or Postgres. Instead of using connection=carto_dw, you need to use connection=<your_connection>

Gathering a large dataset

BigQuery has the OpenStreetMap dataset available for public usage, and using the following query we can extract all the buildings in the world. To run this query you will need to access the console of the CARTO Data Warehouse (recommended) or use SQL API.

CREATE TABLE cartobq.public_account.osm_buildings CLUSTER BY geometry 
AS
  SELECT * 
    FROM `bigquery-public-data.geo_openstreetmap.planet_features`
    WHERE 'building' IN (SELECT key FROM UNNEST(all_tags)) AND geometry IS NOT NULL

This table is huge, it has 481M polygons and 119GB of data.

Create a tileset

So it seems obvious that we need to create a tileset to visualize our 418M polygons dataset in a performant way.

The tiler module in the CARTO Analytics Toolbox will allow us to perform that operation. Use the following SQL query to create a tileset containing all the buildings

CALL `carto-un`.carto.CREATE_TILESET(
  '(
    SELECT geometry AS geom, osm_id 
      FROM `cartobq.public_account.osm_buildings` 
     WHERE ST_GeometryType(geometry)=\'ST_Polygon\'
   )',
  '`cartobq.public_account.osm_buildings_tileset`',
  STRUCT(
    'osm_buildings_tileset' AS name,
    NULL AS description,
    NULL AS legend,
    4 AS zoom_min,
    15 AS zoom_max,
    'geom' AS geom_column_name,
    NULL AS zoom_min_column,
    NULL AS zoom_max_column,
    NULL AS max_tile_size_kb,
    NULL AS tile_feature_order,
    NULL AS drop_duplicates,
    NULL AS extra_metadata
  )
);

For more info visit the CREATE TILESET reference.

If you're using another Data Warehouse check the proper reference: BigQuery, Snowflake, Postgres, Redshift.

Visualize the tileset

Once the tileset is created, we can visualize it using deck.gl.

new CartoLayer({
  connection: 'carto_dw',
  type: MAP_TYPES.TILESET,
  data: 'cartobq.public_account.osm_buildings_tileset',
  getFillColor: [18,147,154],
  getLineColor: [241,92,23],
  getLineWidth: 2,
})

It should look like this:

The API takes less than 1s in general for a tile request:

You can follow the steps explained in other guides to build a public or private application, and then simply change your layer to a Tileset Layer to integrate a tileset in your application.

What about the cost of the data warehouse?

Our tileset solution is cost-effective in terms of computing and storage. More details here.

In the case of CARTO Data Warehouse or BigQuery, the bytes billed by a tile request against the CARTO Data Warehouse are:

The cost of processing using On-Demand is $5 per TB. So, it will be $0.00021 per 1 tile request. 21 cents for 1000 requests.

This price is approximated. It can vary based on the number of features of the tile, and the configuration of the tileset.

Last updated