# Visualize massive datasets

Following this guide we're going to visualize data for *all* the **buildings** in the world (publicly available in [OpenStreetMap](https://console.cloud.google.com/marketplace/product/openstreetmap/geo-openstreetmap?project=cartodb-on-gcp-backend-team)) in a performance and cost-effective way. This dataset has **481 M polygons** and a size of **119GB.**

When visualizing a dataset using [CARTO + deck.gl](https://github.com/CartoDB/gitbook-documentation/blob/master/carto-for-developers/key-concepts/carto-for-deck.gl), it uses a technique called **tiles**. Map tiles were invented by Google when creating Google Maps. In short, instead of rendering the entire map each time the user zooms in and out, the map is broken down into many subsequent smaller parts.

<figure><img src="https://3029946802-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FybPdpmLltPkzGFvz7m8A%2Fuploads%2Fgit-blob-1a2f692416988f043456a7c9da5926233b916f6a%2Fimage.png?alt=media" alt="" width="346"><figcaption><p>Tiles map representation</p></figcaption></figure>

Our [Maps API](https://docs.carto.com/key-concepts/apis#maps) is responsible of generating the tiles by performing requests to the data warehouse. The CARTO backend will pass this request, and will also handle the response with the data of a given tile. Finally, deck.gl will render it using the GPU in your computer.

In [CARTO + deck.gl](https://github.com/CartoDB/gitbook-documentation/blob/master/carto-for-developers/key-concepts/carto-for-deck.gl) you can specify the type of data source: Table, Query, or Tileset. When using **Table or Query**, CARTO generates automatically tiles on the fly, and when using **Tileset,** the tiles have been pre-generated beforehand.

Generating tiles is not a simple task, it requires to run multiple geospatial operations (calculate intersections with your tiles, simplify polygons, drop invisible features, etc..), and for small datasets it can be done in real-time, but for large datasets (like the one in this guide) is better to pre-generate them in a tileset to get the best performance.

{% hint style="info" %}
The CARTO platform provides an advanced CDN integration so that the API is only reached once if the source dataset has not been modified, no matter the number of requests.

In other words, if you run 1M requests to get a tile, 999K requests will hit the CDN, and only 1 request will reach the API and the warehouse.
{% endhint %}

In this guide you will learn:

* How to pregenerate a tileset.
* How to visualize it using deck.gl.

CARTO provides procedures to create [Tilesets](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/key-concepts/tilesets) inside the Data Warehouse.

{% hint style="info" %}
During this guide, we're using the [CARTO Data Warehouse](https://docs.carto.com/carto-user-manual/connections/carto-data-warehouse). The process explained here is also compatible with other Warehouses like BigQuery, Snowflake, Redshift, Databricks, Oracle, or Postgres. Instead of using <mark style="color:orange;">`connection=carto_dw`</mark>, you need to use <mark style="color:orange;">connection=\<your\_connection></mark>
{% endhint %}

## Gathering a large dataset

BigQuery has the [OpenStreetMap](https://console.cloud.google.com/marketplace/product/openstreetmap/geo-openstreetmap?project=cartodb-on-gcp-backend-team) dataset available for public usage, and using the following query we can extract all the buildings in the world. To run this query you will need to [access the console of the CARTO Data Warehouse](https://docs.carto.com/carto-user-manual/connections/carto-data-warehouse#accessing-the-console) (recommended) or use [SQL API](https://docs.carto.com/key-concepts/apis#sql).

```sql
CREATE TABLE cartobq.public_account.osm_buildings CLUSTER BY geometry 
AS
  SELECT * 
    FROM `bigquery-public-data.geo_openstreetmap.planet_features`
    WHERE 'building' IN (SELECT key FROM UNNEST(all_tags)) AND geometry IS NOT NULL
```

This table is huge, it has 481M polygons and 119GB of data.

## Create a tileset

So it seems obvious that we need to create a tileset to visualize our 418M polygons dataset in a performant way.

The [tiler module](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/tiler) in the **CARTO Analytics Toolbox** will allow us to perform that operation. Use the following SQL query to create a tileset containing all the buildings

```sql
CALL `carto-un`.carto.CREATE_TILESET(
  '(
    SELECT geometry AS geom, osm_id 
      FROM `cartobq.public_account.osm_buildings` 
     WHERE ST_GeometryType(geometry)=\'ST_Polygon\'
   )',
  '`cartobq.public_account.osm_buildings_tileset`',
  STRUCT(
    'osm_buildings_tileset' AS name,
    NULL AS description,
    NULL AS legend,
    4 AS zoom_min,
    15 AS zoom_max,
    'geom' AS geom_column_name,
    NULL AS zoom_min_column,
    NULL AS zoom_max_column,
    NULL AS max_tile_size_kb,
    NULL AS tile_feature_order,
    NULL AS drop_duplicates,
    NULL AS extra_metadata
  )
);
```

For more info visit the [CREATE TILESET](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/tiler#create_tileset) reference.

{% hint style="info" %}
If you're using another Data Warehouse check the proper reference: [BigQuery](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/key-concepts/tilesets), [Snowflake](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-snowflake/key-concepts/tilesets), [Postgres](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-postgresql/key-concepts/tilesets), [Redshift](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-redshift/key-concepts/tilesets).
{% endhint %}

## Visualize the tileset

Once the tileset is created, we can visualize it using deck.gl.

```typescript
const dataSource = vectorTilesetSource({
  ...cartoConfig,
  tableName: 'cartobq.public_account.osm_buildings_tileset'
});

const deck = new Deck({
  canvas: 'deck-canvas',
  initialViewState: INITIAL_VIEW_STATE,
  controller: true,
  layers: [
    new VectorTileLayer({
      id: 'places',
      data: dataSource,
      getFillColor: [18,147,154],
      getLineColor: [241,92,23],
      getLineWidth: 2,
    })
  ]
});
```

It should look like this:

<figure><img src="https://3029946802-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FybPdpmLltPkzGFvz7m8A%2Fuploads%2Fgit-blob-6a17d6972474491d7f030da0bfb30ff3f84d4f4c%2Fimage.png?alt=media" alt=""><figcaption><p>World Buildings visualization</p></figcaption></figure>

The API takes less than 1s in general for a tile request:

![](https://3029946802-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FybPdpmLltPkzGFvz7m8A%2Fuploads%2Fgit-blob-37d7d83113621dc1ccbace93bc49a2e98cf5d1fe%2Fimage.png?alt=media)

{% hint style="info" %}
You can follow the steps explained in other guides to build a [public](https://docs.carto.com/carto-for-developers/guides/build-a-public-application) or [private](https://docs.carto.com/carto-for-developers/guides/build-a-private-application) application, and then simply change your layer to a Tileset Layer to integrate a tileset in your application.
{% endhint %}

## What about the cost of the data warehouse?

Our tileset solution is cost-effective in terms of computing and storage. More details [here](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/key-concepts/tilesets#benefits).

In the case of CARTO Data Warehouse or BigQuery, the bytes billed by a tile request against the CARTO Data Warehouse are:

![](https://3029946802-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FybPdpmLltPkzGFvz7m8A%2Fuploads%2Fgit-blob-7c78e6d1c24637dd0217ab2f07b16c204b0e6aee%2Fimage.png?alt=media)

The cost of processing using [On-Demand](https://cloud.google.com/bigquery/pricing) is $5 per TB. So, it will be $0.00021 per 1 tile request. 21 cents for 1000 requests.

{% hint style="info" %}
This price is approximated. It can vary based on the number of features of the tile, and the configuration of the tileset.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.carto.com/carto-for-developers/guides/visualize-massive-datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
