Data sources

When you open a map, the left panel will appear on the screen. There you can add data sources that will be visualized as map layers.

Data source types

There are different types of data that can be loaded in Builder. Basically, aggregated data sources like grids based on spatial indexes and simple features.

Aggregated grids

Based on Discrete Global Grid (DGG) systems, this kind of data source uses a spatial index (H3 or Quadbin) to reference each cell of the grid. Think of a spatial index as an id that always makes reference to the same portion of the surface on Earth.

  • this portion of the Earth is called a cell.

  • the shape of the cell depends on the type of index. For example, H3 uses hexagons; while Quadbin use squares.

  • the size of the cell depends on the resolution. The higher the resolution, the smaller the size of the cell.

DGG systems are hierarchical, which means that every cell contains a constant number of smaller cells at a higher resolution:

The above is an example of how each H3 cell is sub-divided into smaller cells at higher resolutions.

One of the advantages of working with spatial indexes is that operating with them in data warehouses is way more efficient and cost-effective than computing geometries. They are also smaller in size and help saving storage and reducing the volume of transferred data.

When working with DGGs, Builder will dynamically aggregate your data into cells at a meaningful resolution depending on the current map zoom level. See the animation below for an example:

This is what a table containing H3 indexes looks like, with some additional columns that contain aggregated socio-demographic data for each hexagon:

h3populationavg_rent

8a0c0036a49ffff

103.0

1344.56

8a0c002e4c0ffff

1093.0

2087.04

8a0c002e4caffff

209.0

3098.39

The h3 column contains the indexes for H3 cells at level 10. That’s what we call the native resolution of the data.

However, if you load the table in a Builder map and zoom out to a low zoom level, it will be shown at a lower resolution, which means we would actually be visualizing an aggregated version of our table. This aggregation will be generated on the fly, using SQL queries that are pushed from CARTO into the data warehouse where the table lives.

The above implies that hexagons will be aggregated into their parents: the bigger hexagons that contain them at a lower resolutions.

But data also needs to be aggregated, so Builder will always need you to pick an aggregation method for the data used in the map. This applies to all selectors where you can pick a property for cartography settings, pop-ups and widgets.

When loading a table that contains a spatial index, there is a column name convention that should followed to ensure that CARTO can fully support it.

  • For H3 ids, the column should be named h3

  • For Quadbin ids, the column should be named quadbin

There are some performance and processing cost optimizations that should be applied to this kind of table. Find them in this section.

Simple features

Simple features are defined as a standard which specifies digital storage of geographical data (usually point, line or polygon) with both spatial and non-spatial attributes.

Most data warehouses support simple features through different data types, such as geometry or geography. This table shows the current type supported on each data warehouse:

Geography

Geometry

BigQuery

Not Supported

CARTO DW

Not Supported

Redshift

Not Supported

Snowflake

PostgreSQL

Not Supported

Databricks

Not Supported

Simple features are widely spread and have been traditionally used by GIS software to store the shape and properties of phenomena that occur on the surface of the Earth.

CARTO supports simple features stored as geometry or geography in cloud data warehouses. There are different methods to load a data source that contains simple features in a Builder map. These methods ensure the most performance when rendering data on a map, and they’re selected automatically based on the type and size of the data source:

  • For small data sources, data can be fully loaded at once on the map.

  • For bigger data sources, and also those defined as arbitrary SQL queries, data is loaded progressively via vector tiles. The data for these tiles is extracted by pushing down SQL queries to the data warehouse, and they’re are requested as you zoom in and out or pan the map.

Find more information about the different methods mentioned above in this section.

When loading a table that contains simple features, there is a column name convention that should followed to ensure that CARTO can fully support it.

CARTO expects to find a column named geom that will be used to render the features in the map.

Adding a data source

Click on Add source from, to access data from your existing data warehouse connections. If you haven’t added a data source to the map yet, you will see the following page:

Just click on "Add source from" to display the following options:

Add source from your connections

If you select "Data Explorer", the default tab is "From your connection", which will let you browse your connections and select a table or a tileset. Use the search bar and the breadcrumbs to find your data and feel free to switch to tree view in the top right if that's easier.

Once you've found your data, click on "Add source"

At this point, you may see a warning about the table potentially being slow because it's not optimized for geospatial analysis. This process is covered in Optimizing your data.

Once the process is finished, the table or tileset is included in the Builder map as a new layer. The map displays the basemap and the new layer on top. You can add additional layers, or start applying styling and analysis features.

Partitioned BigQuery tables will fail, since they always require a WHERE clause in the query that filters by the column used for the partition.

If you need to load a BigQuery partitioned table in Builder, the best option is to add them as a SQL Query source like:

SELECT *
FROM project.dataset.my_partitioned_table
WHERE partition_column = 'value'

Once you have added your datasets to the map, you can always add a new layer or delete the source. Click on the three dots icon and select Add layer or Delete source. When you click the Delete quick action, a dialog will appear allowing you to confirm that you want to delete the selected data source and warning you that it will be affect the layers created with this source.

You can also rename a source by clicking on the “Rename” option.

Rename the source and press enter to save your changes.

Note that you cannot leave a source unnamed. In this case, the last name set will be the default name.

Once the source is renamed, the new name will replace the old one in the data table or when we add a new widget.

Add source from a custom query

If you've selected "Custom Query (SQL)", a new dialog screen will open allowing you to create your own query or run a template SQL analysis to data on your connection. Select an option and click Add source.

Once the process is finished, the SQL Editor appears in the Builder interface, where you can type your query and then click Run when it is active to execute the query. Please make sure to use Fully Qualified Table names.

From the SQL Editor, you can easily toggle to full screen mode, minimize, maximize or close the panel.

Full screen mode allows you to easily work with long SQL queries. Once your query is executed, you can always restore to default screen by clicking on Exit full screen.

Also note that while typing the query, a label will indicate that the SQL Editor is in edit mode.

For BigQuery data sources, when you enter a query in the SQL Editor, a query validator (dry-run) verifies the query syntax and provides an estimate of the number of bytes read by the query. You can check out this documentation page for more information.

If the query is valid, then a check mark automatically appears along with the amount of data that the query will process.

If the query is invalid, then an exclamation point appears along with an error message (syntax error, permission issues, etc.).

Bear in mind that the dry-run option is only available for BigQuery data sources (CARTO Data Warehouse and BigQuery connection)

When you click on Run, a timer will appear while the query is running informing you how long the query is taking until it’s finished. A label will also appear along indicating the connection being used to execute the query. In this example, the BigQuery connection named as custom_name.

You can click on Cancel at any time to stop running the query. At this poing, the layer loading will also be cancelled.

If you keep the query running and it executes successfully, the table will be included in the Builder map tool as a layer.

Remember, when using running queries that return geometries, you should use an alias in the query to make sure the column that contains the geometry is called geom. For example:

SELECT population, geometry as geom FROM demographic_data

Use spatial indexes in custom queries

CARTO supports H3 and Quadbin spatial indexes. In order to render a map from a data source that contains a spatial index instead of a geometry, there are some nuances to take into account.

First, if you are going to type a query that returns an spatial index (H3 or Quadbin), you should use the spatial data type selector on your SQL Panel to select the type of data that you’re working with.

The Spatial data type selector will recognize patterns such as AS h3 or AS quadbin in your query and will automatically set to the appropriate value:

  • If your query is going to return H3 indexes, make sure the Spatial data type selector is set to H3 and make sure the column that contains the H3 indexes is called h3. For example:

SELECT
  `carto-un`.carto.H3_FROMGEOGPOINT(geom, 10) as h3,
  count(*) as num_points
FROM 10M_points_table
GROUP BY h3
  • If your query is going to return Quadbin indexes, make sure the Spatial data type selector is set to Quadbin and make sure the column that contains the indexes is called quadbin. For example:

SELECT
  `carto-un`.carto.QUADBIN_FROMGEOGPOINT(geom, 10) as quadbin,
  count(*) as num_points
FROM 10M_points_table
GROUP BY quadbin

Learn more about using the CARTO Analytics Toolbox to work with spatial indexes here.

Also, when working with spatial indexes, there are a few important details to take into account in order to optimize performance and reduce the associated computing cost. Learn more about it here.

Create a table from query

Additionally, you can also persist the query as a table by clicking on Create table from query button that will be available when the query is successfully completed.

Click on Create table from query button. A new dialog will open allowing you to create your table from query into the available connections. This interface will allow you to set the location and name of the output table. Click on Create table from query to continue or click on Cancel if you don’t want the changes to be applied.

This option is only available for the same connection used for running the query. If you try to select a different location, a message will appear warning you that the connection cannot be changed in this process.

Select the location and click on Create table from query.

Once the process is finished, the table is included in the Builder map as a new layer. The map displays the basemap and the new layer on top. You can add additional layers, or start applying styling and analysis features.

Custom queries using the Analytics Toolbox

You can also use the Analytics Toolbox functions in your custom SQL queries. For example, you can perform a simple clustering using the ST_CLUSTERKMEANS function by running this query from your CARTO Data Warehouse connection:

WITH clustered_points AS
(
    SELECT `carto-un`.carto.ST_CLUSTERKMEANS(ARRAY_AGG(geom ignore nulls), 6) AS cluster_arr
    FROM `carto-demo-data`.demo_tables.sample_customer_home_locations
)

SELECT cluster_element.cluster, cluster_element.geom AS geom FROM clustered_points, UNNEST(cluster_arr) AS cluster_element

This query computes five clusters from the points of the sample_customer_home_locations table. As a result, each point is assigned a cluster ID. By styling the layer by this cluster attribute, we get the following result:

Map Add Query

To learn more, please visit the Documentation page of the CARTO Analytics Toolbox for each provider:

Add source from a local or remote file

CARTO allows creating geospatial tables in your connections by importing files from your computer or via URL. Once a file is imported, the resulting table can be previewed in Data Explorer and used in Builder and external applications to create maps.

Find more information about compatible data warehouses, supported formats, column names, and delimiters in our Importing Data documentation.

To import files directly from Builder, you need to select "Import file" after clicking in "Add source from" A new dialog will open allowing you to import your data into the available connections.

You can upload your data through two different methods: Local or Remote.

  • Local

To import a local file, select the icon on the left.

This interface will allow you to upload data from your computer. Once you have selected the file, click on Continue.

The next screen will allow you to set the location and name of the output table. Once you have completed this configuration, click on Save here.

The last screen will show you a summary with the import details. Click on Import to confirm or click on Back if you want to edit anything before importing.

If you continue importing the file, CARTO will give you the option to follow the status of the import in this screen, or to minimize it and follow the status in the top right corner of the screen.

Once the data has been imported, the dataset is included in the Builder map tool as a new layer. You can then add additional layers, or apply styling and analysis features.

  • Remote

To import a remote URL, select the icon on the right.

This interface will allow you to enter a supported URL file. Once you have entered the URL, click on Continue.

The next screen will allow you to set the location and name of the output table. Once you have completed this configuration, click on Continue.

The last screen will show you a summary with the import details. Click on Add Source to confirm or click on Edit details if you want to edit anything before importing.

A new dialog will open informing you that the import may take a while to process and that once the file has been imported, it will be added as a data source to the map.

Once the data has been imported, the dataset is included in the Builder map tool as a new layer. You can then add additional layers, or apply styling and analysis features.

Add source from Data Observatory

From the Layers tab, go to the Sources panel and click on Add source from…. Go to the “Data Observatory” tab. A new dialog screen will open allowing you to select your subscriptions or samples from one of your connections. Select a subscription or a sample and click on Add source.

Bear in mind that subscriptions are only available from CARTO Data Warehouse, BigQuery and Snowflake connections, while samples are only available from CARTO Data Warehouse and BigQuery connections.

If you try to visualize a sample from an unavailable connection, a message will appear warning you that this sample is not available from that connection.

To learn more about how to visualize your Data Observatory datasets in Builder, please visit its dedicated documentation page.

Refreshing a data source

By opening the contextual menu on a data source card, you will find the option to “Refresh data source”.

By option will reload the data source and recreate the associated layers. With this option, CARTO will invalidate any previous cached object related to this source and will push a SQL query to the data warehouse to fetch fresh data.

Bear in mind that using this option will increase the amount of data processed in your data warehouse, which might have a significant cost associated to it.

The cached objects associated to the data source will be invalidated, and the SQL queries that were executed to generate them will be executed again.

Last updated