Enriching data
Last updated
Last updated
Enrichments are moving to CARTO Workflows
This documentation only applies to organizations created before April 1st, 2024. The assisted UI to enrich data starting from Data Explorer covered in this documentation page will be removed for all existing organizations on October 1st, 2024.
Why this change?
Currently, the best way to do an enrichment (both for custom data sources and Data Observatory datasets) is by using CARTO Workflows. Workflows provide a low-code visual UI to build enrichment processes, with expanded abilities such as scheduling or flexible combinations with other components and analytical pipelines. We will continue supporting and improving the enrichment capabilities in CARTO Workflows.
Get started with Enrichment in CARTO Workflows
Templates for Data Enrichment in CARTO Workflows
Data Enrichment components in CARTO Workflows.
Currently the “Enrich table” functionality is available for data accessible via BigQuery (incl. CARTO Data Warehouse) and Snowflake connections. You need to have active data subscriptions from the Data Observatory and the Analytics Toolbox installed on the data warehouse you will use.
Enrichment is the process of augmenting your data tables with new external variables by means of a spatial join between your data and a dataset from a Data Observatory subscription, and the application of an aggregation method (i.e. sum, average, max, min,…).
To illustrate the case of enriching polygons, as in the image below, imagine that we have polygons representing municipalities (i.e. named A, B and C in the image) and we want to enrich them based on the population attribute in a known buffer (i.e. named D) coming from an external dataset. We don’t know how the population is distributed inside these municipalities or inside the buffer. They are probably concentrated in cities somewhere, but, since we don’t know where they are, our best guess is to assume that the population is evenly distributed in the different geometries involved in the process (i.e. every point inside the municipalities or buffer has the same population density). Population is an extensive property (it grows with area), so we can subset it and also aggregate it by summing. In this case, we’d calculate the population inside each part of the circle that intersects with a municipality. On the other hand, when enriching points, the result of the process will give you the value of the variables from the Data Observatory subscription in the areas intersecting with the locations of the target points (to be enriched).
If you still do not have any active data subscription from the Data Observatory, start browsing our Spatial Data Catalog, where you will find information about the +11k spatial datasets from public and premium sources that we have in our offering.
To enrich one of your data tables, go to the Data Explorer and select a connection.
Now, click on the table you would like to enrich from the list.
Then, click on the Enrich table button from the available options at the top right of the screen.
A new dialog screen will open for you to choose the Data Observatory subscription with which you want to enrich your data table
It is important to note that for the enrichment process to yield any result, your target table and the Data Observatory subscription need to have overlapping geometries. Meaning, in order to enrich your table of census block groups in Chicago, the dataset from the Data Observatory should also have data on that geographic area. Otherwise, the spatial join between the two data sources will provide NULL values.
The data table you want to enrich and the dataset from the Data Observatory subscription should be available on the same data warehouse connection in order to be able to perform the enrichment procedure.
Once you have selected the Data Observatory subscription, it is time to select the specific variables with which you want to enrich your table. You will notice that some variables/columns from the dataset are disabled since they are not applicable for the enrichment operation - this includes variables such as geoid and do_date that are internal to CARTO and variables that cannot be aggregated such as those in string data type.
In order to enrich your table with external variables you need also to specify which aggregation method will be applied when intersecting the target geometries with those from the Data Observatory subscription. CARTO has identified a default aggregation method for each of the available variables from datasets in the Data Observatory. However, you can also modify the default and pick your aggregation method of choice, from the different supported operations: Sum, Minimum, Maximum, Average, and Count.
After selecting the variables and their associated aggregation methods, it is time to select where you want to save the results from the enrichment procedure. Depending on the permissions that you have on the target table (i.e. write vs. read-only), you will be able to “Create a new table” with the result of the enrichment or to “Enrich current table”, which will append new columns to the same target table selected at the beginning of the process.
If you select “Create new table”, the next step of the process will ask you to select the destination where the new table will be stored. Note that this destination should be accessible via the same data warehouse connection as the target table and the Data Observatory subscription, and that you will have to have the necessary write permissions in such destination.
Attention: This destination should be accessible via the same data warehouse connection as the target table and the Data Observatory subscription.
If you don’t have the necessary write permission on the selected destination, you will see a message such as the one illustrated in the image below.
Once you have selected a valid output destination and table name, you can run the enrichment process by clicking on the “Create table” button. That will trigger the start of the enrichment process.
Attention: Depending on the characteristics of the datasets involved and the number of variables that you have selected, the enrichment process can take from several seconds to a few minutes. Once the process has finished you can click on “Done” and start leveraging your new enriched table for your spatial analysis.
If there’s a problem with the process, the system will return an error message. You can click on Done to close the window or on Try Again to do the process again. Once the process has finished, you can click on Done and start leveraging your new enriched table.
If you would like to learn more about the Enrichment methods in CARTO’s Analytics Toolbox, you will find them in the following sections of our documentation: