Analytics Toolbox for Snowflake

Analytics Toolbox for Snowflake

Data enrichment using the Data Observatory

In this guide you will learn how to perform data enrichment using Data Observatory data and the Analytics Toolbox. In this guide you will learn how to perform data enrichment using Data Observatory data and the Analytics Toolbox from your Snowflake console. To use the Enrichment UI available in the Data Explorer, please refer to this guide.

1. Create a connection with Snowflake in the CARTO Workspace

  1. Sign into your CARTO Workspace. If you still don’t have an account, you can sign-up here for a 14-day trial.
  2. Navigate to the Connections section.
  3. Create a new connection with Snowflake.

For more details, please refer to the documentation.

2. Subscribe to the Data Observatory datasets

  1. Navigate to the Data Observatory section of the CARTO Workspace.
  2. Using the Spatial Data Catalog, subscribe to the following datasets. You can find these datasets by using the search bar or the filter column on the left of the screen:
    • Spatial Features - Spain (H3 Resolution 8) from CARTO.
    • Census Sections (Spain), with displaced Canary Islands (2020) from CARTO.
Create Data Observatory subscriptions
  1. In order for you to access any of your Data Observatory subscriptions from Snowflake, our team need to manually provision the data in a database within your Snowflake account. Go to the “Access In > Snowflake” of at least one of your Data Observatory subscriptions and place a request so our engineering team start the provisioning process.
Create Data Observatory subscriptions
  1. Once our team confirms you that the data subscriptions have been provisioned in your Snowflake account, go back to the “Access in” section of your subscription and copy the Snowflake database and schema where the data has been made available.
Find the location of your Data Observatory subscriptions
  1. Now, in the Snowflake console, confirm that you can see all the relevant data subscriptions by running the command below, which makes use of the DATAOBS_SUBSCRIPTIONS procedure. Please replace the Snowflake database and schema with those you copied in the previous step.
1
2
3
USE DATABASE <ANALYTICS_TOOLBOX_DB>;
CALL carto.DATAOBS_SUBSCRIPTIONS('carto_dev_data.carto','');
-- CALL carto.DATAOBS_SUBSCRIPTIONS('MY_DATAOBS_DB.MY_DATAOBS_SCHEMA', '');

enrichment_sf_subscriptions

3. Choose variables for the enrichment

We can list all the variables (data columns) available in our Data Observatory subscriptions by running the following query, which makes use of the DATAOBS_SUBSCRIPTION_VARIABLES procedure. Please remember to replace the Snowflake database and schema with those you used in the previous command.

1
2
3
USE DATABASE <ANALYTICS_TOOLBOX_DB>;
CALL carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto_dev_data.carto','');
-- CALL carto.DATAOBS_SUBSCRIPTION_VARIABLES('MY_DATAOBS_DB.MY_DATAOBS_SCHEMA', '');
Select the Data Observatory variables for enrichment in BigQuery

enrichment_sf_subscription_variables

In this particular example we are going to enrich our data with the following variables. Please note that these variables are uniquely identified by their variable_slug.

  • population_3c8da397, retail_4bcd08da,elevation_f0733359 and tavg_jan_c35c48dc: these variables are from the CARTO Spatial Features dataset for Spain, aggregated in an H3 grid of resolution 8. As we can see in the variable_description column, they represent the total population, their number of retail POIs, their average elevation and their average temperature in January, respectively.

4. Run the enrichment

The enrichment is performed using the DATAOBS_ENRICH_POLYGONS procedure of the Analytics Toolbox. Please note that this particular procedure makes use of spatial indexes and does not require the input data to have a geometry column.

The following inputs are needed:

  • The input table to be enriched.
  • The list of variables to be used for the enrichment and their aggregation method. As explained earlier, these variables are identified using their variable_slug. For more information about the aggregation methods, please refer to the documentation.
  • Name of the output table where the result of the enrichment will be stored.
  • Location of your Data Observatory subscriptions. This is the same database.schema we used to run the DATAOBS_SUBSCRIPTIONS and DATAOBS_SUBSCRIPTION_VARIABLES in previous steps of this guide.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
USE DATABASE <ANALYTICS_TOOLBOX_DB>;
CALL carto.DATAOBS_ENRICH_POLYGONS
('MY_DATAOBS_DB.MY_DATAOBS_SCHEMA.CENSUSSECTIONDISPLACEDCANARYISLANDS_2020', 'GEOM', 
 ARRAY_CONSTRUCT(
  OBJECT_CONSTRUCT('variable', 'population_3c8da397', 'aggregation', 'SUM'),
  OBJECT_CONSTRUCT('variable', 'retail_4bcd08da', 'aggregation', 'SUM'),
  OBJECT_CONSTRUCT('variable', 'elevation_f0733359', 'aggregation', 'AVG'),
  OBJECT_CONSTRUCT('variable', 'tavg_jan_c35c48dc', 'aggregation', 'AVG')), 
 NULL, 
 TO_ARRAY('MY_DATAOBS_DB.MY_DATAOBS_SCHEMA.CENSUSSECTIONDISPLACEDCANARYISLANDS_2020_ENRICHED'), 
 'MY_DATAOBS_DB.MY_DATAOBS_SCHEMA');
-- The table `CENSUSSECTIONDISPLACEDCANARYISLANDS_2020_ENRICHED` will be created
Preview of the enrichment result

5. Analyze the enrichment result

The table resulting from running the previous query, CENSUSSECTIONDISPLACEDCANARYISLANDS_2020_ENRICHED, will include all the columns of the input table plus four additional columns, containing the value of each enrichment variable in each polygon. As shown below, the enrichment result can be analyzed with the help of a map and a set of interactive widgets created using Builder, our map making tool available from the CARTO Workspace.

To get started creating maps, we recommend the following resources from the documentation:

EU flag

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.