How to access your Data Observatory subscriptions
This guide showcases how to access the data from your Data Observatory subscriptions available in your CARTO Data Warehouse by using the Analytics Toolbox from a Python notebook. You can find the original notebook here.
To learn more about how to explore and subscribe to data from our Data Observatory, please check our documentation.
We first authenticate to the CARTO account so to be able to access the CARTO Data Warehouse resources with the carto_auth
library, and then we use the Python client to explore your Data Observatory subscriptions and select variables of our interest. Finally, we perform an enrichment of a sample dataset with one of our subscriptions.
Authentication to CARTO
We start by using the carto_auth
package to authenticate to our CARTO account and to get the necessary details to interact with data available in the CARTO Data Warehouse. Note that the CARTO Data Warehouse is based on Google BigQuery, so we will be using that platform for storing and computing on the data. This also means that we will be levarging the implementation of the Analytics Toolbox for BigQuery.
Listing our Data Observatory subscriptions and exploring their metadata
We first retrieve a list of all our subscriptions as a pandas dataframe in order to explore what datasets from the Data Observatory we have available. For more details about how to use the following SQL functions, please refer to the Analytics Toolbox documentation.
To understand how the Data Observatory structures the datasets, we recommend you to read the Terminology section of the Data Observatory documentation.
Let’s take a look at what subscriptions we have that are specifically for the “United States”.
After exploring all the available datasets and their metadata, we decide to pick for this example the “Population” dataset from Worldpop and explore what variables it contains. For that we use the dataset_slug
.
Accessing and exporting data from a Data Observatory subscription
Once we have explored the available variables, and we know what we want to do with the data; we can use the Python client for the CARTO Data Warehouse connection and use any available function from the Analytics Toolbox. Additionally, we can export the data to geodataframe or event into local files in csv or parquet for example.
In this example, we will retrieve the population
variable for a 10 km buffer around Atlanta.
We will need the IDs of both data tables and geography tables of the specific subscription we want to work with.
Now that we have our data of interest in a dataframe, we can also save it in our local machine in several formats.
Enriching data with a Data Observatory subscription
The retail_stores
is a dataset with information about revenue and size of retail stores in USA which can be found by default as demo data in your CARTO Data Warehouse. We are going to enrich this table with the population variable from the previous example (slug_id: population_e3a78133
), based on the population reported by Worldpop in the location of each retail store.
We define an output table where the enriched data will be placed, also within the CARTO Data Warehouse. Later we use the pydeck-carto package to visualize the results, rendering directly from the table in the data warehouse.
To learn more about the Data Enrichment functions please check the relevant section of the SQL Reference of the Analytics Toolbox. There is also additional information about the Enrichment workflow with the Analytivccs Toolbox.
Last updated