How to work with your data in the CARTO Data Warehouse
This notebook guides the user through the process for connecting to the CARTO account and leverage CARTO’s Analytics Toolbox and CARTO’s integration with Pydeck to be able to perform spatial analytics at scale and create map visualizations from Python notebooks. You can find the original notebook here.
The outline of this notebooks is as follows:
Authenticating with your CARTO account: to get access to the objects within the CARTO Data Warehouse, to run analysis functions from CARTO’s Analytics Toolbox, and to be able to use ‘CartoLayer’ in Pydeck for visualizing your data
Running data opeartions and anlyses using the Python client for the CARTO Data Warehouse
Creating map visualizations with CARTO and Pydeck
Install all necessary libraries
Please run the following commands to install the CARTO’s Python packages (pydeck-carto and carto-auth) and all other required libraries.
Note that in order to install the Python client to access the CARTO Data Warehouse you need to identify the extra parameter [carto-dw] when installing the carto-auth package. Note that this client is a wrapper to the Python client for Google BigQuery.
import pydeck as pdkimport pydeck_carto as pdkcimport pandas as pdimport geopandas as gpdfrom carto_auth import CartoAuto
Authenticate to CARTO
In this step, we use the carto_auth package to authenticate to our CARTO account and to get the necessary details to interact with data available in the CARTO Data Warehouse.
# Using the Oauth autentication methodcarto_auth = CartoAuth.from_oauth()# CARTO Data Warehouse clientcarto_dw_client = carto_auth.get_carto_dw_client()
#to take the dataset_id in CARTO DWdatasets =list(carto_dw_client.list_datasets())if datasets:print("Datasets in CARTO Data Warehouse:")for dataset in datasets:print("\t{}".format(dataset.dataset_id))else:print("CARTO Data Warehouse project does not contain any datasets.")
We can already list datasets, data objects, and even read tables by passing the “table id”. In this example we show how to get the properties from a table.
We can also list all tables contained in the “shared” dataset of the CARTO Data Warehouse.
#to list the tables contained in the dataset_id previously obtaineddataset_id ='shared'tables = carto_dw_client.list_tables('shared')# Make an API request.print("Tables contained in '{}':".format(dataset_id))for table in tables:print("{}.{}.{}".format(table.project, table.dataset_id, table.table_id))
Loading data from the CARTO Data Warehouse into a dataframe
Next, you can also load data available in the CARTO Data Warehouse into a geodataframe in Python.
CARTO’s Analytics Toolbox is a suite of functions and procedures that run natively in SQL within the different supported data warehouses. This means that the user can run functions from the Analytics Toolbox using the Python clients from BigQuery, Snowflake, Redshift and so on. In the case of the CARTO Data Warehouse the carto_dw_client provides a wrapper to the BigQuery client, so we will be leveragin the Analytics Toolbox for BigQuery implementation.
query_string_air =f"""WITH h3_airbnb AS ( SELECT `carto-un`.carto.H3_FROMGEOGPOINT(geom, 8) AS h3, * FROM shared.01_listings_la_2021_5_reviews), aggregated_h3 AS ( SELECT h3, ROUND(AVG(price_num), 2) price, ROUND(AVG(review_scores_rating), 2) overall_rating, ROUND(AVG(review_scores_value), 2) value_vs_price_rating, ROUND(AVG(review_scores_cleanliness), 2) cleanliness_rating, ROUND(AVG(review_scores_location), 2) location_rating, COUNT(*) AS total_listings FROM h3_airbnb GROUP BY h3 HAVING COUNT(*) > 3)SELECT *FROM aggregated_h3"""gdf_air = carto_dw_client.query(query_string_air).result().to_dataframe(create_bqstorage_client=False)gdf_air['overall_rating']= gdf_air['overall_rating'].astype('float')gdf_air['value_vs_price_rating']= gdf_air['value_vs_price_rating'].astype('float')gdf_air['cleanliness_rating']= gdf_air['cleanliness_rating'].astype('float')gdf_air['location_rating']= gdf_air['location_rating'].astype('float')gdf_air.head()
Uploading the result of our analysis as a new table in our data warehouse
Once you have the desired result, you might want to store it as a new table in your CARTO Data Warehouse.