How to access CARTO's Analytics Toolbox for BigQuery and create visualizations via Python notebooks
This notebook guides the user through the process for connecting to CARTO account and leverage CARTO’s Analytics Toolbox and CARTO’s integration with Pydeck to be able to perform spatial analytics at scale and create map visualizations from Python notebooks. You can find the original notebook here.
The outline of this notebooks is as follows:
Authentication with BigQuery to access to our data and CARTO’s Analytics Toolbox functions and to be able to use ‘CartoLayer’ in Pydeck
Opeartions and anlysis using BigQuery’s SQL Client for Python
Map visualizations with CARTO and Pydeck
Install all necessary libraries
Please run the following commands to install pydeck-carto and all other required libraries.
!pip install pydeck-carto geopandas db_dtypes -q
import pydeck as pdkimport pydeck_carto as pdkcimport pandas as pdimport geopandas as gpdimport google.cloud.bigquery as bigqueryfrom carto_auth import CartoAuthfrom google.colab import auth
Authenticate to CARTO and BigQuery
In this step, we use the carto_auth package to authenticate to our CARTO account. We want to be authenticate to our Google account too so we can connect to our projects in Google BigQuery and be able to operate with our data tables using BigQuery’s SQL Client for Python notebooks.
And then, we can already list datasets, data objects, and even read tables by passing the “table id”. In short, we can work with the entire BigQuery client. In this example, we list all tables contained in a specific BigQuery dataset.
dataset_id ='bqcartodemos.sample_tables'tables =list(bq_client.list_tables(dataset_id) )# Make an API request.print("Tables contained in '{}':".format(dataset_id))for table in tables:print("{}.{}.{}".format(table.project, table.dataset_id, table.table_id))
And we can also show the properties from a table.
# Set table_id to the ID of the table model to fetch.table_id ='bqcartodemos.sample_tables.01_listings_la_2021_5_reviews'# Make an API request.table = bq_client.get_table(table_id)# View table propertiesprint(f"Got table {table_id}.")print(f"Table schema: {table.schema}")print(f"Table description: {table.description}")print(f"Table has {table.num_rows} rows")
Load data from BigQuery into a dataframe
Next, you can also load data available in BigQuery into a geodataframe in Python.
CARTO’s Analytics Toolbox is a suite of functions and procedures that run natively in SQL within the different supported data warehouse. This means, that the user can run functions from the Analytics Toolbox using the BigQuery SQL Client.
To be able to access the Analytics Toolbox here, you will need to first connect your BigQuery account to CARTO in the Connections section of the CARTO Workspace.
query_string_air =f"""WITH h3_airbnb AS ( SELECT `carto-un`.carto.H3_FROMGEOGPOINT(geom, 8) AS h3, * FROM bqcartodemos.sample_tables.01_listings_la_2021_5_reviews), aggregated_h3 AS ( SELECT h3, ROUND(AVG(price_num), 2) price, ROUND(AVG(review_scores_rating), 2) overall_rating, ROUND(AVG(review_scores_value), 2) value_vs_price_rating, ROUND(AVG(review_scores_cleanliness), 2) cleanliness_rating, ROUND(AVG(review_scores_location), 2) location_rating, COUNT(*) AS total_listings FROM h3_airbnb GROUP BY h3 HAVING COUNT(*) > 3)SELECT *FROM aggregated_h3"""gdf_air = bq_client.query(query_string_air).result().to_dataframe()gdf_air['overall_rating']= gdf_air['overall_rating'].astype('float')gdf_air['value_vs_price_rating']= gdf_air['value_vs_price_rating'].astype('float')gdf_air['cleanliness_rating']= gdf_air['cleanliness_rating'].astype('float')gdf_air['location_rating']= gdf_air['location_rating'].astype('float')gdf_air.head()
Write a result in a new table on BigQuery
Once you have the desired result, you might want to store it in a new table in your BigQuery account. Let’s see how to do it.