How to access CARTO’s Analytics Toolbox for Snowflake and create visualizations via Python notebooks
This notebook guides the user through the process for connecting to both CARTO and Snowflake accounts and leverage CARTO’s Analytics Toolbox and CARTO’s integration with Pydeck to be able to perform spatial analytics at scale and create map visualizations from Python notebooks. You can find the original notebook here.
The outline of this notebooks is as follows:
Authentication to CARTO: to be able to use ‘CartoLayer’ in Pydeck;
Authentication to Snowflake (credentials that have access to the database connected to CARTO with the Analytics Toolbox installed)
Operations and analysis using Snowpark Python connector and CARTO’s Analytics Toolbox
Map visualizations with CARTO and Pydeck
NOTE: snowflake-snowpark-python is only compatible with python >= 3.8, so be sure to run the notebook in an appropriate environment
Downloading data from Snowflake into a Python dataframe
“Crossfit” is a gym chain located in California. We will be running a location analysis of “Crossfit” venues vs its competitors.
We use the h3 module in CARTO’s Analytics Toolbox for Snowflake to compute the H3 cell of each gym in the “Crossfit” and “Competition” tables, we then join them by h3 id and download the data.
q ="""WITH crossfit_count AS (SELECT CARTO_DEV_DATA.carto.H3_FROMGEOGPOINT(geom, 5) h3, COUNT(*) crossfit_gymsFROM SFDATABASE.CARTO.GYMS_CA_CROSSFITGROUP BY h3),competition_count AS (SELECT CARTO_DEV_DATA.carto.H3_FROMGEOGPOINT(geom, 5) h3, COUNT(*) competition_gymsFROM SFDATABASE.CARTO.GYMS_CA_COMPETITIONGROUP BY h3)SELECT coalesce(a.h3,b.h3) h3, crossfit_gyms, competition_gyms, CARTO_DEV_DATA.carto.H3_BOUNDARY(coalesce(a.h3,b.h3)) geom
FROM crossfit_count a FULL OUTER JOIN competition_count b ON a.h3 = b.h3"""
gyms_df = sf_client.sql(q).to_pandas()
We can export directly the output of a query as a pandas dataframe. The geometry column is downloaded as geojson text
gyms_df.head()
# converts from geojson string to polygontext_to_geom =lambdat : shape(json.loads(t))gyms_df["GEOM"]= gyms_df.GEOM.apply(text_to_geom)gyms_df = gyms_df.fillna(0)gyms_df.head()
Uploading a dataframe back to Snowflake
We transform our current dataframe, and we upload it back into our Snowflake database
total_gyms = gyms_df.drop(columns = ["GEOM"])total_gyms["TOTAL_GYMS"]= gyms_df.CROSSFIT_GYMS + gyms_df.COMPETITION_GYMS# We go from pandas DF to Snowflake DF. This creates a temp table with the data, which will be dropped at the end of the session.
snowflake_df = sf_client.create_dataframe(total_gyms)# We persist such table.snowflake_df.write.save_as_table("SFDATABASE.CARTO.GYMS_CA_TOTAL_CENTROID", mode ="overwrite")
Visualizing data in Snowflake with the pydeck-carto library
Here we visualize the uploaded data in two layers, using the new styling functions and the Analytics Toolbox installed in SF.
hexagons: renders the h3 cells with a colour continuos style representing the dominance ratio of crossfit gyms vs total number of gyms
points: plots the location of the gyms, with a color category style representing the gym type (crossfit gyms vs competition gyms)
# Note that the attribute name must be cased when passed to the styling functions even though in the query is uncased.# This is because column names in SF tables are always with capital letters# Snowflake# Register CartoLayer in pydeckpdkc.register_carto_layer()hexagons_query ="""SELECT CARTO_DEV_DATA.carto.H3_BOUNDARY("H3") H3_GEOM, CROSSFIT_GYMS / TOTAL_GYMS AS dominance_ratio FROM SFDATABASE.CARTO.GYMS_CA_TOTAL_CENTROID """credentials = pdkc.get_layer_credentials(carto_auth)hexagons = pdk.Layer("CartoLayer", data = hexagons_query, geo_column=pdk.types.String("H3_GEOM"), type_=pdkc.MapType.QUERY, connection=pdk.types.String("snowflake"), credentials=credentials, opacity=0.2, stroked=True, get_fill_color=pdkc.styles.color_continuous("DOMINANCE_RATIO", [x/10for x inrange(10)], colors ="Tropic"), get_line_color=[0,42,42], line_width_min_pixels=2 )points_query ="""SELECT GEOM, 'crossfit' AS CATEGORYFROM SFDATABASE.CARTO.GYMS_CA_CROSSFITUNION ALLSELECT GEOM, 'competitors' AS CATEGORYFROM SFDATABASE.CARTO.GYMS_CA_COMPETITION"""points = pdk.Layer("CartoLayer", data = points_query, geo_column=pdk.types.String("GEOM"), type_=pdkc.MapType.QUERY, connection=pdk.types.String("snowflake"), credentials=credentials, opacity=0.8, stroked=True, pickable=True, point_radius_min_pixels=2, get_fill_color=pdkc.styles.color_categories("CATEGORY", ["competitors", "crossfit"], colors ="Tropic") )view_state = pdk.ViewState(latitude=33.64, longitude=-117.94, zoom=5)r = pdk.Deck( [hexagons, points], initial_view_state=view_state, map_style=pdk.map_styles.LIGHT,)r.to_html(iframe_height =700)