Standard (former Shared) cluster

Follow the instructions below to install the CARTO Analytics Toolbox for Databricks on a Shared cluster. In order to complete all steps, you will need:

A .jar package that will be installed in the cluster.
An carto_sql_init.sh that will be used as init script for the cluster.

Download the installation package

Upload your installer to a Databricks catalog

Unity catalog requires a 'Metastore Admin' to whitelist the JAR package and the init script so that they can be used for creating a cluster later. This is a one-off operation that needs to occur once per metastore per version of the Analytics Toolbox.

The installer files need to be uploaded to a Unity Catalog Volume location.

For this, go to 'Catalog' in your Databricks workspace UI, navigate to the desired location and click on 'Upload to this volume'.

Allow JAR and Init Script

Before creating a cluster, the JAR package and the Init Script need to be allowed by a user with 'Metastore Admin' privileges.

Go to the 'Catalog' section in your Databricks workspace and click on the settings button (the one shaped like a cog). Click on 'Metastore'.

Check the 'Allowed JARs/Init Scripts' tab:

Add your JAR package
- Type: JAR
- Source Type: Volume
- Source: Volume path of the JAR file
Add Init Script
- Type: Init Script
- Source Type: Volume
- Source: Volume path of the JAR file

After this, you can proceed to create the cluster using the whitelisted JAR and Init Script.

Create a cluster

In the Databricks workspace go to 'Compute' and make sure you are on the 'All-purpose compute' tab. Click on 'Create compute' button.

Different policies might hide some of the required settings, or have conflicts with them. If that's the case please get in touch with your Databricks admin so they can evaluate.

When creating the cluster, take the following into consideration:

Photon Acceleration needs to be enabled in the cluster
Recommended DBR is 15.4 LTS.
- Some DBR versions don't support Photon Acceleration. Make sure you select a DBR that allows enabling Photon.
Bear in mind that performance will vary with the cluster size.

Now, scroll down to Advance Options.

Due to different policy settings, you might find already some configurations in these sections. If that's the case, just add the above to the pre-existing configurations and variables.

Check the 'Spark' tab and enter the following:

Spark config

spark.sql.extensions com.carto.analytics.toolbox.sql.SparkExtension
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.apache.sedona.core.serde.SedonaKryoRegistrator

Environment variables

SCALAPY_PYTHON_LIBRARY=python3.11
SCALAPY_PYTHON_PROGRAMNAME=/databricks/python3/bin/python
CARTO_AT_LOCATION=/Volumes/path_to_AT_folder

The code above will be interpreted in a bash script. Please be careful when setting environment variables in your cluster and ensure there are no intermediate or trailing spaces after any variable. Even small formatting issues like this can cause errors that are difficult to debug.

You can get the path to your JAR file from the Workspace UI, by clicking on the options (three-dots button) of the file from the directory where it was uploaded:

Still within Advanced Options, check the 'Init Scripts' tab. And use the UI to locate and add the File path to the script provided with the installer package. Use 'Volumes' as source:

Click on 'Create compute' and wait for the process to finish.

🎉Congrats! we're done and you should be able to use the Analytics Toolbox functions and procedures from your Databricks notebooks.

PreviousPersonal (former Single User) cluster NextReference

Last updated 4 months ago

Was this helpful?