This guide explains all the steps to install the Analytics Toolbox in your Databricks enviroment.
The CARTO Analytics Toolbox contains two packages:
- core: this is the public and open-source package.
- advanced: this is a premium package. It contains the Tiler module, that allows to process and visualize very large spatial datasets stored in Databricks.
This guide explains how to install the core package. In order to access the advanced features, please contact [email protected].
To install the core package of the Analytics Toolbox in your Databricks cluster, follow the instructions below on your Databricks workspace UI:
- Click on Compute
- Select the cluster where you want to install the Analytics Toolbox
- Open the Libraries tab
- Click on Install new
- Select Maven as Library Source
- Click on Search Packages, select Maven Central and look for
carto.analyticstoolbox; select the latest version of the one that their “Artifact Id” start as “core_” (the other one is a dependence that this one install under the hood, you do not need to install the package that their “Artefact Id” start with “hiveless”).
- Click on Select
- Click Install to finish the process. Dependencies of the package will be installed transitively
Once the package is installed, you need to create the SQL UDFs functions in your cluster, open a SQL console and run this script:
Running the script above will install the functions in your Databrick’s
As mentioned in the note above, not qualified function names will install them in your Databrick’s
defaultdatabase. If you need your UDF’s in a different database, you will need to qualify the function name, for example:
CREATE OR REPLACE FUNCTION your_db.st_area as 'com.carto.analyticstoolbox.core.ST_Area';
- Make sure you are on the Access Tokens tab and click on Generate New Token.
- Give the token a name and set the lifetime for it.
- Click on Generate and you will have the option to copy your token. That is the only time you will be able to see it.
To get a Token, click on Settings > User Settings.
- Server Hostname. i.e:
- Port. i.e.:
- HTTP Path. i.e.:
Click on Compute. Select your cluster and see the Advanced options. Open the JDBC/ODBC tab to find the following parameters:
The connection parameters need to be obtained from the Databricks workspace UI:
In order to leverage the spatial functionality provided by the Analytics Toolbox, you need to create a Databricks connection in your CARTO account. See this section of the user manual to get more information about creating connections.
Take this into account when creating a Databricks connection in your CARTO Workspace, as it will require to have the UDF’s installed in the database that you use for the connection.