Analytics Toolbox for Databricks

Analytics Toolbox for Databricks

Installation

This guide explains all the steps to install the Analytics Toolbox in your Databricks enviroment.

The CARTO Analytics Toolbox contains two packages:

  • core: this is the public and open-source package.
  • advanced: this is a premium package. It contains the Tiler module, that allows to process and visualize very large spatial datasets stored in Databricks.

To install the core package of the Analytics Toolbox in your Databricks cluster, follow the instructions below on your Databricks workspace UI:

  • Click on Compute
  • Select the cluster where you want to install the Analytics Toolbox
  • Open the Libraries tab
  • Click on Install new
  • Select Maven as Library Source
  • Click on Search Packages, select Maven Central and look for carto.analyticstoolbox; select the latest version of the one that their “Artifact Id” start as “core_” (the other one is a dependence that this one install under the hood, you do not need to install the package that their “Artefact Id” start with “hiveless”).
  • Click on Select
Install CARTO Analytics Toolbox in your cluster
  • Click Install to finish the process. Dependencies of the package will be installed transitively

Once the package is installed, you need to create the SQL UDFs functions in your cluster, open a SQL console and run this script:

SQL UDFs functions in your cluster

As mentioned in the note above, not qualified function names will install them in your Databrick’s default database. If you need your UDF’s in a different database, you will need to qualify the function name, for example:

1
CREATE OR REPLACE FUNCTION your_db.st_area as 'com.carto.analyticstoolbox.core.ST_Area';

Take this into account when creating a Databricks connection in your CARTO Workspace, as it will require to have the UDF’s installed in the database that you use for the connection.

Connection parameters

In order to leverage the spatial functionality provided by the Analytics Toolbox, you need to create a Databricks connection in your CARTO account. See this section of the user manual to get more information about creating connections.

The connection parameters need to be obtained from the Databricks workspace UI:

Click on Compute. Select your cluster and see the Advanced options. Open the JDBC/ODBC tab to find the following parameters:

  • Server Hostname. i.e: adb-XXXXXXXXXXXXXXXX.X.azuredatabricks.net
  • Port. i.e.: 443
  • HTTP Path. i.e.: sql/protocolv1/o/XXXXXXXXXXXXXXXX/0000-0000000-aaaaaaaaa

To get a Token, click on Settings > User Settings.

  • Make sure you are on the Access Tokens tab and click on Generate New Token.
  • Give the token a name and set the lifetime for it.
  • Click on Generate and you will have the option to copy your token. That is the only time you will be able to see it.