Databricks

CARTO can connect to Databricks to push down SQL queries that will be executed through your Databricks SQL Warehouse or create and run jobs on your Databricks cluster.

Currently, Databricks connections are supported in Builder to load tilesets and H3 data sources. Out of the box support for Point, Line and Polygon tables is coming later in 2024.

Support for Databricks connections in Workflow is also coming in 2024.

If you want to create a connection to Databricks, you need to select the Databricks connector in the New connection dialog. After you select the connector click the Setup connection button.

These are the parameters you need to provide:

  • Name: A name to identify this connection across different CARTO interfaces.

  • Instance Name: Your Databricks instance hostname, for example dbc-xxxxx113-0000.cloud.databricks.com or carto-data-science.cloud.databricks.com. Learn more about this on the official Databricks documentation.

  • Token: A Databricks user token. The connection will inherit permission and access privileges of the user that generates the token.

  • Catalog: Pick from the list a catalog to be used with this connection

  • SQL Warehouse: Pick from the list a SQL Warehouse that CARTO will use to push down SQL queries. This is used to list resources, get data for widgets and fetch map tiles.

  • All-purpose compute: Pick from the list a cluster that will be used to create and run jobs. This is used to create and run Workflows

CARTO connections require Unity Catalog in your Databricks workspace.

Once you have entered the parameters, you can click the Connect button. CARTO will try to connect to Databricks, and if everything is OK, your new connection will be registered.

Advanced Options

  • Max number of concurrent queries: This setting controls the maximum number of simultaneous queries that CARTO will send to BigQuery using this connection.

  • Max query timeout: This setting controls the maximum allowed duration of queries that CARTO runs in BigQuery using this connection.

Require Viewer Credentials

Connections to Databricks can be set up to require viewer credentials. This means that instead of using the credentials (token) of the user that created the connection, each user will have to provide their own credentials to use it.

Click on "Permissions and Sharing > Manage options" to set your Databricks connection to require viewer credentials.

After the setup, other logged-in users (no matter what their role is) will see a prompt like this every time they want to use the connection. This happens when creating maps but also when previewing data in Data Explorer or when consuming maps as a viewer.

All Databricks users can obtain a token by clicking on User Settings > Access Tokens. Once inside, you can create a new token for CARTO, set the desired expiration and copy-paste it into CARTO.

The token can be removed from CARTO at any time from the connection panel or the map topbar, and a different one can be added afterwards.

The connection will require viewers to input their own credentials except if the map is public.

IP Whitelisting

If you're using the cloud version of CARTO (SaaS), CARTO will connect to Databricks using a set of static IPs for each region. Check this guide to find the IPs you need to allow for your specific region.

Last updated