Databricks

CARTO can connect to your Databricks Data Warehouse, allowing you to use your data for building Maps, Workflows and custom applications. There are three ways to set up a Databricks connection.

Recommended methods:

OAuth (U2M): Users authenticate into Databricks using their individual Databricks credentials, generating an access token for each user. This is the recommended setup, but it needs to be configured by an Admin first.
OAuth (M2M): Users authorize unattended access to Databricks resources with a service principal. This method is ideal when developing applications or using service accounts.

Other methods:

Personal Access Token (PAT): Connect to Databricks using a Personal Access Token. This method provides a straightforward setup without requiring an OAuth configuration, but the methods above represent a more secure strategy for production environments.

Databricks strongly recommends using OAuth over Personal Access Tokens. OAuth tokens are automatically refreshed by default and do not require the direct management of the access token, improving your security against token hijacking and unwanted access.

CARTO is a fully cloud-native platform that runs queries on your behalf to power maps, workflows, etc. We never create or maintain any copies of your data.

What it means to be fully cloud native.

Once connected to your Databricks account, CARTO will push SQL queries that will be executed through your Databricks SQL Warehouse or create and run jobs on your Databricks All-purspose compute cluster. Currently, the level of support varies for different geospatial data formats:

Simple features (as WKB binaries)

H3 indexes

Maps * - Prepared tables

✅

Maps * - SQL Queries

❌

✅

Workflows

✅

(*) Maps in Builder, Data Explorer, Worflows previews and custom apps created with our developer Tools

Setup requirements

These requirements apply regardless of the authentication method used:

CARTO connections require Unity Catalog in your Databricks workspace.
Photon acceleration must be enabled in the All-purpose compute clusters that you use with CARTO.
Databricks Runtime Release (DBR) 15.4 LTS is recommended. Minimum DBR required is 14.2.

For visualizing simple features (as WKB binaries), tables need to be prepared for visualization. In order to do so, your Databricks workspace needs to be enabled with Spatial SQL functions, which are currently in Private Preview.

The Databricks team has made this form available to request access to the functions. Please get in touch with them through the form to gain access to all Spatial SQL functions.

Connecting to Databricks using OAuth (U2M)

CARTO can connect to Databricks with OAuth user-to-machine (U2M) for interactive access to Databricks resources.

As a prerequisite, an organization Admin needs to create a Databricks OAuth integration first. Once this is done, Databricks OAuth U2M connections will be available to all users within the organization. Read more about setting up a Databricks OAuth (U2M) integration.

Head to the Connections settings from the side menu, click on Databricks and select Setup connection with OAuth U2M, which will open the new connection form. You will be taken to your Databricks login page where you will have to authenticate with your own personal credentials.

Once authenticated, you will be redirected to the new connection form. These are the fields you need to provide:

Name: The name for the connection you're creating.
Catalog: The Unity Catalog to use with this connection. This should be the catalog your Service Principal has access to and contains the data you want to use in CARTO.
SQL Warehouse: The SQL Warehouse that will be used to execute SQL queries. This is used to list resources, get data for widgets and fetch map tiles.
All-purpose compute: The All-purpose compute cluster that will be used to create and run jobs. This is used to create and run Workflows.

Make sure that you follow the setup requirements when creating a new Databricks connection.

Connecting to Databricks using OAuth (M2M)

CARTO can connect to Databricks with OAuth machine-to-machine (M2M), which provides unattended access to your resources.

As a prerequisite, a Service Principal must be created in Databricks, as well as an OAuth Secret for that Service Principal. For detailed steps on how to do this, please follow Databricks' official guide:

Authorize unattended access to Databricks resources with a service principal using OAuth

Once you have created the Service Principal and its OAuth Secret, head to the Connections settings from the side menu, click on Databricks and select Setup connection with OAuth M2M. This will open the new connection form:

These are the fields you need to provide:

Name: The name for the connection you're creating.
Host: Your Databricks instance name host, for example dbc-xxxxx113-0000.cloud.databricks.com or carto-data-science.cloud.databricks.com (without https://). Learn more about this on the official Databricks documentation.
Service Principal Client ID: The Client ID of the Service Principal you wish to use.
Service Principal Secret: The OAuth secret of the Service Principal you wish to use.
Catalog: The Unity Catalog to use with this connection. This should be the catalog your Service Principal has access to and contains the data you want to use in CARTO.
SQL Warehouse: The SQL Warehouse that will be used to execute SQL queries. This is used to list resources, get data for widgets and fetch map tiles.
All-purpose compute: The All-purpose compute cluster that will be used to create and run jobs. This is used to create and run Workflows.

Make sure that you follow the setup requirements when creating a new Databricks connection.

Connecting to Databricks using Personal Access Tokens (PATs)

CARTO can connect to Databricks using a Personal Access Token, which provides access to resources at the Databricks Workspace level. To create a new token, follow the steps on Databrick's official documentation:

Databricks personal access token authentication

Then head to the Connections settings from the side menu, click on Databricks and select Connect using personal access token. This will open the new connection form:

These are the fields you need to provide:

Name: The name for the connection you're creating.
Host: Your Databricks instance name host, for example dbc-xxxxx113-0000.cloud.databricks.com or carto-data-science.cloud.databricks.com (without https://). Learn more about this on the official Databricks documentation.
Token: The Personal Access Token. The connection will inherit permission and access privileges of the user that generates the token.
Catalog: The Unity Catalog to use with this connection.
SQL Warehouse: The SQL Warehouse that will be used to execute SQL queries. This is used to list resources, get data for widgets and fetch map tiles.
All-purpose compute: The All-purpose compute cluster that will be used to create and run jobs. This is used to create and run Workflows.

Make sure that you follow the setup requirements when creating a new Databricks connection.

Advanced options

Connections can be set up with these advanced options:

Data location

CARTO temp location: Location to store temporary tables used during workflow execution. These include intermediate tables with hashed names created by nodes in a workflow. By default, CARTO uses a carto_temp schema within a Unity Catalog database. For connections shared requiring Viewer Credentials, a carto_temp_<user> schema is created per user. Example: my_catalog.my_db.carto_temp
CARTO Workspace location: Location to store persistent objects related to workflows, such as API stored procedures and imported files. By default, CARTO uses a carto_workspace schema within the same database. For connections shared requiring Viewer Credentials, a carto_workspace_<user> schema is created per user. Example: my_catalog.my_db.carto_workspace

Query management options

Max number of concurrent queries: The maximum number of simultaneous queries that CARTO will send to Databricks in that connection.
Max query timeout: This sets the maximum allowed duration of queries that CARTO runs in Databricks in that connection.

CARTO for Developer options

Restrict this connection to only use Named Sources: When enabled, this connection will only work within apps that use Named Sources, and will NOT work in Data Explorer, Builder and Workflows. This prevents the usage of arbitrary SQL in applications for this connection.

Requiring viewer credentials on shared Databricks OAuth U2M connections

Databricks OAuth U2M connections can be set up to require viewer credentials. This means that instead of using the credentials of the user that created the connection, each user will have to provide their own credentials to use it.

To require viewer credentials on your Databricks OAuth U2M connection, head to the Connections section and click on Permissions and Sharing from the Connection card and then set the Share mode to Organization. By default, Viewer Credentials will be checked:

After the setup, other users (regardless of their role) will see a prompt like this every time they want to use the connection. This happens when creating/viewing maps that use that connection or when previewing the connection's data in the Data Explorer.

If a map is public, users won't be asked for their credentials even if the connection requires viewer credentials.

IP Whitelisting

If you're using the cloud version of CARTO (SaaS), CARTO will connect to Databricks using a set of static IPs for each region. Check this guide to find the IPs you need to allow for your specific region.

PreviousSnowflake NextAmazon Redshift

Last updated 2 months ago

Was this helpful?