Databricks
Last updated
Last updated
CARTO connects to Databricks to push down SQL queries that will be executed through your Databricks SQL Warehouse or create and run jobs on your Databricks all-purspose compute cluster.
Currently, the level of support varies for different geospatial data formats:
Maps * - Prepared tables
✅
✅
Maps * - SQL Queries
❌
✅
Workflows
✅
✅
(*) Maps in Builder, Data Explorer, Worflows previews and custom apps created with our developer Tools
For visualizing simple features (as WKB binaries), tables need to be prepared for visualization. In order to do so, your Databricks workspace needs to be enabled with Spatial SQL functions, which are currently in Private Preview.
The Databricks team has made this form available to request access to the functions. Please get in touch with them through the form to gain access to all Spatial SQL functions.
If you want to create a connection to Databricks, you need to select the Databricks connector in the New connection dialog. After you select the connector click the Setup connection button.
CARTO connections require Unity Catalog in your Databricks workspace.
Also Photon acceleration enabled in all-purpose compute clusters.
DBR 15.4 LTS is recommended. Minimum DBR required is 14.2.
These are the parameters you need to provide:
Name: A name to identify this connection across different CARTO interfaces.
Instance Name: Your Databricks instance hostname, for example dbc-xxxxx113-0000.cloud.databricks.com
or carto-data-science.cloud.databricks.com
. Learn more about this on the official Databricks documentation.
Token: A Databricks user token. The connection will inherit permission and access privileges of the user that generates the token.
Catalog: Pick from the list a catalog to be used with this connection
SQL Warehouse: Pick from the list a SQL Warehouse that CARTO will use to push down SQL queries. This is used to list resources, get data for widgets and fetch map tiles.
All-purpose compute: Pick from the list a cluster that will be used to create and run jobs. This is used to create and run Workflows
Once you have entered the parameters, you can click the Connect button. CARTO will try to connect to Databricks, and if everything is OK, your new connection will be registered.
Max number of concurrent queries: This setting controls the maximum number of simultaneous queries that CARTO will send to Databricks using this connection.
Max query timeout: This setting controls the maximum allowed duration of queries that CARTO runs in Databricks using this connection.
Connections to Databricks can be set up to require viewer credentials. This means that instead of using the credentials of the user that created the connection, each user will have to provide their own credentials to use it.
Click on "Permissions and Sharing > Manage options" to set your Databricks connection to require viewer credentials.
After the setup, other logged-in users (no matter what their role is) will see a prompt like this every time they want to use the connection. This happens when creating maps but also when previewing data in Data Explorer or when consuming maps as a viewer.
All Databricks users can obtain a token by clicking on User Settings > Access Tokens. Once inside, you can create a new token for CARTO, set the desired expiration and copy-paste it into CARTO.
The token can be removed from CARTO at any time from the connection panel or the map topbar, and a different one can be added afterwards.
The connection will require viewers to input their own credentials except if the map is public.
If you're using the cloud version of CARTO (SaaS), CARTO will connect to Databricks using a set of static IPs for each region. Check this guide to find the IPs you need to allow for your specific region.