Analytics Toolbox for Databricks

Analytics Toolbox for Databricks

Working with geospatial data in Databricks

Databricks doesn’t have native support for geospatial data types. That means not being able to store geospatial data as a geometry.

The CARTO Analytics Toolbox for Databricks provides geospatial capabilities through the functions it includes, but most of these functions expect geometry data as input, and return geometry data as output.

How can we make them work when there is not a geometry data type available? This guide will help you with that.

Storing geospatial data

CARTO Maps API can work directly with geospatial data represented as WKT or WKB strings.

That means you can preview your data in Data Explorer, load it in Builder to create maps and use it in your custom applications, if it’s stored as a text string in any of those formats.

Preview geospatial data in your Data Explorer

Spatial SQL with the Analytics Toolbox

As mentioned above, many the functions from the Analytics Toolbox expect a geometry data type as input. See the SQL Reference to get more detailed information on each function.

If the geometries in your tables or files are stored as a GeoJSON, WKT or WKB string, they need to be converted to the Geometry type to be used. The functions in the Geometry Constructors section can help with that.

For example, if we wanted to get the points that intersects with a specific bounding box, we can use this query:

SELECT * FROM points
WHERE st_Intersects(
  st_makeBBOX(-85.0, 30.0, -90.0, 40.0))

Since our points are stored as WKT strings in the geom column, we just need to create the geometry with st_geomfromWKT(geom).

Work with Spatial SQL in Builder

The map above shows a point layer and as an overlay, the intersection between that layer and the bounding box defined in the example query.