Importing rasters

Raster data plays a critical role in geospatial applications, enabling the visualization and analysis of spatial patterns, trends, and relationships. CARTO provides end-to-end support for raster data, allowing users to store, analyze, and render raster datasets directly in their cloud data warehouses.

This documentation outlines the complete process of preparing raster data, ensuring it meets CARTO’s specifications, and importing it into supported cloud data warehouses, including Google BigQuery, Snowflake, and Databricks.

Preparing your raster data

Before importing raster data into your cloud data warehouse using CARTO, it must meet specific format, tiling, and projection requirements to ensure compatibility with CARTO.

Required specifications:

  • Format: Cloud Optimized GeoTIFF (COG)

  • Tiling Schema: Google Maps Tiling Schema

Setting up the environment

To process and prepare raster data efficiently, certain software dependencies must be installed, particularly Python and GDAL for geospatial data manipulation.

Check Python installation

Ensure that Python 3 is installed on your system. Run the following command to verify:

python3 --version

If Python is not installed, download and install it from Python.org.

Set up a virtual environment (Recommended)

Using a virtual environment helps manage dependencies and prevents conflicts with system-wide packages. Run the following commands to create and activate a virtual environment:

python3 -m venv carto_raster_env
source carto_raster_env/bin/activate  # For Linux/macOS
carto_raster_env\Scripts\activate     # For Windows

Install GDAL (Python bindings):

Once the virtual environment is activated, install GDAL using pip:

pip install GDAL

This will install the necessary GDAL bindings for Python, allowing you to manipulate and process raster data.

Inspecting raster metadata

Use gdalinfo to get information and metadata about your file that will be useful for debugging and preparation.

gdalinfo raster.tif

NODATA values

NODATA values represent missing or invalid data within a raster file which can be inspected using gdalinfo command. These values are automatically ignored by CARTO to ensure accurate analysis and visualization. Defining NODATA values properly is crucial, as they will not be displayed on the map and will be excluded from analytical queries.

gdal_translate -a_nodata 0 input.tif output_nodata.tif

Reprojecting raster data

It is advisable to reproject your raster to EPSG:4326 before converting it into a Cloud Optimized GeoTiff. This is easily done with gdalwarp

gdalwarp -wm 1024 -multi -s_srs EPSG:5070 -t_srs EPSG:4326 input.tif output_4326.tif

Generating a Cloud Optimized Geotiff

CARTO requires that raster files are in the Cloud Optimized GeoTIFF (COG) format. For this, use GDAL's gdalwarp tool to transform our raster to this projection using the Google Maps tiling scheme, as per below example:

gdalwarp -of COG \
-co TILING_SCHEME=GoogleMapsCompatible \
-co COMPRESS=DEFLATE -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=NO \
-co RESAMPLING=NEAREST \
raster_4326.tif raster_cog.tif

Resampling methods

Reprojecting a raster during the conversion to COG can introduce artifacts, especially when interpolating pixel values.

The RESAMPLING=NEAREST method is used to avoid these distortions by assigning the value of the nearest pixel rather than interpolating between multiple pixels. This method is particularly useful for categorical data (such as land cover classifications), where preserving exact pixel values is essential to maintain data integrity.

Other resampling methods exist, such as BILINEAR or CUBIC, but they are better suited for continuous data like elevation models or temperature maps, where smooth transitions between pixels are desirable.

Overviews

Raster overviews, also known as pyramids, are lower-resolution copies of the original raster data stored within the file. Overviews allow CARTO to display raster data efficiently at different zoom levels by loading a lower-resolution version of the raster when the user is zoomed out, reducing processing and loading times.

By including the -co OVERVIEWS=IGNORE_EXISTING option, you ensure that overviews are generated correctly, allowing CARTO to request the appropriate resolution dynamically based on the zoom level. Without overviews, CARTO would have to load the full-resolution raster even at distant zoom levels, leading to slow rendering performance.

Alpha band

In terms of adding an alpha band to your COG, -co ADD_ALPHA=NO is the safer general option. However, in some cases it's advisable to convert your NO_DATA values to an alpha band and use -co ADD_ALPHA=YES instead.

gdalwarp supports many other options when creating a COG. Take a look at the complete documentation of the COG driver to see all of them.


Importing raster data into your data warehouse

This section outlines the available methods for importing raster data into CARTO when using Google BigQuery, Snowflake and Databricks as your data warehouse. The method you choose will depend on the file size, the level of control required during the upload process and the data warehouse provider.

Available import methods:

  • Import interface: Best suited for smaller raster files (≤1GB) where advanced configuration is not necessary.

  • Raster Loader: Recommended for larger raster files or cases where more control is needed during the upload process.

Using CARTO import interface

Recommended for files smaller than 1GB. This is the most straightforward approach but has limitations in terms of file size and complexity. Currently, it is only supported for Google BigQuery and Snowflakenot supported for Databricks.

Using CARTO Raster Loader

The CARTO Raster Loader is a Python utility that can import a COG raster file to Google BigQuery, Snowflake and Databricks as a CARTO raster table.

Installation

The raster-loader library can be installed from pip like:

pip install raster-loader

Installation within a virtual environment is highly recommended.

python -m venv rasterenv
source rasterenv/bin/activate
pip install raster-loader

Authentication

Before uploading rasters to BigQuery, ensure you have the gcloud SDK installed. If not, install it from https://cloud.google.com/sdk/docs/install.

Then, authenticate with Google Cloud by executing the following command:

gcloud auth application-default login

For Snowflake and Databricks, authentication is performed during the uploading command itself.

Uploading raster data to BigQuery

Before you can upload a raster file, you need to have set up the following in BigQuery:

  • A GCP project

  • A BigQuery dataset

To use the bigquery utilities, use the carto bigquery command. Find a complete guide and reference at the Raster Loader documentation.

The basic command to upload a COG to BigQuery as a CARTO raster table is:

carto bigquery upload \
  --file_path /path/to/raster.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-table \
  --overwrite

Uploading raster data to Snowflake

Before you can upload a raster file, you need to have set up the following in Snowflake:

  • A Snowflake account

  • A Snowflake database

  • A Snowflake schema

To use the snowflake utilities, use the carto snowflake command. Find a complete guide and reference at the Raster Loader documentation.

The basic command to upload a COG to Snowflake as a CARTO raster table is:

carto snowflake upload \
  --file_path /path/to/raster.tif \
  --database my-database \
  --schema my-schema \
  --table my-table \
  --account my-snowflake-account \
  --username my-username \
  --password my-password \
  --overwrite

Uploading raster data to Databricks

Before you can upload a raster file, you need to have set up the following in Databricks:

  • A Databricks server hostname

  • A Databricks cluster id

  • A Databricks token

To use the databricks utilities, use the carto databricks command. Find a complete guide and reference at the Raster Loader documentation.

The basic command to upload a COG to Databricks as a CARTO raster table is:

carto databricks upload \
  --file_path /path/to/my/raster/file.tif \
  --catalog my-databricks-catalog \
  --schema my-databricks-schema \
  --table my-databricks-table \
  --server-hostname my-databricks-server-hostname \
  --cluster-id my-databricks-cluster-id \
  --token my-databricks-token \
  --overwrite

Advanced options

Options for raster bands

By default, Raster Loader will upload the first band in the raster file, but it's possible to specify a different band with a command like:

--band 2

Uploading multiple bands, with (optionally) custom names is supporting by concatenating both the bands to include and the label if required.

--band 1 \
--band 2 \
--band_name red \
--band_name green

Options for very large files

For large raster files, you can use the --chunk_size flag to specify the number of rows to upload at once. The default chunk size is 1000 rows.

For example, the following command uploads the raster in chunks of 2000 rows:

--chunk_size 2000

For large raster files, you also have the option to enable the --compress flag which enables compression of the band data using gzip compression which can significantly reduce storage size.

--compress

Analyzing and visualizing raster data

Once your raster data is stored in your cloud data warehouse, you can analyze and visualize it using the CARTO platform. Below are the available options:

For more details, visit the linked sections or explore CARTO’s documentation.

Last updated

Was this helpful?