Working with Raster data

Raster data plays a crucial role in many geospatial applications, providing a powerful tool for visualizing and analyzing spatial patterns, trends, and relationships.

CARTO offers comprehensive support for raster data, empowering users to harness its capabilities for a wide range of analytics and visualization tasks.

This documentation guide delves into the process of working with raster data in CARTO, guiding you through the steps of preparing, uploading, analyzing, and visualizing raster data to extract meaningful insights from your spatial data.

Prepare your raster data

Before you can leverage the power of raster data in CARTO, it's essential to prepare your raster files for optimal performance and compatibility with the platform's raster table format. This section outlines the key considerations and steps involved in preparing your raster data for seamless integration with CARTO.

We'll elaborate on topics like resampling methods, specifying a nodata value, determining an appropriate block size, creating overviews for efficient access, and converting to Cloud-Optimized GeoTIFF (COG) format to enhance performance and scalability.

By following these guidelines, you'll ensure that your raster data is properly formatted and optimized for seamless integration with CARTO's raster capabilities.

For the following steps, we'll be using different GDAL tools that will allow us to prepare our raster to be imported in BigQuery.

Get information about the raster file

Use gdalinfo to get information and metadata about your file that will be useful for debugging and preparation:

$ gdalinfo -stats your_raster_file.tif 
Driver: GTiff/GeoTIFF
Files: your_raster_file.tif
Size is 176783, 66440
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-119.278749999991476,45.829305555550739)
Pixel Size = (0.000277777777778,-0.000277777777778)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-119.2787500,  45.8293056) (119d16'43.50"W, 45d49'45.50"N)
Lower Left  (-119.2787500,  27.3737500) (119d16'43.50"W, 27d22'25.50"N)
Upper Right ( -70.1723611,  45.8293056) ( 70d10'20.50"W, 45d49'45.50"N)
Lower Right ( -70.1723611,  27.3737500) ( 70d10'20.50"W, 27d22'25.50"N)
Center      ( -94.7255556,  36.6015278) ( 94d43'32.00"W, 36d36' 5.50"N)
Band 1 Block=176783x1 Type=Int16, ColorInterp=Gray
  Minimum=-32767.000, Maximum=1000.000, Mean=-0.026, StdDev=29.425
  Metadata:
    STATISTICS_MAXIMUM=1000
    STATISTICS_MEAN=-0.026405042072327
    STATISTICS_MINIMUM=-32767
    STATISTICS_STDDEV=29.425488249582
    STATISTICS_VALID_PERCENT=100

NODATA values

From the information above, we can observe a few things that might be wrong with our file. For example, the minimum value is -32767 which is the larger minimum number that can be represented with the Int16 type. This value is often used as NODATA value in rasters. Other usual NODATA values that might be distorting your data are -9999, -999, or -3.4e38.

Setting the correct NODATA value is a needed step in order to have our raster table correctly generated:

$ gdal_translate -a_nodata -32767 -ot Int16 -stats raster_file.tif raster_file_nodata_ok.tif

After this, checking our new raster_file_nodata_ok.tif stats should produce the correct output:

$ gdalinfo -stats raster_file_nodata_ok.tif
Band 1 Block=145x28 Type=Int16, ColorInterp=Gray
  Minimum=0.000, Maximum=219.000, Mean=0.664, StdDev=8.977
  NoData Value=-32767

Reproject your raster to EPSG:4326

In some cases, it is advisable to reproject your raster to EPSG:4326 before converting it into a Cloud Optimized GeoTiff. This is easily done with gdalwarp

gdalwarp -s_srs EPSG:3857 -t_srs EPSG:4326 raster_file_nodata_ok.tif raster_4326.tif

Conversion to Cloud Optimized GeoTIFF

CARTO requires that raster files are in the Cloud Optimized GeoTIFF (COG) format. We will use GDAL's gdalwarp tool to transform our raster to this projection using the Google Maps tiling scheme:

$ gdalwarp -of COG \
  -co TILING_SCHEME=GoogleMapsCompatible \
  -co COMPRESS=DEFLATE -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=NO \
  -co RESAMPLING=NEAREST \
  raster_4326.tif raster_cog.tif

Resampling methods

Sometimes, the transformation introduces alterations our our data. For example, different resampling methods might produce different stats in our transformed raster. In this case, using the RESAMPLING=NEAREST option produced a COG with the same stats as the original file:

$ gdalinfo -stats raster_cog.tif
Band 1 Block=512x512 Type=Int16, ColorInterp=Gray
  Minimum=0.000, Maximum=1000.000, Mean=0.000, StdDev=0.073
  NoData Value=-32767

Overviews

CARTO supports overviews by storing different resolutions on of the raster in the same table and making Maps API request them depending on the zoom level of the map. To ensure that they're generated when creating our COG, use the -co OVERVIEWS=IGNORE_EXISTING option.

Alpha band

In terms of adding an alpha band to your COG, -co ADD_ALPHA=NO is the safer general option. However, in some cases it's advisable to convert your NO_DATA values to an alpha band and use -co ADD_ALPHA=YES instead.

gdalwarp supports many other options when creating a COG. Take a look at the complete documentation of the COG driver to see all of them.

Upload your raster file

Once your raster data has been prepared for upload, you can utilize CARTO's Raster Loader utility to seamlessly integrate your raster data into the CARTO platform. This section guides you through the process of installing the Raster Loader, authenticating with your CARTO account, and executing the raster upload commands to effectively store your raster data in BigQuery as a table in the CARTO raster format.

Installation of Raster Loader

Raster Loader is a Python utility that can upload a COG raster file to BigQuery as a CARTO raster table.

The raster-loader library can be installed from pip like:

pip install raster-loader

However, it's much advisable to use a virtual environment:

python -m venv rasterenv
source rasterenv/bin/activate
pip install raster-loader

Find more exhaustive information about the installation of Raster Loader in the documentation.

Authentication

In order to create raster tables in BigQuery using Raster Loader, you will need to be authenticated in Google Cloud. Run this command:

gcloud auth application-default login

Execution

Find a complete guide and reference at the Raster Loader documentation.

The basic command to upload a COG to BigQuery as a CARTO raster table is:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset

You could also add some options, like the resulting table name or overwrite/append existing tables with --overwrite or --append options, for example:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --overwrite

Options for raster bands

By default, Raster Loader will upload the first band in the raster file, but it's possible to specify a different band with a command like:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --band 2

We could also assign a name to the band, like:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --band 2 \
  --band_name red

Uploading multiple bands, with (optionally) custom names would look like:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --band 1 \
  --band 2 \
  --band_name red \
  --band_name green

In this case, the order of the bands and their names is determinant: the first band will receive the first name; the second band will get the second name, etc.

Options for very large files

For large raster files, you can use the --chunk_size flag to specify the number of rows to upload at once, and preventing BigQuery from showing you an exception like the following, due to excessive operations in the destination table:

Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

The default chunk size is 1000 rows.

For example, the following command uploads the raster in chunks of 2000 rows:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --chunk_size 2000

Analysis with raster tables using the CARTO Analytics Toolbox

Once your raster data is stored as a BigQuery table, you can leverage the power of CARTO's Analytics Toolbox to perform advanced spatial analyses involving raster and vector data sources.

This section covers some key use cases for analyzing raster data, demonstrating how to extract raster values, perform spatial intersections, and aggregate raster data values using SQL and the procedures provided by the raster module in the CARTO Analytics Toolbox.

Check the complete SQL reference for the raster module.

Get values from the pixels that intersect with a specific point

CALL `carto-un`.carto.RASTER_VALUES(
    'my-project.my-dataset.my-raster-table',
    '''SELECT ST_GEOGPOINT(-6.61403, 37.47571) AS geom'''
    'band_1',
    NULL,
    NULL
);
-- The table `my-project.my-dataset.my-output-table` will be created
-- with columns: pixel, band_1 (one pixel intersecting the point)

Get values from the pixels that intersect with points from a table

CALL `carto-un`.carto.RASTER_VALUES(
    'my-project.my-dataset.my-raster-table',
    '''
    SELECT id, point AS geom
    FROM `my-project.my-dataset.my-point-table`
    ''',
    '''
    band_1 AS alias_1,
    band_3 AS alias_2
    ''',
    'my-project.my-dataset.my-output-table',
    NULL
);
-- The table `my-project.my-dataset.my-output-table` will be created
-- with columns: id, pixel, alias_1, alias_2 (one pixel intersecting each point)

Get a calculated value from different bands for each pixel that intersects with points on table

CALL `carto-un`.carto.RASTER_VALUES(
    'my-project.my-dataset.my-raster-table',
    '''
    SELECT id, point AS geom
    FROM `my-project.my-dataset.my-point-table`
    ''',
    '''
    band_1 as nir,
    band_2 as red,
    (band_1-band_2)/(band_1+band_2) as nvdi
    ''',
    'my-project.my-dataset.my-output-table',
    NULL
);
-- The table `my-project.my-dataset.my-output-table` will be created
-- with columns: pixel, nir, red, nvdi

Get all pixels that intersect with a polygon and their band's values

CALL `carto-un`.carto.RASTER_VALUES(
    'my-project.my-dataset.my-raster-table',
    ST_GEOGFROMTEXT('''
        POLYGON((
            0.14538491988787428 -0.19997006321490718,
            0.1595469834865071 -0.22803650229192843,
            0.14881814742693678 -0.20391825036028427,
            0.1734515550197102 -0.22898063235646787,
            0.15165056014666334 -0.19885427102265768,
            0.16169275069842115 -0.21413203430647013,
            0.14538491988787428 -0.19997006321490718
        ))
    '''),
    'band_1, band_2',
    'my-project.my-dataset.my-output-table',
    NULL
);
-- The table `my-project.my-dataset.my-output-table` will be created
-- with columns: pixel, band_1, band_2 (all the pixels intersecting the polygon)

Intersect with a polygon table and get aggregated data

CALL `carto-un`.carto.RASTER_AGG_VALUES(
    'my-project.my-dataset.my-raster-table',
    'my-project.my-dataset.my-geom-table', -- columns: name, geom
    '''
    AVG(band_1) AS band_1_avg,
    MAX(band_2) AS band_2_max,
    SUM(band_1 + band_2) AS total_sum,
    COUNT(band_1) AS count
    ''',
    'my-project.my-dataset.my-output-table',
    NULL
);
-- The table `my-project.my-dataset.my-output-table` will be created
-- with columns: name, band_1_avg, band_2_max, total_sum, count
-- (aggregation of all the pixels intersecting each polygon)

Raster visualization with deck.gl

To effectively visualize and gain insights from your raster data, we'll explore the integration of CARTO's Maps API with deck.gl.

This section delves into the process of creating an interactive raster visualization using deck.gl, showcasing how to load a layer that displays raster data extracted from a CARTO raster table.

Find the complete reference in deck.gl's CARTO module documentation.

The following snippet shows how to load a layer coming from a CARTO raster table in BigQuery:

import DeckGL from '@deck.gl/react';
import {CartoLayer, MAP_TYPES} from '@deck.gl/carto';

function App({viewState}) {
  const layer = new CartoLayer({
    type: MAP_TYPES.RASTER,
    connection: 'bigquery',
    data: 'project.dataset.pv_out_spain',
    getFillColor: d => {
      const {band_1} = d.properties;
      return [(band_1 - 4.2) * 425, 0, 0];
    }
  })

  return <DeckGL viewState={viewState} layers={[layer]} />;
}

The most relevant part is the return expression within the getFillColor property:

return [(band_1 - 4.2) * 425, 0, 0];

getFillColor expects an array of numbers that range from 0 to 255, to specify an RGBa color, following the format [r, g, b, [a]].

This provides a lot of flexibility when it comes to visualizing values from your raster, but you should make sure that the values will range from 0 to 255.

In this example, we're visualizing photovoltaic potential in Spain. The values of the band_1 are normalized so they range between 0-255, and the value is passed to the red channel, so the brighter the red, the higher the photovoltaic potential.

Last updated