CARTO User Manual

CARTO User Manual

Tutorials

In this section we provide a set of tutorials that showcase how to get the most out of the CARTO platform, in particular through the Workspace and Builder interfaces that are detailed in this User Manual.

These tutorials leverage the demo data provided by default via the CARTO Data Warehouse connection, so you can start creating maps and exploring our tools from the very beginning.

The data you need for these tutorials are available and ready to be used in the Data Explorer and Builder and you can start using them with all other components of the CARTO platform.

Computing US airport connections and route interpolations

Generating connections

In this example we will showcase how easily we can compute all the paths that interconnect the main four US airports using the Analytics Toolbox.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
WITH data AS(
  SELECT 'SEA' AS abbrev, TO_GEOGRAPHY('POINT(-122.302289722924 47.4435819127259)') as geog UNION
  SELECT 'MIA', TO_GEOGRAPHY('POINT(-80.2789718277441 25.7949407212406)') UNION
  SELECT 'LAX', TO_GEOGRAPHY('POINT(-118.402468548522 33.9441742543586)') UNION
  SELECT 'JFK', TO_GEOGRAPHY('POINT(-73.7863268609295 40.6459595584081)')
)
SELECT carto.ST_GREATCIRCLE(t1.geog, t2.geog, 25) AS geom
FROM data AS t1
CROSS JOIN data AS t2
WHERE t1.abbrev != t2.abbrev

This query first creates all the possible combinations between airports and then generates the paths between them using the ST_GREATCIRCLE function. The resulting paths contain 25 points, but you can set the number of points in order to make the lines smoother if needed (you could also not include this parameter).

The result is displayed in this visualization. Notice that we are not using straight lines to interconnect the different airports, but great circles instead.

Routes interpolation

Now let’s put to the test how to perform line interpolations using the ST_LINE_INTERPOLATE_POINT function. In this example we will compute the airplane position after taking off from the different airports and travelling a certain distance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
WITH data AS(
  SELECT 'SEA' AS abbrev, TO_GEOGRAPHY('POINT(-122.302289722924 47.4435819127259)') as geog UNION
  SELECT 'MIA', TO_GEOGRAPHY('POINT(-80.2789718277441 25.7949407212406)') UNION
  SELECT 'LAX', TO_GEOGRAPHY('POINT(-118.402468548522 33.9441742543586)') UNION
  SELECT 'JFK', TO_GEOGRAPHY('POINT(-73.7863268609295 40.6459595584081)')
)
SELECT CONCAT(t1.abbrev, ' - ', t2.abbrev) as route,
  carto.ST_LINE_INTERPOLATE_POINT(carto.ST_GREATCIRCLE(t1.geog, t2.geog, 25), 500) AS geom
FROM data AS t1
CROSS JOIN data AS t2
WHERE t1.abbrev != t2.abbrev

This query uses the ST_LINE_INTERPOLATE_POINT function over each great circle in order to calculate the location of the plane after travelling 500 kilometers. In the following visualization you can see the resulting locations as well as their origin and destination airports.

New police stations based on Chicago crime location clusters

In this example we are going to use points clustering to analyze where to locate five new police stations in Chicago based on 5000 samples of crime locations.

Generating the clusters

First, we calculate crime locations clusters using the ST_CLUSTERKMEANS function:

𝅺
1
2
3
4
5
6
WITH clustered_points AS
(
    SELECT `carto-un`.carto.ST_CLUSTERKMEANS(ARRAY_AGG(ST_GEOGPOINT(longitude, latitude)), 5) AS cluster_arr
    FROM cartobq.docs.chicago_crime_sample
)
SELECT cluster_element.cluster, cluster_element.geom AS geom FROM clustered_points, UNNEST(cluster_arr) AS cluster_element

This query takes the crimes samples data, gathers their geometries in order to establish clusters and then groups the different geometry clusters. In the following visualization we can easily identify the different clusters by colors.

Calculating the clusters' centers

Once we have split the sample of points into clusters, we can easily work with them to calculate their centers, envelopes, concave/convex hulls and other different transformations. In this particular example we are interested in finding the center of the clusters, since that is where we are going to place the police stations. The Analytics Toolbox offers different functions for this task: ST_CENTERMEAN, ST_CENTERMEDIAN and ST_CENTEROFMASS. In this case let’s use ST_CENTERMEDIAN to calculate the location of the new police stations:

𝅺
1
2
3
4
5
6
WITH clustered_points AS
(
    SELECT `carto-un`.carto.ST_CLUSTERKMEANS(ARRAY_AGG(ST_GEOGPOINT(longitude, latitude)), 5) AS cluster_arr
    FROM cartobq.docs.chicago_crime_sample
)
SELECT cluster_element.cluster, `carto-un`.carto.ST_CENTERMEDIAN(ST_UNION_AGG(cluster_element.geom)) AS geom FROM clustered_points, UNNEST(cluster_arr) AS cluster_element GROUP BY cluster_element.cluster

We can see the result in the following visualization, where the bigger dots represent the police stations we have decided to open based on our analysis:

EU flag

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.

New supplier offices based on store locations clusters

In this example we are going to use points clustering to analyze where to locate 10 new supplier offices in US so they can best serve all Starbucks locations.

Generating the clusters

First, we calculate Starbucks locations clusters using the ST_CLUSTERKMEANS function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
WITH data AS(
  SELECT geog
  FROM sfcarto.public.starbucks_locations_usa
  WHERE geog IS NOT null
  ORDER BY id
),
clustered_points AS
(
    SELECT carto.ST_CLUSTERKMEANS(ARRAY_AGG(ST_ASGEOJSON(geog)::STRING), 10) AS cluster_arr
    FROM data
)
SELECT GET(VALUE, 'cluster') AS cluster, TO_GEOGRAPHY(GET(VALUE, 'geom')) AS geom
FROM clustered_points, lateral FLATTEN(input => cluster_arr)

This query gathers the geometries of the Starbucks locations in order to establish clusters and then groups the different geometry clusters. Each of them is represented in a different colour in the following visualization.

Calculating the clusters' centers

Once we have split the sample of points into clusters, we can easily work with them to calculate their centers, envelopes, concave/convex hulls and other different transformations. In this particular example we are interested in finding the center of the clusters, since that is where we are going to place the offices. The Analytics Toolbox offers different functions for this task, for example ST_CENTERMEAN, ST_CENTERMEDIAN and ST_CENTEROFMASS.

In this case we are going to use ST_CENTEROFMASS to calculate the location of the new offices:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
WITH data AS(
  SELECT geog
  FROM sfcarto.public.starbucks_locations_usa
  WHERE geog IS NOT null
  ORDER BY id
),
clustered_points AS
(
    SELECT carto.ST_CLUSTERKMEANS(ARRAY_AGG(ST_ASGEOJSON(geog)::STRING), 10) AS cluster_arr
    FROM data
)
SELECT GET(VALUE, 'cluster') AS cluster, carto.ST_CENTEROFMASS(ST_COLLECT(TO_GEOGRAPHY(GET(VALUE, 'geom')))) AS geom
FROM clustered_points, lateral FLATTEN(input => cluster_arr)
GROUP BY cluster

We can see the result in the following visualization, where the bigger dots represent the supplier offices locations we have decided to open based on our analysis.

Computing US airport connections and route interpolations

Generating connections

In this example we will showcase how easily we can compute all the paths that interconnect the main four US airports using the Analytics Toolbox.

𝅺
1
2
3
4
5
6
7
8
9
WITH data AS(
    SELECT *
    FROM `carto-do-public-data`.natural_earth.geography_glo_airports_410
	WHERE abbrev = 'JFK' OR abbrev = 'LAX'  OR abbrev = 'SEA'  OR abbrev = 'MIA'
)
SELECT `carto-un`.carto.ST_GREATCIRCLE(t1.geom, t2.geom, 25) AS geo
FROM data AS t1
CROSS JOIN data AS t2
WHERE t1.abbrev != t2.abbrev

This query first creates all the possible combinations between airports and then generates the paths between them using the ST_GREATCIRCLE function. The resulting paths contain 25 points, but you can set the number of points in order to make the lines smoother if needed.

The result is displayed in this visualization. Notice we are not using straight lines to interconnect the different airports, but great circles instead.

Routes interpolation

Now let’s put to the test how to perform line interpolations using the ST_LINE_INTERPOLATE_POINT function. In this example we will compute the airplane position after taking off from the different airports and travelling a certain distance.

𝅺
1
2
3
4
5
6
7
8
9
WITH data AS(
    SELECT *
    FROM `carto-do-public-data`.natural_earth.geography_glo_airports_410
	WHERE abbrev = 'JFK' OR abbrev = 'LAX'  OR abbrev = 'SEA'  OR abbrev = 'MIA'
)
SELECT CONCAT(t1.abbrev, " - ", t2.abbrev) as route, `carto-un`.carto.ST_LINE_INTERPOLATE_POINT(`carto-un`.carto.ST_GREATCIRCLE(t1.geom, t2.geom, 25), 500,'kilometers') AS geo
FROM data AS t1
CROSS JOIN data AS t2
WHERE t1.abbrev != t2.abbrev

This query uses the ST_LINE_INTERPOLATE_POINT function over each great circle in order to calculate the location of the plane after travelling 500 kilometers. In the following visualization you can see these locations as well as their origin and destination airports.

EU flag

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.

Identifying earthquake-prone areas in the state of California

In this example we are going to use some of the functions included in CARTO’s Analytics Toolbox in order to highlight zones prone to earthquakes, using a BigQuery public dataset.

First we define our region of interest, which in this case is a bounding box enclosing the state of California, using the function ST_MAKEENVELOPE. After filtering the earthquake locations with this bounding box, we compute the concave hull polygon enclosing the resulting points using the ST_CONCAVEHULL function. For visualization purposes, this polygon is smoothed out by means of the ST_BEZIERSPLINE function. Finally, we construct the polygon defining the earthquake-prone area using the ST_POLYGONIZE function.

𝅺
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
WITH bounds AS (
    SELECT `carto-un`.carto.ST_MAKEENVELOPE(-126.98746757203217, 31.72298737861544, -118.1856191911019, 40.871240645013735) AS bbox
),
data AS (
    SELECT ST_GEOGPOINT(longitude, latitude) AS points
    FROM `bigquery-public-data`.noaa_significant_earthquakes.earthquakes
    JOIN bounds
    ON ST_CONTAINS(bounds.bbox, ST_GEOGPOINT(longitude, latitude))
    WHERE longitude IS NOT NULL AND latitude IS NOT NULL
),
bezier_spline AS (
    SELECT `carto-un`.carto.ST_BEZIERSPLINE(
        ST_BOUNDARY(
        `carto-un`.carto.ST_CONCAVEHULL(ARRAY_AGG(points), 300, "kilometres")),
        null,
        0.9) AS geom
    FROM data
),
polygon_array AS (
    SELECT `carto-un`.carto.ST_POLYGONIZE(ARRAY_AGG(geom)) AS geom
    FROM bezier_spline
)
SELECT unnested FROM polygon_array, UNNEST(geom) AS unnested

EU flag

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.

Bikeshare stations within a San Francisco buffer

In this example we are going to showcase how easily we can compute buffers around geometries using the Analytics Toolbox.

Creating a buffer of a neighborhood

The following query creates a buffer with a radius of 50 meters around San Francisco’s Financial District neighborhood using the ST_BUFFER function. The number of steps could be modified in order to make the lines smoother if needed.

𝅺
1
SELECT `carto-un`.carto.ST_BUFFER(neighborhood_geom, 50, 'meters', 5) AS geo FROM `bigquery-public-data`.san_francisco_neighborhoods.boundaries WHERE neighborhood = "Financial District"

In this visualization you can see the Financial Disctrict (darker blue) and its buffer around it. Notice that buffers radius are not necesarily positive numbers. Negative numbers would generate a buffer in the interior of the district’s geomery.

Bikeshare stations within a buffer

Now let’s use the buffer as a way of defining a bigger region around the Financial District of San Francisco and filtering some other geometries.

𝅺
1
2
3
SELECT ST_GEOGPOINT(d2.longitude,d2.latitude) AS geo FROM `bigquery-public-data`.san_francisco_neighborhoods.boundaries d1,
`bigquery-public-data`.san_francisco.bikeshare_stations d2
WHERE d1.neighborhood = "Financial District" AND ST_CONTAINS(`carto-un`.carto.ST_BUFFER(d1.neighborhood_geom, 50, 'meters', 5), ST_GEOGPOINT(d2.longitude, d2.latitude))

This query uses the ST_BUFFER and ST_CONTAINS functions in order to filter those bikeshare stations that are contained inside the buffered geometry. The result is displayed below, where bikeshare stations are represented as yellow dots.

EU flag

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.