Auditing SQL queries from CARTO

When you create a connection in CARTO, it will be used to push down SQL queries, which ultimately power your maps, workflows, applications and all the other data interactions in the platforms.

Each of these queries contains a fingerprint in the form of SQL comments that gives information about the connection, user, and feature that triggered such query. This information can be used to perform audits on performance, data governance, or other use cases.

CARTO SQL Audit Trail structure

Here's an example of a BigQuery SQL query with its audit trail comments:

/* CARTO/3.0 (GPN:CARTODB_Inc; ACCID:ac_bj3xw0in; USERID:auth0|685ab6bef3767efff1a98e98; CM:maps_api_compute_builder; connectionId:154438ab-84ed-41cb-b833-6b6de56c2a54; mapId:d4596571-611f-4a0a-a0ab-81dc2b27e9f5) */ 
          SELECT s.*
          FROM `my-project.my_dataset.osm_pois_usa` s
          WHERE geom IS NOT NULL
          LIMIT 1

If we break down the SQL Audit Trail, we can find the following information:

  • CARTO identifier (CARTO/3.0 ): Initial identifier to start the audit trail as generated by CARTO

  • CARTO partner name (GPN:CARTODB_Inc ): Partner name tag that tells Google/Snowflake/etc the partner, using their partner program structure, that was responsible for this query.

  • Organization ID (ACCID:ac_bj3xw0in ): Your own organization ID, as found in Settings.

  • User ID (USERID:auth0|685ab6bef3767efff1a98e98 ): The ID of the user that triggered the query. For a map, this would be the viewer of the map, not necessarily the map's owner. You can find the list of user IDs in Activity Data.

  • Metric (CM:maps_api_compute_builder): A composed string that describes the component or metric that triggered this query. Its structure is similar to API + method + feature.

  • Connection ID (connectionId:154438ab-84ed-41cb-b833-6b6de56c2a54): The ID of the connection that was used to execute the query.

  • Map ID (mapId:d4596571-611f-4a0a-a0ab-81dc2b27e9f5): The ID of the Builder map that was used to execute the query.

  • Workflow ID (workflowId:d4596571-611f-4a0a-a0ab-81dc2b27e9f59): The ID of the Workflow that was used to execute the query.

Audit examples

The following examples assume BigQuery syntax and structure, please modify accordingly depending on your data warehouse.

List all queries performed by CARTO

SQL Example
-- All Jobs that contain the CARTO audit-trail comment
SELECT
  creation_time,
  user_email,
  job_id,
  total_bytes_processed,
  total_bytes_billed,
  query
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
  job_type = 'QUERY'
  AND REGEXP_CONTAINS(query, r'/\s*CARTO/3\.0')   -- matches /* CARTO/3.0 … */
ORDER BY
  creation_time DESC;

Find out which workflows scan more data

SQL Example
-- Summarizes bytes read by each CARTO component (CM) and workflowId
WITH carto_jobs AS (
  SELECT
    total_bytes_processed,
    REGEXP_EXTRACT(query, r'workflowId:([a-f0-9-]+)') AS workflow_id
  FROM
    `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE
    job_type = 'QUERY'
    AND REGEXP_CONTAINS(query, r'/\s*CARTO/3\.0')
)
SELECT
  workflow_id,
  COUNT(*)                       AS num_queries,
  SUM(total_bytes_processed)     AS bytes_processed
FROM
  carto_jobs
GROUP BY
  workflow_id
ORDER BY
  bytes_processed DESC;          -- “heaviest” workflows first

Understand which connections are being used in which maps

SQL Example
-- Maps every CARTO mapId to the connectionId(s) it hit
WITH carto_jobs AS (
  SELECT
    REGEXP_EXTRACT(query, r'mapId:([a-f0-9-]+)')        AS map_id,
    REGEXP_EXTRACT(query, r'connectionId:([a-f0-9-]+)') AS connection_id,
    creation_time
  FROM
    `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE
    job_type = 'QUERY'
    AND REGEXP_CONTAINS(query, r'/\s*CARTO/3\.0')
)
SELECT
  map_id,
  connection_id,
  COUNT(*)          AS num_queries,
  MIN(creation_time) AS first_seen,
  MAX(creation_time) AS last_seen
FROM
  carto_jobs
GROUP BY
  map_id, connection_id
ORDER BY
  num_queries DESC;

Last updated

Was this helpful?