LogoLogo
HomeAcademyLoginTry for free
  • Welcome
  • What's new
    • Q2 2025
    • Q1 2025
    • Q4 2024
    • Q3 2024
    • Q2 2024
    • Q1 2024
    • Q4 2023
    • Q3 2023
    • Q2 2023
    • Q1 2023
    • Q4 2022
    • Q3 2022
  • FAQs
    • Accounts
    • Migration to the new platform
    • User & organization setup
    • General
    • Builder
    • Workflows
    • Data Observatory
    • Analytics Toolbox
    • Development Tools
    • Deployment Options
    • CARTO Basemaps
    • CARTO for Education
    • Support Packages
    • Security and Compliance
  • Getting started
    • What is CARTO?
    • Quickstart guides
      • Connecting to your data
      • Creating your first map
      • Creating your first workflow
      • Developing your first application
    • CARTO Academy
  • CARTO User Manual
    • Overview
      • Creating your CARTO organization
      • CARTO Cloud Regions
      • CARTO Workspace overview
    • Maps
      • Data sources
        • Simple features
        • Spatial Indexes
        • Pre-generated tilesets
        • Rasters
        • Defining source spatial data
        • Managing data freshness
        • Changing data source location
      • Layers
        • Point
          • Grid point aggregation
          • H3 point aggregation
          • Heatmap point aggregation
          • Cluster point aggregation
        • Polygon
        • Line
        • Grid
        • H3
        • Raster
        • Zoom to layer
      • Widgets
        • Formula widget
        • Category widget
        • Pie widget
        • Histogram widget
        • Range widget
        • Time Series widget
        • Table widget
      • SQL Parameters
        • Date parameter
        • Text parameter
        • Numeric parameter
        • Publishing SQL parameters
      • Interactions
      • Legend
      • Basemaps
        • Basemap selector
      • AI Agents
      • SQL analyses
      • Map view modes
      • Map description
      • Feature selection tool
      • Search locations
      • Measure distances
      • Exporting data
      • Download PDF reports
      • Managing maps
      • Sharing and collaboration
        • Editor collaboration
        • Map preview for editors
        • Map settings for viewers
        • Comments
        • Embedding maps
        • URL parameters
      • Performance considerations
    • Workflows
      • Workflow canvas
      • Results panel
      • Components
        • Aggregation
        • Custom
        • Data Enrichment
        • Data Preparation
        • Generative AI
        • Input / Output
        • Joins
        • Parsers
        • Raster Operations
        • Spatial Accessors
        • Spatial Analysis
        • Spatial Constructors
        • Spatial Indexes
        • Spatial Operations
        • Statistics
        • Tileset Creation
        • BigQuery ML
        • Snowflake ML
        • Google Earth Engine
        • Google Environment APIs
        • Telco Signal Propagation Models
      • Data Sources
      • Scheduling workflows
      • Sharing workflows
      • Using variables in workflows
      • Executing workflows via API
      • Temporary data in Workflows
      • Extension Packages
      • Managing workflows
      • Workflows best practices
    • Data Explorer
      • Creating a map from your data
      • Importing data
        • Importing rasters
      • Geocoding data
      • Optimizing your data
    • Data Observatory
      • Terminology
      • Browsing the Spatial Data Catalog
      • Subscribing to public and premium datasets
      • Accessing free data samples
      • Managing your subscriptions
      • Accessing your subscriptions from your data warehouse
        • Access data in BigQuery
        • Access data in Snowflake
        • Access data in Databricks
        • Access data in Redshift
        • Access data in PostgreSQL
    • Connections
      • Google BigQuery
      • Snowflake
      • Databricks
      • Amazon Redshift
      • PostgreSQL
      • CARTO Data Warehouse
      • Sharing connections
      • Deleting a connection
      • Required permissions
      • IP whitelisting
      • Customer data responsibilities
    • Applications
    • Settings
      • Understanding your organization quotas
      • Activity Data
        • Activity Data Reference
        • Activity Data Examples
        • Activity Data Changelog
      • Users and Groups
        • Inviting users to your organization
        • Managing user roles
        • Deleting users
        • SSO
        • Groups
        • Mapping groups to user roles
      • CARTO Support Access
      • Customizations
        • Customizing appearance and branding
        • Configuring custom color palettes
        • Configuring your organization basemaps
        • Enabling AI Agents
      • Advanced Settings
        • Managing applications
        • Configuring S3 Bucket for Redshift Imports
        • Configuring OAuth connections to Snowflake
        • Configuring OAuth U2M connections to Databricks
        • Configuring S3 Bucket integration for RDS for PostgreSQL Exports in Builder
        • Configuring Workload Identity Federation for BigQuery
      • Data Observatory
      • Deleting your organization
    • Developers
      • Managing Credentials
        • API Base URL
        • API Access Tokens
        • SPA OAuth Clients
        • M2M OAuth Clients
      • Named Sources
  • Data and Analysis
    • Analytics Toolbox Overview
    • Analytics Toolbox for BigQuery
      • Getting access
        • Projects maintained by CARTO in different BigQuery regions
        • Manual installation in your own project
        • Installation in a Google Cloud VPC
        • Core module
      • Key concepts
        • Tilesets
        • Spatial indexes
      • SQL Reference
        • accessors
        • clustering
        • constructors
        • cpg
        • data
        • http_request
        • import
        • geohash
        • h3
        • lds
        • measurements
        • placekey
        • processing
        • quadbin
        • random
        • raster
        • retail
        • routing
        • s2
        • statistics
        • telco
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
        • Working with Raster data
      • Release notes
      • About Analytics Toolbox regions
    • Analytics Toolbox for Snowflake
      • Getting access
        • Native App from Snowflake's Marketplace
        • Manual installation
      • Key concepts
        • Spatial indexes
        • Tilesets
      • SQL Reference
        • accessors
        • clustering
        • constructors
        • data
        • http_request
        • import
        • h3
        • lds
        • measurements
        • placekey
        • processing
        • quadbin
        • random
        • raster
        • retail
        • s2
        • statistics
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
        • Working with Raster data
      • Release Notes
    • Analytics Toolbox for Databricks
      • Getting access
        • Personal (former Single User) cluster
        • Standard (former Shared) cluster
      • Reference
        • lds
        • tiler
      • Guides
      • Release Notes
    • Analytics Toolbox for Redshift
      • Getting access
        • Manual installation in your database
        • Installation in an Amazon Web Services VPC
        • Core version
      • Key concepts
        • Tilesets
        • Spatial indexes
      • SQL Reference
        • clustering
        • constructors
        • data
        • http_request
        • import
        • lds
        • placekey
        • processing
        • quadbin
        • random
        • s2
        • statistics
        • tiler
        • transformations
      • Guides
        • Running queries from Builder
      • Release Notes
    • Analytics Toolbox for PostgreSQL
      • Getting access
        • Manual installation
        • Core version
      • Key concepts
        • Tilesets
        • Spatial Indexes
      • SQL Reference
        • h3
        • quadbin
        • tiler
      • Guides
        • Creating spatial index tilesets
        • Running queries from Builder
      • Release Notes
    • CARTO + Python
      • Installation
      • Authentication Methods
      • Visualizing Data
      • Working with Data
        • How to work with your data in the CARTO Data Warehouse
        • How to access your Data Observatory subscriptions
        • How to access CARTO's Analytics Toolbox for BigQuery and create visualizations via Python notebooks
        • How to access CARTO’s Analytics Toolbox for Snowflake and create visualizations via Python notebooks
        • How to visualize data from Databricks
      • Reference
    • CARTO QGIS Plugin
  • CARTO for Developers
    • Overview
    • Key concepts
      • Architecture
      • Libraries and APIs
      • Authentication methods
        • API Access Tokens
        • OAuth Access Tokens
        • OAuth Clients
      • Connections
      • Data sources
      • Visualization with deck.gl
        • Basemaps
          • CARTO Basemap
          • Google Maps
            • Examples
              • Gallery
              • Getting Started
              • Basic Examples
                • Hello World
                • BigQuery Tileset Layer
                • Data Observatory Tileset Layer
              • Advanced Examples
                • Arc Layer
                • Extrusion
                • Trips Layer
            • What's New
          • Amazon Location
            • Examples
              • Hello World
              • CartoLayer
            • What's New
        • Rapid Map Prototyping
      • Charts and widgets
      • Filtering and interactivity
      • Summary
    • Quickstart
      • Make your first API call
      • Visualize your first dataset
      • Create your first widget
    • Guides
      • Build a public application
      • Build a private application
      • Build a private application using SSO
      • Visualize massive datasets
      • Integrate CARTO in your existing application
      • Use Boundaries in your application
      • Avoid exposing SQL queries with Named Sources
      • Managing cache in your CARTO applications
    • Reference
      • Deck (@deck.gl reference)
      • Data Sources
        • vectorTableSource
        • vectorQuerySource
        • vectorTilesetSource
        • h3TableSource
        • h3QuerySource
        • h3TilesetSource
        • quadbinTableSource
        • quadbinQuerySource
        • quadbinTilesetSource
        • rasterSource
        • boundaryTableSource
        • boundaryQuerySource
      • Layers (@deck.gl/carto)
      • Widgets
        • Data Sources
        • Server-side vs. client-side
        • Models
          • getFormula
          • getCategories
          • getHistogram
          • getRange
          • getScatter
          • getTimeSeries
          • getTable
      • Filters
        • Column filters
        • Spatial filters
      • CARTO APIs Reference
    • Release Notes
    • Examples
    • CARTO for React
      • Guides
        • Getting Started
        • Views
        • Data Sources
        • Layers
        • Widgets
        • Authentication and Authorization
        • Basemaps
        • Look and Feel
        • Query Parameters
        • Code Generator
        • Sample Applications
        • Deployment
        • Upgrade Guide
      • Examples
      • Library Reference
        • Introduction
        • API
        • Auth
        • Basemaps
        • Core
        • Redux
        • UI
        • Widgets
      • Release Notes
  • CARTO Self-Hosted
    • Overview
    • Key concepts
      • Architecture
      • Deployment requirements
    • Quickstarts
      • Single VM deployment (Kots)
      • Orchestrated container deployment (Kots)
      • Advanced Orchestrated container deployment (Helm)
    • Guides
      • Guides (Kots)
        • Configure your own buckets
        • Configure an external in-memory cache
        • Enable Google Basemaps
        • Enable the CARTO Data Warehouse
        • Configure an external proxy
        • Enable BigQuery OAuth connections
        • Configure Single Sign-On (SSO)
        • Use Workload Identity in GCP
        • High availability configuration for CARTO Self-hosted
        • Configure your custom service account
      • Guides (Helm)
        • Configure your own buckets (Helm)
        • Configure an external in-memory cache (Helm)
        • Enable Google Basemaps (Helm)
        • Enable the CARTO Data Warehouse (Helm)
        • Configure an external proxy (Helm)
        • Enable BigQuery OAuth connections (Helm)
        • Configure Single Sign-On (SSO) (Helm)
        • Use Workload Identity in GCP (Helm)
        • Use EKS Pod Identity in AWS (Helm)
        • Enable Redshift imports (Helm)
        • Migrating CARTO Self-hosted installation to an external database (Helm)
        • Advanced customizations (Helm)
        • Configure your custom service account (Helm)
    • Maintenance
      • Maintenance (Kots)
        • Updates
        • Backups
        • Uninstall
        • Rotating keys
        • Monitoring
        • Change the Admin Console password
      • Maintenance (Helm)
        • Monitoring (Helm)
        • Rotating keys (Helm)
        • Uninstall (Helm)
        • Backups (Helm)
        • Updates (Helm)
    • Support
      • Get debug information for Support (Kots)
      • Get debug information for Support (Helm)
    • CARTO Self-hosted Legacy
      • Key concepts
        • Architecture
        • Deployment requirements
      • Quickstarts
        • Single VM deployment (docker-compose)
      • Guides
        • Configure your own buckets
        • Configure an external in-memory cache
        • Enable Google Basemaps
        • Enable the CARTO Data Warehouse
        • Configure an external proxy
        • Enable BigQuery OAuth connections
        • Configure Single Sign-On (SSO)
        • Enable Redshift imports
        • Configure your custom service account
        • Advanced customizations
        • Migrating CARTO Self-Hosted installation to an external database
      • Maintenance
        • Updates
        • Backups
        • Uninstall
        • Rotating keys
        • Monitoring
      • Support
    • Release Notes
  • CARTO Native App for Snowflake Containers
    • Deploying CARTO using Snowflake Container Services
  • Get Help
    • Legal & Compliance
    • Previous libraries and components
    • Migrating your content to the new CARTO platform
Powered by GitBook
On this page
  • BUILD_REVENUE_MODEL_DATA
  • BUILD_REVENUE_MODEL
  • PREDICT_REVENUE_AVERAGE
  • FIND_WHITESPACE_AREAS
  • COMMERCIAL_HOTSPOTS
  • BUILD_CANNIBALIZATION_DATA
  • CANNIBALIZATION_OVERLAP
  • BUILD_TWIN_AREAS_MODEL
  • FIND_TWIN_AREAS
  • FIND_TWIN_AREAS_WEIGHTED

Was this helpful?

Export as PDF
  1. Data and Analysis
  2. Analytics Toolbox for BigQuery
  3. SQL Reference

retail

PreviousrasterNextrouting

Last updated 19 days ago

Was this helpful?

This module contains procedures to perform analysis to solve specific retail analytics use cases, such as revenue prediction.

BUILD_REVENUE_MODEL_DATA

BUILD_REVENUE_MODEL_DATA(stores_query, stores_variables, competitors_query, aoi_query, grid_type, grid_level, kring, decay, do_variables, do_source, custom_variables, custom_query, output_prefix)

Description

This procedure is the first step of the Revenue Prediction analysis workflow. It prepares the model data to be used in the training and prediction phases by performing the following steps:

  1. Polyfill the geometry from the area of interest using the grid type and resolution level.

  2. Enrich the grid cells with the revenue, stores, Data Observatory (DO) variables and custom variables.

  3. Apply a k-ring decay function to the enriched DO variables and custom variables. This operation smooths the features for a given cell by taking into account the values of these features in the neighboring cells (defined as those within the specified k-ring size), applying a scaling factor determined by the decay function of choice.

  4. Create the revenue revenue_model_data table (see the output description for more details).

  5. Create the revenue revenue_model_data_stats table (see the output description for more details).

Input parameters

  • stores_query: STRING query with variables related to the stores to be used in the model, including their revenue per store (required) and other variables (optional). It must contain the columns revenue (revenue of the store), store (store unique id) and geom (the geographical point of the store). The values of these columns cannot be NULL.

  • stores_variables: ARRAY<STRUCT<variable STRING, aggregation STRING>> list with the columns of the stores_query and their corresponding aggregation method (sum, avg, max, min, count) that will be used to enrich the grid cells. It can be set to NULL. The aggregation of the revenue (avg) and store (count) variables should not be included as they are performed anyway.

  • competitors_query: STRING query with the competitors information to be used in the model. It must contain the columns competitor (competitor store unique id) and geom (the geographical point of the store).

  • aoi_query: STRING query with the geography of the area of interest. It must contain a column geom with a single area (Polygon or MultiPolygon).

  • grid_type: STRING type of the cell grid. Supported values are h3, and quadbin.

  • grid_level: INT64 level or resolution of the cell grid. Check the available , and .

  • kring: INT64 size of the kring where the decay function will be applied. This value can be 0, in which case no kring will be computed and the decay function won't be applied.

  • decay: STRING decay function. Supported values are uniform, inverse, inverse_square and exponential. If set to NULL or '', uniform is used by default.

  • do_variables: ARRAY<STRUCT<variable STRING, aggregation STRING>> variables of the Data Observatory that will be used to enrich the grid cells and therefore train the revenue prediction model in the subsequent step of the Revenue Prediction workflow. For each variable, its slug and the aggregation method must be provided. Use default to use the variable's default aggregation method. Valid aggregation methods are: sum, avg, max, min, count. The catalog procedure DATAOBS_SUBSCRIPTION_VARIABLES can be used to find available variables and their slugs and default aggregation. It can be set to NULL.

  • do_source: STRING name of the location where the Data Observatory subscriptions of the user are stored, in <my-dataobs-project>.<my-dataobs-dataset> format. If only the <my-dataobs-dataset> is included, it uses the project carto-data by default. It can be set to NULL or ''.

  • custom_variables: ARRAY<STRUCT<variable STRING, aggregation STRING>> list with the columns of the custom_query and their corresponding aggregation method (sum, avg, max, min, count) that will be used to enrich the grid cells. It can be set to NULL.

  • custom_query: STRING query that contains a geography column geom and the columns with the custom data that will be used to enrich the grid cells. It can be set to NULL or ''.

  • output_prefix: STRING destination prefix for the output tables. It must contain the project, dataset and a prefix which will be prepended to each output tabla name. For example <my-project>.<my-dataset>.<output-prefix>.

Output

The procedure will output two tables:

  1. Model data table: contains an index column with the cell ids and all the enriched columns: revenue_avg, store_count, competitor_count, stores_variables suffixed by aggregation method, DO variables and custom variables. The name of the table includes the suffix _revenue_model_data, for example <my-project>.<my-dataset>.<output-prefix>_revenue_model_data.

  2. Model data stats table: contains the morans_i value computed for the revenue_avg column, computed with kring 1 and decay uniform. The name of the table includes the suffix _revenue_model_data_stats, for example <my-project>.<my-dataset>.<output-prefix>_revenue_model_data_stats.

Example

CALL `carto-un`.carto.BUILD_REVENUE_MODEL_DATA(
    -- Stores: revenue, store, geom and optional store information
    '''SELECT revenue, store, geom, store_area FROM `<project>.<dataset>.input_stores_data`''',
    -- Stores information variables
    [('store_area','sum')],
    -- Competitors: competitor, geom
    '''SELECT competitor, geom FROM `<project>.<dataset>.input_competitors_data`''',
    -- Area of interest: geom
    '''SELECT geom FROM `<project>.<dataset>.area_of_interest`''',
    -- Grid params: grid type and level
    'h3', 6,
    -- Decay params: kring size and decay function
    3, 'exponential',
    -- Data Observatory enrichment
    [('POPCY_4534fac4', 'sum'), ('INCCYPCAP_7c8377cf', 'avg')],
    '<my-dataobs-project>.<my-dataobs-dataset>',
    -- Custom data enrichment
    [('var1', 'sum'), ('var2', 'avg')],
    '''SELECT var1, var2, geom FROM `<project>.<dataset>.custom_data`''',
    -- Output destination prefix
    '<my-project>.<my-dataset>.<output-prefix>'
);
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_data` will be created
-- with columns: index, revenue_avg, store_count, store_area_sum, competitor_count, POPCY_4534fac4_sum, INCCYPCAP_7c8377cf_avg, var1_sum, var2_avg
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_data_stats` will be created
-- with the column: morans_i
CALL `carto-un-eu`.carto.BUILD_REVENUE_MODEL_DATA(
    -- Stores: revenue, store, geom and optional store information
    '''SELECT revenue, store, geom, store_area FROM `<project>.<dataset>.input_stores_data`''',
    -- Stores information variables
    [('store_area','sum')],
    -- Competitors: competitor, geom
    '''SELECT competitor, geom FROM `<project>.<dataset>.input_competitors_data`''',
    -- Area of interest: geom
    '''SELECT geom FROM `<project>.<dataset>.area_of_interest`''',
    -- Grid params: grid type and level
    'h3', 6,
    -- Decay params: kring size and decay function
    3, 'exponential',
    -- Data Observatory enrichment
    [('POPCY_4534fac4', 'sum'), ('INCCYPCAP_7c8377cf', 'avg')],
    '<my-dataobs-project>.<my-dataobs-dataset>',
    -- Custom data enrichment
    [('var1', 'sum'), ('var2', 'avg')],
    '''SELECT var1, var2, geom FROM `<project>.<dataset>.custom_data`''',
    -- Output destination prefix
    '<my-project>.<my-dataset>.<output-prefix>'
);
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_data` will be created
-- with columns: index, revenue_avg, store_count, store_area_sum, competitor_count, POPCY_4534fac4_sum, INCCYPCAP_7c8377cf_avg, var1_sum, var2_avg
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_data_stats` will be created
-- with the column: morans_i
CALL carto.BUILD_REVENUE_MODEL_DATA(
    -- Stores: revenue, store, geom and optional store information
    '''SELECT revenue, store, geom, store_area FROM `<project>.<dataset>.input_stores_data`''',
    -- Stores information variables
    [('store_area','sum')],
    -- Competitors: competitor, geom
    '''SELECT competitor, geom FROM `<project>.<dataset>.input_competitors_data`''',
    -- Area of interest: geom
    '''SELECT geom FROM `<project>.<dataset>.area_of_interest`''',
    -- Grid params: grid type and level
    'h3', 6,
    -- Decay params: kring size and decay function
    3, 'exponential',
    -- Data Observatory enrichment
    [('POPCY_4534fac4', 'sum'), ('INCCYPCAP_7c8377cf', 'avg')],
    '<my-dataobs-project>.<my-dataobs-dataset>',
    -- Custom data enrichment
    [('var1', 'sum'), ('var2', 'avg')],
    '''SELECT var1, var2, geom FROM `<project>.<dataset>.custom_data`''',
    -- Output destination prefix
    '<my-project>.<my-dataset>.<output-prefix>'
);
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_data` will be created
-- with columns: index, revenue_avg, store_count, store_area_sum, competitor_count, POPCY_4534fac4_sum, INCCYPCAP_7c8377cf_avg, var1_sum, var2_avg
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_data_stats` will be created
-- with the column: morans_i

BUILD_REVENUE_MODEL

BUILD_REVENUE_MODEL(revenue_model_data, options, output_prefix)

Description

  1. Compute the model from the input query and options.

  2. Compute the revenue model_shap, model_stats tables (see the output description for more details).

Input parameters

  • options: STRING JSON string to overwrite the model default options. The following fixed options cannot be modified:

    • ENABLE_GLOBAL_EXPLAIN: TRUE

    • INPUT_LABEL_COLS: ['revenue_avg']

  • BOOSTED_TREE_REGRESSOR:

    • SUBSAMPLE: 0.85

    • EARLY_STOP: FALSE

    • MAX_ITERATIONS: 50

    • DATA_SPLIT_METHOD: NO_SPLIT

  • RANDOM_FOREST_REGRESSOR:

    • COLSAMPLE_BYTREE: 0.5

    • DATA_SPLIT_METHOD: NO_SPLIT

  • LINEAR_REG:

    • EARLY_STOP: FALSE

    • MAX_ITERATIONS: 50

    • DATA_SPLIT_METHOD: NO_SPLIT

  • output_prefix: STRING destination prefix for the output tables. It must contain the project, dataset and prefix. For example <my-project>.<my-dataset>.<output-prefix>.

Output

The procedure will output the following:

  1. Model: contains the trained model to be used for the revenue prediction. The name of the model includes the suffix _revenue_model, for example <my-project>.<my-dataset>.<output-prefix>_revenue_model.

Example

CALL `carto-un`.carto.BUILD_REVENUE_MODEL(
    -- Model data
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model_data',
    -- Options
    '{"MAX_ITERATIONS": 100}',
    -- Output destination prefix
    '<my-project>.<my-dataset>.<output-prefix>'
);
-- Model `<my-project>.<my-dataset>.<output-prefix>_revenue_model` will be created
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_shap` will be created
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_stats` will be created
CALL `carto-un-eu`.carto.BUILD_REVENUE_MODEL(
    -- Model data
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model_data',
    -- Options
    '{"MAX_ITERATIONS": 100}',
    -- Output destination prefix
    '<my-project>.<my-dataset>.<output-prefix>'
);
-- Model `<my-project>.<my-dataset>.<output-prefix>_revenue_model` will be created
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_shap` will be created
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_stats` will be created
CALL carto.BUILD_REVENUE_MODEL(
    -- Model data
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model_data',
    -- Options
    '{"MAX_ITERATIONS": 100}',
    -- Output destination prefix
    '<my-project>.<my-dataset>.<output-prefix>'
);
-- Model `<my-project>.<my-dataset>.<output-prefix>_revenue_model` will be created
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_shap` will be created
-- Table `<my-project>.<my-dataset>.<output-prefix>_revenue_model_stats` will be created

PREDICT_REVENUE_AVERAGE

PREDICT_REVENUE_AVERAGE(index, revenue_model, revenue_model_data, candidate_data, stores_variables)

Description

Input parameters

  • index: ANY TYPE cell index where the new store will be located. It can be an H3 or a Quadbin index. For Quadbin, the value should be INT64 whereas for H3 the value should be STRING. It can also be 'ALL', in which case the prediction for all the grid cells of the model data are returned.

  • revenue_model: STRING the fully qualified model name.

  • revenue_model_data: STRING the fully qualified revenue_model_data table name.

  • candidate_data: STRING the fully qualified 1-row table containing the values for any of the stores_variables to be aggregated with the current values in the grid cell(s) specified by the index parameter. If set to NULL, only the store_count variable is considered, with 1 store added in the grid cell(s) specified by the index parameter.

  • stores_variables: ARRAY<STRUCT<variable STRING, aggregation STRING>> list with the columns of the stores_query and their corresponding aggregation method (sum, avg, max, min, count) that will be used for prediction. It can be set to NULL.

Output

The procedure will output the index, predicted_revenue_avg value in the cell (in the same units of the revenue column), and shap_values, an array of key value pairs with the shap values of the features for each prediction. It also includes a baseline_prediction, which is the expected revenue without considering the impact of any other features.

Example

CREATE TABLE '<my-project>.<my-dataset>.<output-prefix>_candidate_data'  AS (SELECT 25 store_area_sum);

CALL `carto-un`.carto.PREDICT_REVENUE_AVERAGE(
    '862676d1fffffff',
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model',
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model_data',
    '<my-project>.<my-dataset>.<output-prefix>_candidate_data',
    [('store_area','sum')]
);
-- index, predicted_revenue_avg
CREATE TABLE '<my-project>.<my-dataset>.<output-prefix>_candidate_data'  AS (SELECT 25 store_area_sum);

CALL `carto-un-eu`.carto.PREDICT_REVENUE_AVERAGE(
    '862676d1fffffff',
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model',
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model_data',
    '<my-project>.<my-dataset>.<output-prefix>_candidate_data',
    [('store_area','sum')]
);
-- index, predicted_revenue_avg
CREATE TABLE '<my-project>.<my-dataset>.<output-prefix>_candidate_data'  AS (SELECT 25 store_area_sum);

CALL carto.PREDICT_REVENUE_AVERAGE(
    '862676d1fffffff',
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model',
    '<my-project>.<my-dataset>.<output-prefix>_revenue_model_data',
    '<my-project>.<my-dataset>.<output-prefix>_candidate_data',
    [('store_area','sum')]
);
-- index, predicted_revenue_avg

FIND_WHITESPACE_AREAS

FIND_WHITESPACE_AREAS(revenue_model, revenue_model_data, generator_query, aoi_query, minimum_revenue, max_results, with_own_stores, with_competitors)

Description

This is a postprocessing step that may be used after completing a Revenue Prediction analysis workflow. It allows you to identify cells with the highest potential revenue (whitespaces), while satisfying a series of criteria (e.g. presence of competitors).

Input parameters

  • revenue_model: STRING with the fully qualified model name.

  • revenue_model_data: STRING with the fully qualified model_data table name.

  • generator_query: STRING query with the location of a set of generator points as a geography column named geom. The algorithm will look for whitespaces in the surroundings of these locations, therefore avoiding offering results in locations that are not of the interest of the user. Good options to use as generator locations are, for instance, the location of the stores and competitors, or a collection of POIs that are known to drive commercial activity to an area.

  • aoi_query: STRING query with the geography of the area of interest in which to perform the search. May be NULL, in which case no spatial filter will be applied.

  • minimum_revenue: FLOAT64 the minimum revenue to filter results by. May be NULL, in which case no revenue threshold will be applied.

  • max_results: INT64 of the maximum number of results, ordered by decreasing predicted revenue. May be NULL, in which case all eligible cells are returned.

  • with_own_stores: BOOL specifying whether to consider cells that already have own stores in them. If NULL, defaults to TRUE.

  • with_competitors: BOOL specifying whether to consider cells that already have competitors in them. If NULL, defaults to TRUE.

Output

The procedure will output a table of cells with the following columns:

  • index: identifying the H3, or Quadbin cell.

  • predicted_revenue_avg: average revenue of an additional store located in the grid cell.

  • store_count: number of own stores present in the grid cell.

  • competitor_count: number of competitors present in the grid cell.

Example

CALL `carto-un`.carto.FIND_WHITESPACE_AREAS(
    '<my-project>.<my-dataset>.<output-prefix>_model',
    '<my-project>.<my-dataset>.<output-prefix>_model_data',
    'SELECT geom FROM `<my-project>.<my-dataset>.<generator-table>`',
    'SELECT geom FROM `<my-project>.<my-dataset>.<area_of_interest_table>`', -- Area of Interest filter
    10000, -- Minimum predicted revenue filter
    5, -- Maximum number of results
    TRUE, -- Whether to include cells with own stores
    FALSE -- Whether to include cells with competitors
);
CALL `carto-un-eu`.carto.FIND_WHITESPACE_AREAS(
    '<my-project>.<my-dataset>.<output-prefix>_model',
    '<my-project>.<my-dataset>.<output-prefix>_model_data',
    'SELECT geom FROM `<my-project>.<my-dataset>.<generator-table>`',
    'SELECT geom FROM `<my-project>.<my-dataset>.<area_of_interest_table>`', -- Area of Interest filter
    10000, -- Minimum predicted revenue filter
    5, -- Maximum number of results
    TRUE, -- Whether to include cells with own stores
    FALSE -- Whether to include cells with competitors
);
CALL carto.FIND_WHITESPACE_AREAS(
    '<my-project>.<my-dataset>.<output-prefix>_model',
    '<my-project>.<my-dataset>.<output-prefix>_model_data',
    'SELECT geom FROM `<my-project>.<my-dataset>.<generator-table>`',
    'SELECT geom FROM `<my-project>.<my-dataset>.<area_of_interest_table>`', -- Area of Interest filter
    10000, -- Minimum predicted revenue filter
    5, -- Maximum number of results
    TRUE, -- Whether to include cells with own stores
    FALSE -- Whether to include cells with competitors
);

COMMERCIAL_HOTSPOTS

COMMERCIAL_HOTSPOTS(input, output, index_column, index_type, variable_columns, variable_weights, kring, pvalue_thresh)

Description

Input parameters

  • input: STRING name of the table containing the input data. It should include project and dataset, i.e., follow the format <project-id>.<dataset-id>.<table-name>.

  • output: STRING name of the table where the output data will be stored. It should include project and dataset, i.e., follow the format <project-id>.<dataset-id>.<table-name>. If NULL, the procedure will return the output but it will not be persisted.

  • index_column: STRING name of the column containing the H3 or Quadbin indexes.

  • index_type: STRING type of the input cell indexes. Supported values are 'h3', or 'quadbin'.

  • variable_columns: ARRAY<STRING> names of the columns containing the variables to take into account when computing the combined Gi* statistic.

  • variable_weights: ARRAY<FLOAT64> containing the weights associated with each of the variables. These weights can take any value but will be normalized to sum up to 1. If NULL, uniform weights will be considered

  • kring: INT64 size of the k-ring (distance from the origin). This defines the area around each cell that will be taken into account to compute its Gi* statistic.

  • pvalue_thresh: Threshold for the Gi* value significance, ranging from 0 (most significant) to 1 (least significant). It defaults to 0.05. Cells with a p-value above this threshold won't be returned.

Output The output will contain the following columns:

  • index: STRING containing the cell index.

  • combined_gi: FLOAT64 with the resulting combined Gi*.

  • p_value: FLOAT64 with the p-value associated with the combined Gi* statistic.

If the output table is not specified when calling the procedure, the result will be returned but it won't be persisted.

Examples

CALL `carto-un`.carto.COMMERCIAL_HOTSPOTS(
    'project_id.dataset_id.my_input_table',
    'project_id.dataset_id.my_output_table',
    'index',
    'h3',
    ['feature_0', 'feature_1'],
    [0.7, 0.3],
     3,
     0.01
)
-- Table project_id.dataset_id.my_output_table will be created.
-- with columns: index, combined_gi, p_value
CALL `carto-un-eu`.carto.COMMERCIAL_HOTSPOTS(
    'project_id.dataset_id.my_input_table',
    'project_id.dataset_id.my_output_table',
    'index',
    'h3',
    ['feature_0', 'feature_1'],
    [0.7, 0.3],
     3,
     0.01
)
-- Table project_id.dataset_id.my_output_table will be created.
-- with columns: index, combined_gi, p_value
CALL carto.COMMERCIAL_HOTSPOTS(
    'project_id.dataset_id.my_input_table',
    'project_id.dataset_id.my_output_table',
    'index',
    'h3',
    ['feature_0', 'feature_1'],
    [0.7, 0.3],
     3,
     0.01
)
-- Table project_id.dataset_id.my_output_table will be created.
-- with columns: index, combined_gi, p_value
CALL `carto-un`.carto.COMMERCIAL_HOTSPOTS(
    'project_id.dataset_id.my_input_table',
    'project_id.dataset_id.my_output_table',
    'index',
    'quadbin',
    ['feature_0', 'feature_1'],
    [0.5, 0.5],
    1,
    0.05
);
-- Table project_id.dataset_id.my_output_table will be created.
-- with columns: index, combined_gi, p_value
CALL `carto-un-eu`.carto.COMMERCIAL_HOTSPOTS(
    'project_id.dataset_id.my_input_table',
    'project_id.dataset_id.my_output_table',
    'index',
    'quadbin',
    ['feature_0', 'feature_1'],
    [0.5, 0.5],
    1,
    0.05
);
-- Table project_id.dataset_id.my_output_table will be created.
-- with columns: index, combined_gi, p_value
CALL carto.COMMERCIAL_HOTSPOTS(
    'project_id.dataset_id.my_input_table',
    'project_id.dataset_id.my_output_table',
    'index',
    'quadbin',
    ['feature_0', 'feature_1'],
    [0.5, 0.5],
    1,
    0.05
);
-- Table project_id.dataset_id.my_output_table will be created.
-- with columns: index, combined_gi, p_value

Additional examples

BUILD_CANNIBALIZATION_DATA

BUILD_CANNIBALIZATION_DATA(grid_type, store_query, resolution, method, do_variables, do_urbanity_index, do_source, output_destination, output_prefix, options)

Description

  1. For each store location, the urbanity level based on CARTO Spatial Features dataset is retrieved.

  2. For each store location, given the radius specified, the cells of the influence area are found.

  3. All cells are enriched with the specified features from Data Observatory subscriptions (e.g. population, footfall, etc.).

  4. A table with store_id, cell_id, and features values are created.

Input parameters

  • grid_type: STRING type of the cell grid. Supported values are h3 and quadbin.

  • store_query: STRING query with variables related to the stores to be used in the model, including their id and location. It must contain the columns store_id (store unique id) and geom (the geographical point of the store). Optionally it can contain a third column custom_geom which is the custom Polygon that represents the trade area for each location. It is used only when method is set to custom. The values of these columns cannot be NULL.

  • method: STRING indicates the method of trade area generation. Three options available: buffer, kring, isoline and custom. This method applies to all locations provided.

  • do_urbanity_index: STRING|NULL urbanity index variable slug_id in a CARTO Spatial Features subscription from the Data Observatory. If set to NULL then the urbanity is not considered and only one distance, the first one from the options arguments is being taken into account.

  • do_source: STRING name of the location where the Data Observatory subscriptions of the user are stored, in <my-dataobs-project>.<my-dataobs-dataset> format. If only the <my-dataobs-dataset> is included, it uses the project carto-data by default. It can be set to NULL or ''.

  • output_destination: STRING destination dataset the output tables. It must contain the project and dataset. For example <my-project>.<my-dataset>.

  • output_prefix: STRING prefix for the output table.

Output

This procedure will output one table:

  • Table containing the store_id, cell_id, distance from store_id (integer), the values for each Data Observatory feature, method type and parameters for each method. The output table can be found at the output destination with name <output-prefix>_output. Overall path <my-project>.<my-dataset>.<output-prefix>_output.

Examples

CALL `carto-un`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'buffer',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[5., 3., 1.]}'
);
CALL `carto-un-eu`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'buffer',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[5., 3., 1.]}'
);
CALL carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'buffer',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[5., 3., 1.]}'
);
CALL `carto-un`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'kring',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[10, 5, 3]}',
);
CALL `carto-un-eu`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'kring',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[10, 5, 3]}',
);
CALL carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'kring',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[10, 5, 3]}',
);
CALL `carto-un`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'isoline',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"modes":[car, car, walk], "distances":[600, 600, 300], "api_base_url":"your url", "api_access_token":"your token"}'
);
CALL `carto-un-eu`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'isoline',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"modes":[car, car, walk], "distances":[600, 600, 300], "api_base_url":"your url", "api_access_token":"your token"}'
);
CALL carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'isoline',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"modes":[car, car, walk], "distances":[600, 600, 300], "api_base_url":"your url", "api_access_token":"your token"}'
);
CALL `carto-un`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, custom_geom FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'custom',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    NULL,
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    NULL
);
CALL `carto-un-eu`.carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, custom_geom FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'custom',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    NULL,
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    NULL
);
CALL carto.BUILD_CANNIBALIZATION_DATA(
    --grid_type
    'h3',
    --store_query
    '''SELECT store_id, geom, custom_geom FROM `<project>.<dataset>.<table_name_with_stores>`''',
    --resolution
    8,
    --method,
    'custom',
    --do_variables
    [('population_f5b8d177','sum')],
    --do_urbanity_index
    NULL,
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    NULL
);

Additional examples

CANNIBALIZATION_OVERLAP

CANNIBALIZATION_OVERLAP(data_table, new_locations_query, method, do_urbanity_index, do_source, output_destination, output_prefix, options)

Description

Input parameters

  • data_table: STRING Table with columns store_id, cell_id, distance from store_id (integer) and the values for each Data Observatory features.

  • new_locations_query: STRING query with store_id and location of new stores. Optionally it can contain a third column custom_geom which is the custom Polygon that represents the trade area for each location. It is used only when method is set to custom.

  • method: STRING indicates the method of trade area generation. Three options available: buffer, kring, isoline and custom. This method applies to all locations provided.

  • do_urbanity_index: STRING|NULL urbanity index variable name from the Data Observatory subscriptions. If set to NULL then the urbanity is not considered and only one distance, the first one from the options arguments is being taken into account.

  • do_source: STRING name of the location where the Data Observatory subscriptions of the user are stored, in <my-dataobs-project>.<my-dataobs-dataset> format. If only the <my-dataobs-dataset> is included, it uses the project carto-data by default. It can be set to NULL or ''.

  • output_destination: STRING destination dataset for the output tables. It must contain the project, dataset and prefix. For example <my-project>.<my-dataset>.

  • output_prefix: STRING The prefix for each table in the output destination.

Output

This procedure will output one table:

  • Output overlap table which contains the store_id that receives the "cannibalization", store_id that causes the cannibalization, area overlap and features overlap for each Data Observatory features included in the analysis. The output table can be found at the output destination with the name <output-prefix>_output_overlap. Overall path <my-project>.<my-dataset>.<output-prefix>_output_overlap.

Examples

CALL `carto-un`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'buffer',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[5.0, 3.0, 1.0]}'
);
CALL `carto-un-eu`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'buffer',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[5.0, 3.0, 1.0]}'
);
CALL carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'buffer',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[5.0, 3.0, 1.0]}'
);
CALL `carto-un`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'kring',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[10, 5, 3]}'
);
CALL `carto-un-eu`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'kring',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[10, 5, 3]}'
);
CALL carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'kring',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"distances":[10, 5, 3]}'
);
CALL `carto-un`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'isoline',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"modes":[car, car, walk], "distances":[600, 600, 300], "api_base_url":"your url", "api_access_token":"your token"}'
);
CALL `carto-un-eu`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'isoline',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"modes":[car, car, walk], "distances":[600, 600, 300], "api_base_url":"your url", "api_access_token":"your token"}'
);
CALL carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'isoline',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    '{"modes":[car, car, walk], "distances":[600, 600, 300], "api_base_url":"your url", "api_access_token":"your token"}'
);
CALL `carto-un`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom, custom_geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'custom',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    NULL
);
CALL `carto-un-eu`.carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom, custom_geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'custom',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    NULL
);
CALL carto.CANNIBALIZATION_OVERLAP(
    --data_table
    '<my-project>.<my-dataset>.output_<suffix from step1>',
    --new_locations_query
     '''SELECT store_id, geom, custom_geom FROM `<project>.<dataset>.<table_name_with_new_stores>`''',
    --method,
    'custom',
    --do_urbanity_index
    'urbanity_e1a58891',
    --do_source
    '<my-dataobs-project>.<my-dataobs-dataset>',
    --output_destination
    '<my-project>.<my-dataset>',
    --output_prefix
    '<output-prefix>',
    --options
    NULL
);

Additional examples

BUILD_TWIN_AREAS_MODEL

BUILD_TWIN_AREAS_MODEL(origin_query, target_query, index_column, output_prefix, options)

Description

Input parameters

  • origin_query: STRING the query or the fully qualified name of the table containing the origin cells data.

  • target_query: STRING the query or the fully qualified name of the table containing the target cells data.

  • index_column: STRING the name of the index column.

  • output_prefix: STRING destination prefix for the output tables. It must contain the project, dataset and prefix. For example <my-project>.<my-dataset>.<output-prefix>.

  • options: STRING the JSON string containing the available options as described in the table below.

    Option
    Description

    categorical_variables

    ARRAY<STRING> The array containing the names of the categorical (a.k.a. qualitative) columns

    NUM_PRINCIPAL_COMPONENTS

    PCA_EXPLAINED_VARIANCE_RATIO

    PCA_SOLVER

Return type

The procedure will output the following:

  • Target data table: contains the transformed data for the data in the target cells, that will be used to create the PCA model. The name of the table includes the suffix _target_data, for example <my-project>.<my-dataset>.<output-prefix>_target_data.

  • Origin data table: contains the transformed data for the data in the origin cells. The name of the table includes the suffix _origin_data, for example <my-project>.<my-dataset>.<output-prefix>_origin_data.

  • The PCA model. The model name includes the suffix _model, for example <my-project>.<my-dataset>.<output-prefix>_model.

Examples

CALL `carto-un`.carto.BUILD_TWIN_AREAS_MODEL
(
    '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_w_mixed_quadbin`''',
    '''SELECT * FROM `cartobq.docs.twin_areas_target_enriched_w_mixed_quadbin`''',
    'quadbin',
    '<my-project>.<my-dataset>.<my-prefix>',
    '''{
        "categorical_variables":["urbanity_66f7575b_max"],
        "model_options":{
            "NUM_PRINCIPAL_COMPONENTS":3
        }
    }'''
)
-- Tables `<my-project>.<my-dataset>.<my-prefix>_target_data` and `<my-project>.<my-dataset>.<my-prefix>_origin_data` and model `<my-project>.<my-dataset>.<my-prefix>_model will be created
CALL `carto-un-eu`.carto.BUILD_TWIN_AREAS_MODEL
(
    '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_w_mixed_quadbin`''',
    '''SELECT * FROM `cartobq.docs.twin_areas_target_enriched_w_mixed_quadbin`''',
    'quadbin',
    '<my-project>.<my-dataset>.<my-prefix>',
    '''{
        "categorical_variables":["urbanity_66f7575b_max"],
        "model_options":{
            "NUM_PRINCIPAL_COMPONENTS":3
        }
    }'''
)
-- Tables `<my-project>.<my-dataset>.<my-prefix>_target_data` and `<my-project>.<my-dataset>.<my-prefix>_origin_data` and model `<my-project>.<my-dataset>.<my-prefix>_model will be created
CALL carto.BUILD_TWIN_AREAS_MODEL
(
    '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_w_mixed_quadbin`''',
    '''SELECT * FROM `cartobq.docs.twin_areas_target_enriched_w_mixed_quadbin`''',
    'quadbin',
    '<my-project>.<my-dataset>.<my-prefix>',
    '''{
        "categorical_variables":["urbanity_66f7575b_max"],
        "model_options":{
            "NUM_PRINCIPAL_COMPONENTS":3
        }
    }'''
)
-- Tables `<my-project>.<my-dataset>.<my-prefix>_target_data` and `<my-project>.<my-dataset>.<my-prefix>_origin_data` and model `<my-project>.<my-dataset>.<my-prefix>_model will be created
CALL `carto-un`.carto.BUILD_TWIN_AREAS_MODEL
(
    '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_w_mixed_h3` WHERE urbanity_66f7575b_max = "rural"''',
    '''SELECT * FROM `cartobq.docs.twin_areas_target_enriched_w_mixed_h3`''',
    'h3',
    '<my-project>.<my-dataset>.<my-prefix>',
    '''{
        "categorical_variables":["urbanity_66f7575b_max"],
        "model_options":{
            "NUM_PRINCIPAL_COMPONENTS":3
        }
    }'''
)
-- Tables `<my-project>.<my-dataset>.<my-prefix>_target_data` and `<my-project>.<my-dataset>.<my-prefix>_origin_data` and model `<my-project>.<my-dataset>.<my-prefix>_model will be created
CALL `carto-un-eu`.carto.BUILD_TWIN_AREAS_MODEL
(
    '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_w_mixed_h3` WHERE urbanity_66f7575b_max = "rural"''',
    '''SELECT * FROM `cartobq.docs.twin_areas_target_enriched_w_mixed_h3`''',
    'h3',
    '<my-project>.<my-dataset>.<my-prefix>',
    '''{
        "categorical_variables":["urbanity_66f7575b_max"],
        "model_options":{
            "NUM_PRINCIPAL_COMPONENTS":3
        }
    }'''
)
-- Tables `<my-project>.<my-dataset>.<my-prefix>_target_data` and `<my-project>.<my-dataset>.<my-prefix>_origin_data` and model `<my-project>.<my-dataset>.<my-prefix>_model will be created
CALL carto.BUILD_TWIN_AREAS_MODEL
(
    '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_w_mixed_h3` WHERE urbanity_66f7575b_max = "rural"''',
    '''SELECT * FROM `cartobq.docs.twin_areas_target_enriched_w_mixed_h3`''',
    'h3',
    '<my-project>.<my-dataset>.<my-prefix>',
    '''{
        "categorical_variables":["urbanity_66f7575b_max"],
        "model_options":{
            "NUM_PRINCIPAL_COMPONENTS":3
        }
    }'''
)
-- Tables `<my-project>.<my-dataset>.<my-prefix>_target_data` and `<my-project>.<my-dataset>.<my-prefix>_origin_data` and model `<my-project>.<my-dataset>.<my-prefix>_model will be created

FIND_TWIN_AREAS

FIND_TWIN_AREAS(twin_areas_model, index_column, output_table, options)

Description

Input parameters

  • twin_areas_model: STRING the fully qualified name of the Principal Component Analysis model based on the target data.

  • index_column: STRING the name of the index column.

  • output_table: STRING the fully qualified name of the output table. It must contain the project, dataset and prefix: <my-project>.<my-dataset>.<my-table>.

  • options: STRING the JSON string containing the available options as described in the table below.

    Option
    Description

    origin_index

    STRING the index of the origin cell. Must be specified if origin_coords is not specified.

    origin_coords

    ARRAY<FLOAT64> the longitude and the latitude of the origin location. Must be specified if origin_index is not specified.

    max_results

    INT64 the maximum number of results to be returned. If this is set to NULL or it's greater than the number of target cells with a positive similarity skill score, only these cells are returned.

Return type

A table containing in each row the index of the target cells (index_column) and its associated similarity_score and similarity_skill_score. The similarity_score corresponds to the distance between the origin and target cell in the Principal Component (PC) Scores space; the similarity_skill_score for a given target cell *t* is computed as 1 - similarity_score(*t*) / similarity_score(<*t*>), where <*t*> is the average target cell, computed by averaging each retained PC score for all the target cells. This similarity_skill_score represents a relative measure: the score will be positive if and only if the target cell is more similar to the origin than the mean vector data, with a score of 1 meaning perfect matching or zero distance. Therefore, a target cell with a larger score will be more similar to the origin under this scoring rule. Only the cells in the target area for which the similarity skill score is positive are returned.

Examples

In the following query, the index of the cell containing the origin location is specified:

CALL `carto-un`.carto.FIND_TWIN_AREAS
(
    'cartobq.docs.twin_areas_test_model',
    'quadbin',
    '<my-project>.<my-dataset>.<my-table>',
    '''{
        "origin_index":"5256404338758123519",
        "max_results":10
    }'''
)
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score
CALL `carto-un-eu`.carto.FIND_TWIN_AREAS
(
    'cartobq.docs.twin_areas_test_model',
    'quadbin',
    '<my-project>.<my-dataset>.<my-table>',
    '''{
        "origin_index":"5256404338758123519",
        "max_results":10
    }'''
)
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score
CALL carto.FIND_TWIN_AREAS
(
    'cartobq.docs.twin_areas_test_model',
    'quadbin',
    '<my-project>.<my-dataset>.<my-table>',
    '''{
        "origin_index":"5256404338758123519",
        "max_results":10
    }'''
)
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score

Here, instead, the longitude and the latitude of the origin location are specified:

CALL `carto-un`.carto.FIND_TWIN_AREAS
(
    'cartobq.docs.twin_areas_test_h3_model',
    'h3',
    '<my-project>.<my-dataset>.<my-table>',
    '''{
        "origin_coords":[-91.416, 41.915]
    }'''
)
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: h3, similarity_score, similarity_skill_score
CALL `carto-un-eu`.carto.FIND_TWIN_AREAS
(
    'cartobq.docs.twin_areas_test_h3_model',
    'h3',
    '<my-project>.<my-dataset>.<my-table>',
    '''{
        "origin_coords":[-91.416, 41.915]
    }'''
)
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: h3, similarity_score, similarity_skill_score
CALL carto.FIND_TWIN_AREAS
(
    'cartobq.docs.twin_areas_test_h3_model',
    'h3',
    '<my-project>.<my-dataset>.<my-table>',
    '''{
        "origin_coords":[-91.416, 41.915]
    }'''
)
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: h3, similarity_score, similarity_skill_score

Additional examples

FIND_TWIN_AREAS_WEIGHTED

FIND_TWIN_AREAS_WEIGHTED(origin_query, target_query, index_column, output_table, options)

Description

Input parameters

  • origin_query: STRING the query or the fully qualified name of the table containing the origin cells data.

  • target_query: STRING the query or the fully qualified name of the table containing the target cells data.

  • index_column: STRING the name of the index column.

  • output_table: STRING the fully qualified prefix for the output tables, e.g. '-<my.dataset>'.

  • options: STRING the JSON string containing the available options as described in the table below.

    Option
    Description

    max_results

    INT64 of the maximum number of twin areas returned. If set to NULL, all target cells are returned

    weights

    ARRAY<STRUCT<name STRING, value FLOAT64>> the weights on the features. If set to NULL, then all features are treated equally. This parameter is considered only if the length of weights is greater or equal than one. The sum of weights must be less than or equal to 1. If less weights than the number of features are provided, then for the undefined features, the remaining 1 - sum(weights) is distributed evenly

Output

The procedure outputs a table containing in each row the index of the target cells (index_column) and its associated similarity_score and similarity_skill_score. The similarity_score corresponds to the distance between the origin and target cell taking into account the user defined weights; the similarity_skill_score for a given target cell *t* is computed as 1 - similarity_score(*t*) / similarity_score(<*t*>), where <*t*> is the average target cell, computed by averaging each feature for all the target cells. This similarity_skill_score represents a relative measure: the score will be positive if and only if the target cell is more similar to the origin than the mean vector data, with a score of 1 meaning perfect matching or zero distance. Therefore, a target cell with a larger score will be more similar to the origin under this scoring rule.

Example

In this example, we are using default equal weights:

CALL `carto-un`.carto.FIND_TWIN_AREAS_WEIGHTED(
  -- Input queries
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin` LIMIT 1''',
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  -- Twin areas model inputs
  'quadbin',
  NULL,
  NULL,
  '<my-project>.<my-dataset>.<my-table>'
);
-- Table `<my-project>.<my-dataset>.<my-table>'` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score
CALL `carto-un-eu`.carto.FIND_TWIN_AREAS_WEIGHTED(
  -- Input queries
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin` LIMIT 1''',
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  -- Twin areas model inputs
  'quadbin',
  NULL,
  NULL,
  '<my-project>.<my-dataset>.<my-table>'
);
-- Table `<my-project>.<my-dataset>.<my-table>'` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score
CALL carto.FIND_TWIN_AREAS_WEIGHTED(
  -- Input queries
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin` LIMIT 1''',
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  -- Twin areas model inputs
  'quadbin',
  NULL,
  NULL,
  '<my-project>.<my-dataset>.<my-table>'
);
-- Table `<my-project>.<my-dataset>.<my-table>'` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score

In this example, we are using user-specified weights:

CALL `carto-un`.carto.FIND_TWIN_AREAS_WEIGHTED(
  -- Input queries
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  'quadbin',
  '<my-project>.<my-dataset>.<my-table>',
  '''{
    "origin_coords":[-91.4117431, 41.9145412],
    "weights":
      {
        "total_pop_3cf008b3_sum":0.35,
        "households_58bf2ab1_sum":0.35,
        "count_qualified_count":0.35
      }
    }'''
);
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score
CALL `carto-un-eu`.carto.FIND_TWIN_AREAS_WEIGHTED(
  -- Input queries
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  'quadbin',
  '<my-project>.<my-dataset>.<my-table>',
  '''{
    "origin_coords":[-91.4117431, 41.9145412],
    "weights":
      {
        "total_pop_3cf008b3_sum":0.35,
        "households_58bf2ab1_sum":0.35,
        "count_qualified_count":0.35
      }
    }'''
);
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score
CALL carto.FIND_TWIN_AREAS_WEIGHTED(
  -- Input queries
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  '''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched_quadbin`''',
  'quadbin',
  '<my-project>.<my-dataset>.<my-table>',
  '''{
    "origin_coords":[-91.4117431, 41.9145412],
    "weights":
      {
        "total_pop_3cf008b3_sum":0.35,
        "households_58bf2ab1_sum":0.35,
        "count_qualified_count":0.35
      }
    }'''
);
-- Table `<my-project>.<my-dataset>.<my-table>` will be created
-- with the column: quadbin, similarity_score, similarity_skill_score

This procedure is the second step of the Revenue Prediction analysis workflow. It creates the model and its description tables from the input model data (output of the procedure). It performs the following steps:

revenue_model_data: STRING table with the revenue model data generated with the procedure.

If set to NULL or empty, it will use the default options for each allowed model type, as detailed below. Models currently supported are (DEFAULT), and .

Check the for more information.

Shap table: contains a list of the features and their attribution to the model, computed with . The name of the table includes the suffix _revenue_model_shap, for example <my-project>.<my-dataset>.<output-prefix>_revenue_model_shap.

Stats table: contains the model stats (mean_error, variance, etc.), computed with . The name of the table includes the suffix _revenue_model_stats, for example <my-project>.<my-dataset>.<output-prefix>_revenue_model_stats.

To learn more about how to evaluate the results of your model through the concept of explainability, refer to ().

This procedure is the third and final step of the Revenue Prediction analysis workflow. It predicts the average revenue of an additional store located in the specified grid cell. It requires as input the model data (output of the procedure) and the trained model (output of the procedure).

It requires as input the model data (output of the procedure) and the trained model (output of the procedure), as well as a query with points to use as generators for the area of applicability of the model, plus a series of optional filters.

A cell is eligible to be considered a whitespace if it complies with the filtering criteria (minimum revenue, presence of competitors, etc.) and is within the of the revenue model provided.

This procedure is used to locate hotspot areas by calculating a combined statistic using a uniform kernel over several variables. The input data should be in either an H3 or Quadbin grid. The individual Gi* statistics are combined using , which also allows to introduce individual weights, with the combined statistics following a standard normal distribution. The hotspots are identified as those cells with a positive combined the Gi* statistics which is significant at the specified significance level, i.e. whose p-value is below the p-value threshold (pvalue_thresh) set by the user.

This procedure is the first of two from the Cannibalization analysis workflow. It builds the dataset for the existing locations to be used by the procedure to estimate the overlap between existing stores and the potentially new ones.

For the isoline method, the use of this procedure requires providing authorization credentials. Two parameters are needed: api_base_url and api_access_token. Both the API base url and your API access token can be accessed through the developers section of the CARTO user interface. Please check our for more details.

resolution: INT64 level or resolution of the cell grid. Check the available and .

do_variables: ARRAY<STRUCT<variable STRING, aggregation STRING>> variables of the Data Observatory that will be used to enrich the grid cells and therefore compute the overlap between store locations in the subsequent step of the Cannibalization workflow. For each variable, its slug and the aggregation method must be provided. Use default to use the variable's default aggregation method. Valid aggregation methods are: sum, avg, max, min, count. The catalog procedure can be used to find available variables and their slugs and default aggregation. It can be set to NULL.

options: JSON A JSON string containing the required parameters for the specified method. For buffer: {distances: [radius km for Remote/Rural/Low_density_urban, radius km for Medium_density_urban, radius km for High/Very_High_density_urban locations] - ARRAY<FLOT64>}, kring:{distances: [number of layers for Remote/Rural/Low_density_urban, number of layers for Medium_density_urban, number of layers for High/Very_High_density_urban locations] - INT64} - ARRAY<INT64>}, isoline : {mode: type of transport. Supported: 'walk', 'car' - [type for Remote/Rural/Low_density_urban, type for Medium_density_urban, type for High/Very_High_density_urban locations] ARRAY<STRING>, time: range of the isoline in seconds - [seconds for Remote/Rural/Low_density_urban, seconds for Medium_density_urban, seconds for High/Very_High_density_urban locations] ARRAY<INT64>, api_base_url : url of the API where the customer account is stored - STRING, api_access_token: an that is allowed to use the LDS API - STRING}.

This procedure is the second step of the Cannibalization analysis workflow. It takes as input the generated table from and the location of the new store, and estimates the overlap of areas and spatial features that the new store would have with the existing stores included into the generated table.

For the isoline method, the use of this procedure requires providing authorization credentials. Two parameters are needed: api_base_url and api_access_token. Both the API base url and your API access token can be accessed through the developers section of the CARTO user interface. Please check our for more details.

options: JSON A JSON string containing the required parameters for the specified method. For buffer: {distances: [radius km for Remote/Rural/Low_density_urban, radius km for Medium_density_urban, radius km for High/Very_High_density_urban locations] - ARRAY<FLOT64>}, kring:{distances: [number of layers for Remote/Rural/Low_density_urban, number of layers for Medium_density_urban, number of layers for High/Very_High_density_urban locations] - ARRAY<INT64>}, isoline : {mode: type of transport. Supported: 'walk', 'car' - [type for Remote/Rural/Low_density_urban, type for Medium_density_urban, type for High/Very_High_density_urban locations] ARRAY<STRING>, time: range of the isoline in seconds - [seconds for Remote/Rural/Low_density_urban, seconds for Medium_density_urban, seconds for High/Very_High_density_urban locations] ARRAY<INT64>, api_base_url : url of the API where the customer account is stored - STRING, api_access_token: an that is allowed to use the LDS API - STRING}.

This procedure runs the first step in the Twin Areas analysis, which can be used to find the most similar (a.k.a the twin areas) amongst the cells in a target area with respect to an origin cell of interest based on a set of input variables. For both the origin and the target cells, this procedure transforms the input data by standardizing the numerical variables and creating a standardized indicator matrix for the categorical variables and then it creates a model using the processed target data as input. More details on the data processing can be found in the procedure, which is used to return the transformed data.

Both the origin and target cells should be provided in grid format (Quadbin or H3) at the same resolution. We recommend using the procedure to prepare the data in the format expected by this procedure.

INT64 Number of principal components to keep as defined in

FLOAT64 as defined in

STRING as defined in

Procedure to obtain the twin areas for a given origin location in a target area. The procedure first computes a similarity score as the Euclidean distance between the principal component scores derived with the of the origin cell and those of each of the cells in the target area and then it derives a similarity skill score that is used to rank the results.

We recommend using the procedure to prepare the data in the format expected by the procedure, which must preceed the use of this procedure.

Procedure to obtain the twin areas for a given origin location in a target area. The function is similar to the where the full description of the method, based on Principal Component Analysis (PCA), can be found . Herein, no PCA is performed, but the user has the capability to specify weights for the features and check the similarities between origin and target area. The sum of weights must be less than or equal to 1. Not all them need to be defined. The undefined features are set to the remaining value divided by their number to reach 1. In the case where weights are provided, then no PCA takes place, and the features are standardized.

The output twin areas are those of the target area considered to be the most similar to the origin location, based on the values of a set of variables. Only variables with numerical values are supported. Both origin and target areas should be provided in grid format (h3, or quadbin) of the same resolution. We recommend using the procedure to prepare the data in the format expected by this procedure.

This project has received funding from the research and innovation programme under grant agreement No 960401.

H3 levels
Quadbin levels
BOOSTED_TREE_REGRESSOR
RANDOM_FOREST_REGRESSOR
LINEAR_REG
model documentation
ML.GLOBAL_EXPLAIN
ML.EVALUATE
this article
https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-xai-overview
area of applicability
Opening a new Pizza Hut location in Honolulu
documentation for Developers
H3 levels
Quadbin levels
API Access Token
Store cannibalization: quantifying the effect of opening new stores on your existing network
documentation for Developers
API Access Token
Store cannibalization: quantifying the effect of opening new stores on your existing network
Find Twin Areas of top-performing stores
BUILD_REVENUE_MODEL_DATA
BUILD_REVENUE_MODEL_DATA
BUILD_REVENUE_MODEL_DATA
BUILD_REVENUE_MODEL
BUILD_REVENUE_MODEL_DATA
BUILD_REVENUE_MODEL
CANNIBALIZATION_OVERLAP
BUILD_CANNIBALIZATION_DATA
BUILD_TWIN_AREAS_MODEL
here
FIND_TWIN_AREAS
BigQuery ML CREATE MODEL statement for PCA models
BigQuery ML CREATE MODEL statement for PCA models
BigQuery ML CREATE MODEL statement for PCA models
Stouffer's Z-score method
Principal Component Analysis (PCA)
European Union’s Horizon 2020
BUILD_TWIN_AREAS_MODEL
Getis-Ord Gi*
BUILD_PCAMIX_DATA
DATAOBS_SUBSCRIPTION_VARIABLES
GRIDIFY_ENRICH
GRIDIFY_ENRICH
GRIDIFY_ENRICH