retail
BETA
This module contains procedures to perform analysis to solve specific retail analytics use cases, such as revenue prediction.
BUILD_REVENUE_MODEL_DATA
Description
This procedure is the first step of the Revenue Prediction analysis workflow. It prepares the model data to be used in the training and prediction phases by performing the following steps:
Polyfill the geometry from the area of interest using the grid type and resolution level.
Enrich the grid cells with the revenue, stores, Data Observatory (DO) variables and custom variables.
Apply a k-ring decay function to the enriched DO variables and custom variables. This operation smooths the features for a given cell by taking into account the values of these features in the neighboring cells (defined as those within the specified k-ring size), applying a scaling factor determined by the decay function of choice.
Create the revenue
revenue_model_data
table (see the output description for more details).Create the revenue
revenue_model_data_stats
table (see the output description for more details).
Input parameters
stores_query
:STRING
query with variables related to the stores to be used in the model, including their revenue per store (required) and other variables (optional). It must contain the columnsrevenue
(revenue of the store),store
(store unique id) andgeom
(the geographical point of the store). The values of these columns cannot beNULL
.stores_variables
:ARRAY<STRUCT<variable STRING, aggregation STRING>>
list with the columns of thestores_query
and their corresponding aggregation method (sum
,avg
,max
,min
,count
) that will be used to enrich the grid cells. It can be set toNULL
. The aggregation of therevenue
(avg
) andstore
(count
) variables should not be included as they are performed anyway.competitors_query
:STRING
query with the competitors information to be used in the model. It must contain the columnscompetitor
(competitor store unique id) andgeom
(the geographical point of the store).aoi_query
:STRING
query with the geography of the area of interest. It must contain a columngeom
with a single area (Polygon or MultiPolygon).grid_type
:STRING
type of the cell grid. Supported values areh3
, andquadbin
.grid_level
:INT64
level or resolution of the cell grid. Check the available H3 levels, and Quadbin levels.kring
:INT64
size of the kring where the decay function will be applied. This value can be 0, in which case no kring will be computed and the decay function won't be applied.decay
:STRING
decay function. Supported values areuniform
,inverse
,inverse_square
andexponential
. If set toNULL
or''
,uniform
is used by default.do_variables
:ARRAY<STRUCT<variable STRING, aggregation STRING>>
variables of the Data Observatory that will be used to enrich the grid cells and therefore train the revenue prediction model in the subsequent step of the Revenue Prediction workflow. For each variable, its slug and the aggregation method must be provided. Usedefault
to use the variable's default aggregation method. Valid aggregation methods are:sum
,avg
,max
,min
,count
. The catalog procedureDATAOBS_SUBSCRIPTION_VARIABLES
can be used to find available variables and their slugs and default aggregation. It can be set toNULL
.do_source
:STRING
name of the location where the Data Observatory subscriptions of the user are stored, in<my-dataobs-project>.<my-dataobs-dataset>
format. If only the<my-dataobs-dataset>
is included, it uses the projectcarto-data
by default. It can be set toNULL
or''
.custom_variables
:ARRAY<STRUCT<variable STRING, aggregation STRING>>
list with the columns of thecustom_query
and their corresponding aggregation method (sum
,avg
,max
,min
,count
) that will be used to enrich the grid cells. It can be set toNULL
.custom_query
:STRING
query that contains a geography columngeom
and the columns with the custom data that will be used to enrich the grid cells. It can be set toNULL
or''
.output_prefix
:STRING
destination prefix for the output tables. It must contain the project, dataset and a prefix which will be prepended to each output tabla name. For example<my-project>.<my-dataset>.<output-prefix>
.
Output
The procedure will output two tables:
Model data table: contains an
index
column with the cell ids and all the enriched columns:revenue_avg
,store_count
,competitor_count
,stores_variables
suffixed by aggregation method, DO variables and custom variables. The name of the table includes the suffix_revenue_model_data
, for example<my-project>.<my-dataset>.<output-prefix>_revenue_model_data
.Model data stats table: contains the
morans_i
value computed for therevenue_avg
column, computed with kring 1 and decayuniform
. The name of the table includes the suffix_revenue_model_data_stats
, for example<my-project>.<my-dataset>.<output-prefix>_revenue_model_data_stats
.
Example
BUILD_REVENUE_MODEL
Description
This procedure is the second step of the Revenue Prediction analysis workflow. It creates the model and its description tables from the input model data (output of the BUILD_REVENUE_MODEL_DATA
procedure). It performs the following steps:
Compute the model from the input query and options.
Compute the revenue
model_shap
,model_stats
tables (see the output description for more details).
Input parameters
revenue_model_data
:STRING
table with the revenue model data generated with theBUILD_REVENUE_MODEL_DATA
procedure.options
:STRING
JSON string to overwrite the model default options. The following fixed options cannot be modified:ENABLE_GLOBAL_EXPLAIN: TRUE
INPUT_LABEL_COLS: ['revenue_avg']
If set to NULL or empty, it will use the default options for each allowed model type, as detailed below. Models currently supported are BOOSTED_TREE_REGRESSOR (DEFAULT), RANDOM_FOREST_REGRESSOR and LINEAR_REG.
BOOSTED_TREE_REGRESSOR:
SUBSAMPLE: 0.85
EARLY_STOP: FALSE
MAX_ITERATIONS: 50
DATA_SPLIT_METHOD: NO_SPLIT
RANDOM_FOREST_REGRESSOR:
COLSAMPLE_BYTREE: 0.5
DATA_SPLIT_METHOD: NO_SPLIT
LINEAR_REG:
EARLY_STOP: FALSE
MAX_ITERATIONS: 50
DATA_SPLIT_METHOD: NO_SPLIT
Check the model documentation for more information.
output_prefix
:STRING
destination prefix for the output tables. It must contain the project, dataset and prefix. For example<my-project>.<my-dataset>.<output-prefix>
.
Output
The procedure will output the following:
Model: contains the trained model to be used for the revenue prediction. The name of the model includes the suffix
_revenue_model
, for example<my-project>.<my-dataset>.<output-prefix>_revenue_model
.Shap table: contains a list of the features and their attribution to the model, computed with
ML.GLOBAL_EXPLAIN
. The name of the table includes the suffix_revenue_model_shap
, for example<my-project>.<my-dataset>.<output-prefix>_revenue_model_shap
.Stats table: contains the model stats (mean_error, variance, etc.), computed with
ML.EVALUATE
. The name of the table includes the suffix_revenue_model_stats
, for example<my-project>.<my-dataset>.<output-prefix>_revenue_model_stats
.
To learn more about how to evaluate the results of your model through the concept of explainability, refer to this article (https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-xai-overview).
Example
PREDICT_REVENUE_AVERAGE
Description
This procedure is the third and final step of the Revenue Prediction analysis workflow. It predicts the average revenue of an additional store located in the specified grid cell. It requires as input the model data (output of the BUILD_REVENUE_MODEL_DATA
procedure) and the trained model (output of the BUILD_REVENUE_MODEL
procedure).
Input parameters
index
:ANY TYPE
cell index where the new store will be located. It can be an H3 or a Quadbin index. For Quadbin, the value should beINT64
whereas for H3 the value should beSTRING
. It can also be'ALL'
, in which case the prediction for all the grid cells of the model data are returned.revenue_model
:STRING
the fully qualifiedmodel
name.revenue_model_data
:STRING
the fully qualifiedrevenue_model_data
table name.candidate_data
:STRING
the fully qualified 1-row table containing the values for any of thestores_variables
to be aggregated with the current values in the grid cell(s) specified by theindex
parameter. If set toNULL
, only thestore_count
variable is considered, with 1 store added in the grid cell(s) specified by theindex
parameter.stores_variables
:ARRAY<STRUCT<variable STRING, aggregation STRING>>
list with the columns of thestores_query
and their corresponding aggregation method (sum
,avg
,max
,min
,count
) that will be used for prediction. It can be set toNULL
.
Output
The procedure will output the index
, predicted_revenue_avg
value in the cell (in the same units of the revenue
column), and shap_values
, an array of key value pairs with the shap values of the features for each prediction. It also includes a baseline_prediction
, which is the expected revenue without considering the impact of any other features.
Example
FIND_WHITESPACE_AREAS
Description
This is a postprocessing step that may be used after completing a Revenue Prediction analysis workflow. It allows you to identify cells with the highest potential revenue (whitespaces), while satisfying a series of criteria (e.g. presence of competitors).
It requires as input the model data (output of the BUILD_REVENUE_MODEL_DATA
procedure) and the trained model (output of the BUILD_REVENUE_MODEL
procedure), as well as a query with points to use as generators for the area of applicability of the model, plus a series of optional filters.
A cell is eligible to be considered a whitespace if it complies with the filtering criteria (minimum revenue, presence of competitors, etc.) and is within the area of applicability of the revenue model provided.
Input parameters
revenue_model
:STRING
with the fully qualifiedmodel
name.revenue_model_data
:STRING
with the fully qualifiedmodel_data
table name.generator_query
:STRING
query with the location of a set of generator points as a geography column namedgeom
. The algorithm will look for whitespaces in the surroundings of these locations, therefore avoiding offering results in locations that are not of the interest of the user. Good options to use as generator locations are, for instance, the location of the stores and competitors, or a collection of POIs that are known to drive commercial activity to an area.aoi_query
:STRING
query with the geography of the area of interest in which to perform the search. May beNULL
, in which case no spatial filter will be applied.minimum_revenue
:FLOAT64
the minimum revenue to filter results by. May beNULL
, in which case no revenue threshold will be applied.max_results
:INT64
of the maximum number of results, ordered by decreasing predicted revenue. May beNULL
, in which case all eligible cells are returned.with_own_stores
:BOOL
specifying whether to consider cells that already have own stores in them. IfNULL
, defaults toTRUE
.with_competitors
:BOOL
specifying whether to consider cells that already have competitors in them. IfNULL
, defaults toTRUE
.
Output
The procedure will output a table of cells with the following columns:
index
: identifying the H3, or Quadbin cell.predicted_revenue_avg
: average revenue of an additional store located in the grid cell.store_count
: number of own stores present in the grid cell.competitor_count
: number of competitors present in the grid cell.
Example
COMMERCIAL_HOTSPOTS
Description
This procedure is used to locate hotspot areas by calculating a combined Getis-Ord Gi* statistic using a uniform kernel over several variables. The input data should be in either an H3 or Quadbin grid. The individual Gi* statistics are combined using Stouffer's Z-score method, which also allows to introduce individual weights, with the combined statistics following a standard normal distribution. The hotspots are identified as those cells with a positive combined the Gi* statistics which is significant at the specified significance level, i.e. whose p-value is below the p-value threshold (pvalue_thresh
) set by the user.
Input parameters
input
:STRING
name of the table containing the input data. It should include project and dataset, i.e., follow the format<project-id>.<dataset-id>.<table-name>
.output
:STRING
name of the table where the output data will be stored. It should include project and dataset, i.e., follow the format<project-id>.<dataset-id>.<table-name>
. If NULL, the procedure will return the output but it will not be persisted.index_column
:STRING
name of the column containing the H3 or Quadbin indexes.index_type
:STRING
type of the input cell indexes. Supported values are 'h3', or 'quadbin'.variable_columns
:ARRAY<STRING>
names of the columns containing the variables to take into account when computing the combined Gi* statistic.variable_weights
:ARRAY<FLOAT64>
containing the weights associated with each of the variables. These weights can take any value but will be normalized to sum up to 1. If NULL, uniform weights will be consideredkring
:INT64
size of the k-ring (distance from the origin). This defines the area around each cell that will be taken into account to compute its Gi* statistic.pvalue_thresh
: Threshold for the Gi* value significance, ranging from 0 (most significant) to 1 (least significant). It defaults to 0.05. Cells with a p-value above this threshold won't be returned.
Output The output will contain the following columns:
index
:STRING
containing the cell index.combined_gi
:FLOAT64
with the resulting combined Gi*.p_value
:FLOAT64
with the p-value associated with the combined Gi* statistic.
If the output table is not specified when calling the procedure, the result will be returned but it won't be persisted.
Examples
Additional examples
BUILD_CANNIBALIZATION_DATA
Description
This procedure is the first of two from the Cannibalization analysis workflow. It builds the dataset for the existing locations to be used by the procedure CANNIBALIZATION_OVERLAP
to estimate the overlap between existing stores and the potentially new ones.
For each store location, the urbanity level based on CARTO Spatial Features dataset is retrieved.
For each store location, given the radius specified, the cells of the influence area are found.
All cells are enriched with the specified features from Data Observatory subscriptions (e.g. population, footfall, etc.).
A table with store_id, cell_id, and features values are created.
For the isoline
method, the use of this procedure requires providing authorization credentials. Two parameters are needed: api_base_url and api_access_token. Both the API base url and your API access token can be accessed through the developers section of the CARTO user interface. Please check our documentation for Developers for more details.
Input parameters
grid_type
:STRING
type of the cell grid. Supported values areh3
andquadbin
.store_query
:STRING
query with variables related to the stores to be used in the model, including their id and location. It must contain the columnsstore_id
(store unique id) andgeom
(the geographical point of the store). Optionally it can contain a third columncustom_geom
which is the custom Polygon that represents the trade area for each location. It is used only whenmethod
is set tocustom
. The values of these columns cannot beNULL
.resolution
:INT64
level or resolution of the cell grid. Check the available H3 levels and Quadbin levels.method
:STRING
indicates the method of trade area generation. Three options available:buffer
,kring
,isoline
andcustom
. This method applies to all locations provided.do_variables
:ARRAY<STRUCT<variable STRING, aggregation STRING>>
variables of the Data Observatory that will be used to enrich the grid cells and therefore compute the overlap between store locations in the subsequent step of the Cannibalization workflow. For each variable, its slug and the aggregation method must be provided. Usedefault
to use the variable's default aggregation method. Valid aggregation methods are:sum
,avg
,max
,min
,count
. The catalog procedureDATAOBS_SUBSCRIPTION_VARIABLES
can be used to find available variables and their slugs and default aggregation. It can be set to NULL.do_urbanity_index
:STRING
|NULL
urbanity index variable slug_id in a CARTO Spatial Features subscription from the Data Observatory. If set toNULL
then the urbanity is not considered and only onedistance
, the first one from the options arguments is being taken into account.do_source
:STRING
name of the location where the Data Observatory subscriptions of the user are stored, in<my-dataobs-project>.<my-dataobs-dataset>
format. If only the<my-dataobs-dataset>
is included, it uses the projectcarto-data
by default. It can be set to NULL or ''.output_destination
:STRING
destination dataset the output tables. It must contain the project and dataset. For example<my-project>.<my-dataset>
.output_prefix
:STRING
prefix for the output table.options
:JSON
A JSON string containing the required parameters for the specified method. Forbuffer
: {distances
: [radius km for Remote/Rural/Low_density_urban, radius km for Medium_density_urban, radius km for High/Very_High_density_urban locations] -ARRAY<FLOT64>
},kring
:{distances
: [number of layers for Remote/Rural/Low_density_urban, number of layers for Medium_density_urban, number of layers for High/Very_High_density_urban locations] -INT64
} -ARRAY<INT64>
},isoline
: {mode
: type of transport. Supported: 'walk', 'car' - [type for Remote/Rural/Low_density_urban, type for Medium_density_urban, type for High/Very_High_density_urban locations]ARRAY<STRING>
,time
: range of the isoline in seconds - [seconds for Remote/Rural/Low_density_urban, seconds for Medium_density_urban, seconds for High/Very_High_density_urban locations]ARRAY<INT64>
,api_base_url
: url of the API where the customer account is stored -STRING
,api_access_token
: an API Access Token that is allowed to use the LDS API -STRING
}.
Output
This procedure will output one table:
Table containing the store_id, cell_id, distance from store_id (integer), the values for each Data Observatory feature, method type and parameters for each method. The output table can be found at the output destination with name
<output-prefix>_output
. Overall path<my-project>.<my-dataset>.<output-prefix>_output
.
Examples
CANNIBALIZATION_OVERLAP
Description
This procedure is the second step of the Cannibalization analysis workflow. It takes as input the generated table from BUILD_CANNIBALIZATION_DATA
and the location of the new store, and estimates the overlap of areas and spatial features that the new store would have with the existing stores included into the generated table.
For the isoline
method, the use of this procedure requires providing authorization credentials. Two parameters are needed: api_base_url and api_access_token. Both the API base url and your API access token can be accessed through the developers section of the CARTO user interface. Please check our documentation for Developers for more details.
Input parameters
data_table
:STRING
Table with columnsstore_id
,cell_id
,distance
fromstore_id
(integer) and the values for each Data Observatory features.new_locations_query
:STRING
query with store_id and location of new stores. Optionally it can contain a third columncustom_geom
which is the custom Polygon that represents the trade area for each location. It is used only whenmethod
is set tocustom
.method
:STRING
indicates the method of trade area generation. Three options available:buffer
,kring
,isoline
andcustom
. This method applies to all locations provided.do_urbanity_index
:STRING
|NULL
urbanity index variable name from the Data Observatory subscriptions. If set toNULL
then the urbanity is not considered and only onedistance
, the first one from the options arguments is being taken into account.do_source
:STRING
name of the location where the Data Observatory subscriptions of the user are stored, in<my-dataobs-project>.<my-dataobs-dataset>
format. If only the<my-dataobs-dataset>
is included, it uses the projectcarto-data
by default. It can be set toNULL
or''
.output_destination
:STRING
destination dataset for the output tables. It must contain the project, dataset and prefix. For example<my-project>.<my-dataset>
.output_prefix
:STRING
The prefix for each table in the output destination.options
:JSON
A JSON string containing the required parameters for the specified method. Forbuffer
: {distances
: [radius km for Remote/Rural/Low_density_urban, radius km for Medium_density_urban, radius km for High/Very_High_density_urban locations] -ARRAY<FLOT64>
},kring
:{distances
: [number of layers for Remote/Rural/Low_density_urban, number of layers for Medium_density_urban, number of layers for High/Very_High_density_urban locations] -ARRAY<INT64>
},isoline
: {mode
: type of transport. Supported: 'walk', 'car' - [type for Remote/Rural/Low_density_urban, type for Medium_density_urban, type for High/Very_High_density_urban locations]ARRAY<STRING>
,time
: range of the isoline in seconds - [seconds for Remote/Rural/Low_density_urban, seconds for Medium_density_urban, seconds for High/Very_High_density_urban locations]ARRAY<INT64>
,api_base_url
: url of the API where the customer account is stored -STRING
,api_access_token
: an API Access Token that is allowed to use the LDS API -STRING
}.
Output
This procedure will output one table:
Output overlap table which contains the store_id that receives the "cannibalization", store_id that causes the cannibalization, area overlap and features overlap for each Data Observatory features included in the analysis. The output table can be found at the output destination with the name
<output-prefix>_output_overlap
. Overall path<my-project>.<my-dataset>.<output-prefix>_output_overlap
.
Examples
BUILD_TWIN_AREAS_MODEL
Description
This procedure runs the first step in the Twin Areas analysis, which can be used to find the most similar (a.k.a the twin areas) amongst the cells in a target area with respect to an origin cell of interest based on a set of input variables. For both the origin and the target cells, this procedure transforms the input data by standardizing the numerical variables and creating a standardized indicator matrix for the categorical variables and then it creates a Principal Component Analysis (PCA) model using the processed target data as input. More details on the data processing can be found in the BUILD_PCAMIX_DATA
procedure, which is used to return the transformed data.
Both the origin and target cells should be provided in grid format (Quadbin or H3) at the same resolution. We recommend using the GRIDIFY_ENRICH
procedure to prepare the data in the format expected by this procedure.
Input parameters
origin_query
:STRING
the query or the fully qualified name of the table containing the origin cells data.target_query
:STRING
the query or the fully qualified name of the table containing the target cells data.index_column
:STRING
the name of the index column.output_prefix
:STRING
destination prefix for the output tables. It must contain the project, dataset and prefix. For example<my-project>.<my-dataset>.<output-prefix>
.options
:STRING
the JSON string containing the available options as described in the table below.OptionDescriptioncategorical_variables
ARRAY<STRING>
The array containing the names of the categorical (a.k.a. qualitative) columnsNUM_PRINCIPAL_COMPONENTS
INT64
Number of principal components to keep as defined in BigQuery ML CREATE MODEL statement for PCA modelsPCA_EXPLAINED_VARIANCE_RATIO
FLOAT64
as defined in BigQuery ML CREATE MODEL statement for PCA modelsPCA_SOLVER
STRING
as defined in BigQuery ML CREATE MODEL statement for PCA models
Return type
The procedure will output the following:
Target data table: contains the transformed data for the data in the target cells, that will be used to create the PCA model. The name of the table includes the suffix _target_data, for example
<my-project>.<my-dataset>.<output-prefix>_target_data
.Origin data table: contains the transformed data for the data in the origin cells. The name of the table includes the suffix _origin_data, for example
<my-project>.<my-dataset>.<output-prefix>_origin_data
.The PCA model. The model name includes the suffix _model, for example
<my-project>.<my-dataset>.<output-prefix>_model
.
Examples
FIND_TWIN_AREAS
Description
Procedure to obtain the twin areas for a given origin location in a target area. The procedure first computes a similarity score as the Euclidean distance between the principal component scores derived with the BUILD_TWIN_AREAS_MODEL
of the origin cell and those of each of the cells in the target area and then it derives a similarity skill score that is used to rank the results.
We recommend using the GRIDIFY_ENRICH
procedure to prepare the data in the format expected by the BUILD_TWIN_AREAS_MODEL
procedure, which must preceed the use of this procedure.
Input parameters
twin_areas_model
:STRING
the fully qualified name of the Principal Component Analysis model based on the target data.index_column
:STRING
the name of the index column.output_table
:STRING
the fully qualified name of the output table. It must contain the project, dataset and prefix:<my-project>.<my-dataset>.<my-table>
.options
:STRING
the JSON string containing the available options as described in the table below.OptionDescriptionorigin_index
STRING
the index of the origin cell. Must be specified if origin_coords is not specified.origin_coords
ARRAY<FLOAT64>
the longitude and the latitude of the origin location. Must be specified if origin_index is not specified.max_results
INT64
the maximum number of results to be returned. If this is set toNULL
or it's greater than the number of target cells with a positive similarity skill score, only these cells are returned.
Return type
A table containing in each row the index of the target cells (index_column
) and its associated similarity_score
and similarity_skill_score
. The similarity_score
corresponds to the distance between the origin and target cell in the Principal Component (PC) Scores space; the similarity_skill_score
for a given target cell *t*
is computed as 1 - similarity_score(*t*) / similarity_score(<*t*>)
, where <*t*>
is the average target cell, computed by averaging each retained PC score for all the target cells. This similarity_skill_score
represents a relative measure: the score will be positive if and only if the target cell is more similar to the origin than the mean vector data, with a score of 1 meaning perfect matching or zero distance. Therefore, a target cell with a larger score will be more similar to the origin under this scoring rule. Only the cells in the target area for which the similarity skill score is positive are returned.
Examples
In the following query, the index of the cell containing the origin location is specified:
Here, instead, the longitude and the latitude of the origin location are specified:
Additional examples
FIND_TWIN_AREAS_WEIGHTED
Description
Procedure to obtain the twin areas for a given origin location in a target area. The function is similar to the FIND_TWIN_AREAS
where the full description of the method, based on Principal Component Analysis (PCA), can be found here. Herein, no PCA is performed, but the user has the capability to specify weights for the features and check the similarities between origin and target area. The sum of weights must be less than or equal to 1. Not all them need to be defined. The undefined features are set to the remaining value divided by their number to reach 1. In the case where weights are provided, then no PCA takes place, and the features are standardized.
The output twin areas are those of the target area considered to be the most similar to the origin location, based on the values of a set of variables. Only variables with numerical values are supported. Both origin and target areas should be provided in grid format (h3, or quadbin) of the same resolution. We recommend using the GRIDIFY_ENRICH procedure to prepare the data in the format expected by this procedure.
Input parameters
origin_query
:STRING
the query or the fully qualified name of the table containing the origin cells data.target_query
:STRING
the query or the fully qualified name of the table containing the target cells data.index_column
:STRING
the name of the index column.output_table
:STRING
the fully qualified prefix for the output tables, e.g. '-<my.dataset>'.options
:STRING
the JSON string containing the available options as described in the table below.OptionDescriptionmax_results
INT64
of the maximum number of twin areas returned. If set toNULL
, all target cells are returnedweights
ARRAY<STRUCT<name STRING, value FLOAT64>>
the weights on the features. If set toNULL
, then all features are treated equally. This parameter is considered only if the length of weights is greater or equal than one. The sum of weights must be less than or equal to 1. If less weights than the number of features are provided, then for the undefined features, the remaining 1 - sum(weights) is distributed evenly
Output
The procedure outputs a table containing in each row the index of the target cells (index_column
) and its associated similarity_score
and similarity_skill_score
. The similarity_score
corresponds to the distance between the origin and target cell taking into account the user defined weights; the similarity_skill_score
for a given target cell *t*
is computed as 1 - similarity_score(*t*) / similarity_score(<*t*>)
, where <*t*>
is the average target cell, computed by averaging each feature for all the target cells. This similarity_skill_score
represents a relative measure: the score will be positive if and only if the target cell is more similar to the origin than the mean vector data, with a score of 1 meaning perfect matching or zero distance. Therefore, a target cell with a larger score will be more similar to the origin under this scoring rule.
Example
In this example, we are using default equal weights:
In this example, we are using user-specified weights:
Last updated