statistics
ADVANCED
This module contains functions to perform spatial statistics calculations.
P_VALUE(z_score)
Description
This function computes the one tail p-value (upper-tail test) of a given z-score assuming the population follows a normal distribution where the mean is 0 and the standard deviation is 1. The z-score is a measure of how many standard deviations below or above the population mean a value is. It gives you an idea of how far from the mean a data point is. The p-value is the probability that a randomly sampled point has a value at least as extreme as the point whose z-score is being tested.
z_score
:FLOAT64
Return type
FLOAT64
Example
carto-un
carto-un-eu
manual
SELECT `carto-un`.carto.P_VALUE(u) as p_value
FROM UNNEST([-2,-1,0,1,2]) u;
​
-- [ 0.9772499371127437, 0.8413447361676363, 0.49999999949999996, 0.15865526383236372, 0.02275006288725634]
SELECT `carto-un-eu`.carto.P_VALUE(u) as p_value
FROM UNNEST([-2,-1,0,1,2]) u;
​
-- [ 0.9772499371127437, 0.8413447361676363, 0.49999999949999996, 0.15865526383236372, 0.02275006288725634]
SELECT carto.P_VALUE(u) as p_value
FROM UNNEST([-2,-1,0,1,2]) u;
​
-- [ 0.9772499371127437, 0.8413447361676363, 0.49999999949999996, 0.15865526383236372, 0.02275006288725634]
KNN(points, k)
Description
points
:ARRAY<STRUCT<geoid STRING, geo GEOGRAPHY>>
input data with unique id and geography.k
:INT64
number of nearest neighbors (positive, typically small).
Return type
ARRAY<STRUCT<geo GEOGRAPHY, geo_knn GEOGRAPHY, geoid STRING, geoid_knn STRING, distance FLOAT64, knn INT64>>
where:
geo
: the geometry of the considered point.geo_knn
: the k-nearest neighbor point.geoid
: the unique identifier of the considered point.geoid_knn
: the unique identifier of the k-nearest neighbor.distance
: the k-nearest neighbor distance to the considered point.knn
: the k-order (knn)
Example
carto-un
carto-un-eu
manual
SELECT *
FROM UNNEST((
SELECT `carto-un`.carto.KNN(myarray, 10)
FROM (
SELECT ARRAY_AGG(STRUCT(format('%08x', uid),position_geom)) myarray
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY hour) AS uid, position_geom
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
)
))
ORDER BY geoid
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.8225 41.97117)",
-- "geoid": "00000001",
-- "geoid_knn": "00000624",
-- "distance": "10.804663098937658",
-- "knn": "1"
--},
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.823 41.9712)",
-- "geoid": "00000001",
-- "geoid_knn": "00000666",
-- "distance": "30.66917920746894",
-- "knn": "2"
--},
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.82298 41.9713)",
-- "geoid": "00000001",
-- "geoid_knn": "00000618",
-- "distance": "31.863463704968353",
-- "knn": "3"
--},
-- ...
SELECT *
FROM UNNEST((
SELECT `carto-un-eu`.carto.KNN(myarray, 10)
FROM (
SELECT ARRAY_AGG(STRUCT(format('%08x', uid),position_geom)) myarray
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY hour) AS uid, position_geom
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
)
))
ORDER BY geoid
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.8225 41.97117)",
-- "geoid": "00000001",
-- "geoid_knn": "00000624",
-- "distance": "10.804663098937658",
-- "knn": "1"
--},
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.823 41.9712)",
-- "geoid": "00000001",
-- "geoid_knn": "00000666",
-- "distance": "30.66917920746894",
-- "knn": "2"
--},
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.82298 41.9713)",
-- "geoid": "00000001",
-- "geoid_knn": "00000618",
-- "distance": "31.863463704968353",
-- "knn": "3"
--},
-- ...
SELECT *
FROM UNNEST((
SELECT carto.KNN(myarray, 10)
FROM (
SELECT ARRAY_AGG(STRUCT(format('%08x', uid),position_geom)) myarray
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY hour) AS uid, position_geom
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
)
))
ORDER BY geoid
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.8225 41.97117)",
-- "geoid": "00000001",
-- "geoid_knn": "00000624",
-- "distance": "10.804663098937658",
-- "knn": "1"
--},
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.823 41.9712)",
-- "geoid": "00000001",
-- "geoid_knn": "00000666",
-- "distance": "30.66917920746894",
-- "knn": "2"
--},
--{
-- "geo": "POINT(2.82263 41.97118)",
-- "geo_knn": "POINT(2.82298 41.9713)",
-- "geoid": "00000001",
-- "geoid_knn": "00000618",
-- "distance": "31.863463704968353",
-- "knn": "3"
--},
-- ...
LOF_TABLE(src_fullname STRING, target_fullname STRING, geoid_column_name STRING, geo_column_name STRING, k INT64)
Description
This function computes the Local Outlier Factor for each point of a specified column and stores the result in an output table along with the other input columns.
src_fullname
:STRING
The input table. ASTRING
of the formproject-id.dataset-id.table-name
is expected. Theproject-id
can be omitted (in which case the default one will be used).target_fullname
:STRING
The resulting table where the LOF will be stored. ASTRING
of the formproject-id.dataset-id.table-name
is expected. Theproject-id
can be omitted (in which case the default one will be used). The dataset must exist and the caller needs to have permissions to create a new table in it. The process will fail if the target table already exists.geoid_column_name
:STRING
The column name with a unique identifier for each point.geo_column_name
:STRING
The column name containing the points.lof_target_column_name
:STRING
The column name where the resulting Local Outlier Factor will be stored in the output table.k
:INT64
Number of nearest neighbors (positive, typically small).
Example
carto-un
carto-un-eu
manual
CALL `carto-un`.carto.LOF_TABLE(
'bigquery-public-data.new_york_subway.stations',
'myproject.mydataset.my_output_table',
'station_id',
'station_geom',
'lof',
10
);
-- The table `'myproject.mydataset.my_output_table` will be created
-- with an extra column containing the `lof` value.
CALL `carto-un-eu`.carto.LOF_TABLE(
'bigquery-public-data.new_york_subway.stations',
'myproject.mydataset.my_output_table',
'station_id',
'station_geom',
'lof',
10
);
-- The table `'myproject.mydataset.my_output_table` will be created
-- with an extra column containing the `lof` value.
CALL carto.LOF_TABLE(
'bigquery-public-data.new_york_subway.stations',
'myproject.mydataset.my_output_table',
'station_id',
'station_geom',
'lof',
10
);
-- The table `'myproject.mydataset.my_output_table` will be created
-- with an extra column containing the `lof` value.
LOF(points, k)
Description
points
:ARRAY<STRUCT<geoid STRING, geo GEOGRAPHY>>
input data points with unique id and geography.k
:INT64
number of nearest neighbors (positive, typically small).
Return type
ARRAY<STRUCT<geo GEOGRAPHY, geoid GEOGRAPHY, lof FLOAT64>>
where:geo
: the geometry of the considered point.geoid
: the unique identifier of the considered point.lof
: the Local Outlier Factor score.
Example
carto-un
carto-un-eu
manual
SELECT *
FROM UNNEST((
SELECT `carto-un`.carto.LOF(myarray, 10)
FROM (
SELECT ARRAY_AGG(STRUCT(format('%08x', uid),position_geom)) myarray
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY hour) AS uid, position_geom
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
)
))
ORDER BY geoid
-- {"geo": POINT(2.82263 41.97118), "geoid": "00000001", "lof": 1.3217599116891428}
-- {"geo": POINT(2.35705 41.49786), "geoid": "00000002", "lof": 1.235551000737416}
-- {"geo": POINT(2.13967 41.3838), "geoid": "00000003", "lof": 1.1305674032876687}
-- ...
SELECT *
FROM UNNEST((
SELECT `carto-un-eu`.carto.LOF(myarray, 10)
FROM (
SELECT ARRAY_AGG(STRUCT(format('%08x', uid),position_geom)) myarray
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY hour) AS uid, position_geom
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
)
))
ORDER BY geoid
-- {"geo": POINT(2.82263 41.97118), "geoid": "00000001", "lof": 1.3217599116891428}
-- {"geo": POINT(2.35705 41.49786), "geoid": "00000002", "lof": 1.235551000737416}
-- {"geo": POINT(2.13967 41.3838), "geoid": "00000003", "lof": 1.1305674032876687}
-- ...
SELECT *
FROM UNNEST((
SELECT carto.LOF(myarray, 10)
FROM (
SELECT ARRAY_AGG(STRUCT(format('%08x', uid),position_geom)) myarray
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY hour) AS uid, position_geom
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
)
))
ORDER BY geoid
-- {"geo": POINT(2.82263 41.97118), "geoid": "00000001", "lof": 1.3217599116891428}
-- {"geo": POINT(2.35705 41.49786), "geoid": "00000002", "lof": 1.235551000737416}
-- {"geo": POINT(2.13967 41.3838), "geoid": "00000003", "lof": 1.1305674032876687}
-- ...
GFUN(points)
Description
points
:ARRAY<GEOGRAPHY>
input data points.
Return type
ARRAY<STRUCT<distance FLOAT64, gfun_G FLOAT64, gfun_ev FLOAT64>>
where:
distance
: the nearest neighbors distances.gfun_G
: the empirical G evaluated for each distance in the support.gfun_ev
: the theoretical Poisson G evaluated for each distance in the support.
Example
carto-un
carto-un-eu
manual
SELECT *
FROM UNNEST((
SELECT `carto-un`.carto.GFUN(myarray)
FROM (
SELECT ARRAY_AGG(position_geom) myarray
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
))
ORDER BY distance
--{
-- "distance": "38.599968853183",
-- "gfun_G": "0.4319167389418907",
-- "gfun_ev": "4.037383876246414E-4"
--},
--{
-- "distance": "77.199937706366",
-- "gfun_G": "0.5771899392888118",
-- "gfun_ev": "0.0016139757856029613"
--},
--{
-- "distance": "115.799906559549",
-- "gfun_G": "0.6522116218560278",
-- "gfun_ev": "0.003627782844736638"
--},
-- ...
SELECT *
FROM UNNEST((
SELECT `carto-un-eu`.carto.GFUN(myarray)
FROM (
SELECT ARRAY_AGG(position_geom) myarray
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
))
ORDER BY distance
--{
-- "distance": "38.599968853183",
-- "gfun_G": "0.4319167389418907",
-- "gfun_ev": "4.037383876246414E-4"
--},
--{
-- "distance": "77.199937706366",
-- "gfun_G": "0.5771899392888118",
-- "gfun_ev": "0.0016139757856029613"
--},
--{
-- "distance": "115.799906559549",
-- "gfun_G": "0.6522116218560278",
-- "gfun_ev": "0.003627782844736638"
--},
-- ...
SELECT *
FROM UNNEST((
SELECT carto.GFUN(myarray)
FROM (
SELECT ARRAY_AGG(position_geom) myarray
FROM `bigquery-public-data.catalonian_mobile_coverage.mobile_data_2015_2017`
WHERE date = '2017-12-31'
)
))
ORDER BY distance
--{
-- "distance": "38.599968853183",
-- "gfun_G": "0.4319167389418907",
-- "gfun_ev": "4.037383876246414E-4"
--},
--{
-- "distance": "77.199937706366",
-- "gfun_G": "0.5771899392888118",
-- "gfun_ev": "0.0016139757856029613"
--},
--{
-- "distance": "115.799906559549",
-- "gfun_G": "0.6522116218560278",
-- "gfun_ev": "0.003627782844736638"
--},
-- ...
CREATE_SPATIAL_COMPOSITE_SUPERVISED(input_query,index_column, output_prefix, options)
Description
This procedure derives a spatial composite score as the standardized residuals of a regression model which is used to detect areas of under- and over-prediction. The response variable should be measurable and correlated with the set of variables defining the score. For each data point. the residual is defined as the observed value minus the predicted value. Rows with a NULL value in any of the individual variables are dropped.
Input parameters
input_query
:STRING
the query to the data used to compute the spatial composite. It must contain all the individual variables that should be included in the computation of the composite as well as a unique geographic id for each row. A qualified table name can be given as well, e.g. 'project-id.dataset-id.table-name'.index_column
:STRING
the name of the column with the unique geographic identifier.output_prefix
:STRING
the prefix for the output table. It should include project and dataset, e.g. 'project-id.dataset-id.table-name'.options
:STRING
containing a valid JSON with the different options. Valid options are described below.model_transform
:STRING
containing the TRANSFORM clause in a BigQuery ML CREATE MODEL statement. If NULL no TRANSFORM clause is applied.model_options
:JSON
with the different options allowed by BigQuery ML CREATE MODEL statement for regression models. Any model is allowed as long as it can deal with numerical inputs for the response variable. At least theINPUT_LABEL_COLS
andMODEL_TYPE
parameters must be specified.r2_thr
:FLOAT64
the minimum allowed value for the R2 model score. If the R2 of the regression model is lower than this threshold this implies poor fitting and a warning is raised. The default value is 0.5.return_range
:ARRAY<FLOAT64>
The user-defined normalization range of the spatial composite score, e.g [0.0,1.0]. The default value is NULL. Ignored ifbucketize_method
is specified.bucketize_method
:STRING
the method used to discretize the spatial composite score. The default value is NULL. Possible options are:- EQUAL_INTERVALS: the values of the spatial composite score are discretized into buckets of equal widths.
- QUANTILES: the values of the spatial composite score are discretized into buckets based on quantiles.
- JENKS: the values of the spatial composite score are discretized into buckets obtained using k-means clustering.
nbuckets
:INT64
the number of buckets used when a bucketization method is specified. The default number of buckets is selected using Freedman and Diaconis’s (1981) rule. For bucketize_method = JENKS the maximum value is 100, aka the maximum number of clusters allowed by BigQuery with k-means clustering.
Return type
The output table with the following columns:
index_column
:STRING
the unique geographic identifier.spatial_score
:FLOAT64
the value of the composite score.
Example
carto-un
carto-un-eu
manual
CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_SUPERVISED(
'SELECT * FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"model_transform":[
"revenue_change",
"fempop_15_44, public_transport, education, pois, urbanity"
],
"model_options":{
"MODEL_TYPE":"LINEAR_REG",
"INPUT_LABEL_COLS":['revenue_change'],
"DATA_SPLIT_METHOD":"no_split",
"OPTIMIZE_STRATEGY":"NORMAL_EQUATION",
"CATEGORY_ENCODING_METHOD":"ONE_HOT_ENCODING",
"ENABLE_GLOBAL_EXPLAIN":true
},
"r2_thr":0.4,
"bucketize_method":"JENKS"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_SUPERVISED(
'SELECT * FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"model_transform":[
"revenue_change",
"fempop_15_44, public_transport, education, pois, urbanity"
],
"model_options":{
"MODEL_TYPE":"LINEAR_REG",
"INPUT_LABEL_COLS":['revenue_change'],
"DATA_SPLIT_METHOD":"no_split",
"OPTIMIZE_STRATEGY":"NORMAL_EQUATION",
"CATEGORY_ENCODING_METHOD":"ONE_HOT_ENCODING",
"ENABLE_GLOBAL_EXPLAIN":true
},
"r2_thr":0.4,
"bucketize_method":"JENKS"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL carto.CREATE_SPATIAL_COMPOSITE_SUPERVISED(
'SELECT * FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"model_transform":[
"revenue_change",
"fempop_15_44, public_transport, education, pois, urbanity"
],
"model_options":{
"MODEL_TYPE":"LINEAR_REG",
"INPUT_LABEL_COLS":['revenue_change'],
"DATA_SPLIT_METHOD":"no_split",
"OPTIMIZE_STRATEGY":"NORMAL_EQUATION",
"CATEGORY_ENCODING_METHOD":"ONE_HOT_ENCODING",
"ENABLE_GLOBAL_EXPLAIN":true
},
"r2_thr":0.4,
"bucketize_method":"JENKS"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(input_query, index_column, output_prefix, options)
Description
This procedure combines (spatial) variables into a meaningful composite score. The composite score can be derived using different methods, scaling and aggregation functions and weights. Rows with a NULL value in any of the model predictors are dropped.
Input parameters
input_query
:STRING
the query to the data used to compute the spatial composite. It must contain all the individual variables that should be included in the computation of the composite as well as a unique geographic id for each row. A qualified table name can be given as well, e.g. 'project-id.dataset-id.table-name'.index_column
:STRING
the name of the column with the unique geographic identifier.output_prefix
:STRING
the prefix for the output table. It should include project and dataset, e.g. 'project-id.dataset-id.table-name'.options
:STRING
containing a valid JSON with the different options. Valid options are described below. If options is set to NULL then all options are set to default values, as specified in the table below.scoring_method
:STRING
Possible options are ENTROPY, CUSTOM_WEIGHTS, FIRST_PC. With the ENTROPY method the spatial composite is derived as the weighted sum of the proportion of the min-max scaled individual variables, where the weights are based on the entropy of the proportion of each variable. Only numerical variables are allowed. With the CUSTOM_WEIGHTS method, the spatial composite is computed by first scaling each individual variable and then aggregating them according to user-defined scaling and aggregation methods and individual weights. Depending on the scaling parameter, both numerical and ordinal variables are allowed (categorical and boolean variables need to be transformed to ordinal). With the FIRST_PC method, the spatial composite is derived from a Principal Component Analysis as the first principal component score. Only numerical variables are allowed.weights
:STRUCT
the (optional) weights for each variable used to compute the spatial composite when scoring_method is set to CUSTOM_WEIGHTS, passed as{"name":value, …}
. If a different scoring method is selected, then this input parameter is ignored. If specified, the sum of the weights must be lower than 1. If no weights are specified, equal weights are assumed. If weights are specified only for some variables and the sum of weights is less than 1, the remainder is distributed equally between the remaining variables. If weights are specified for all the variables and the sum of weights is less than 1, the remainder is distributed equally between all the variables.scaling
:STRING
the user-defined scaling when the scoring_method is set to CUSTOM_WEIGHTS. Possible options are:- MIN_MAX_SCALER: data is rescaled into the range [0,1] based on minimum and maximum values. Only numerical variables are allowed.
- STANDARD_SCALER: data is rescaled by subtracting the mean value and dividing the result by the standard deviation. Only numerical variables are allowed.
- RANKING: data is replaced by its percent rank, that is by values ranging from 0 lowest to 1. Both numerical and ordinal variables are allowed (categorical and boolean variables need to be transformed to ordinal).
- DISTANCE_TO_TARGET_MIN(_MAX,_AVG):data is rescaled by dividing by the minimum, maximum, or mean of all the values. Only numerical variables are allowed.
- PROPORTION: data is rescaled by dividing by the sum total of all the values. Only numerical variables are allowed.
aggregation
:STRING
the aggregation function used when the scoring_method is set to CUSTOM_WEIGHTS. Possible options are:- LINEAR: the spatial composite is derived as the weighted sum of the scaled individual variables.
- GEOMETRIC: the spatial composite is given by the product of the scaled individual variables, each to the power of its weight.
correlation_var
:STRING
when scoring_method is set to FIRST_PC, the spatial score will be positively correlated with the selected variable (i.e. the sign the spatial score is set such that the correlation between the selected variable and the first principal component score is positive).correlation_thr
:FLOAT64
the minimum absolute value of the correlation between each individual variable and the first principal component score when scoring_method is set to FIRST_PC.return_range
:ARRAY<FLOAT64>
the user-defined normalization range of the spatial composite score, e.g [0.0,1.0]. Ignored ifbucketize_method
is specified.bucketize_method
:STRING
the method used to discretize the spatial composite score. Possible options are:- EQUAL_INTERVALS: the values of the spatial composite score are discretized into buckets of equal widths.
- QUANTILES: the values of the spatial composite score are discretized into buckets based on quantiles.
- JENKS: the values of the spatial composite score are discretized into buckets obtained using k-means clustering.
nbuckets
:INT64
the number of buckets used when a bucketization method is specified. The default number of buckets is selected using Freedman and Diaconis’s (1981) rule. For bucketize_method = JENKS the maximum value is 100, aka the maximum number of clusters allowed by BigQuery with k-means clustering.
Option | ENTROPY | CUSTOM_WEIGHTS | FIRST_PC | Valid options | Default value |
scoring_method | Optional | Optional | Optional | ENTROPY, CUSTOM_WEIGHTS, FIRST_PC | ENTROPY |
weights | Ignored | Optional | Ignored | {"name":value…} | NULL |
scaling | Ignored | Optional | Ignored | MIN_MAX_SCALER, STANDARD_SCALER, RANKING, DISTANCE_TO_TARGET_MIN, DISTANCE_TO_TARGET_MAX, DISTANCE_TO_TARGET_AVG, PROPORTION | MIN_MAX_SCALER |
aggregation | Ignored | Optional | Ignored | LINEAR, GEOMETRIC | LINEAR |
correlation_var | Ignored | Optional | Mandatory | - | NULL |
correlation_thr | Ignored | Optional | Optional | - | NULL |
return_range | Ignored | Optional | Optional | - | NULL |
bucketize_method | Ignored | Optional | Optional | EQUAL_INTERVALS, QUANTILES, JENKS | NULL |
nbuckets | Ignored | Optional | Optional | - |
Return type
The output table with the following columns:
index_column
:STRING
the unique geographic identifier.spatial_score
:FLOAT64
the value of the composite score.
Examples
With the
ENTROPY
method:carto-un
carto-un-eu
manual
CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"ENTROPY",
"return_range":[0.0,1.0]
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"ENTROPY",
"return_range":[0.0,1.0]
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"ENTROPY",
"return_range":[0.0,1.0]
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
With the
CUSTOM_WEIGHTS
method:carto-un
carto-un-eu
manual
CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"CUSTOM_WEIGHTS",
"weights":{"fempop_15_44":0.2,"education":0.1,"urbanity_ordinal":0.1,"pois":0.1},
"scaling":"RANKING",
"aggregation":"LINEAR",
"bucketize_method":"JENKS"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"CUSTOM_WEIGHTS",
"weights":{"fempop_15_44":0.2,"education":0.1,"urbanity_ordinal":0.1,"pois":0.1},
"scaling":"RANKING",
"aggregation":"LINEAR",
"bucketize_method":"JENKS"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"CUSTOM_WEIGHTS",
"weights":{"fempop_15_44":0.2,"education":0.1,"urbanity_ordinal":0.1,"pois":0.1},
"scaling":"RANKING",
"aggregation":"LINEAR",
"bucketize_method":"JENKS"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
With the
FIRST_PC
method:carto-un
carto-un-eu
manual
CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"FIRST_PC",
"correlation_var":"fempop_15_44",
"correlation_thr":0.6,
"bucketize_method":"QUANTILES"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"FIRST_PC",
"correlation_var":"fempop_15_44",
"correlation_thr":0.6,
"bucketize_method":"QUANTILES"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT * EXCEPT(geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'geoid',
'<project-id>.<dataset-id>.<table-name>',
'''{
"scoring_method":"FIRST_PC",
"correlation_var":"fempop_15_44",
"correlation_thr":0.6,
"bucketize_method":"QUANTILES"
}
'''
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CRONBACH_ALPHA_COEFFICIENT(input_query, output_prefix)
Description
This procedure computes the Cronbach’s alpha coefficient for a set of (spatial) variables. This coefficient can be used as a measure of internal consistency or reliability of the data, based on the strength of correlations between individual variables. Cronbach’s alpha reliability coefficient normally ranges between 0 and 1 but there is actually no lower limit to the coefficient. Higher alpha (closer to 1) vs lower alpha (closer to 0) means higher vs lower consistency, with usually 0.65 being the minimum acceptable value of internal consistency. Rows with a NULL value in any of the individual variables are dropped.
Input parameters
input_query
:STRING
the query to the data used to compute the coefficient. It must contain all the individual variables that should be included in the computation of the coefficient. A qualified table name can be given as well, e.g. 'project-id.dataset-id.table-name'.output_prefix
:STRING
the prefix for the output table. It should include project and dataset, e.g. 'project-id.dataset-id.table-name'.
Return type
The output table with the following columns:
cronbach_alpha_coef
:FLOAT64
the computed Cronbach Alpha coefficient.k
:INT64
the number of the individual variables used to compute the composite.mean_var
:FLOAT64
the mean variance of all individual variables.mean_cov
:FLOAT64
the mean covariance of all individual variables.
Example
carto-un
carto-un-eu
manual
CALL `carto-un`.carto.CRONBACH_ALPHA_COEFFICIENT(
'SELECT * EXCEPT(geoid, geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'<project-id>.<dataset-id>.<table-name>'
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL `carto-un-eu`.carto.CRONBACH_ALPHA_COEFFICIENT(
'SELECT * EXCEPT(geoid, geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'<project-id>.<dataset-id>.<table-name>'
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
CALL carto.CRONBACH_ALPHA_COEFFICIENT(
'SELECT * EXCEPT(geoid, geom, revenue_change, urbanity, urbanity_ordinal) FROM `cartobq.docs.spatial_scoring_input`',
'<project-id>.<dataset-id>.<table-name>'
)
-- Table `<my-project>.<my-dataset>.<table-name>` will be created
GWR_GRID(input_table, features_columns, label_column, cell_column, cell_type, kring_distance, kernel_function, fit_intercept, output_table)
Description
Geographically weighted regression (GWR) models local relationships between spatially varying predictors and an outcome of interest using a local least squares regression.
This procedures performs a local least squares regression for every input cell. In each regression, the data of each cell and that of the neighboring cells, defined by the
kring_distance
parameter, will be taken into account. The data of the neighboring cells will be assigned a lower weight the further they are from the origin cell, following the function specified in the kernel_function
.input_table
:STRING
name of the source dataset. It should be a quoted qualified table with project and dataset:<project-id>.<dataset-id>.<table-name>
.features_columns
:ARRAY<STRING>
array of column names frominput_table
to be used as features in the GWR.label_column
:STRING
name of the target variable column.cell_column
:STRING
name of the column containing the cell ids.cell_type
:STRING
spatial index type as 'h3', 'quadbin'.kring_distance
:INT64
distance of the neighboring cells whose data will be included in the local regression of each cell.kernel_function
:STRING
kernel function to compute the spatial weights across the kring. Available functions are: 'uniform', 'triangular', 'quadratic', 'quartic' and 'gaussian'.fit_intercept
:BOOL
whether to calculate the interception of the model or to force it to zero if, for example, the input data is already supposed to be centered. If NULL,fit_intercept
will be considered asTRUE
.output_table
:STRING
name of the output table. It should be a quoted qualified table with project and dataset:<project-id>.<dataset-id>.<table-name>
. The process will fail if the target table already exists. If NULL, the result will be returned directly by the query and not persisted.
Output
The output table will contain a column with the cell id, a column for each feature column containing its corresponding coefficient estimate and one extra column for intercept if
fit_intercept
is TRUE
.Examples
carto-un
carto-un-eu
manual