
Analytics Toolbox for BigQuery
statistics
This module contains functions to perform spatial statistics calculations.
GETIS_ORD_H3
Description
This function computes the Getis-Ord Gi* statistic for each H3 index in the input array.
input
:ARRAY<STRUCT<index STRING, value FLOAT64>>
input data with the indexes and values of the cells.size
:INT64
size of the H3 kring (distance from the origin). This defines the area around each index cell that will be taken into account to compute its Gi* statistic.kernel
:STRING
kernel function to compute the spatial weights across the kring. Available functions are: uniform, triangular, quadratic, quartic and gaussian.
Return type
ARRAY<STRUCT<index STRING, gi FLOAT64, p_value FLOAT64>>
Example
|
|
|
|
GETIS_ORD_QUADKEY
Description
This function computes the Getis-Ord Gi* statistic for each quadkey index in the input array.
input
:ARRAY<STRUCT<index STRING, value FLOAT64>>
input data with the indexes and values of the cells.size
:INT64
size of the quadkey kring (distance from the origin). This defines the area around each index cell that will be taken into account to compute its Gi* statistic.kernel
:STRING
kernel function to compute the spatial weights across the kring. Available functions are: uniform, triangular, quadratic, quartic and gaussian.
Return type
ARRAY<STRUCT<index STRING, gi FLOAT64, p_value FLOAT64>>
Example
|
|
|
|
GFUN
Description
This function computes the G-function of a given set of points.
points
:ARRAY<GEOGRAPHY>
input data points.
Return type
ARRAY<STRUCT<distance FLOAT64, gfun_G FLOAT64, gfun_ev FLOAT64>>
where:
distance
: the nearest neighbors distances.gfun_G
: the empirical G evaluated for each distance in the support.gfun_ev
: the theoretical Poisson G evaluated for each distance in the support.
Example
|
|
GWR_GRID
Description
Geographically weighted regression (GWR) models local relationships between spatially varying predictors and an outcome of interest using a local least squares regression.
This procedures performs a local least squares regression for every input cell. In each regression, the data of each cell and that of the neighboring cells, defined by the kring_distance
parameter, will be taken into account. The data of the neighboring cells will be assigned a lower weight the further they are from the origin cell, following the function specified in the kernel_function
.
input_table
:STRING
name of the source dataset. It should be a quoted qualified table with project and dataset:<project-id>.<dataset-id>.<table-name>
.features_columns
:ARRAY<STRING>
array of column names frominput_table
to be used as features in the GWR.label_column
:STRING
name of the target variable column.cell_column
:STRING
name of the column containing the cell ids.cell_type
:STRING
spatial index type as ‘h3’ or ‘quadkey’.kring_distance
:INT64
distance of the neighboring cells whose data will be included in the local regression of each cell.kernel_function
:STRING
kernel function to compute the spatial weights across the kring. Available functions are: ‘uniform’, ‘triangular’, ‘quadratic’, ‘quartic’ and ‘gaussian’.fit_intercept
:BOOL
whether to calculate the interception of the model or to force it to zero if, for example, the input data is already supposed to be centered. If NULL,fit_intercept
will be considered asTRUE
.output_table
:STRING
name of the output table. It should be a quoted qualified table with project and dataset:<project-id>.<dataset-id>.<table-name>
. The process will fail if the target table already exists. If NULL, the result will be returned directly by the query and not persisted.
Output
The output table will contain a column with the cell id, a column for each feature column containing its corresponding coefficient estimate and one extra column for intercept if fit_intercept
is TRUE
.
Examples
|
|
|
|
KNN
Description
This function returns for each point the k-nearest neighbors of a given set of points.
points
:ARRAY<STRUCT<geoid STRING, geo GEOGRAPHY>>
input data with unique id and geography.k
:INT64
number of nearest neighbors (positive, typically small).
Return type
ARRAY<STRUCT<geo GEOGRAPHY, geo_knn GEOGRAPHY, geoid STRING, geoid_knn STRING, distance FLOAT64, knn INT64>>
where:
geo
: the geometry of the considered point.geo_knn
: the k-nearest neighbor point.geoid
: the unique identifier of the considered point.geoid_knn
: the unique identifier of the k-nearest neighbor.distance
: the k-nearest neighbor distance to the considered point.knn
: the k-order (knn)
Example
|
|
LOCAL_MORANS_I_H3
Description
This function computes the local Moran’s I spatial autocorrelation from the input array of H3 indexes.
input
:ARRAY<STRUCT<index STRING, value FLOAT64>>
input data with the indexes and values of the cells.size
:INT64
size of the H3 kring (distance from the origin). This defines the area around each index cell where the distance decay will be applied.decay
:STRING
decay function to compute the distance decay. Available functions are: uniform, inverse, inverse_square and exponential.
Return type
ARRAY<STRUCT<index STRING, value FLOAT64»
Example
|
|
|
|
LOCAL_MORANS_I_QUADKEY
Description
This function computes the local Moran’s I spatial autocorrelation from the input array of quadkey indexes.
input
:ARRAY<STRUCT<index INT64, value FLOAT64>>
input data with the indexes and values of the cells.size
:INT64
size of the quadkey kring (distance from the origin). This defines the area around each index cell where the distance decay will be applied.decay
:STRING
decay function to compute the distance decay. Available functions are: uniform, inverse, inverse_square and exponential.
Return type
ARRAY<STRUCT<index INT64, value FLOAT64»
Example
|
|
|
|
LOF
Description
This function computes the Local Outlier Factor of each point of a given set of points.
points
:ARRAY<STRUCT<geoid STRING, geo GEOGRAPHY>>
input data points with unique id and geography.k
:INT64
number of nearest neighbors (positive, typically small).
Return type
ARRAY<STRUCT<geo GEOGRAPHY, geoid GEOGRAPHY, lof FLOAT64>>
where:
geo
: the geometry of the considered point.geoid
: the unique identifier of the considered point.lof
: the Local Outlier Factor score.
Example
|
|
LOF_TABLE
Description
This function computes the Local Outlier Factor for each point of a specified column and stores the result in an output table along with the other input columns.
src_fullname
:STRING
The input table. ASTRING
of the formprojectID.dataset.tablename
is expected. The projectID can be omitted (in which case the default one will be used).target_fullname
:STRING
The resulting table where the LOF will be stored. ASTRING
of the formprojectID.dataset.tablename
is expected. The projectID can be omitted (in which case the default one will be used). The dataset must exist and the caller needs to have permissions to create a new table in it. The process will fail if the target table already exists.geoid_column_name
:STRING
The column name with a unique identifier for each point.geo_column_name
:STRING
The column name containing the points.lof_target_column_name
:STRING
The column name where the resulting Local Outlier Factor will be stored in the output table.k
:INT64
Number of nearest neighbors (positive, typically small).
Example
|
|
MORANS_I_H3
Description
This function computes the Moran’s I spatial autocorrelation from the input array of H3 indexes.
input
:ARRAY<STRUCT<index STRING, value FLOAT64>>
input data with the indexes and values of the cells.size
:INT64
size of the H3 kring (distance from the origin). This defines the area around each index cell where the distance decay will be applied.decay
:STRING
decay function to compute the distance decay. Available functions are: uniform, inverse, inverse_square and exponential.
Return type
FLOAT64
Example
|
|
|
|
MORANS_I_QUADKEY
Description
This function computes the Moran’s I spatial autocorrelation from the input array of quadkey indexes.
input
:ARRAY<STRUCT<index INT64, value FLOAT64>>
input data with the indexes and values of the cells.size
:INT64
size of the quadkey kring (distance from the origin). This defines the area around each index cell where the distance decay will be applied.decay
:STRING
decay function to compute the distance decay. Available functions are: uniform, inverse, inverse_square and exponential.
Return type
FLOAT64
Example
|
|
|
|
ORDINARY_KRIGING
Description
This function uses Ordinary kriging to compute the interpolated values of an array of points, given another array of points with known associated values and a variogram. This variogram may be computed with the [#variogram] function.
sample_points
:ARRAY<STRUCT<point GEOGRAPHY, value FLOAT64>>
input array with the sample points and their values.interp_points
:ARRAY<GEOGRAPHY>
input array with the points whose values will be interpolated.max_distance
:FLOAT64
maximum distance to compute the semivariance.variogram_params
:ARRAY<FLOAT64>
parameters [P0, P1, P2] of the variogram model.n_neighbors
:INT64
maximum number of neighbors of a point to be taken into account for interpolation.model
:STRING
type of model for fitting the semivariance. It can be eitherexponential
orspherical
and it should be the same type of model as the one used to compute the variogram:exponential
:P0 * (1. - exp(-xi / (P1 / 3.0))) + P2
spherical
:P1 * (1.5 * (xi / P0) - 0.5 * (xi / P0)**3) + P2
.
Return type
ARRAY<STRUCT<point GEOGRAPHY, value FLOAT64>>
Examples
Here is a standalone example:
|
|
Here is an example using the ORDINARY_KRIGING
function along with a VARIOGRAM
estimation:
|
|
ORDINARY_KRIGING_TABLE
Description
This procedure uses Ordinary kriging to compute the interpolated values of a set of points stored in a table, given another set of points with known associated values.
input_table
:STRING
name of the table with the sample points locations and their values stored in a column namedpoint
(typeGEOGRAPHY
) andvalue
(typeFLOAT
), respectively. It should be a qualified table name including project and dataset:<project-id>.<dataset-id>.<table-name>
.interp_table
:STRING
name of the table with the point locations whose values will be interpolated stored in a column namedpoint
of typeGEOGRAPHY
. It should be a qualified table name including project and dataset:<project-id>.<dataset-id>.<table-name>
.target_table
:STRING
name of the output table where the result of the kriging will be stored. It should be a qualified table name including project and dataset:<project-id>.<dataset-id>.<table-name>
. The process will fail if the target table already exists. If NULL, the result will be returned by the procedure and won’t be persisted.n_bins
:INT64
number of bins to compute the semivariance.max_distance
:FLOAT64
maximum distance to compute the semivariance.n_neighbors
:INT64
maximum number of neighbors of a point to be taken into account for interpolation.model
:STRING
type of model for fitting the semivariance. It can be either:exponential
:P0 * (1. - exp(-xi / (P1 / 3.0))) + P2
spherical
:P1 * (1.5 * (xi / P0) - 0.5 * (xi / P0)**3) + P2
.
Example
|
|
P_VALUE
Description
This function computes the one tail p-value (upper-tail test) of a given z-score assuming the population follows a normal distribution where the mean is 0 and the standard deviation is 1. The z-score is a measure of how many standard deviations below or above the population mean a value is. It gives you an idea of how far from the mean a data point is. The p-value is the probability that a randomly sampled point has a value at least as extreme as the point whose z-score is being tested.
z_score
:FLOAT64
Return type
FLOAT64
Example
|
|
SMOOTHING_MRF_H3
Description
This procedure computes a Markov Random Field (MRF) smoothing for a table containing H3 cell indices and their associated values.
This implementation is based on the work of Christopher J. Paciorek: “Spatial models for point and areal data using Markov random fields on a fine grid.” Electron. J. Statist. 7 946 - 972, 2013. https://doi.org/10.1214/13-EJS791
input
:STRING
name of the source table. It should be a fully qualified table name including project and dataset:<project-id>.<dataset-id>.<table-name>
.output
:STRING
name of the output table. It should be a fully qualified table name including project and dataset:<project-id>.<dataset-id>.<table-name>
. The process will fail if the table already exists. If NULL, the result will be returned directly by the procedure and not persisted.index_column
:STRING
name of the column containing the cell ids.variable_column
:STRING
name of the target variable column.options
:STRING
JSON string to overwrite the model’s default options. If set to NULL or empty, it will use the default values.closing_distance
:INT64
distance of closing. It defaults to 0. If strictly positive, the algorithm performs a morphological closing on the cells by theclosing_distance
, defined in number of cells, before performing the smoothing. No closing is performed otherwise.output_closing_cell
:BOOL
controls whether the cells generated by the closing are added to the output. If defaults toFALSE
.lambda
:FLOAT64
iteration update factor. It defaults to 1.6. For more details, see https://doi.org/10.1214/13-EJS791, page 963.iter
:INT64
number of iterative queries to perform the smoothing. It defaults to 10. Increasing this parameter might help if theconvergence_limit
is not reached by the end of the procedure’s execution. Tip: if this limit has ben reached, the status of the second-to-last step of the procedure will throw an error.intra_iter
:INT64
number of iterations per query. It defaults to 50. Reducing this parameter might help if a resource error is reached during the procedure’s execution.convergence_limit
:FLOAT64
threshold condition to stop iterations. If this threshold is not reached, then the procedure will finish its execution after the maximum number of iterations (iter
) is reached. It defaults to 10e-5. For more details, see https://doi.org/10.1214/13-EJS791, page 963.
Return type
FLOAT64
Example
|
|
SMOOTHING_MRF_QUADKEY
Description
This procedure computes a Markov Random Field (MRF) smoothing for a table containing QUADINT cell indices and their associated values.
This implementation is based on the work of Christopher J. Paciorek: “Spatial models for point and areal data using Markov random fields on a fine grid.” Electron. J. Statist. 7 946 - 972, 2013. https://doi.org/10.1214/13-EJS791
input
:STRING
name of the source table. It should be a fully qualified table name including project and dataset:<project-id>.<dataset-id>.<table-name>
.output
:STRING
name of the output table. It should be a fully qualified table name including project and dataset:<project-id>.<dataset-id>.<table-name>
. The process will fail if the table already exists. If NULL, the result will be returned directly by the procedure and not persisted.index_column
:STRING
name of the column containing the cell ids.variable_column
:STRING
name of the target variable column.options
:STRING
JSON string to overwrite the model’s default options. If set to NULL or empty, it will use the default values.closing_distance
:INT64
distance of closing. It defaults to 0. If strictly positive, the algorithm performs a morphological closing on the cells by theclosing_distance
, defined in number of cells, before performing the smoothing. No closing is performed otherwise.output_closing_cell
:BOOL
controls whether the cells generated by the closing are added to the output. If defaults toFALSE
.lambda
:FLOAT64
iteration update factor. It defaults to 1.6. For more details, see https://doi.org/10.1214/13-EJS791, page 963.iter
:INT64
number of iterative queries to perform the smoothing. It defaults to 10. Increasing this parameter might help if theconvergence_limit
is not reached by the end of the procedure’s execution. Tip: if this limit has ben reached, the status of the second-to-last step of the procedure will throw an error.intra_iter
:INT64
number of iterations per query. It defaults to 50. Reducing this parameter might help if a resource error is reached during the procedure’s execution.convergence_limit
:FLOAT64
threshold condition to stop iterations. If this threshold is not reached, then the procedure will finish its execution after the maximum number of iterations (iter
) is reached. It defaults to 10e-5. For more details, see https://doi.org/10.1214/13-EJS791, page 963.
Return type
FLOAT64
Example
|
|
VARIOGRAM
Description
This function computes the Variogram from the input array of points and their associated values.
It returns a STRUCT with the parameters of the variogram, the x values, the y values, the predicted y values and the number of values aggregated per bin.
input
:ARRAY<STRUCT<point GEOGRAPHY, value FLOAT64>>
input array with the points and their associated values.n_bins
:INT64
number of bins to compute the semivariance.max_distance
:FLOAT64
maximum distance to compute the semivariance.model
:STRING
type of model for fitting the semivariance. It can be either:exponential
:P0 * (1. - exp(-xi / (P1 / 3.0))) + P2
spherical
:P1 * (1.5 * (xi / P0) - 0.5 * (xi / P0)**3) + P2
.
Return type
STRUCT<variogram_params ARRAY<FLOAT64>, x ARRAY<FLOAT64>, y ARRAY<FLOAT64>, yp ARRAY<FLOAT64>, count ARRAY<INT64>>
where:
variogram_params
: array containing the parameters [P0, P1, P2] fitted to themodel
.x
: array with the x values used to fit themodel
.y
: array with the y values used to fit themodel
.yp
: array with the y values as predicted by themodel
.count
: array with the number of elements aggregated in the bin.
Examples
|
|

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.