# statistics

This module contains functions to perform spatial statistics calculations.

## P\_VALUE <a href="#p_value" id="p_value"></a>

```sql
P_VALUE(z_score)
```

**Description**

This function computes the p-value (two-tails test) of a given z-score assuming the population follows a normal distribution where the mean is 0 and the standard deviation is 1. The [z-score](https://en.wikipedia.org/wiki/Standard_score) is a measure of how many standard deviations below or above the population mean a value is. It gives you an idea of how far from the mean a data point is. The [p-value](https://en.wikipedia.org/wiki/P-value) is the probability that a randomly sampled point has a value at least as extreme as the point whose z-score is being tested.

**Input parameters**

* `z_score`: `DOUBLE`

**Return type**

`DOUBLE`

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
SELECT carto.P_VALUE(z) AS p_value
FROM VALUES (-2), (-1), (0), (1), (2) AS t(z);
-- 0.04550012577451279,
-- 0.31731052766472745,
-- 0.999999999,
-- 0.31731052766472745,
-- 0.04550012577451279
```

{% endcode %}

## CREATE\_SPATIAL\_COMPOSITE\_UNSUPERVISED <a href="#create_spatial_composite_unsupervised" id="create_spatial_composite_unsupervised"></a>

```sql
CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(input, index_column, output_table, options)
```

**Description**

This procedure combines (spatial) variables into a meaningful composite score. The composite score can be derived using different methods, scaling and aggregation functions and weights. Rows with a NULL value in any of the model predictors are dropped.

**Input parameters**

* `input`: `STRING` the query to the data used to compute the spatial composite. It must contain all the individual variables that should be included in the computation of the composite as well as a unique geographic id for each row. A qualified table name can be given as well, e.g. `'<my-catalog>.<my-schema>.<my-table>'`.
* `index_column`: `STRING` the name of the column with the unique geographic identifier.
* `output_table`: `STRING` the name for the output table. It should include catalog and schema, e.g. `'<my-catalog>.<my-schema>.<my-table>'`.
* `options`: `STRING` containing a valid JSON with the different options. Valid options are described below. If options is set to NULL then all options are set to default values, as specified in the table below.
  * `scoring_method`: `STRING` Possible options are ENTROPY, CUSTOM\_WEIGHTS, FIRST\_PC. With the ENTROPY method the spatial composite is derived as the weighted sum of the proportion of the min-max scaled individual variables, where the weights are based on the entropy of the proportion of each variable. Only numerical variables are allowed. With the CUSTOM\_WEIGHTS method, the spatial composite is computed by first scaling each individual variable and then aggregating them according to user-defined scaling and aggregation methods and individual weights. Depending on the scaling parameter, both numerical and ordinal variables are allowed (categorical and boolean variables need to be transformed to ordinal). With the FIRST\_PC method, the spatial composite is derived from a Principal Component Analysis as the first principal component score. Only numerical variables are allowed.
  * `weights`: `OBJECT` the (optional) weights for each variable used to compute the spatial composite when scoring\_method is set to CUSTOM\_WEIGHTS, passed as `{"name":value, ...}`. If a different scoring method is selected, then this input parameter is ignored. If specified, the sum of the weights must be lower than 1. If no weights are specified, equal weights are assumed. If weights are specified only for some variables and the sum of weights is less than 1, the remainder is distributed equally between the remaining variables. If weights are specified for all the variables and the sum of weights is less than 1, the remainder is distributed equally between all the variables.
  * `scaling`: `STRING` the user-defined scaling when the scoring\_method is set to CUSTOM\_WEIGHTS. Possible options are:
    * MIN\_MAX\_SCALER: data is rescaled into the range \[0,1] based on minimum and maximum values. Only numerical variables are allowed.
    * STANDARD\_SCALER: data is rescaled by subtracting the mean value and dividing the result by the standard deviation. Only numerical variables are allowed.
    * RANKING: data is replaced by its percent rank, that is by values ranging from 0 lowest to 1. Both numerical and ordinal variables are allowed (categorical and boolean variables need to be transformed to ordinal).
    * DISTANCE\_TO\_TARGET\_MIN(\_MAX,\_AVG):data is rescaled by dividing by the minimum, maximum, or mean of all the values. Only numerical variables are allowed.
    * PROPORTION: data is rescaled by dividing by the sum total of all the values. Only numerical variables are allowed.
  * `aggregation`: `STRING` the aggregation function used when the scoring\_method is set to CUSTOM\_WEIGHTS. Possible options are:
    * LINEAR: the spatial composite is derived as the weighted sum of the scaled individual variables.
    * GEOMETRIC: the spatial composite is given by the product of the scaled individual variables, each to the power of its weight.
  * `correlation_var`: `STRING` when scoring\_method is set to FIRST\_PC, the spatial score will be positively correlated with the selected variable (i.e. the sign the spatial score is set such that the correlation between the selected variable and the first principal component score is positive).
  * `correlation_thr`: `DOUBLE` the minimum absolute value of the correlation between each individual variable and the first principal component score when scoring\_method is set to FIRST\_PC.
  * `return_range`: `ARRAY` the user-defined normalization range of the spatial composite score, e.g \[0.0,1.0]. Ignored if `bucketize_method` is specified.
  * `bucketize_method`: `STRING` the method used to discretize the spatial composite score. Possible options are:
    * EQUAL\_INTERVALS: the values of the spatial composite score are discretized into buckets of equal widths.
    * QUANTILES: the values of the spatial composite score are discretized into buckets based on quantiles.
    * JENKS: the values of the spatial composite score are discretized into buckets obtained using k-means clustering.
  * `nbuckets`: `INT` the number of buckets used when a bucketization method is specified. When `bucketize_method` is set to EQUAL\_INTERVALS, if `nbuckets` is NULL, the default number of buckets is selected using [Freedman and Diaconis's (1981) rule](https://robjhyndman.com/papers/sturges.pdf). When `bucketize_method` is set to JENKS or QUANTILES, `nbuckets` cannot be NULL.

| Option             | `ENTROPY` | `CUSTOM_WEIGHTS` | `FIRST_PC` | Valid options                                                                                                                            | Default value                                                                                                                                        |
| ------------------ | --------- | ---------------- | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `scoring_method`   | Optional  | Optional         | Optional   | ENTROPY, CUSTOM\_WEIGHTS, FIRST\_PC                                                                                                      | ENTROPY                                                                                                                                              |
| `weights`          | Ignored   | Optional         | Ignored    | `{"name":value...}`                                                                                                                      | NULL                                                                                                                                                 |
| `scaling`          | Ignored   | Optional         | Ignored    | MIN\_MAX\_SCALER, STANDARD\_SCALER, RANKING, DISTANCE\_TO\_TARGET\_MIN, DISTANCE\_TO\_TARGET\_MAX, DISTANCE\_TO\_TARGET\_AVG, PROPORTION | MIN\_MAX\_SCALER                                                                                                                                     |
| `aggregation`      | Ignored   | Optional         | Ignored    | LINEAR, GEOMETRIC                                                                                                                        | LINEAR                                                                                                                                               |
| `correlation_var`  | Ignored   | Optional         | Mandatory  | -                                                                                                                                        | NULL                                                                                                                                                 |
| `correlation_thr`  | Ignored   | Optional         | Optional   | -                                                                                                                                        | NULL                                                                                                                                                 |
| `return_range`     | Optional  | Optional         | Optional   | -                                                                                                                                        | NULL                                                                                                                                                 |
| `bucketize_method` | Optional  | Optional         | Optional   | EQUAL\_INTERVALS, QUANTILES, JENKS                                                                                                       | NULL                                                                                                                                                 |
| `nbuckets`         | Optional  | Optional         | Optional   | -                                                                                                                                        | When `bucketize_method` is set to EQUAL\_INTERVALS, selected using [Freedman and Diaconis's (1981) rule](https://robjhyndman.com/papers/sturges.pdf) |

**Return type**

The results are stored in the table named `<output_table>`, which contains the following columns:

* `index_column`: the unique geographic identifier. The type of this column depends on the type of `index_column` in `input`.
* `spatial_score`: the value of the composite score. The type of this column is `DOUBLE` if the score is not discretized and `INT` otherwise.

When the score is discretized by specifying the `bucketize_method` parameter, the procedure also returns a lookup table named `<output_table>_lookup_table` with the following columns:

* `lower_bound`: `DOUBLE` the lower bound of the bin.
* `upper_bound`: `DOUBLE` the upper bound of the bin.
* `spatial_score`: `INT` the value of the (discretized) composite score.

**Examples**

With the `ENTROPY` method:

```sql
CALL carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
    'SELECT * FROM <my-catalog>.<my-schema>.<my-table>',
    'geoid',
    '<my-catalog>.<my-schema>.<my-output-table>',
    '{
        "scoring_method":"ENTROPY",
        "return_range":[0.0,1.0]
    }'
)
-- The table `<my-catalog>.<my-schema>.<my-output-table>` will be created
```

With the `CUSTOM_WEIGHTS` method:

```sql
CALL carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
    'SELECT * FROM <my-catalog>.<my-schema>.<my-table>',
    'geoid',
    '<my-catalog>.<my-schema>.<my-output-table>',
    '{
        "scoring_method":"CUSTOM_WEIGHTS",
        "weights":{"fempop_15_44":0.2,"education":0.1,"urbanity_ordinal":0.1,"pois":0.1},
        "scaling":"RANKING",
        "aggregation":"LINEAR",
        "bucketize_method":"JENKS"
    }'
)
-- The table `<my-catalog>.<my-schema>.<my-output-table>` will be created
```

With the `FIRST_PC` method:

```sql
CALL carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
    'SELECT * FROM <my-catalog>.<my-schema>.<my-table>',
    'geoid',
    '<my-catalog>.<my-schema>.<my-output-table>',
    '{
        "scoring_method":"FIRST_PC",
        "correlation_var":"fempop_15_44",
        "correlation_thr":0.6,
        "bucketize_method":"QUANTILES"
    }'
)
-- The table `<my-catalog>.<my-schema>.<my-output-table>` will be created
```

With default options:

```sql
CALL carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
    'SELECT geoid, var1, var2, var3 FROM <my-catalog>.<my-schema>.<my-table>',
    'geoid',
    '<my-catalog>.<my-schema>.<my-composite-score>',
    NULL
)
-- The table `<my-catalog>.<my-schema>.<my-composite-score>` will be created
```

## GWR\_GRID <a href="#gwr_grid" id="gwr_grid"></a>

```sql
GWR_GRID(input, features_columns, label_column, index_column, cell_type, kring_distance, kernel_function, fit_intercept, output_table)
```

**Description**

Geographically weighted regression (GWR) models local relationships between spatially varying predictors and an outcome of interest using a local least squares regression.

This procedure performs a local least squares regression for every input cell. This approach was selected to improve computation time and efficiency. The number of models is controlled by the selected cell resolution, thus the user can increase or decrease the resolution of the cell index to perform more or less regressions. Note that you need to provide the cell ID (spatial index) for every location as input (see `index_column` parameter), i.e., the cell type and resolution are not passed explicitly, but rather the index has to be computed previously. Hence if you want to increase or decrease the resolution, you need to precompute the corresponding cell ID of every location (see [H3](#h3) or [Quadbin](https://github.com/CartoDB/gitbook-documentation/blob/master/data-and-analysis/analytics-toolbox-for-databricks/reference/.md/README.md#quadbin) modules).

In each regression, the data of the locations in each cell and those of the neighboring cells, defined by the `kring_distance` parameter, will be taken into account. The data of the neighboring cells will be assigned a lower weight the further they are from the origin cell, following the function specified in the `kernel_function`. For example, considering cell `i` and `kring_distance` of 1. Having `n` locations located inside cell `i`, and in the neighboring cells \[`n_1`, `n_2`, ..., `n_k`], then the regression of the cell `i` will have in total `n` + `n_1` + `n_2` + ... + `n_k` points.

**Input parameters**

* `input`: `STRING` the query to the input data. A qualified table name can be given as well, e.g. `<my-catalog>.<my-schema>.<my-table>`.
* `features_columns`: `ARRAY<STRING>` array of column names from the input to be used as features in the GWR.
* `label_column`: `STRING` name of the target variable column.
* `index_column`: `STRING` name of the column containing the cell ids.
* `cell_type`: `STRING` type of spatial index used. Supported values are `'H3'` and `'QUADBIN'`.
* `kring_distance`: `INT` distance of the neighboring cells whose data will be included in the local regression of each cell.
* `kernel_function`: `STRING` [kernel function](https://en.wikipedia.org/wiki/Kernel_\(statistics\)) to compute the spatial weights across the kring. Available functions are: 'uniform', 'triangular', 'quadratic', 'quartic' and 'gaussian'.
* `fit_intercept`: `BOOLEAN` whether to calculate the interception of the model or to force it to zero if, for example, the input data is already supposed to be centered. If NULL, `fit_intercept` will be considered as `TRUE`.
* `output_table`: `STRING` qualified name of the output table, e.g. `<my-catalog>.<my-schema>.<my-output-table>`.

**Output**

The output table will contain a column named `H3` (`STRING`) or `QUADBIN` (`BIGINT`) depending on the `cell_type`, storing the unique geographic identifier of each grid cell, and a column for each feature column containing its corresponding coefficient estimate and one extra column for the intercept if `fit_intercept` is set to `TRUE`.

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.GWR_GRID(
    '<my-catalog>.<my-schema>.<my-table>',
    ARRAY('bedrooms', 'bathrooms'), -- [ beds feature, bathrooms feature ]
    'price', -- price (target variable)
    'h3_z6',
    'H3',
    3, 'gaussian', TRUE,
    '<my-catalog>.<my-schema>.<my-table>_OUTPUT'
);
-- The table `<my-catalog>.<my-schema>.<my-table>_OUTPUT` will be created
```

{% endcode %}

## GETIS\_ORD\_H3 <a href="#getis_ord_h3" id="getis_ord_h3"></a>

```sql
GETIS_ORD_H3(input, output_table, index_column, value_column, size, kernel)
```

**Description**

This procedure computes the Getis-Ord Gi\* statistic for each row in the input table.

**Input parameters**

* `input`: `STRING` the query to the data used to compute the coefficient. A qualified table name can be given as well, e.g. `<my-catalog>.<my-schema>.<my-table>`.
* `output_table`: `STRING` qualified name of the output table, e.g. `<my-catalog>.<my-schema>.<my-output-table>`.
* `index_column`: `STRING` name of the column with the H3 indexes.
* `value_column`: `STRING` name of the column with the values for each H3 cell.
* `size`: `INT` size of the H3 kring (distance from the origin). This defines the area around each index cell that will be taken into account to compute its Gi\* statistic.
* `kernel`: `STRING` [kernel function](https://en.wikipedia.org/wiki/Kernel_\(statistics\)) to compute the spatial weights across the kring. Available functions are: uniform, triangular, quadratic, quartic and gaussian.

{% hint style="warning" %}
The `index_column` cannot contain NULL values, otherwise a `Invalid input origin` error will be returned.
{% endhint %}

**Output**

The results are stored in the table named `<output_table>`, which contains the following columns:

* `h3`: `STRING` the H3 index.
* `gi`: `DOUBLE` computed Gi\* value.
* `p_value`: `DOUBLE` computed P value.

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.GETIS_ORD_H3(
    '<my-catalog>.<my-schema>.<my-table>',
    '<my-catalog>.<my-schema>.<my-table>_OUTPUT',
    'h3',
    'value',
    3,
    'gaussian'
);
-- The table `<my-catalog>.<my-schema>.<my-table>_OUTPUT` will be created
```

{% endcode %}

## GETIS\_ORD\_QUADBIN <a href="#getis_ord_quadbin" id="getis_ord_quadbin"></a>

```sql
GETIS_ORD_QUADBIN(input, output_table, index_column, value_column, size, kernel)
```

**Description**

This procedure computes the Getis-Ord Gi\* statistic for each row in the input table.

**Input parameters**

* `input`: `STRING` the query to the data used to compute the coefficient. A qualified table name can be given as well, e.g. `<my-catalog>.<my-schema>.<my-table>`.
* `output_table`: `STRING` qualified name of the output table, e.g. `<my-catalog>.<my-schema>.<my-output-table>`.
* `index_column`: `STRING` name of the column with the Quadbin indexes.
* `value_column`: `STRING` name of the column with the values for each Quadbin cell.
* `size`: `INT` size of the Quadbin kring (distance from the origin). This defines the area around each index cell that will be taken into account to compute its Gi\* statistic.
* `kernel`: `STRING` [kernel function](https://en.wikipedia.org/wiki/Kernel_\(statistics\)) to compute the spatial weights across the kring. Available functions are: uniform, triangular, quadratic, quartic and gaussian.

{% hint style="warning" %}
The `index_column` cannot contain NULL values, otherwise a `Invalid input origin` error will be returned.
{% endhint %}

**Output**

The results are stored in the table named `<output_table>`, which contains the following columns:

* `quadbin`: `BIGINT` the Quadbin index.
* `gi`: `DOUBLE` computed Gi\* value.
* `p_value`: `DOUBLE` computed P value.

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.GETIS_ORD_QUADBIN(
    '<my-catalog>.<my-schema>.<my-table>',
    '<my-catalog>.<my-schema>.<my-table>_OUTPUT',
    'quadbin',
    'value',
    3,
    'gaussian'
);
-- The table `<my-catalog>.<my-schema>.<my-table>_OUTPUT` will be created
```

{% endcode %}

## MORANS\_I\_H3 <a href="#morans_i_h3" id="morans_i_h3"></a>

```sql
MORANS_I_H3(input, output_table, index_column, value_column, size, decay)
```

**Description**

This procedure computes the [Moran's I spatial autocorrelation](https://en.wikipedia.org/wiki/Moran%27s_I) from the input table with H3 indexes.

**Input parameters**

* `input`: `STRING` the query to the data used to compute the coefficient. A qualified table name can be given as well, e.g. `<my-catalog>.<my-schema>.<my-table>`.
* `output_table`: `STRING` qualified name of the output table, e.g. `<my-catalog>.<my-schema>.<my-output-table>`.
* `index_column`: `STRING` name of the column with the H3 indexes.
* `value_column`: `STRING` name of the column with the values for each H3 cell.
* `size`: `INT` size of the H3 *k-ring* (distance from the origin). This defines the area around each index cell where the distance decay will be applied. If no neighboring cells are found, the weight of the corresponding index cell is set to zero.
* `decay`: `STRING` decay function to compute the [distance decay](https://en.wikipedia.org/wiki/Distance_decay). Available functions are: uniform, inverse, inverse\_square and exponential.

{% hint style="warning" %}
The `index_column` cannot contain NULL values, otherwise a `Invalid input origin` error will be returned.
{% endhint %}

**Output**

The results are stored in the table named `<output_table>`, which contains the following column:

* `morans_i`: `DOUBLE` Moran's I spatial autocorrelation.

If all cells have no neighbours, then the procedure will fail.

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.MORANS_I_H3(
    '<my-catalog>.<my-schema>.<my-table>',
    '<my-catalog>.<my-schema>.<my-table>_OUTPUT',
    'h3',
    'value',
    5,
    'uniform'
);
-- The table `<my-catalog>.<my-schema>.<my-table>_OUTPUT` will be created
```

{% endcode %}

## MORANS\_I\_QUADBIN <a href="#morans_i_quadbin" id="morans_i_quadbin"></a>

```sql
MORANS_I_QUADBIN(input, output_table, index_column, value_column, size, decay)
```

**Description**

This procedure computes the [Moran's I spatial autocorrelation](https://en.wikipedia.org/wiki/Moran%27s_I) from the input table with Quadbin indexes.

**Input parameters**

* `input`: `STRING` the query to the data used to compute the coefficient. A qualified table name can be given as well, e.g. `<my-catalog>.<my-schema>.<my-table>`.
* `output_table`: `STRING` qualified name of the output table, e.g. `<my-catalog>.<my-schema>.<my-output-table>`.
* `index_column`: `STRING` name of the column with the Quadbin indexes.
* `value_column`: `STRING` name of the column with the values for each Quadbin cell.
* `size`: `INT` size of the Quadbin *k-ring* (distance from the origin). This defines the area around each index cell where the distance decay will be applied. If no neighboring cells are found, the weight of the corresponding index cell is set to zero.
* `decay`: `STRING` decay function to compute the [distance decay](https://en.wikipedia.org/wiki/Distance_decay). Available functions are: uniform, inverse, inverse\_square and exponential.

{% hint style="warning" %}
The `index_column` cannot contain NULL values, otherwise a `Invalid input origin` error will be returned.
{% endhint %}

**Output**

The results are stored in the table named `<output_table>`, which contains the following column:

* `morans_i`: `DOUBLE` Moran's I spatial autocorrelation.

If all cells have no neighbours, then the procedure will fail.

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.MORANS_I_QUADBIN(
    '<my-catalog>.<my-schema>.<my-table>',
    '<my-catalog>.<my-schema>.<my-table>_OUTPUT',
    'quadbin',
    'value',
    5,
    'uniform'
);
-- The table `<my-catalog>.<my-schema>.<my-table>_OUTPUT` will be created
```

{% endcode %}

## LOCAL\_MORANS\_I\_H3 <a href="#local_morans_i_h3" id="local_morans_i_h3"></a>

```sql
LOCAL_MORANS_I_H3(input, output_table, index_column, value_column, size, decay, permutations)
```

**Description**

This procedure computes the local Moran's I spatial autocorrelation from the input table with H3 indexes. It outputs the H3 `index`, local Moran's I spatial autocorrelation `value`, simulated p value `psim`, Conditional randomization null - expectation `EIc`, Conditional randomization null - variance `VIc`, Total randomization null - expectation `EI`, Total randomization null - variance `VI`, and the `quad` HH=1, LL=2, LH=3, HL=4.

**Input parameters**

* `input`: `STRING` the query to the data used to compute the coefficient. A qualified table name can be given as well, e.g. `<my-catalog>.<my-schema>.<my-table>`.
* `output_table`: `STRING` qualified name of the output table, e.g. `<my-catalog>.<my-schema>.<my-output-table>`.
* `index_column`: `STRING` name of the column with the H3 indexes.
* `value_column`: `STRING` name of the column with the values for each H3 cell.
* `size`: `INT` size of the H3 *k-ring* (distance from the origin). This defines the area around each index cell where the distance decay will be applied. If no neighboring cells are found, the weight of the corresponding index cell is set to zero.
* `decay`: `STRING` decay function to compute the [distance decay](https://en.wikipedia.org/wiki/Distance_decay). Available functions are: uniform, inverse, inverse\_square and exponential.
* `permutations`: `INT` number of permutations for the estimation of p-value.

{% hint style="warning" %}
The `index_column` cannot contain NULL values, otherwise a `Invalid input origin` error will be returned.
{% endhint %}

**Output**

The results are stored in the table named `<output_table>`, which contains the following columns:

* `h3`: `STRING` the H3 index.
* `value`: `DOUBLE` local Moran's I spatial autocorrelation.
* `psim`: `DOUBLE` simulated p value.
* `EIc`: `DOUBLE` conditional randomization null - expectation.
* `VIc`: `DOUBLE` conditional randomization null - variance.
* `EI`: `DOUBLE` total randomization null - expectation.
* `VI`: `DOUBLE` total randomization null - variance.
* `quad`: `INTEGER` HH=1, LL=2, LH=3, HL=4.

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.LOCAL_MORANS_I_H3(
    '<my-catalog>.<my-schema>.<my-table>',
    '<my-catalog>.<my-schema>.<my-table>_OUTPUT',
    'h3',
    'value',
    3,
    'exponential',
    100
);
-- The table `<my-catalog>.<my-schema>.<my-table>_OUTPUT` will be created
```

{% endcode %}

## LOCAL\_MORANS\_I\_QUADBIN <a href="#local_morans_i_quadbin" id="local_morans_i_quadbin"></a>

```sql
LOCAL_MORANS_I_QUADBIN(input, output_table, index_column, value_column, size, decay, permutations)
```

**Description**

This procedure computes the local Moran's I spatial autocorrelation from the input table with Quadbin indexes. It outputs the Quadbin `index`, local Moran's I spatial autocorrelation `value`, simulated p value `psim`, Conditional randomization null - expectation `EIc`, Conditional randomization null - variance `VIc`, Total randomization null - expectation `EI`, Total randomization null - variance `VI`, and the `quad` HH=1, LL=2, LH=3, HL=4.

**Input parameters**

* `input`: `STRING` the query to the data used to compute the coefficient. A qualified table name can be given as well, e.g. `<my-catalog>.<my-schema>.<my-table>`.
* `output_table`: `STRING` qualified name of the output table, e.g. `<my-catalog>.<my-schema>.<my-output-table>`.
* `index_column`: `STRING` name of the column with the Quadbin indexes.
* `value_column`: `STRING` name of the column with the values for each Quadbin cell.
* `size`: `INT` size of the Quadbin *k-ring* (distance from the origin). This defines the area around each index cell where the distance decay will be applied. If no neighboring cells are found, the weight of the corresponding index cell is set to zero.
* `decay`: `STRING` decay function to compute the [distance decay](https://en.wikipedia.org/wiki/Distance_decay). Available functions are: uniform, inverse, inverse\_square and exponential.
* `permutations`: `INT` number of permutations for the estimation of p-value.

{% hint style="warning" %}
The `index_column` cannot contain NULL values, otherwise a `Invalid input origin` error will be returned.
{% endhint %}

**Output**

The results are stored in the table named `<output_table>`, which contains the following columns:

* `quadbin`: `BIGINT` the Quadbin index.
* `value`: `DOUBLE` local Moran's I spatial autocorrelation.
* `psim`: `DOUBLE` simulated p value.
* `EIc`: `DOUBLE` conditional randomization null - expectation.
* `VIc`: `DOUBLE` conditional randomization null - variance.
* `EI`: `DOUBLE` total randomization null - expectation.
* `VI`: `DOUBLE` total randomization null - variance.
* `quad`: `INTEGER` HH=1, LL=2, LH=3, HL=4.

**Example**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.LOCAL_MORANS_I_QUADBIN(
    '<my-catalog>.<my-schema>.<my-table>',
    '<my-catalog>.<my-schema>.<my-table>_OUTPUT',
    'quadbin',
    'value',
    3,
    'exponential',
    100
);
-- The table `<my-catalog>.<my-schema>.<my-table>_OUTPUT` will be created
```

{% endcode %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.carto.com/data-and-analysis/analytics-toolbox-for-databricks/reference/statistics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.