# Data Enrichment

Components to enrich your data with variables from other data sources. These components work for simple features and spatial indexes grids.

## Downscaling

**Description**

The **Downscaling** component distributes variables from larger source polygons to smaller target polygons based on weights. This component takes aggregate data from larger geographic areas (e.g., countries, states) and distributes it proportionally to smaller areas (e.g., counties, census tracts) using a provided weighting factor that will be used as a *proxy variable* to distribute the data proportionally.

**Note**: This component is designed for geometries that are perfectly contained within others (i.e. from US Census Block Groups to US Census Blocks), where target polygon centroids fall within exactly one source polygon. If this is not the case, you should probably use [*Enrich Polygons with Weights*](https://docs.carto.com/carto-user-manual/workflows/components/data-enrichment#enrich-polygons-with-weights)

**Inputs**

Source and Target Tables:

* The **source table**, containing larger polygons with variables to distribute.
* The **target table**, containing smaller polygons with weights.
* An optional **lookup table**, containing a mapping from source IDs to target IDs.

**Settings**

* **Source ID column**: the column that uniquely identifies each row from the source table.
* **Target ID column**: the column that uniquely identifies each row from the target table.
* **Weight column**: the column from the target table that will be used to weight the distribution.
* **Variables to downscale**: each of the variables to be distributed from the source table. There are different distribution methods to be chosen per column:
  * *Extensive*: Distributes values proportionally based on weights. The sum of distributed values across all target polygons equals the original source value. Suitable for count data or totals (e.g., total population, total revenue).
  * *Intensive*: Redistributes values using weighted averages. First upscales target values to source level, then applies a normalization ratio. Suitable for rates, densities, or averages (e.g., population density, average income) that requires reconciliation with a higher level of aggregation. **Requires the variable to exist in both source and target tables with the same name**, i.e. using an outdated value or a handcrafted measure.
  * *Uniform*: Copies the same value from the source to all matching target polygons without modification. Suitable for categorical or constant values (e.g., region name, policy indicator, join keys, etc).
* **Matching mode** : Choose how to match target polygons to source polygons:
  * *Infer*: Computes target polygon centroids and uses spatial containment to match them with source polygons. Requires geometry columns from both tables to compute the spatial relations. By default, it performs
  * *Lookup*: Uses a precomputed lookup table with source-target relationships. Faster when relationships are precomputed or can be obtained through a simpler computation.

**Advanced Options**

* *Allow unmatched targets*: When using *Infer* as matching mode, the component performs a quick check to verify that there are no obvious issues with the data and the contract of the process (perfectly contained, smaller geometries): that no target polygon is contained in multiple source polygons, and no target polygons are left outside source polygons. Enable this checkbox to skip those checks.

**Outputs**

* *Result table*: Output table containing *all* columns from the target table plus the downscaled variables with suffixes indicating the distribution method:
  * `{variable_name}_extensive` for extensive variables
  * `{variable_name}_intensive` for intensive variables
  * `{variable_name}_uniform` for uniform variables

## Enrich H3 Grid

**Description**

This component enriches a *target* table with data from a *source*. Enriching here means adding columns with aggregated data from the source that matches the target geographies.

* The *target*, (which is the upper input connection of this component), must have a column that contains H3 indices, which will be used to join with the *source*.
* The *source* (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geography column.

For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

* `DATAOBS_ENRICH_GRID` if the source is a Data Observatory subscription
* `ENRICH_GRID` otherwise

**Inputs**

* `Target geo column`: it's the column of the target that will be used to join with the source and select the rows that will be aggregated for each target row.
* `Source geo column`: (only necessary for non-DO sources) is the column of the source that will join with the target.
* `Variables`: this allows selecting the data from the source that will be aggregated and added to the target.

  * For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their *variable slug*;
  * for other sources they are the columns in the source table.

  Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods.\
  At the moment only numeric variables are supported.

For spatially smoothed enrichments that take into account the surrounding cells, use the following input parameters:

* `Kring size`: size of the k-ring where the decay function will be applied. This value can be 0, in which case no k-ring will be computed and the decay function won't be applied.
* `Decay function`: decay function to aggregate and smooth the data. Supported values are `uniform`, `inverse`, `inverse_square` and `exponential`.

**Outputs**

* `Result table [Table]`

## Enrich Points

**Description**

This component enriches a *target* table with data from a *source*. Enriching here means adding columns with aggregated data from the source that matches (intersects) the target geographies.

* The *target*, (which is the upper input connection of this component), must have a geo column, which will be used to intersect with the *source*.
* The *source* (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geo column.

For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

* DATAOBS\_ENRICH\_POINTS if the source is a Data Observatory subscription
* ENRICH\_POINTS otherwise

**Inputs**

* `Target geo column`: it's the column of the target that will be used to intersect with the source and select the rows that will be aggregated for each target row.
* `Source geo column`: (only necessary for non-DO sources) is the column of the source that will intersect with the target.
* `Variables`: this allows selecting the data from the source that will be aggregated and added to the target.

  * For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their *variable slug*;
  * for other sources they are the columns in the source table.

  Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods.\
  At the moment only numeric variables are supported.

**Outputs**

* `Result table [Table]`

## Enrich Polygons

**Description**

This component enriches a *target* table with data from a *source*. Enriching here means adding columns with aggregated data from the source that matches (intersects) the target geographies.

* The *target*, (which is the upper input connection of this component), must have a geo column, which will be used to intersect with the *source*.
* The *source* (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geo column.

For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

* `DATAOBS_ENRICH_POLYGONS` if the source is a Data Observatory subscription
* `ENRICH_POLYGONS` otherwise

**Inputs**

* `Target geo column`: it's the column of the target that will be used to intersect with the source and select the rows that will be aggregated for each target row.
* `Source geo column`: (only necessary for non-DO sources) is the column of the source that will intersect with the target.
* `Variables`: this allows selecting the data from the source that will be aggregated and added to the target.

  * For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their *variable slug*;
  * for other sources they are the columns in the source table.

  Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods.\
  At the moment only numeric variables are supported.

**Outputs**

* `Result table [Table]`

## Enrich Polygons with Weights

**Description**

This component enriches a target table with data from a source table, using a weights table for proportional attribution. It uses a weight-projected enrichment method to distribute values from the source to the target polygons based on the spatial distribution of weights.

**Inputs**

* Target table: polygons to be enriched
* Source table: table with data for the enrichment
* Weights table: table with data to weight the enrichment

**Settings**

* Target polygons geo column: Select the column from the target table that contains a valid geography.
* Source table geo column: Select the column from the source table that contains a valid geography.
* Variables: Select a list of variables and aggregation method from the source table to be used to enrich the target table. Valid aggregation methods are:
  * **`SUM`**: It assumes the aggregated variable is an [*extensive property*](https://en.wikipedia.org/wiki/Intensive_and_extensive_properties) (e.g. population). Accordingly, the value corresponding to the feature intersected **is weighted by the fraction of the intersected weight variable**.
  * **`AVG`**: It assumes the aggregated variable is an [*intensive property*](https://en.wikipedia.org/wiki/Intensive_and_extensive_properties) (e.g. temperature, population density). **A** [**weighted average**](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean) **is computed, using the value of the intersected weight variable as weights**.
* Weights geo column: Select the column from the weights table that contains a valid geography.
* Weight variable: Select a numeric column from the weights table to be used as the weight for proportional attribution.

{% hint style="info" %}
If your weight variables are included in the same table as the source variables, you can connect the same node to both inputs in this component.
{% endhint %}

**Outputs**

* Output table with the following schema
  * All columns from Target
  * A column from each variable in 'Variables', named like 'name\_sum', 'name\_avg' depending on the original column name and the aggregation method.

## Enrich Polygons with Weights (Legacy)

**Description**

This is the legacy version of the Enrich Polygons with Weights component. It uses a data source (either a table or a Data Observatory subscription) to enrich another target table using weights to control the enrichment. It supports a wider range of aggregation methods but is only available on BigQuery.

**Inputs**

* Target table to be enriched
* Source table with data for the enrichment (can be a Data Observatory subscription or a standard table)
* Weights table with data to weight the enrichment

**Settings**

* Target polygons geo column: Select the column from the target table that contains a valid geography.
* Source table geo column: Select the column from the source table that contains a valid geography.
* Variables: Select a list of variables and aggregation method from the source table to be used to enrich the target table. Valid aggregation methods are:
  * **`SUM`**: It assumes the aggregated variable is an [*extensive property*](https://en.wikipedia.org/wiki/Intensive_and_extensive_properties) (e.g. population). Accordingly, the value corresponding to the feature intersected **is weighted by the fraction of the intersected weight variable**.
  * **`MIN`**: It assumes the aggregated variable is an [*intensive property*](https://en.wikipedia.org/wiki/Intensive_and_extensive_properties) (e.g. temperature, population density). Thus, **the value is not altered by the weight variable**.
  * **`MAX`**: It assumes the aggregated variable is an [*intensive property*](https://en.wikipedia.org/wiki/Intensive_and_extensive_properties) (e.g. temperature, population density). Thus, **the value is not altered by the weight variable**.
  * **`AVG`**: It assumes the aggregated variable is an [*intensive property*](https://en.wikipedia.org/wiki/Intensive_and_extensive_properties) (e.g. temperature, population density). **A** [**weighted average**](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean) **is computed, using the value of the intersected weight variable as weights**.
  * **`COUNT`** It computes the number of features that contain the enrichment variable and are intersected by the input geography.

{% hint style="info" %}
The component will return an error if all variables selected are aggregated as MIN or MAX, since the result wouldn't actually be weighted.
{% endhint %}

* Weights geo column: Select the column from the weights table that contains a valid geography.
* Weights variable: Select one variable and aggregation operation to be used as weight for the enrichment.

{% hint style="info" %}
If your weight variables are included in the same table as the source variables, you can connect the same node to both inputs in this component.
{% endhint %}

{% hint style="warning" %}
When the source for the enrichment is a standard table, the weights source can't be a DO subscription.

The same limitation applies when the source for the enrichment is a DO subscription; the weights source can't be a standard table.
{% endhint %}

**Outputs**

* Output table with the following schema
  * All columns from Target
  * A column from each variable in 'Variables', named like 'name\_sum', 'name\_avg', 'name\_max' depending on the original column name and the aggregation method.

## Enrich Quadbin Grid

**Description**

This component enriches a *target* table with data from a *source*. Enriching here means adding columns with aggregated data from the source that matches the target geographies.

* The *target*, (which is the upper input connection of this component), must have a column that contains Quadbin indices, which will be used to join with the *source*.
* The *source* (lower input connection) can be either a CARTO Data Observatory subscription or table (or result from other component) with a geography column.

For the enrichment operation the CARTO Analytics Toolbox is required, and one of the following procedures will be called:

* `DATAOBS_ENRICH_GRID` if the source is a Data Observatory subscription
* `ENRICH_GRID` otherwise

**Inputs**

* `Target geo column`: it's the column of the target that will be used to join with the source and select the rows that will be aggregated for each target row.
* `Source geo column`: (only necessary for non-DO sources) is the column of the source that will join with the target.
* `Variables`: this allows selecting the data from the source that will be aggregated and added to the target.

  * For Data Observatory subscriptions, the variables can be selected from the DO variables of the subscription, identified by their *variable slug*;
  * for other sources they are the columns in the source table.

  Each variable added must be assigned an aggregation method. You can add the same variable with different aggregation methods.\
  At the moment only numeric variables are supported.

For spatially smoothed enrichments that take into account the surrounding cells, use the following input parameters:

* `Kring size`: size of the k-ring where the decay function will be applied. This value can be 0, in which case no k-ring will be computed and the decay function won't be applied.
* `Decay function`: decay function to aggregate and smooth the data. Supported values are `uniform`, `inverse`, `inverse_square` and `exponential`.

**Outputs**

* `Result table [Table]`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.carto.com/carto-user-manual/workflows/components/data-enrichment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
