> For the complete documentation index, see [llms.txt](https://docs.carto.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-redshift/sql-reference/clustering.md).

# clustering

This module contains functions that perform clustering on geographies.

## CREATE\_CLUSTERKMEANS <a href="#create_clusterkmeans" id="create_clusterkmeans"></a>

```sql
CREATE_CLUSTERKMEANS(input, output_table, geom_column, number_of_clusters)
```

**Description**

Takes a set of points as input and partitions them into clusters using the k-means algorithm. Creates a new table with the same columns as `input` plus a `cluster_id` column with the cluster index for each of the input features.

**Input parameters**

* `input`: `VARCHAR` name of the table or literal SQL query to be clustered.
* `output_table`: `VARCHAR(MAX)` qualified name of the output table, e.g. `<my-schema>.<my-output-table>`. The process will fail if the table already exists.
* `geom_column`: `VARCHAR` name of the column to be clusterd.
* `number_of_clusters`: `INT` number of clusters that will be generated.

{% hint style="warning" %}
**warning**

Keep in mid that due to some restrictions in the Redshift `VARCHAR` size, the maximum number of features (points) allow to be clustered is around 2500.
{% endhint %}

**Examples**

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.CREATE_CLUSTERKMEANS('<my-schema>.<my-table>', '<my-schema>.<my-output-table>', 'geom', 5);
-- The table `<my-schema>.<my-output-table>` will be created
-- adding the column cluster_id to those in `<my-schema>.<my-table>`.
```

{% endcode %}

{% code overflow="wrap" lineNumbers="true" %}

```sql
CALL carto.CREATE_CLUSTERKMEANS('SELECT * FROM <my-schema>.<my-table>', '<my-schema>.<my-output-table>', 'geom', 5);
-- The table `<my-schema>.<my-output-table>` will be created
-- adding the column cluster_id to those returned in the input query.
```

{% endcode %}

## ST\_CLUSTERKMEANS <a href="#st_clusterkmeans" id="st_clusterkmeans"></a>

```sql
ST_CLUSTERKMEANS(geog [, numberOfClusters])
```

**Description**

Takes a set of points as input and partitions them into clusters using the k-means algorithm. Returns an array of tuples with the cluster index for each of the input features and the input geometry.

**Input parameters**

* `geog`: `GEOMETRY` points to be clustered.
* `numberOfClusters` (optional): `INT` number of clusters that will be generated. It defaults to the square root of half the number of points (`sqrt(<NUMBER OF POINTS>/2)`). The output number of cluster cannot be greater to the number of distinct points of the `geog`.

**Return type**

`SUPER`: containing objects with `cluster` as the cluster id and `geom` as the geometry in GeoJSON format.

**Examples**

{% code overflow="wrap" lineNumbers="true" %}

```sql
SELECT carto.ST_CLUSTERKMEANS(ST_GEOMFROMTEXT('MULTIPOINT ((0 0), (0 1), (5 0), (1 0))'));
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,1.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[5.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[1.0,0.0]}}
```

{% endcode %}

{% code overflow="wrap" lineNumbers="true" %}

```sql
SELECT carto.ST_CLUSTERKMEANS(ST_GEOMFROMTEXT('MULTIPOINT ((0 0), (0 1), (5 0), (1 0))'), 2);
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,1.0]}}
-- {"cluster":1,"geom":{"type":"Point","coordinates":[5.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[1.0,0.0]}}
```

{% endcode %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.carto.com/data-and-analysis/analytics-toolbox-for-redshift/sql-reference/clustering.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
