Analytics Toolbox for Redshift

Analytics Toolbox for Redshift

clustering

This module contains functions that perform clustering on geographies.

CREATE_CLUSTERKMEANS

Description

Takes a set of points as input and partitions them into clusters using the k-means algorithm. Creates a new table with the same columns as input plus a cluster_id column with the cluster index for each of the input features.

  • input: VARCHAR name of the table or literal SQL query to be clustered.
  • output_table: VARCHAR name of the output table.
  • geom_column: VARCHAR name of the column to be clusterd.
  • number_of_clusters: INT number of clusters that will be generated.

Examples

1
2
3
CALL carto.CREATE_CLUSTERKMEANS('my-schema.my-input-table', 'my-schema.my-output-table', 'geom', 5);
-- The table `my-schema.my-output-table` will be created
-- adding the column cluster_id to those in `my-schema.my-input-table`.
1
2
3
CALL carto.CREATE_CLUSTERKMEANS('select * my-schema.my-input-table', 'my-schema.my-output-table', 'geom', 5);
-- The table `my-schema.my-output-table` will be created
-- adding the column cluster_id to those returned in the input query.

ST_CLUSTERKMEANS

Description

Takes a set of points as input and partitions them into clusters using the k-means algorithm. Returns an array of tuples with the cluster index for each of the input features and the input geometry.

  • geog: GEOMETRY points to be clustered.
  • numberOfClusters (optional): INT number of clusters that will be generated. It defaults to the square root of half the number of points (sqrt(<NUMBER OF POINTS>/2)).

Return type

SUPER: containing objects with cluster as the cluster id and geom as the geometry in GeoJSON format.

Examples

1
2
3
4
5
SELECT carto.ST_CLUSTERKMEANS(ST_GEOMFROMTEXT('MULTIPOINT ((0 0), (0 1), (5 0), (1 0))'));
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,1.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[5.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[1.0,0.0]}}
1
2
3
4
5
SELECT carto.ST_CLUSTERKMEANS(ST_GEOMFROMTEXT('MULTIPOINT ((0 0), (0 1), (5 0), (1 0))'), 2);
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[0.0,1.0]}}
-- {"cluster":1,"geom":{"type":"Point","coordinates":[5.0,0.0]}}
-- {"cluster":0,"geom":{"type":"Point","coordinates":[1.0,0.0]}}