Difference between revisions of "Point Clustering"

Revision as of 11:57, 12 October 2014

Point Clustering: Various Approaches

Please fill this in with any approaches that you have tried for Point Clustering along with code snippets. Please include discussion on why a particular method worked well or didn't work well and what circumstances it may be good for.

Possible Approaches

Coordinate interleaving (i.e. 1. rounding input coordinates, 2. grouping/aggregating them, and then 3. averaging their original coordinates so that the cluster position is at the weighted coordinate of all input geometries).
K-means Clustering
Hierarchical Clustering
Distance calculation for each coordinate pair

Input Parameters

Depending on algorithm...

Partitioning methods

Map grid width ("quare / manhattan world", see coordinate interleaving/rounding)
Some self-correlation threshold (see e.g. k-means)
Predefined irregular polygons (e.g. zip code boundaries)

Implementations

MapServer CLUSTER

References

Wikipedia Article on Data Clustering
PostGIS Mailing List thread on clustering points
Point Clustering Utility Trigger enhancement idea reported as ticket to PostGIS Trac.
Here & here: Mapserver Mailing List threads on clustering points
PyCluster: Python Cluster Functions
Using Genetic Algorithms in Clustering Problems: paper from GeoComputation 2000 conference
Automatic clustering via boundary extraction for mining massive point-data sets: paper from GeoComputation 2000 conference

@@ Line 11: / Line 11: @@
 === Input Parameters ===
 Depending on algorithm...
-* Partitioning methods
-** Map grid width ("quare / manhattan world", see coordinate interleaving/rounding)
+Partitioning methods
-** Some self-correlation threshold (see e.g. k-means)
+* Map grid width ("quare / manhattan world", see coordinate interleaving/rounding)
-** Predefined irregular polygons (e.g. zip code boundaries)
+* Some self-correlation threshold (see e.g. k-means)
+* Predefined irregular polygons (e.g. zip code boundaries)
 === Implementations ===