K-Means Clustering

An unsupervised clustering algorithm partitioning data points into K clusters.

What is K-Means Clustering?

K-Means Clustering is an unsupervised machine learning algorithm used to partition a dataset into K distinct, non-overlapping clusters. It works iteratively by assigning data points to their nearest centroid, and then recalculating the centroid coordinates as the mean of all points assigned to that cluster, repeating until convergence.

Key Characteristics:

  • Unsupervised: works on unlabeled spatial data
  • K parameter specifies total centroids
  • Iteratively minimizes Within-Cluster Sum of Squares (WCSS)
  • Guarantees convergence to local minimum

Complexity Analysis

Avg Time Complexity: O(T * K * N * D) where T = iterations, N = points, D = dimensions
Space Complexity: O(N * D + K * D) storage matrix
Best Case: O(K * N) (converges in single step)
Worst Case: O(T * K * N) (runs full epoch count)

How it Works Step-by-Step

  1. Initialize Centroids: Place K centroids randomly on the coordinate space.
  2. Assign Points: Associate each data point with the closest centroid.
  3. Update Centroids: Reposition centroids to the mean coordinates of all assigned points.
  4. Repeat: Iterate steps 2 and 3 until centroids stabilize.

Code Implementation

Worked Trace Example

Grouping points (2,2), (8,8) with K=2:
1. Centroids randomly placed at (1,1) [C1] and (9,9) [C2]
2. (2,2) assigned to C1; (8,8) assigned to C2
3. Recalculate means: C1 moves to (2,2), C2 moves to (8,8)
Result: Optimal convergence.