K-means Clustering

K-means: widely used clustering technique! ,Initialization: blind random on input data!
Drawback: very sensitive to choice of initial clustercenters (seeds)!
Local optimal can be arbitrarily bad wrt. objective function, compared to global optimal clustering

Idea: spread the k initial cluster centers away from each other.!
O(log k)-competitive with the optimal clustering" substantial convergence time speedups (empirical)!

C - Sample a point uniformly at random from X
    While `C´ < k do
    Sample x € X with probability prop, to DSquare (x)
    c <- C U {x}
end while

c € c: Cluster Center
x € X: Data Point'D(x) distance between x and nearest Ck that has already chosen

Test dataset
200 Clustering runs, each with and without k-means initialization
Measure RSS (Intra-Class variance)

K.Means optimal clustering 115 times (57.5%)

Implementation Test Dataset: 4 Square (n=16)

Expected: 4 nice Cluster

Evaluation on Test Dataset!
• 200 clustering runs, each with and without kmeans++ initialization!
• Measure RSS (intra-class variance)!
• K-means! optimal clustering 115 times (57.5%) !
• K-means++ ! optimal clustering 182 times (91%)!

Comparison of the frequency distribution of RSS values between k-means and k-means
++ on the evaluation dataset (n=200)!

Comparison of the frequency distribution of RSS values between k-means and k-means
++ on the UCI real world dataset (n=500)!

Big Data and Cloud Computing

Search This Blog

K-means Clustering

Comments

Post a Comment

Popular posts from this blog

Python and Parquet Performance

Kubernetes Configuration Provider to load data from Secrets and Config Maps

Andriod Bug