Big Data and Cloud Computing

K-means Clustering

K-means: widely used clustering technique! ,Initialization: blind random on input data! Drawback: very sensitive to choice of initial clustercenters (seeds)! Local optimal can be arbitrarily bad wrt. objective function, compared to global optimal clustering Idea: spread the k initial cluster centers away from each other.! O(log k)-competitive with the optimal clustering" substantial convergence time speedups (empirical)! C - Sample a point uniformly at random from X While `C´ < k do Sample x € X with probability prop, to DSquare (x) c <- C U {x} end while c € c: Cluster Center x € X: Data Point'D(x) distance between x and nearest Ck that has already chosen Test dataset 200 Clustering runs, each with and without k-means initialization Measure RSS (Intra-Class variance) K.Means optimal clustering 115 times (57.5%) Implementation Test Dataset: 4 Square (n=16) Expected: 4 nice Cluster Evaluation on Test ...

Big Data and Cloud Computing

Search This Blog

Posts

K-means Clustering