Question 14 LLoyd's algorithm  empty clusters
While checking my implementation of LLoyd's algorithm, I received an error of float division by zero on one of my runs (it seems to happen rarely). I realized that if on the initial or subsequent iteration one of the clusters has zero elements then the update for mu for that clustered needs to be altered (see slide 11/20 Draft Lecture 16 for update rule). There seem to be two possibilities  either use the old value of mu for the cluster (i.e. not update it) or seed a new random value. If you use the old value then it is possible that the cluster will never get any members (for example if your points have a relatively empty region in the square and the mu for the cluster is in that region. Effectively you are using one less cluster if that happens. The alternative is to use a new random seed for that cluster and hopefully get out of the empty region but I am worried about convergence issues.
I am curious if others have encountered this issue and how they think it should be dealt with. 
Re: Question 14 LLoyd's algorithm  empty clusters
I generated a new random center when a center has no nearby points. The algo could still converge.

Re: Question 14 LLoyd's algorithm  empty clusters
Quote:

Re: Question 14 LLoyd's algorithm  empty clusters
I'm starting Lloyd's algorithm over anytime it comes up with fewer clusters than desired, so that each iteration I'm using the full number. Thanks to this original post I realized that this is an issue. Usually Lloyd's algorithm comes up with the correct number of clusters, but occasionally something weird happens. The larger the value of K (# desired clusters), the more frequently one ends up with less than the requisite number. I was very surprised to see rare cases where only one cluster remained (Perhaps I SHOULD be surprised, as all I've noticed is a bug in my code!). Anyone else see Lloyd's algorithm occasionally (rarely) coming up with only one region?
Thanks Mark, for asking the question. Also thanks to our illustrious professor! :) 
Re: Question 14 LLoyd's algorithm  empty clusters
I haven't had any problem since I used the approach of updating the mean of a cluster to a random point if the cluster was empty. And to my surprise once I figured out the parameters the scilearn python package has a routine which implements lloyds algorithm and it agreed with my own coding once I set equivalent parameters and seeds.

Re: Question 14 LLoyd's algorithm  empty clusters
It's fuzzy to me right now, but it looks like when I had no points in a cluster, its center snapped to the origin! Oops. I'm not sure this happened enough to harm my score, though.

Re: Question 14 LLoyd's algorithm  empty clusters
Quote:
a) Leave points with 0 nearby events alone, and use them as cluster centers. b) When there was a point with 0 nearby events, restart from new initial centers. c) Do 40 random initializations of Lloyd's algorithm and pick the one with the lowest minimum. That always (10000 trials) turned out to have no unoccupied cluster centers. For K=12 method c) was best at preventing regular RBF from being worse than SVM (problem 16), and for K=9 b) gave a somewhat higher probability of E_in =0 for regular RBF (problem 19). But the three methods were so close together in results that they all gave the same best choice among the multiple choices for problems 1519. 
Re: Question 14 LLoyd's algorithm  empty clusters
Quote:
I already ran 20 or more times the simulation, where each one has 1000 runs, for K=9, K=12, various gamma, etc. I only saw 1 abort due to an "isolated" center. So, I think that this is not an issue in statistical terms. Hope I don't have bugs in the implementation... Regarding your observation dudefromdayton, perhaps you reset your cluster centers to zero before Lloyd's algorithm, and somehow it (or they) stays zero if no X points are near it.:clueless: 
Re: Question 14 LLoyd's algorithm  empty clusters
Quote:

All times are GMT 7. The time now is 07:30 AM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.