LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > The Final

Reply
 
Thread Tools Display Modes
  #1  
Old 06-08-2012, 10:10 AM
markweitzman markweitzman is offline
Invited Guest
 
Join Date: Apr 2012
Location: Las Vegas
Posts: 69
Default Question 14 LLoyd's algorithm - empty clusters

While checking my implementation of LLoyd's algorithm, I received an error of float division by zero on one of my runs (it seems to happen rarely). I realized that if on the initial or subsequent iteration one of the clusters has zero elements then the update for mu for that clustered needs to be altered (see slide 11/20 Draft Lecture 16 for update rule). There seem to be two possibilities - either use the old value of mu for the cluster (i.e. not update it) or seed a new random value. If you use the old value then it is possible that the cluster will never get any members (for example if your points have a relatively empty region in the square and the mu for the cluster is in that region. Effectively you are using one less cluster if that happens. The alternative is to use a new random seed for that cluster and hopefully get out of the empty region but I am worried about convergence issues.

I am curious if others have encountered this issue and how they think it should be dealt with.
Reply With Quote
  #2  
Old 06-08-2012, 10:57 AM
kkkkk kkkkk is offline
Invited Guest
 
Join Date: Mar 2012
Posts: 71
Default Re: Question 14 LLoyd's algorithm - empty clusters

I generated a new random center when a center has no nearby points. The algo could still converge.
Reply With Quote
  #3  
Old 06-08-2012, 11:45 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Question 14 LLoyd's algorithm - empty clusters

Quote:
Originally Posted by kkkkk View Post
I generated a new random center when a center has no nearby points.
This is a valid approach. You can also restart from new initial centers.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #4  
Old 06-09-2012, 05:09 PM
mathprof mathprof is offline
Invited Guest
 
Join Date: Apr 2012
Location: Bakersfield, California
Posts: 36
Default Re: Question 14 LLoyd's algorithm - empty clusters

I'm starting Lloyd's algorithm over anytime it comes up with fewer clusters than desired, so that each iteration I'm using the full number. Thanks to this original post I realized that this is an issue. Usually Lloyd's algorithm comes up with the correct number of clusters, but occasionally something weird happens. The larger the value of K (# desired clusters), the more frequently one ends up with less than the requisite number. I was very surprised to see rare cases where only one cluster remained (Perhaps I SHOULD be surprised, as all I've noticed is a bug in my code!). Anyone else see Lloyd's algorithm occasionally (rarely) coming up with only one region?

Thanks Mark, for asking the question. Also thanks to our illustrious professor!
:-)
Reply With Quote
  #5  
Old 06-09-2012, 08:01 PM
markweitzman markweitzman is offline
Invited Guest
 
Join Date: Apr 2012
Location: Las Vegas
Posts: 69
Default Re: Question 14 LLoyd's algorithm - empty clusters

I haven't had any problem since I used the approach of updating the mean of a cluster to a random point if the cluster was empty. And to my surprise once I figured out the parameters the scilearn python package has a routine which implements lloyds algorithm and it agreed with my own coding once I set equivalent parameters and seeds.
Reply With Quote
  #6  
Old 06-11-2012, 09:45 AM
dudefromdayton dudefromdayton is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 140
Default Re: Question 14 LLoyd's algorithm - empty clusters

It's fuzzy to me right now, but it looks like when I had no points in a cluster, its center snapped to the origin! Oops. I'm not sure this happened enough to harm my score, though.
Reply With Quote
  #7  
Old 06-11-2012, 01:58 PM
Yellin Yellin is offline
Member
 
Join Date: Apr 2012
Posts: 26
Default Re: Question 14 LLoyd's algorithm - empty clusters

Quote:
Originally Posted by yaser View Post
This is a valid approach. You can also restart from new initial centers.
I tried
a) Leave points with 0 nearby events alone, and use them as cluster centers.
b) When there was a point with 0 nearby events, restart from new initial centers.
c) Do 40 random initializations of Lloyd's algorithm and pick the one with the lowest minimum. That always (10000 trials) turned out to have no unoccupied cluster centers.

For K=12 method c) was best at preventing regular RBF from being worse than SVM (problem 16), and for K=9 b) gave a somewhat higher probability of E_in =0 for regular RBF (problem 19). But the three methods were so close together in results that they all gave the same best choice among the multiple choices for problems 15-19.
Reply With Quote
  #8  
Old 09-10-2012, 02:36 PM
TonySuarez TonySuarez is offline
Member
 
Join Date: Jul 2012
Location: Lisboa, Portugal
Posts: 35
Default Re: Question 14 LLoyd's algorithm - empty clusters

Quote:
Originally Posted by dudefromdayton View Post
It's fuzzy to me right now, but it looks like when I had no points in a cluster, its center snapped to the origin! Oops. I'm not sure this happened enough to harm my score, though.
I implemented an "abort" counter for these cases, and subtracted its final value from the number of runs in the batch to calculate the statistics.

I already ran 20 or more times the simulation, where each one has 1000 runs, for K=9, K=12, various gamma, etc. I only saw 1 abort due to an "isolated" center. So, I think that this is not an issue in statistical terms. Hope I don't have bugs in the implementation...

Regarding your observation dudefromdayton, perhaps you reset your cluster centers to zero before Lloyd's algorithm, and somehow it (or they) stays zero if no X points are near it.
Reply With Quote
  #9  
Old 09-10-2012, 05:05 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Question 14 LLoyd's algorithm - empty clusters

Quote:
Originally Posted by TonySuarez View Post
I implemented an "abort" counter for these cases, and subtracted its final value from the number of runs in the batch to calculate the statistics.

I already ran 20 or more times the simulation, where each one has 1000 runs, for K=9, K=12, various gamma, etc. I only saw 1 abort due to an "isolated" center. So, I think that this is not an issue in statistical terms. Hope I don't have bugs in the implementation...

Regarding your observation dudefromdayton, perhaps you reset your cluster centers to zero before Lloyd's algorithm, and somehow it (or they) stays zero if no X points are near it.
This time, I fixed the problem statement to discard the runs that have empty clusters in order to standardize the answer.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
Reply

Tags
lloyd's algorithm

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:55 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.