LFD Book Forum How Many Iterations to Pick Best K-Cluster?
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
09-17-2012, 06:34 PM
 DavidNJ Member Join Date: Jul 2012 Posts: 28
How Many Iterations to Pick Best K-Cluster?

To pick the best set of K-Means clusters centroids for each training set, how many times should be generate the centroids to find the best? 10? 100? 1000? The higher number provides the higher probability of finding the the centroids with the lowest Eout.

Thanks,

David
#2
09-17-2012, 07:00 PM
 MLearning Senior Member Join Date: Jul 2012 Posts: 56
Re: How Many Iterations to Pick Best K-Cluster?

Quote:
 Originally Posted by DavidNJ To pick the best set of K-Means clusters centroids for each training set, how many times should be generate the centroids to find the best? 10? 100? 1000? The higher number provides the higher probability of finding the the centroids with the lowest Eout. Thanks, David
You need a condition that stops iteration when the centroids do not change.
#3
09-17-2012, 07:31 PM
 DavidNJ Member Join Date: Jul 2012 Posts: 28
Re: How Many Iterations to Pick Best K-Cluster?

Different problem If you start with different initial centroids you get different results. If you run it enough you can materially change Ein and Eout because of a better centroid result.

By itself K-means clustering doesn't guarantee an optimal result although it reaches a stable result quickly (especially with so few data points).
#4
09-17-2012, 09:05 PM
 MLearning Senior Member Join Date: Jul 2012 Posts: 56
Re: How Many Iterations to Pick Best K-Cluster?

Quote:
 Originally Posted by DavidNJ Different problem If you start with different initial centroids you get different results. If you run it enough you can materially change Ein and Eout because of a better centroid result. By itself K-means clustering doesn't guarantee an optimal result although it reaches a stable result quickly (especially with so few data points).
I agree that K-means clustering is locally optimal.
#5
09-18-2012, 01:06 AM
 rainbow Member Join Date: Jul 2012 Posts: 41
Re: How Many Iterations to Pick Best K-Cluster?

It was mentioned in lectures that you in general should try different initilizations.

However, I didn't do this... I just started with K random real data points. This strategy favours the SVM model when we compare classification performance, but since we execute the experiment many times this type of bias should be diminishing.
#6
09-18-2012, 05:37 AM
 DavidNJ Member Join Date: Jul 2012 Posts: 28
Re: How Many Iterations to Pick Best K-Cluster?

I'm getting a difference of two letters in the answer between using the first k-means cluster returned and testing 50 clusters per training data set.

If I can't get an official answer....those who submitted the correct answer, was it one iteration or a high number of iterations to choose the centroids for RBF?
#7
09-18-2012, 07:46 AM
 JohnH Member Join Date: Jul 2012 Posts: 43
Re: How Many Iterations to Pick Best K-Cluster?

Each experiment should start with a random selection of points without regard to the data set. Lloyd's algorithm is then applied to these points to discover a set of centroids. This is unsupervised learning; i.e., the training data labels are not considered. It is not intended that one should find the optimal centroids, only that one finds some set of k-means clusters.
#8
09-18-2012, 07:53 AM
 DavidNJ Member Join Date: Jul 2012 Posts: 28
Re: How Many Iterations to Pick Best K-Cluster?

The location of the centroids is fully dependent on the training data. Lloyd's algorithm doesn't find an optimal solution; K-means clustering requires multiple centroid selections to determine the best fit. This reduces the 'luck' in your initial random centroid selections.
#9
09-18-2012, 09:36 AM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: How Many Iterations to Pick Best K-Cluster?

Quote:
 Originally Posted by DavidNJ If I can't get an official answer....those who submitted the correct answer, was it one iteration or a high number of iterations to choose the centroids for RBF?
The problem specifies that each run starts with random centers, and the only time they are revised (starting with another set of random centers) is when one of the clusters becomes empty during the iterations of Lloyd's algorithm.

There are different approaches to choosing the centers that may lead to different performance. The above approach is the one used in this problem.
__________________
Where everyone thinks alike, no one thinks very much
#10
09-18-2012, 11:54 AM
 DavidNJ Member Join Date: Jul 2012 Posts: 28
Re: How Many Iterations to Pick Best K-Cluster?

Oops...less than an hour and I have to rerun all 5...

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 02:19 PM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.