How Many Iterations to Pick Best KCluster?
To pick the best set of KMeans clusters centroids for each training set, how many times should be generate the centroids to find the best? 10? 100? 1000? The higher number provides the higher probability of finding the the centroids with the lowest Eout.
Thanks, David 
Re: How Many Iterations to Pick Best KCluster?
Quote:

Re: How Many Iterations to Pick Best KCluster?
Different problem If you start with different initial centroids you get different results. If you run it enough you can materially change Ein and Eout because of a better centroid result.
By itself Kmeans clustering doesn't guarantee an optimal result although it reaches a stable result quickly (especially with so few data points). 
Re: How Many Iterations to Pick Best KCluster?
Quote:

Re: How Many Iterations to Pick Best KCluster?
It was mentioned in lectures that you in general should try different initilizations.
However, I didn't do this... I just started with K random real data points. This strategy favours the SVM model when we compare classification performance, but since we execute the experiment many times this type of bias should be diminishing. 
Re: How Many Iterations to Pick Best KCluster?
I'm getting a difference of two letters in the answer between using the first kmeans cluster returned and testing 50 clusters per training data set.
If I can't get an official answer....those who submitted the correct answer, was it one iteration or a high number of iterations to choose the centroids for RBF? 
Re: How Many Iterations to Pick Best KCluster?
Each experiment should start with a random selection of points without regard to the data set. Lloyd's algorithm is then applied to these points to discover a set of centroids. This is unsupervised learning; i.e., the training data labels are not considered. It is not intended that one should find the optimal centroids, only that one finds some set of kmeans clusters.

Re: How Many Iterations to Pick Best KCluster?
The location of the centroids is fully dependent on the training data. Lloyd's algorithm doesn't find an optimal solution; Kmeans clustering requires multiple centroid selections to determine the best fit. This reduces the 'luck' in your initial random centroid selections.

Re: How Many Iterations to Pick Best KCluster?
Quote:
There are different approaches to choosing the centers that may lead to different performance. The above approach is the one used in this problem. 
Re: How Many Iterations to Pick Best KCluster?
Oops...less than an hour and I have to rerun all 5...

Re: How Many Iterations to Pick Best KCluster?
How close is the criteria for no change? .1? .5?

Re: How Many Iterations to Pick Best KCluster?
Quote:

Re: How Many Iterations to Pick Best KCluster?
You don't need a lot of runs or a lot of test points to get good results :)
My tests took ~3 minutes per test (you only need 3) and gave me the good answers for that part of the final. 
Re: How Many Iterations to Pick Best KCluster?
Each data set has 100 entries. I run the test 500 times, each creating a new training and test data set. I compute a new set of centroids for each new new training set. Then evaluate the result.
What is the tolerance between two Eins or 2 Eout whether they are the same? 
All times are GMT 7. The time now is 08:01 PM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.