#21




Re: Exercises and Problems
Question about the pocket algorithm described on p. 80 and illustrated on p. 83.
For data that is not linearly separable, I originally thought the pocket algorithm would run exactly like the PLA and simply take the best performing w vector over the entire iteration range and report it back. However, the description on p. 80 says that the w vector is not updated if Ein is worse. Suppose after a certain number of iterations that the set of misclassified points is such that every misclassified point used to perform an update returns a worse performing w vector. In this case, the algorithm will never be able to find a better w vector. My experience watching the PLA as it iterates is that it seems to jump about, and eventually (randomly) converges (with linearly separable data). I've seen proofs on the internet that it will converge, but do not really understand them. My point is that each iteration of the PLA does not necessarily improve g(x), but that in order to reach the best g(x), this jumping about is part of the process. Can you please comment? 
#22




Re: Exercises and Problems
Quote:
There are two vectors that the pocket algorithm keeps track of. The "pocket weight vector" is the one that is not updated if gets worse. However, the other weight vector , which is identical to its counterpart in PLA, keeps getting updated with every iteration using a misclassified point. The algorithm then reports as its final solution.
__________________
Where everyone thinks alike, no one thinks very much 
#23




Re: Exercise 1.10
On exercise 1.10 (c).. I'm getting confused on how to plot P[vu>e] as a function of e... I assumed the distribution is binomial right?

#24




Re: Exercises and Problems
I conducted experiment mentioned in 1.10(b), and i got following result.
V1 = 90, 999, 4375, 11803, 20329, 24793, 20411, 11685, 4450, 988, 77 Vrand = 103, 1022, 4389, 11691, 20444, 24489, 20653, 11669, 4502, 941, 97 Vmin = 62376, 37622, 2, 0, 0, 0, 0, 0, 0, 0, 0 i.e; 90 times out of 100,000 times, i get 0 heads in 10 tosses of first coin 999 times out of 100,000 times, i get 1 head in 10 tosses of first coin .... 77 times out of 100,000 times, i get 10 heads in 10 tosses of first coin and 103 times out of 100,000 times, i get 0 heads in 10 tosses of coin chosen at random .... 97 times out of 100,000 times, i get 10 heads in 10 tosses of coin chosen at random and 62376 times out of 100,000 times, i get 0 heads in 10 tosses of coin for which number of heads was minimum across 1000 coins So, it is as expected that distribution of V1 and Vrand are similar. Can someone please explain how we should interpret result for Vmin? Is distribution of Vmin suggesting that one should be careful about overfitting ? Because if we have many hypothesis there will almost always be some hypothesis which fits data set exactly. So what should be one's strategy for selecting hypothesis ? 
#25




Re: Exercises and Problems
Your connection to learning is correct, that when there many hypotheses, you should be more careful about overfitting. But the main point of the exercise is to realize that if you pick a coin carefully based on the data of the flips (in this case "carefully" means having minimum nu), then the distribution of the nu you see will not be what you would expect if you tossed that *same* coin again. If you tossed that c_min again, you expect to see a binomial distribution for heads. But the distribution of nu_min is clearly not binomial.
Continuing to learning, if you pick a hypothesis "carefully", say having minimum Ein, then the Ein you get will not necessarily reflect the distribution of errors you get when you "toss that hypothesis again"  i.e test it on new data. Quote:
__________________
Have faith in probability 
#26




Re: Exercises and Problems
Quote:
would you like to share your answer please? Thanks 
#27




Re: Exercises and Problems
Mr Lin, I am studying python language in order to complete Problem 1.4.
Would you please share the answer(code with python) to provide an example for me? Thanks. 
Tags 
hoeffding's inequality, hoeffdinginequality 
Thread Tools  
Display Modes  

