LFD Book Forum Exercises and Problems
 Register FAQ Calendar Mark Forums Read

#21
01-26-2013, 11:04 PM
 srport@pacbell.net Junior Member Join Date: Jan 2013 Posts: 1
Re: Exercises and Problems

Question about the pocket algorithm described on p. 80 and illustrated on p. 83.

For data that is not linearly separable, I originally thought the pocket algorithm would run exactly like the PLA and simply take the best performing w vector over the entire iteration range and report it back. However, the description on p. 80 says that the w vector is not updated if Ein is worse.

Suppose after a certain number of iterations that the set of misclassified points is such that every misclassified point used to perform an update returns a worse performing w vector. In this case, the algorithm will never be able to find a better w vector.

My experience watching the PLA as it iterates is that it seems to jump about, and eventually (randomly) converges (with linearly separable data). I've seen proofs on the internet that it will converge, but do not really understand them. My point is that each iteration of the PLA does not necessarily improve g(x), but that in order to reach the best g(x), this jumping about is part of the process.

#22
01-26-2013, 11:56 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: Exercises and Problems

Quote:
 Originally Posted by srport@pacbell.net Question about the pocket algorithm described on p. 80 and illustrated on p. 83. For data that is not linearly separable, I originally thought the pocket algorithm would run exactly like the PLA and simply take the best performing w vector over the entire iteration range and report it back. However, the description on p. 80 says that the w vector is not updated if Ein is worse. Suppose after a certain number of iterations that the set of misclassified points is such that every misclassified point used to perform an update returns a worse performing w vector. In this case, the algorithm will never be able to find a better w vector. My experience watching the PLA as it iterates is that it seems to jump about, and eventually (randomly) converges (with linearly separable data). I've seen proofs on the internet that it will converge, but do not really understand them. My point is that each iteration of the PLA does not necessarily improve g(x), but that in order to reach the best g(x), this jumping about is part of the process. Can you please comment?
Hi,

There are two vectors that the pocket algorithm keeps track of. The "pocket weight vector" is the one that is not updated if gets worse. However, the other weight vector , which is identical to its counterpart in PLA, keeps getting updated with every iteration using a misclassified point. The algorithm then reports as its final solution.
__________________
Where everyone thinks alike, no one thinks very much
#23
04-15-2013, 09:32 AM
 mariovela Junior Member Join Date: Apr 2012 Posts: 5
Re: Exercise 1.10

On exercise 1.10 (c).. I'm getting confused on how to plot P[|v-u|>e] as a function of e... I assumed the distribution is binomial right?
#24
09-09-2015, 11:17 AM
 giridhar1202 Junior Member Join Date: Sep 2015 Posts: 2
Re: Exercises and Problems

I conducted experiment mentioned in 1.10(b), and i got following result.

V1 = 90, 999, 4375, 11803, 20329, 24793, 20411, 11685, 4450, 988, 77
Vrand = 103, 1022, 4389, 11691, 20444, 24489, 20653, 11669, 4502, 941, 97
Vmin = 62376, 37622, 2, 0, 0, 0, 0, 0, 0, 0, 0

i.e; 90 times out of 100,000 times, i get 0 heads in 10 tosses of first coin
999 times out of 100,000 times, i get 1 head in 10 tosses of first coin
....
77 times out of 100,000 times, i get 10 heads in 10 tosses of first coin

and

103 times out of 100,000 times, i get 0 heads in 10 tosses of coin chosen at random
....
97 times out of 100,000 times, i get 10 heads in 10 tosses of coin chosen at random

and

62376 times out of 100,000 times, i get 0 heads in 10 tosses of coin for which number of heads was minimum across 1000 coins

So, it is as expected that distribution of V1 and Vrand are similar.
Can someone please explain how we should interpret result for Vmin?

Is distribution of Vmin suggesting that one should be careful about overfitting ? Because if we have many hypothesis there will almost always be some hypothesis which fits data set exactly. So what should be one's strategy for selecting hypothesis ?
#25
09-21-2015, 04:33 PM
 magdon RPI Join Date: Aug 2009 Location: Troy, NY, USA. Posts: 597
Re: Exercises and Problems

Your connection to learning is correct, that when there many hypotheses, you should be more careful about overfitting. But the main point of the exercise is to realize that if you pick a coin carefully based on the data of the flips (in this case "carefully" means having minimum nu), then the distribution of the nu you see will not be what you would expect if you tossed that *same* coin again. If you tossed that c_min again, you expect to see a binomial distribution for heads. But the distribution of nu_min is clearly not binomial.

Continuing to learning, if you pick a hypothesis "carefully", say having minimum Ein, then the Ein you get will not necessarily reflect the distribution of errors you get when you "toss that hypothesis again" -- i.e test it on new data.

Quote:
 Originally Posted by giridhar1202 I conducted experiment mentioned in 1.10(b), and i got following result. V1 = 90, 999, 4375, 11803, 20329, 24793, 20411, 11685, 4450, 988, 77 Vrand = 103, 1022, 4389, 11691, 20444, 24489, 20653, 11669, 4502, 941, 97 Vmin = 62376, 37622, 2, 0, 0, 0, 0, 0, 0, 0, 0 i.e; 90 times out of 100,000 times, i get 0 heads in 10 tosses of first coin 999 times out of 100,000 times, i get 1 head in 10 tosses of first coin .... 77 times out of 100,000 times, i get 10 heads in 10 tosses of first coin and 103 times out of 100,000 times, i get 0 heads in 10 tosses of coin chosen at random .... 97 times out of 100,000 times, i get 10 heads in 10 tosses of coin chosen at random and 62376 times out of 100,000 times, i get 0 heads in 10 tosses of coin for which number of heads was minimum across 1000 coins So, it is as expected that distribution of V1 and Vrand are similar. Can someone please explain how we should interpret result for Vmin? Is distribution of Vmin suggesting that one should be careful about overfitting ? Because if we have many hypothesis there will almost always be some hypothesis which fits data set exactly. So what should be one's strategy for selecting hypothesis ?
__________________
Have faith in probability
#26
05-01-2016, 08:42 AM
 williamshakespare Junior Member Join Date: Mar 2016 Posts: 6
Re: Exercises and Problems

Quote:
 Originally Posted by tadworthington Problem 1.4 WHAT I LIKED: This is an excellent problem, and a model one in my opinion for helping a student to understand material. It starts out easy, with a small data set - that, importantly, is user-generated. Having to generate a data set - even one as simple as a linearly separable data set in 2 dimensions - goes a long way to helping understand how the Perceptron works and why it wouldn't work if the data were not linearly separable. We gradually progress to bigger data sets, all along the way plotting the data so that we can see what is happening. It is not only instructive but also quite exhilarating to see the results of the PLA (plotting both the target function and the hypothesis generated by the PLA) actually graphed out in front of you on a computer screen! I also thought that the progression to 10 dimensions only after working through the more easily visualized 2 dimensional examples is perfect. Finally, the histogram approach to understanding the computational complexity of the PLA was, I thought, genius. TIME SPENT: Overall I spent about 90 minutes on the problem, although a lot of that was spent documenting my code for future reference, and just general "prettying" of my plots and results; so I would guess that this problem can be completed comfortably in an hour, assuming knowledge of programming in a language where plotting is easy (I used Python, but Matlab/Octave would also be excellent examples for quick-and-dirty graphics programming of this type.) For someone with no programming experience, the problem would probably take much more time. CLARITY: I think the problem is exceptionally clear. DIFFICULTY: I thought the problem was relatively easy, but the Perceptron is a relatively easy concept so I think making it harder is neither necessary nor appropriate. For metrics purposes, on a 1-10 scale of difficulty I would give this a 3.

#27
05-13-2016, 08:34 PM
 williamshakespare Junior Member Join Date: Mar 2016 Posts: 6
Re: Exercises and Problems

Quote:
 Originally Posted by htlin Thank you so much for the valuable feedback!
Mr Lin, I am studying python language in order to complete Problem 1.4.
Would you please share the answer(code with python) to provide an example for me?
Thanks.
#28
03-12-2018, 02:08 PM
 MaxvonHippel Junior Member Join Date: Mar 2018 Posts: 2
Re: Exercises and Problems - 1.3

For question 1.3 we are asked to argue that the move from w(t) to w(t+1) is a move "in the right direction". I think I may be misunderstanding the question and/or the figure 1.3. My impression is that the figure shows us the case of R2 (ie, d=1), but that for arbitrary Rd we are considering the case where we change w(t+1) and then argue that the resulting change in the location of the boundary plane is a move toward (x, y) and therefore more likely than not to pass by (x, y), thereby putting (x, y) on the opposite (and correct) side of the classification boundary.

This is easy enough to say in English, as I just did. But unless I'm missing something, I don't think the analytic proof is necessarily so straightforward. It seems to me that I'd have to show that the plane moves closer to the point (x, y). I think then I would have to argue that moving closer implies increasing the likelihood of passing by/through the point. (This part seems straightforward). And then I would argue that increasing the likelihood of passing by/through the point is logically equivalent to increasing the likelihood of correctly classifying the point.

Is my overarching logic here sound? And is a mathematical (specifically, analytic) proof of this argument what you are intending for an answer? Or is the intent just for the reader to formulate an English explanation such as that which I attempt to give above?

Thank you very much for the help! I am really enjoying your book.
#29
03-17-2018, 04:33 PM
 htlin NTU Join Date: Aug 2009 Location: Taipei, Taiwan Posts: 610
Re: Exercises and Problems

Hint: the answer to (c) is meant to be formed from the answer to (b). Hope this helps.
__________________
When one teaches, two learn.
#30
03-18-2018, 05:20 PM
 MaxvonHippel Junior Member Join Date: Mar 2018 Posts: 2
Re: Exercises and Problems

Ah, that makes sense. (I was vastly overthinking this.)
Thank you!

 Tags hoeffding's inequality, hoeffding-inequality

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 11:31 AM.