LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 1 - The Learning Problem (http://book.caltech.edu/bookforum/forumdisplay.php?f=108)
-   -   Exercises and Problems (http://book.caltech.edu/bookforum/showthread.php?t=257)

 srport@pacbell.net 01-26-2013 11:04 PM

Re: Exercises and Problems

Question about the pocket algorithm described on p. 80 and illustrated on p. 83.

For data that is not linearly separable, I originally thought the pocket algorithm would run exactly like the PLA and simply take the best performing w vector over the entire iteration range and report it back. However, the description on p. 80 says that the w vector is not updated if Ein is worse.

Suppose after a certain number of iterations that the set of misclassified points is such that every misclassified point used to perform an update returns a worse performing w vector. In this case, the algorithm will never be able to find a better w vector.

My experience watching the PLA as it iterates is that it seems to jump about, and eventually (randomly) converges (with linearly separable data). I've seen proofs on the internet that it will converge, but do not really understand them. My point is that each iteration of the PLA does not necessarily improve g(x), but that in order to reach the best g(x), this jumping about is part of the process.

 yaser 01-26-2013 11:56 PM

Re: Exercises and Problems

Quote:
 Originally Posted by srport@pacbell.net (Post 9019) Question about the pocket algorithm described on p. 80 and illustrated on p. 83. For data that is not linearly separable, I originally thought the pocket algorithm would run exactly like the PLA and simply take the best performing w vector over the entire iteration range and report it back. However, the description on p. 80 says that the w vector is not updated if Ein is worse. Suppose after a certain number of iterations that the set of misclassified points is such that every misclassified point used to perform an update returns a worse performing w vector. In this case, the algorithm will never be able to find a better w vector. My experience watching the PLA as it iterates is that it seems to jump about, and eventually (randomly) converges (with linearly separable data). I've seen proofs on the internet that it will converge, but do not really understand them. My point is that each iteration of the PLA does not necessarily improve g(x), but that in order to reach the best g(x), this jumping about is part of the process. Can you please comment?
Hi,

There are two vectors that the pocket algorithm keeps track of. The "pocket weight vector" is the one that is not updated if gets worse. However, the other weight vector , which is identical to its counterpart in PLA, keeps getting updated with every iteration using a misclassified point. The algorithm then reports as its final solution.

 mariovela 04-15-2013 09:32 AM

Re: Exercise 1.10

On exercise 1.10 (c).. I'm getting confused on how to plot P[|v-u|>e] as a function of e... I assumed the distribution is binomial right?

 giridhar1202 09-09-2015 11:17 AM

Re: Exercises and Problems

I conducted experiment mentioned in 1.10(b), and i got following result.

V1 = 90, 999, 4375, 11803, 20329, 24793, 20411, 11685, 4450, 988, 77
Vrand = 103, 1022, 4389, 11691, 20444, 24489, 20653, 11669, 4502, 941, 97
Vmin = 62376, 37622, 2, 0, 0, 0, 0, 0, 0, 0, 0

i.e; 90 times out of 100,000 times, i get 0 heads in 10 tosses of first coin
999 times out of 100,000 times, i get 1 head in 10 tosses of first coin
....
77 times out of 100,000 times, i get 10 heads in 10 tosses of first coin

and

103 times out of 100,000 times, i get 0 heads in 10 tosses of coin chosen at random
....
97 times out of 100,000 times, i get 10 heads in 10 tosses of coin chosen at random

and

62376 times out of 100,000 times, i get 0 heads in 10 tosses of coin for which number of heads was minimum across 1000 coins

So, it is as expected that distribution of V1 and Vrand are similar.
Can someone please explain how we should interpret result for Vmin?

Is distribution of Vmin suggesting that one should be careful about overfitting ? Because if we have many hypothesis there will almost always be some hypothesis which fits data set exactly. So what should be one's strategy for selecting hypothesis ?

 magdon 09-21-2015 04:33 PM

Re: Exercises and Problems

Your connection to learning is correct, that when there many hypotheses, you should be more careful about overfitting. But the main point of the exercise is to realize that if you pick a coin carefully based on the data of the flips (in this case "carefully" means having minimum nu), then the distribution of the nu you see will not be what you would expect if you tossed that *same* coin again. If you tossed that c_min again, you expect to see a binomial distribution for heads. But the distribution of nu_min is clearly not binomial.

Continuing to learning, if you pick a hypothesis "carefully", say having minimum Ein, then the Ein you get will not necessarily reflect the distribution of errors you get when you "toss that hypothesis again" -- i.e test it on new data.

Quote:
 Originally Posted by giridhar1202 (Post 12031) I conducted experiment mentioned in 1.10(b), and i got following result. V1 = 90, 999, 4375, 11803, 20329, 24793, 20411, 11685, 4450, 988, 77 Vrand = 103, 1022, 4389, 11691, 20444, 24489, 20653, 11669, 4502, 941, 97 Vmin = 62376, 37622, 2, 0, 0, 0, 0, 0, 0, 0, 0 i.e; 90 times out of 100,000 times, i get 0 heads in 10 tosses of first coin 999 times out of 100,000 times, i get 1 head in 10 tosses of first coin .... 77 times out of 100,000 times, i get 10 heads in 10 tosses of first coin and 103 times out of 100,000 times, i get 0 heads in 10 tosses of coin chosen at random .... 97 times out of 100,000 times, i get 10 heads in 10 tosses of coin chosen at random and 62376 times out of 100,000 times, i get 0 heads in 10 tosses of coin for which number of heads was minimum across 1000 coins So, it is as expected that distribution of V1 and Vrand are similar. Can someone please explain how we should interpret result for Vmin? Is distribution of Vmin suggesting that one should be careful about overfitting ? Because if we have many hypothesis there will almost always be some hypothesis which fits data set exactly. So what should be one's strategy for selecting hypothesis ?

 williamshakespare 05-01-2016 08:42 AM

Re: Exercises and Problems

Quote:
 Originally Posted by tadworthington (Post 3335) Problem 1.4 WHAT I LIKED: This is an excellent problem, and a model one in my opinion for helping a student to understand material. It starts out easy, with a small data set - that, importantly, is user-generated. Having to generate a data set - even one as simple as a linearly separable data set in 2 dimensions - goes a long way to helping understand how the Perceptron works and why it wouldn't work if the data were not linearly separable. We gradually progress to bigger data sets, all along the way plotting the data so that we can see what is happening. It is not only instructive but also quite exhilarating to see the results of the PLA (plotting both the target function and the hypothesis generated by the PLA) actually graphed out in front of you on a computer screen! I also thought that the progression to 10 dimensions only after working through the more easily visualized 2 dimensional examples is perfect. Finally, the histogram approach to understanding the computational complexity of the PLA was, I thought, genius. TIME SPENT: Overall I spent about 90 minutes on the problem, although a lot of that was spent documenting my code for future reference, and just general "prettying" of my plots and results; so I would guess that this problem can be completed comfortably in an hour, assuming knowledge of programming in a language where plotting is easy (I used Python, but Matlab/Octave would also be excellent examples for quick-and-dirty graphics programming of this type.) For someone with no programming experience, the problem would probably take much more time. CLARITY: I think the problem is exceptionally clear. DIFFICULTY: I thought the problem was relatively easy, but the Perceptron is a relatively easy concept so I think making it harder is neither necessary nor appropriate. For metrics purposes, on a 1-10 scale of difficulty I would give this a 3.

 williamshakespare 05-13-2016 08:34 PM

Re: Exercises and Problems

Quote:
 Originally Posted by htlin (Post 3338) Thank you so much for the valuable feedback!
Mr Lin, I am studying python language in order to complete Problem 1.4.
Would you please share the answer(code with python) to provide an example for me?
Thanks.

 MaxvonHippel 03-12-2018 02:08 PM

Re: Exercises and Problems - 1.3

For question 1.3 we are asked to argue that the move from w(t) to w(t+1) is a move "in the right direction". I think I may be misunderstanding the question and/or the figure 1.3. My impression is that the figure shows us the case of R2 (ie, d=1), but that for arbitrary Rd we are considering the case where we change w(t+1) and then argue that the resulting change in the location of the boundary plane is a move toward (x, y) and therefore more likely than not to pass by (x, y), thereby putting (x, y) on the opposite (and correct) side of the classification boundary.

This is easy enough to say in English, as I just did. But unless I'm missing something, I don't think the analytic proof is necessarily so straightforward. It seems to me that I'd have to show that the plane moves closer to the point (x, y). I think then I would have to argue that moving closer implies increasing the likelihood of passing by/through the point. (This part seems straightforward). And then I would argue that increasing the likelihood of passing by/through the point is logically equivalent to increasing the likelihood of correctly classifying the point.

Is my overarching logic here sound? And is a mathematical (specifically, analytic) proof of this argument what you are intending for an answer? Or is the intent just for the reader to formulate an English explanation such as that which I attempt to give above?

Thank you very much for the help! I am really enjoying your book.

 htlin 03-17-2018 04:33 PM

Re: Exercises and Problems

Hint: the answer to (c) is meant to be formed from the answer to (b). Hope this helps.

 MaxvonHippel 03-18-2018 05:20 PM

Re: Exercises and Problems

Ah, that makes sense. (I was vastly overthinking this.)
Thank you!

All times are GMT -7. The time now is 03:19 AM.