LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 2 (http://book.caltech.edu/bookforum/forumdisplay.php?f=131)
-   -   HW 2 Problem 6 (http://book.caltech.edu/bookforum/showthread.php?t=896)

 dbaksi@gmail.com 07-21-2012 04:55 PM

HW 2 Problem 6

How is this different from problem 5 other than N=1000 and the fact that these simulated 'out of sample' points (E_out) are generated fresh ? I may be missing something but it seems to boil down to running the same program as in problem 5 with N=1000 for 1000 times; can someone clarify please ? thanks

 yaser 07-21-2012 10:00 PM

Re: HW 2 Problem 6

Quote:
 Originally Posted by dbaksi@gmail.com (Post 3575) How is this different from problem 5 other than N=1000 and the fact that these simulated 'out of sample' points (E_out) are generated fresh ? I may be missing something but it seems to boil down to running the same program as in problem 5 with N=1000 for 1000 times; can someone clarify please ? thanks
There are indeed instances in the homeworks where the same experiment covers a number of homework problems.

Problem 5 asks about while Problem 6 asks about (an estimate of) . In both problems, ( stands for the number of training examples in our notation).

 MLearning 07-22-2012 01:58 AM

Re: HW 2 Problem 6

It is my understanding that "fresh data" refers to cross-validation data. Do we then compute Eout using the weights obtained in problem 5? When I do this, Eout < Ein. When I design the weights using the fresh data, Eout is approximately equal to Ein. Does this makes sense?

 yaser 07-22-2012 02:06 AM

Re: HW 2 Problem 6

Quote:
 Originally Posted by MLearning (Post 3581) It is my understanding that "fresh data" refers to cross-validation data. Do we then compute Eout using the weights obtained in problem 5?
It is simpler than cross validation (a topic that will be covered in detail in a later lecture). You just generate new data points that were not involved in training and evaluate the final hypothesis on those points.

The final hypothesis is indeed the one whose weights were determined in Problem 5, where the training took place.

 dsvav 07-22-2012 04:49 AM

Re: HW 2 Problem 6

I am confused here , I don't understand what is final hypothesis here.

There are 1000 target function and corresponding 1000 weight vectors/hypothesis in problem 5 .

So for problem 6 , 1000 times I generate 1000 out-of-sample data and then for each weight vector and target function(from problem 5) I evaluate E_out for that out-of-sample data and finally average them. This is how I have done.

I don't see final hypothesis here , what I am missing , any hint

Could it be that in problem 5 there is supposed to be only one target function and many in-sample data ? If so then the final hypothesis/weights could be that produces minimum in-sample error E_in .

Thanks a lot.

 yaser 07-22-2012 05:00 AM

Re: HW 2 Problem 6

Quote:
 Originally Posted by dsvav (Post 3584) I am confused here , I don't understand what is final hypothesis here. There are 1000 target function and corresponding 1000 weight vectors/hypothesis in problem 5 . So for problem 6 , 1000 times I generate 1000 out-of-sample data and then for each weight vector and target function(from problem 5) I evaluate E_out for that out-of-sample data and finally average them. This is how I have done. I don't see final hypothesis here , what I am missing , any hint Could it be that in problem 5 there is supposed to be only one target function and many in-sample data ? If so then the final hypothesis/weights could be that produces minimum in-sample error E_in . Please clarify. Thanks a lot.
There is a final hypothesis for each of the 1000 runs. The only reason we are repeating the runs is to average out statistical fluctuations, but all the notions of the learning problem, including the final hypothesis, pertain to a single run.

 dbaksi@gmail.com 07-22-2012 06:55 AM

Re: HW 2 Problem 6

Thanks a lot. The statements about (i) N being the number of 'in-sample' training data in both problems and (ii) the freshly generated 1000 points being disjoint from the first set clarified the confusion I had.

 dsvav 07-22-2012 07:14 AM

Re: HW 2 Problem 6

Thanks Professor yaser.

 rakhlin 07-23-2012 11:58 AM

Re: HW 2 Problem 6

When I generate new data and hypothesis for every single run of 1000 (as the problem suggests) I get stable out-of-sample result close to (slightly greater than) in-sample error.
When I estimate 1000 different out-of-samples for one in-sample and single hypothesis I get very different average error rates with high variability from 0.01 to 0.13 Why so?

 yaser 07-23-2012 01:48 PM

Re: HW 2 Problem 6

Quote:
 Originally Posted by rakhlin (Post 3612) When I generate new data and hypothesis for every single run of 1000 (as the problem suggests) I get stable out-of-sample result close to (slightly greater than) in-sample error. When I estimate 1000 different out-of-samples for one in-sample and single hypothesis I get very different average error rates with high variability from 0.01 to 0.13 Why so?
Just to clarify. You used the in-sample points to train and arrived at a final set of weights (corresponding to the final hypothesis). Each out of-sample point is now tested on this hypothesis and compared to the target value on the same point. Now, what exactly do you do to get the two scenarios you are describing?

 rakhlin 07-23-2012 02:41 PM

Re: HW 2 Problem 6

Quote:
 Originally Posted by yaser (Post 3614) Just to clarify. You used the in-sample points to train and arrived at a final set of weights (corresponding to the final hypothesis). Each out of-sample point is now tested on this hypothesis and compared to the target value on the same point. Now, what exactly do you do to get the two scenarios you are describing?
1-st (normal) scenario: I test out-of-sample data set (100 points) against linear model. I repeat it 1000 times: generate 100 in-sample points, linear fit, generate 100 out-of-sample points, test. On each iteration accumulate # of mistaken points. Average errors when done. Average error is stable from run to run.

2-nd scenario: fit linear model only once. Repeat 1000 times: generate 100 out-of-sample points, test. Accumulate and average errors when done. Here I get remarkable variation in average error.

I'd like to understand why these scenarios different. I believe they must not

 yaser 07-23-2012 04:04 PM

Re: HW 2 Problem 6

Quote:
 Originally Posted by rakhlin (Post 3616) 2-nd scenario: fit linear model only once. Repeat 1000 times: generate 100 out-of-sample points, test. Accumulate and average errors when done. Here I get remarkable variation in average error.not
Does variation in the average error mean that you repeat the entire experiment you described (including the target, training set, and resulting linear fit) and look at the different averages you get?

 rakhlin 07-23-2012 05:59 PM

Re: HW 2 Problem 6

Quote:
 Originally Posted by yaser (Post 3617) Does variation in the average error mean that you repeat the entire experiment you described (including the target, training set, and resulting linear fit) and look at the different averages you get?
Not quite so. When I repeat entire experiment (including the target, training set, and resulting linear fit) I always get approximately same averages. I get different averages (0.01 ... 0.13) when I use one target and average over 1000 out-of-sample sets (for a single target).

Now as I plotted both lines (target and hypothesis) per your advice I begin to think that this is maybe what we should expect. Linear regression not always fits well. Usually it looks good giving small in-sample error. But sometimes disagreement is visually large (>0.1 in-sample error). This is the root of variation in average error when I use same "bad" regression for all 1000 iterations. I hope this type of experiment isn't implied by the problem. Otherwise it has no certain answer - at least 2 answers match.

So there is another question. Is >0.1 in-sample error and visually non-optimal fit still valid outcome of linear regression for linearly separable data?

 ilya239 07-23-2012 08:59 PM

Re: HW 2 Problem 6

Quote:
 Originally Posted by rakhlin (Post 3612) When I generate new data and hypothesis for every single run of 1000 (as the problem suggests) I get stable out-of-sample result close to (slightly greater than) in-sample error. When I estimate 1000 different out-of-samples for one in-sample and single hypothesis I get very different average error rates with high variability from 0.01 to 0.13 Why so?
That's like problem 1: if you try many enough different out-of-samples, inevitably there will be one on which your hypothesis does great, and one on which it does badly. As an extreme case, if your out-of-sample testing size was 1 instead of 1000, on some of these out-of-samples you'd get 0% error rate and on some you'd get 100% error rate. To get an actual estimate of out-of-sample error rate you should pool all these together.

 rakhlin 07-24-2012 02:02 AM

Re: HW 2 Problem 6

Quote:
 Originally Posted by ilya239 (Post 3622) That's like problem 1: if you try many enough different out-of-samples, inevitably there will be one on which your hypothesis does great, and one on which it does badly. As an extreme case, if your out-of-sample testing size was 1 instead of 1000, on some of these out-of-samples you'd get 0% error rate and on some you'd get 100% error rate. To get an actual estimate of out-of-sample error rate you should pool all these together.
This is not the case - I average 1000 out-of-samples anyway. Is seems the reason is large variation in in-sample error for different train samples.

 MLearning 07-24-2012 08:17 AM

Re: HW 2 Problem 6

I also observe some discrepancy while computing Eout. When I hold the target function fixed, Eout is approximately equal to Ein. When I use different target functions for each experiment, Eout is significantly higher than Ein. Is this expected?

 dsvav 07-24-2012 08:54 AM

Re: HW 2 Problem 6

@MLearning

In my opinion E_out should be near E_in when target is not fixed(i.e averaging over 1000 iterations) .

Unfortunately problem can be anywhere , but most probably in computing error.

Could it be that when you are computing error for E_out for one iteration, the number of sample points for out of sample are 1000 , and if you forgot to change the number of samples points from 100 (from Q 5) to 1000 , then may be that is the cause of difference. (misclassified/sample_size)

I am just guessing , since I make this kind of mistakes often.

 MLearning 07-24-2012 09:33 AM

Re: HW 2 Problem 6

@dsvav,

Thank you for your comments. You were right, I did forget to change the sample number (N) to 1000. But that doesn't change the result. It is possible that Eout is not the same as Ein although that is what we want. Indeed, we are applying linear regression to a random data that it hasn't seen before; hence, the larger deviation between Eout and Ein.

 dsvav 07-24-2012 09:52 AM

Re: HW 2 Problem 6

@MLearning

When I compute the difference between E_in and E_out I get the difference to be around 0.01.

I still think difference should not be significant , does not this comes from Hoeffding Inequality ?

Also since we are suppressing the "very bad event happening" and "very good event happening" by taking average over 1000 runs , so E_out should track E_in.

This is my understanding , there is good chance that I am wrong :D

By the way what is the difference you are getting ?

 rakhlin 07-24-2012 10:22 AM

Re: HW 2 Problem 6

Quote:
 Originally Posted by MLearning (Post 3638) I also observe some discrepancy while computing Eout. When I hold the target function fixed, Eout is approximately equal to Ein. When I use different target functions for each experiment, Eout is significantly higher than Ein. Is this expected?
For a given target and regression Ein and Eout must not deviate much from each other for large N. The intuition is error zone between two lines is fixed, and points in Ein and Eout distributed uniformly.

 MLearning 07-24-2012 11:38 AM

Re: HW 2 Problem 6

@dsvav,
Ein in the 0.01 range while Eout (for random target functions) close to 0.5.

 MLearning 07-24-2012 01:08 PM

Re: HW 2 Problem 6

I agree. When you randomize the target functions, Eout becomes so large relative to Ein. That was why I did in stead of keeping the target function the same while randomizing the out-of-sample data.

 All times are GMT -7. The time now is 10:15 AM.