HW 2 Problem 6
How is this different from problem 5 other than N=1000 and the fact that these simulated 'out of sample' points (E_out) are generated fresh ? I may be missing something but it seems to boil down to running the same program as in problem 5 with N=1000 for 1000 times; can someone clarify please ? thanks

Re: HW 2 Problem 6
It is my understanding that "fresh data" refers to crossvalidation data. Do we then compute Eout using the weights obtained in problem 5? When I do this, Eout < Ein. When I design the weights using the fresh data, Eout is approximately equal to Ein. Does this makes sense?

Re: HW 2 Problem 6
Quote:
The final hypothesis is indeed the one whose weights were determined in Problem 5, where the training took place. 
Re: HW 2 Problem 6
I am confused here , I don't understand what is final hypothesis here.
There are 1000 target function and corresponding 1000 weight vectors/hypothesis in problem 5 . So for problem 6 , 1000 times I generate 1000 outofsample data and then for each weight vector and target function(from problem 5) I evaluate E_out for that outofsample data and finally average them. This is how I have done. I don't see final hypothesis here , what I am missing , any hint Could it be that in problem 5 there is supposed to be only one target function and many insample data ? If so then the final hypothesis/weights could be that produces minimum insample error E_in . Please clarify. Thanks a lot. 
Re: HW 2 Problem 6
Quote:

Re: HW 2 Problem 6
Thanks a lot. The statements about (i) N being the number of 'insample' training data in both problems and (ii) the freshly generated 1000 points being disjoint from the first set clarified the confusion I had.

Re: HW 2 Problem 6
Thanks Professor yaser.

Re: HW 2 Problem 6
When I generate new data and hypothesis for every single run of 1000 (as the problem suggests) I get stable outofsample result close to (slightly greater than) insample error.
When I estimate 1000 different outofsamples for one insample and single hypothesis I get very different average error rates with high variability from 0.01 to 0.13 Why so? 
Re: HW 2 Problem 6
Quote:

Re: HW 2 Problem 6
Quote:
2nd scenario: fit linear model only once. Repeat 1000 times: generate 100 outofsample points, test. Accumulate and average errors when done. Here I get remarkable variation in average error. I'd like to understand why these scenarios different. I believe they must not 
Re: HW 2 Problem 6
Quote:

Re: HW 2 Problem 6
Quote:
Now as I plotted both lines (target and hypothesis) per your advice I begin to think that this is maybe what we should expect. Linear regression not always fits well. Usually it looks good giving small insample error. But sometimes disagreement is visually large (>0.1 insample error). This is the root of variation in average error when I use same "bad" regression for all 1000 iterations. I hope this type of experiment isn't implied by the problem. Otherwise it has no certain answer  at least 2 answers match. So there is another question. Is >0.1 insample error and visually nonoptimal fit still valid outcome of linear regression for linearly separable data? 
Re: HW 2 Problem 6
Quote:

Re: HW 2 Problem 6
Quote:

Re: HW 2 Problem 6
I also observe some discrepancy while computing Eout. When I hold the target function fixed, Eout is approximately equal to Ein. When I use different target functions for each experiment, Eout is significantly higher than Ein. Is this expected?

Re: HW 2 Problem 6
@MLearning
In my opinion E_out should be near E_in when target is not fixed(i.e averaging over 1000 iterations) . Unfortunately problem can be anywhere , but most probably in computing error. Could it be that when you are computing error for E_out for one iteration, the number of sample points for out of sample are 1000 , and if you forgot to change the number of samples points from 100 (from Q 5) to 1000 , then may be that is the cause of difference. (misclassified/sample_size) I am just guessing , since I make this kind of mistakes often. 
Re: HW 2 Problem 6
@dsvav,
Thank you for your comments. You were right, I did forget to change the sample number (N) to 1000. But that doesn't change the result. It is possible that Eout is not the same as Ein although that is what we want. Indeed, we are applying linear regression to a random data that it hasn't seen before; hence, the larger deviation between Eout and Ein. 
Re: HW 2 Problem 6
@MLearning
When I compute the difference between E_in and E_out I get the difference to be around 0.01. I still think difference should not be significant , does not this comes from Hoeffding Inequality ? Also since we are suppressing the "very bad event happening" and "very good event happening" by taking average over 1000 runs , so E_out should track E_in. This is my understanding , there is good chance that I am wrong :D By the way what is the difference you are getting ? 
Re: HW 2 Problem 6
Quote:

Re: HW 2 Problem 6
@dsvav,
Ein in the 0.01 range while Eout (for random target functions) close to 0.5. 
Re: HW 2 Problem 6
I agree. When you randomize the target functions, Eout becomes so large relative to Ein. That was why I did in stead of keeping the target function the same while randomizing the outofsample data.

All times are GMT 7. The time now is 10:53 AM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.