doubt in lecture 11, deterministic noise
In the lecture 11, it is mentioned that deterministic noise depends on the hypothesis set, and deterministic noise decreases as the hypothesis set becomes more complex, because the set can tackle some of the deterministic noise.
But the experiment performed on slide 7 of lecture 11 suggests that the more complex model fails badly when compared to the less complex model. The above statements seem to contradict in my mind, I cannot put them together. Can anyone please clear my doubt. Thank you. 
Re: doubt in lecture 11, deterministic noise
This is indeed confusing and after spending some time thinking about this point I think I have finally understood it (I hope). Deterministic noise is nothing but the bias in the modelling hypothesis. So the more complex model will indeed have less deterministic noise (small bias). But this does not imply that this model will also have smaller Eout. Because Eout also depends on the variance of the hypothesis and since the variance of the more complex model will be large for small N, this means that Eout will be large for more complex model. But if we have sufficiently large sample size (i.e. large N) then both the variance and the bias (i.e. deterministic noise) will be small for complex model. Hence in this case the more complex model will outperform the simpler model. So the lesson learnt is: Complex model is better than simple model provided we have sufficient data. For small data sets, complex models overfit and it is better to choose simple models.

Re: doubt in lecture 11, deterministic noise
Quote:
If you want to see the impact of deterministic noise by itself, without having another factor that affects overfitting in play, you should fix the hypothesis set and increase the complexity of the target function. 
Re: doubt in lecture 11, deterministic noise
Quote:
Can we say the overfit measure, indicates noise? 
Re: doubt in lecture 11, deterministic noise
Quote:

Re: doubt in lecture 11, deterministic noise
Quote:
The two effects push overfitting in different directions, this means that if we fix a target, and expand H, the more complex H has a better ability to tackle the deterministic noise, but the complexity of the H in turn, makes for bad overfitting? 
Re: doubt in lecture 11, deterministic noise
Is there anything interesting to be said about the relationship between overfitting and the difference between and the truly best hypothesis in ? I think I've seen some others say that the deterministic noise is not dependent on the size of the data set, but I am wondering whether this difference is what accounts for the overfitting caused by deterministic noise. E.g. for complex enough target function, if our sample data set is very big, and we have a simple model (high bias/deterministic noise), we are not really overfitting, as I understand it.
Edit: I've been assuming that the definitions of bias and variance take expectations over all data sets of a particular fixed size  I don't think this was explicitly stated, but I also don't think it makes sense otherwise. In homework #4, I computed a value for that was very far from the best possible hypothesis (lowest mean squared error), because was so low in that case. 
Re: doubt in lecture 11, deterministic noise
I think I am still a bit unclear about deterministic noise. Doesn't the amount of noise (deterministic or stochastic) depend on both the bias and variance of the noise? For a given N, the more complex noise will have a higher variance but lower bias. Hence doesn't the amount of noise depend upon N and the relative complexity difference between the target function and the hypothesis (the level of deterministic noise if you will)? :clueless:

Re: doubt in lecture 11, deterministic noise
In general a more complex H implies lower "deterministic noise" but it is important to take into consideration the amount of training data that you have (N) when discussing Eout. In the example shown in lecture 11 the target function was very complex (50th order) and the training data was noiseless. We could see that a simple hypothesis (second order pol) gave a much better Eout than the more complex hypothesis (10th order polynomial). In this case there was only "deterministic noise" and the more complex Hypothesis performed much worse even if the "deterministic noise" was lower for the more complex H.

All times are GMT 7. The time now is 06:54 AM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.