Quote:
Originally Posted by Tobias
Hi there.
I have some understanding how to find g-bar(x). After a lot of tries, I have got to the following solution, but I am far from sure it is valid.
g-bar(x) must be the h(x)=ax, which minimizes the expected squared error for any point, i.e. the expected value of  . Since x is uniformly distributed this is the same as minimizing  , which yields a=3/pi =0.955
To this I have a few questions - Am I correct
- Does g-bar depend on the size of the sample?
- Is there a general approach to find g-bar?
|
Close. What you have calculated is the best approximation of the target using the model, but it is based on knowing the entire target function. If you assume you know only two points at a time (the data set given in the example), then you should fit the two points with a line then get the average of those lines as you vary the two points. You will get something close, but not identical, to the slope you got.
This answers your second question in the affirmative as well. Doing this exercise with two points at a time is not the same as with three points at a time so

does depend on the size of the training set in general.
The general approach to finding

is exactly following the definition. In integral form, it will be a double integral if the data set has two points, triple integral if it has three points etc., but in general it is done with Monte Carlo so no actual integration is needed.