#11




Re: Problem 1.10 : Expected Off Training Error
Hoeffding applies to a single hypothesis . To take a concrete case of the setting in the problem, suppose that is obtained by flipping a fair coin for every data point to obtain . The problem proves that your expected off training set error is 0.5; no surprise, since is random
Hoeffding tells you that your training error (if N is large enough) will not deviate far from Eout with very high probability. This means that with very high probability, Ein will be close to 0.5. Problem 1.10 says that if you are in such a pathalogical learning situation with a random , then no matter what you do insample, your outofsample performance will be very bad. (This is sometimes called nofreelunch, because you cannot expect to succeed unless you assume that f is not completely random). Hoeffding (and VC) says that provided your hypothesis set is not too large compared to N, you will know that you are in a bad situation because your Ein will reflect Eout and be close to 0.5. The nature of your with respect to will determine how good your Eout; if is very bad, then Eout will be close to 0.5. Hoeffding role is to ensure that Ein will tell you that you are in this tough situation. To summarize: problem 1.10 identifies one case in which you are in trouble and Eout will be 0.5. Hoeffeing tells you when you will be able to know that you are in trouble. Quote:
__________________
Have faith in probability 
#12




Re: Problem 1.10 : Expected Off Training Error
Thank you

#13




Exercise 1.5
The exercise is to determine whether each problem is more suited for the learning or design approach.
Part a "Determining the age at which a particular medical test is recommended" seems like an information retrieval problem not a learning or design problem. Doctors have already decided when to recommend a test. Perhaps it should read: "Determining the age at which a particular medical test should be recommended or performed" to make it a learning problem. 
#14




Re: Exercise 1.5
Yes, the use of "is" makes it seem like the answer is known and just needs to be retrieved; "should be" would have been a better choice of words, thanks.
Quote:
__________________
Have faith in probability 
#15




Re: Exercises and Problems
With respect to the discussion on Prob 1.10 thread here, it is noted that learning is not possible when the target function is a random function. On that note, what crosses my mind is how does one reconcile that statement with the thought that a purely random function could have more characterizable features after transformation to other domains? For eg., white noise maps to constant in spectral domain. Perhaps naively, if I had the a set of images with a reasonable proportion of fully noisy images, and choose to apply the ML apparatus in the frequency domain, I could learn a system to classify noisy vs notnoisy? Right?
It's very likely I'm picking on a point out of context here?
__________________
The whole is simpler than the sum of its parts.  Gibbs 
#16




Re: Exercises and Problems
I suppose I'm interpreting "noise" rather loosely? When I said white noise maps to constant in the frequency domain, I was thinking Gaussian noise, and in which case, we have a case where the output can be characterized as a PDF, and hence not quite "noise" since it can be probabilistically characterized?
__________________
The whole is simpler than the sum of its parts.  Gibbs 
#17




Re: Exercises and Problems
It is true that white noise in the time domain is uniform in the frequency domain. But I do not quite follow how that relates to the prediction task, which in your example can be formulated as follows:
A target function f(x) is white noise. You are given f(x_n) on some data points x_n and asked to learn how to predict f on a test data point x_test. Your best prediction is 0 and your error will be distributed according to a Gaussian, no matter what your training data were, which is by definition of white noise. This would be your prediction even before you saw any training data, so you have not improved your prediction (i.e. you have not been able to learn) from the training data. What bounds like Hoeffding tell you is that you will find out from your training data that your error will be large. The same happens in classification where the function is a random . The data does not help you to predict outside the data, but at least you will learn from your data that you cannot predict outside the data. It is better than learning nothing . Quote:
__________________
Have faith in probability 
#18




Re: Exercises and Problems
Thank you for your response and explanation. Mea culpa. I was thinking a bit too generally, perhaps loosely too, when I should have thought through an example such as the one you note.
In passing, let me restate what I had in mind in my earlier post. To begin with, I shouldn't have used the word frequency, but rather, I should have said wavelength since I used images as my example. I thought, if I had perfectly noisy set of images as input, and my end goal was to classify noisy vs notnoisy images, I would have a case where I attempt to learn the problem after a 2d Fourier transform of the images. For sake of discussion, in the transformed space, we'd see a flatish surface corresponding to various (spatial wavelengths, ie equal energy in all wavelengths since it is a Fourier transformed Gaussian), and basically a constant/flat function (glossing over a lot of practical points such as tapering to keep it bandlimited). I would use this smooth *characterization of noise* as an input to the learning algorithm, and eventually classify new images. Ie, if in the wavelength space the transformed image is flattish => it is "noisy" image, else => it is a "notnoisy" image. In my mind, I started out with a target function which was completely random, ie noisy, and I used the Fourier transformed space to turn it into a nearconstant (smooth) function which I could use for classification. And presumably, I one managed to learn using a perfectly noisy image set. This thinking prompted my question, which questioned why random functions can't be learned. Needless adding, I did not map my thinking properly enough; the target function for this binary classification isn't the 2d Fourier transform; and further, I also mapped the input space (ie image pixel location) onto a wavelength space. I like your simple example. Thank you.
__________________
The whole is simpler than the sum of its parts.  Gibbs 
#19




Re: Exercises and Problems
Good example top clarify an important issue. In your example, the target function is not random. It assigns to a signal that is white noise a 1 and to a signal that is not white noise (say, containing a real image) a +1. A random target would assign random +1,1 to the signal.
What you have described is what we call feature construction and you will study it later in the book. Specifically, given the input signal construct the fourrier transform and take (for example) the variance of this fourier signal. This variance is a good feature of the input for determining (small variance would indicate a noise signal). Good feature construction is a very important part of learning from data successfully. Quote:
__________________
Have faith in probability 
#20




Re: Exercises and Problems
Yes, I do see your point that in my example even though the input images may be completely noisy, the target function in my construct isn't random. Thank you for your explanations. I do appreciate it.
My apologies for distorting this thread by my inquiry!
__________________
The whole is simpler than the sum of its parts.  Gibbs 
Tags 
hoeffding's inequality, hoeffdinginequality 
Thread Tools  
Display Modes  

