LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 1 - The Learning Problem (http://book.caltech.edu/bookforum/forumdisplay.php?f=108)
-   -   Exercises and Problems (http://book.caltech.edu/bookforum/showthread.php?t=257)

 magdon 08-11-2012 07:15 PM

Re: Problem 1.10 : Expected Off Training Error

Hoeffding applies to a single hypothesis . To take a concrete case of the setting in the problem, suppose that is obtained by flipping a fair coin for every data point to obtain . The problem proves that your expected off training set error is 0.5; no surprise, since is random

Hoeffding tells you that your training error (if N is large enough) will not deviate far from Eout with very high probability. This means that with very high probability, Ein will be close to 0.5.

Problem 1.10 says that if you are in such a pathalogical learning situation with a random , then no matter what you do in-sample, your out-of-sample performance will be very bad. (This is sometimes called no-free-lunch, because you cannot expect to succeed unless you assume that f is not completely random).

Hoeffding (and VC) says that provided your hypothesis set is not too large compared to N, you will know that you are in a bad situation because your Ein will reflect Eout and be close to 0.5. The nature of your with respect to will determine how good your Eout; if is very bad, then Eout will be close to 0.5. Hoeffding role is to ensure that Ein will tell you that you are in this tough situation.

To summarize: problem 1.10 identifies one case in which you are in trouble and Eout will be 0.5. Hoeffeing tells you when you will be able to know that you are in trouble.

Quote:
 Originally Posted by vsthakur (Post 3952) Hi, If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem). My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5). Can you please provide some insight on this, perhaps my comparison is erroneous. Thanks.

 vsthakur 08-13-2012 03:09 AM

Re: Problem 1.10 : Expected Off Training Error

Thank you

 axelrv 09-02-2012 04:46 PM

Exercise 1.5

The exercise is to determine whether each problem is more suited for the learning or design approach.

Part a "Determining the age at which a particular medical test is recommended" seems like an information retrieval problem not a learning or design problem. Doctors have already decided when to recommend a test.

Perhaps it should read: "Determining the age at which a particular medical test should be recommended or performed" to make it a learning problem.

 magdon 09-03-2012 06:42 AM

Re: Exercise 1.5

Yes, the use of "is" makes it seem like the answer is known and just needs to be retrieved; "should be" would have been a better choice of words, thanks.

Quote:
 Originally Posted by axelrv (Post 4817) The exercise is to determine whether each problem is more suited for the learning or design approach. Part a "Determining the age at which a particular medical test is recommended" seems like an information retrieval problem not a learning or design problem. Doctors have already decided when to recommend a test. Perhaps it should read: "Determining the age at which a particular medical test should be recommended or performed" to make it a learning problem.

 cygnids 01-13-2013 10:14 AM

Re: Exercises and Problems

With respect to the discussion on Prob 1.10 thread here, it is noted that learning is not possible when the target function is a random function. On that note, what crosses my mind is how does one reconcile that statement with the thought that a purely random function could have more characterizable features after transformation to other domains? For eg., white noise maps to constant in spectral domain. Perhaps naively, if I had the a set of images with a reasonable proportion of fully noisy images, and choose to apply the ML apparatus in the frequency domain, I could learn a system to classify noisy vs not-noisy? Right?

It's very likely I'm picking on a point out of context here?

 cygnids 01-13-2013 10:40 AM

Re: Exercises and Problems

I suppose I'm interpreting "noise" rather loosely? When I said white noise maps to constant in the frequency domain, I was thinking Gaussian noise, and in which case, we have a case where the output can be characterized as a PDF, and hence not quite "noise" since it can be probabilistically characterized? :clueless:

 magdon 01-13-2013 02:15 PM

Re: Exercises and Problems

It is true that white noise in the time domain is uniform in the frequency domain. But I do not quite follow how that relates to the prediction task, which in your example can be formulated as follows:

A target function f(x) is white noise. You are given f(x_n) on some data points x_n and asked to learn how to predict f on a test data point x_test. Your best prediction is 0 and your error will be distributed according to a Gaussian, no matter what your training data were, which is by definition of white noise. This would be your prediction even before you saw any training data, so you have not improved your prediction (i.e. you have not been able to learn) from the training data.

What bounds like Hoeffding tell you is that you will find out from your training data that your error will be large.

The same happens in classification where the function is a random . The data does not help you to predict outside the data, but at least you will learn from your data that you cannot predict outside the data. It is better than learning nothing :).

Quote:
 Originally Posted by cygnids (Post 8612) With respect to the discussion on Prob 1.10 thread here, it is noted that learning is not possible when the target function is a random function. On that note, what crosses my mind is how does one reconcile that statement with the thought that a purely random function could have more characterizable features after transformation to other domains? For eg., white noise maps to constant in spectral domain. Perhaps naively, if I had the a set of images with a reasonable proportion of fully noisy images, and choose to apply the ML apparatus in the frequency domain, I could learn a system to classify noisy vs not-noisy? Right? In learning, it is important to distinguish between how well you can predict outside the data, and how well you think you can predict outside the data. We only have access to the latter and the first task of the theory is to ensure that we have not fooled ourselves, so that the latter matches the former. It's very likely I'm picking on a point out of context here?

 cygnids 01-16-2013 08:37 AM

Re: Exercises and Problems

Thank you for your response and explanation. Mea culpa. I was thinking a bit too generally, perhaps loosely too, when I should have thought through an example such as the one you note.

In passing, let me restate what I had in mind in my earlier post. To begin with, I shouldn't have used the word frequency, but rather, I should have said wavelength since I used images as my example. I thought, if I had perfectly noisy set of images as input, and my end goal was to classify noisy vs not-noisy images, I would have a case where I attempt to learn the problem after a 2d Fourier transform of the images. For sake of discussion, in the transformed space, we'd see a flatish surface corresponding to various (spatial wavelengths, ie equal energy in all wavelengths since it is a Fourier transformed Gaussian), and basically a constant/flat function (glossing over a lot of practical points such as tapering to keep it band-limited). I would use this smooth *characterization of noise* as an input to the learning algorithm, and eventually classify new images. Ie, if in the wavelength space the transformed image is flattish => it is "noisy" image, else => it is a "not-noisy" image.

In my mind, I started out with a target function which was completely random, ie noisy, and I used the Fourier transformed space to turn it into a near-constant (smooth) function which I could use for classification. And presumably, I one managed to learn using a perfectly noisy image set. This thinking prompted my question, which questioned why random functions can't be learned. Needless adding, I did not map my thinking properly enough; the target function for this binary classification isn't the 2d Fourier transform; and further, I also mapped the input space (ie image pixel location) onto a wavelength space.

I like your simple example. Thank you.

 magdon 01-16-2013 10:36 AM

Re: Exercises and Problems

Good example top clarify an important issue. In your example, the target function is not random. It assigns to a signal that is white noise a -1 and to a signal that is not white noise (say, containing a real image) a +1. A random target would assign random +1,-1 to the signal.

What you have described is what we call feature construction and you will study it later in the book. Specifically, given the input signal construct the fourrier transform and take (for example) the variance of this fourier signal. This variance is a good feature of the input for determining (small variance would indicate a noise signal). Good feature construction is a very important part of learning from data successfully.

Quote:
 Originally Posted by cygnids (Post 8722) Thank you for your response and explanation. Mea culpa. I was thinking a bit too generally, perhaps loosely too, when I should have thought through an example such as the one you note. In passing, let me restate what I had in mind in my earlier post. To begin with, I shouldn't have used the word frequency, but rather, I should have said wavelength since I used images as my example. I thought, if I had perfectly noisy set of images as input, and my end goal was to classify noisy vs not-noisy images, I would have a case where I attempt to learn the problem after a 2d Fourier transform of the images. For sake of discussion, in the transformed space, we'd see a flatish surface corresponding to various (spatial wavelengths, ie equal energy in all wavelengths since it is a Fourier transformed Gaussian), and basically a constant/flat function (glossing over a lot of practical points such as tapering to keep it band-limited). I would use this smooth *characterization of noise* as an input to the learning algorithm, and eventually classify new images. Ie, if in the wavelength space the transformed image is flattish => it is "noisy" image, else => it is a "not-noisy" image. In my mind, I started out with a target function which was completely random, ie noisy, and I used the Fourier transformed space to turn it into a near-constant (smooth) function which I could use for classification. And presumably, I one managed to learn using a perfectly noisy image set. This thinking prompted my question, which questioned why random functions can't be learned. Needless adding, I did not map my thinking properly enough; the target function for this binary classification isn't the 2d Fourier transform; and further, I also mapped the input space (ie image pixel location) onto a wavelength space. I like your simple example. Thank you.

 cygnids 01-17-2013 12:52 AM

Re: Exercises and Problems

Yes, I do see your point that in my example even though the input images may be completely noisy, the target function in my construct isn't random. Thank you for your explanations. I do appreciate it.

My apologies for distorting this thread by my inquiry!

All times are GMT -7. The time now is 06:02 AM.