LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 1 - The Learning Problem

Reply
 
Thread Tools Display Modes
  #11  
Old 08-11-2012, 07:15 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 592
Default Re: Problem 1.10 : Expected Off Training Error

Hoeffding applies to a single hypothesis h. To take a concrete case of the setting in the problem, suppose that f is obtained by flipping a fair coin for every data point to obtain \pm 1. The problem proves that your expected off training set error is 0.5; no surprise, since f is random

Hoeffding tells you that your training error (if N is large enough) will not deviate far from Eout with very high probability. This means that with very high probability, Ein will be close to 0.5.

Problem 1.10 says that if you are in such a pathalogical learning situation with a random f, then no matter what you do in-sample, your out-of-sample performance will be very bad. (This is sometimes called no-free-lunch, because you cannot expect to succeed unless you assume that f is not completely random).

Hoeffding (and VC) says that provided your hypothesis set is not too large compared to N, you will know that you are in a bad situation because your Ein will reflect Eout and be close to 0.5. The nature of your \cal H with respect to f will determine how good your Eout; if f is very bad, then Eout will be close to 0.5. Hoeffding role is to ensure that Ein will tell you that you are in this tough situation.

To summarize: problem 1.10 identifies one case in which you are in trouble and Eout will be 0.5. Hoeffeing tells you when you will be able to know that you are in trouble.


Quote:
Originally Posted by vsthakur View Post
Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem).

My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me

By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5).

Can you please provide some insight on this, perhaps my comparison is erroneous.

Thanks.
__________________
Have faith in probability
Reply With Quote
  #12  
Old 08-13-2012, 03:09 AM
vsthakur vsthakur is offline
Member
 
Join Date: Jun 2012
Posts: 14
Default Re: Problem 1.10 : Expected Off Training Error

Thank you
Reply With Quote
  #13  
Old 09-02-2012, 04:46 PM
axelrv axelrv is offline
Junior Member
 
Join Date: Sep 2012
Posts: 6
Default Exercise 1.5

The exercise is to determine whether each problem is more suited for the learning or design approach.

Part a "Determining the age at which a particular medical test is recommended" seems like an information retrieval problem not a learning or design problem. Doctors have already decided when to recommend a test.

Perhaps it should read: "Determining the age at which a particular medical test should be recommended or performed" to make it a learning problem.
Reply With Quote
  #14  
Old 09-03-2012, 06:42 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 592
Default Re: Exercise 1.5

Yes, the use of "is" makes it seem like the answer is known and just needs to be retrieved; "should be" would have been a better choice of words, thanks.

Quote:
Originally Posted by axelrv View Post
The exercise is to determine whether each problem is more suited for the learning or design approach.

Part a "Determining the age at which a particular medical test is recommended" seems like an information retrieval problem not a learning or design problem. Doctors have already decided when to recommend a test.

Perhaps it should read: "Determining the age at which a particular medical test should be recommended or performed" to make it a learning problem.
__________________
Have faith in probability
Reply With Quote
  #15  
Old 01-13-2013, 10:14 AM
cygnids cygnids is offline
Member
 
Join Date: Jan 2013
Posts: 11
Default Re: Exercises and Problems

With respect to the discussion on Prob 1.10 thread here, it is noted that learning is not possible when the target function is a random function. On that note, what crosses my mind is how does one reconcile that statement with the thought that a purely random function could have more characterizable features after transformation to other domains? For eg., white noise maps to constant in spectral domain. Perhaps naively, if I had the a set of images with a reasonable proportion of fully noisy images, and choose to apply the ML apparatus in the frequency domain, I could learn a system to classify noisy vs not-noisy? Right?

It's very likely I'm picking on a point out of context here?
__________________
The whole is simpler than the sum of its parts. - Gibbs
Reply With Quote
  #16  
Old 01-13-2013, 10:40 AM
cygnids cygnids is offline
Member
 
Join Date: Jan 2013
Posts: 11
Default Re: Exercises and Problems

I suppose I'm interpreting "noise" rather loosely? When I said white noise maps to constant in the frequency domain, I was thinking Gaussian noise, and in which case, we have a case where the output can be characterized as a PDF, and hence not quite "noise" since it can be probabilistically characterized?
__________________
The whole is simpler than the sum of its parts. - Gibbs
Reply With Quote
  #17  
Old 01-13-2013, 02:15 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 592
Default Re: Exercises and Problems

It is true that white noise in the time domain is uniform in the frequency domain. But I do not quite follow how that relates to the prediction task, which in your example can be formulated as follows:

A target function f(x) is white noise. You are given f(x_n) on some data points x_n and asked to learn how to predict f on a test data point x_test. Your best prediction is 0 and your error will be distributed according to a Gaussian, no matter what your training data were, which is by definition of white noise. This would be your prediction even before you saw any training data, so you have not improved your prediction (i.e. you have not been able to learn) from the training data.

What bounds like Hoeffding tell you is that you will find out from your training data that your error will be large.

The same happens in classification where the function is a random \pm1. The data does not help you to predict outside the data, but at least you will learn from your data that you cannot predict outside the data. It is better than learning nothing .

Quote:
Originally Posted by cygnids View Post
With respect to the discussion on Prob 1.10 thread here, it is noted that learning is not possible when the target function is a random function. On that note, what crosses my mind is how does one reconcile that statement with the thought that a purely random function could have more characterizable features after transformation to other domains? For eg., white noise maps to constant in spectral domain. Perhaps naively, if I had the a set of images with a reasonable proportion of fully noisy images, and choose to apply the ML apparatus in the frequency domain, I could learn a system to classify noisy vs not-noisy? Right?

In learning, it is important to distinguish between how well you can predict outside the data, and how well you think you can predict outside the data. We only have access to the latter and the first task of the theory is to ensure that we have not fooled ourselves, so that the latter matches the former.

It's very likely I'm picking on a point out of context here?
__________________
Have faith in probability
Reply With Quote
  #18  
Old 01-16-2013, 08:37 AM
cygnids cygnids is offline
Member
 
Join Date: Jan 2013
Posts: 11
Default Re: Exercises and Problems

Thank you for your response and explanation. Mea culpa. I was thinking a bit too generally, perhaps loosely too, when I should have thought through an example such as the one you note.

In passing, let me restate what I had in mind in my earlier post. To begin with, I shouldn't have used the word frequency, but rather, I should have said wavelength since I used images as my example. I thought, if I had perfectly noisy set of images as input, and my end goal was to classify noisy vs not-noisy images, I would have a case where I attempt to learn the problem after a 2d Fourier transform of the images. For sake of discussion, in the transformed space, we'd see a flatish surface corresponding to various (spatial wavelengths, ie equal energy in all wavelengths since it is a Fourier transformed Gaussian), and basically a constant/flat function (glossing over a lot of practical points such as tapering to keep it band-limited). I would use this smooth *characterization of noise* as an input to the learning algorithm, and eventually classify new images. Ie, if in the wavelength space the transformed image is flattish => it is "noisy" image, else => it is a "not-noisy" image.

In my mind, I started out with a target function which was completely random, ie noisy, and I used the Fourier transformed space to turn it into a near-constant (smooth) function which I could use for classification. And presumably, I one managed to learn using a perfectly noisy image set. This thinking prompted my question, which questioned why random functions can't be learned. Needless adding, I did not map my thinking properly enough; the target function for this binary classification isn't the 2d Fourier transform; and further, I also mapped the input space (ie image pixel location) onto a wavelength space.

I like your simple example. Thank you.
__________________
The whole is simpler than the sum of its parts. - Gibbs
Reply With Quote
  #19  
Old 01-16-2013, 10:36 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 592
Default Re: Exercises and Problems

Good example top clarify an important issue. In your example, the target function is not random. It assigns to a signal that is white noise a -1 and to a signal that is not white noise (say, containing a real image) a +1. A random target would assign random +1,-1 to the signal.

What you have described is what we call feature construction and you will study it later in the book. Specifically, given the input signal construct the fourrier transform and take (for example) the variance of this fourier signal. This variance is a good feature of the input for determining \pm1 (small variance would indicate a noise signal). Good feature construction is a very important part of learning from data successfully.



Quote:
Originally Posted by cygnids View Post
Thank you for your response and explanation. Mea culpa. I was thinking a bit too generally, perhaps loosely too, when I should have thought through an example such as the one you note.

In passing, let me restate what I had in mind in my earlier post. To begin with, I shouldn't have used the word frequency, but rather, I should have said wavelength since I used images as my example. I thought, if I had perfectly noisy set of images as input, and my end goal was to classify noisy vs not-noisy images, I would have a case where I attempt to learn the problem after a 2d Fourier transform of the images. For sake of discussion, in the transformed space, we'd see a flatish surface corresponding to various (spatial wavelengths, ie equal energy in all wavelengths since it is a Fourier transformed Gaussian), and basically a constant/flat function (glossing over a lot of practical points such as tapering to keep it band-limited). I would use this smooth *characterization of noise* as an input to the learning algorithm, and eventually classify new images. Ie, if in the wavelength space the transformed image is flattish => it is "noisy" image, else => it is a "not-noisy" image.

In my mind, I started out with a target function which was completely random, ie noisy, and I used the Fourier transformed space to turn it into a near-constant (smooth) function which I could use for classification. And presumably, I one managed to learn using a perfectly noisy image set. This thinking prompted my question, which questioned why random functions can't be learned. Needless adding, I did not map my thinking properly enough; the target function for this binary classification isn't the 2d Fourier transform; and further, I also mapped the input space (ie image pixel location) onto a wavelength space.

I like your simple example. Thank you.
__________________
Have faith in probability
Reply With Quote
  #20  
Old 01-17-2013, 12:52 AM
cygnids cygnids is offline
Member
 
Join Date: Jan 2013
Posts: 11
Default Re: Exercises and Problems

Yes, I do see your point that in my example even though the input images may be completely noisy, the target function in my construct isn't random. Thank you for your explanations. I do appreciate it.

My apologies for distorting this thread by my inquiry!
__________________
The whole is simpler than the sum of its parts. - Gibbs
Reply With Quote
Reply

Tags
hoeffding's inequality, hoeffding-inequality

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 02:34 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.