LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 4 (http://book.caltech.edu/bookforum/forumdisplay.php?f=133)
-   -   Q4) h(x) = ax (http://book.caltech.edu/bookforum/showthread.php?t=959)

itooam 08-07-2012 04:57 AM

Q4) h(x) = ax
 
This question is similar to that in the lectures i.e.,

in the lecture H1 equals

h(x) = ax + b

Is this question different to the lecture in the respect we shouldn't add "b" (i.e., X0 the bias/intercept) when applying? Or should I treat the same?

My confusion is because in many papers etc a bias/intercept is assumed even if not specified i.e., h(x) = ax could be considered the same as h(x) = ax + b

yaser 08-07-2012 05:24 AM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by itooam (Post 3857)
This question is similar to that in the lectures i.e.,

in the lecture H1 equals

h(x) = ax + b

Is this question different to the lecture in the respect we shouldn't add "b" (i.e., X0 the bias/intercept) when applying? Or should I treat the same?

My confusion is because in many papers etc a bias/intercept is assumed even if not specified i.e., h(x) = ax could be considered the same as h(x) = ax + b

There is no bias/intercept in this problem, only the slope (one parameter which is a).

itooam 08-07-2012 05:36 AM

Re: Q4) h(x) = ax
 
Thanks for comfirmation, much appreciated :)

geekoftheweek 01-31-2013 11:16 AM

Re: Q4) h(x) = ax
 
Is there a best way to minimize the mean-squared error? I am doing gradient descent with a very low learning rate (0.00001) and my solution is diverging! not converging. Is it not feasible to do gradient descent with two points when approximating a sine?
Thanks

geekoftheweek 01-31-2013 12:09 PM

Re: Q4) h(x) = ax
 
Never mind, I got my solution to converge, though I do not trust my answer. Oh well.

sanbt 01-31-2013 04:34 PM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by geekoftheweek (Post 9088)
Never mind, I got my solution to converge, though I do not trust my answer. Oh well.

You can use linear regression to calculate each hypothesis.
(since linear regression is basically analytical formula for minimizing mean square error).

Also, you can confirm if your g_bar from simulation makes sense by calculate it directly. (calculate expectation of the hypothesis from each (x1,x2) over [-1,1] x [-1,1] ). This involves two integrals but you can plug in the expression to wolfram or mathematica.

melipone 02-01-2013 07:49 AM

Re: Q4) h(x) = ax
 
I thought it would simply be (y1/x1 + y2/x2)/2 to find an a that minimizes the mean square error on two points, no?

Anne Paulson 02-01-2013 11:36 AM

Re: Q4) h(x) = ax
 
So, in this procedure we:

Pick two points;
Find the best slope a for those two points, the one that minimizes the squared error for those two points;
Do this N times and average all the as

Rather than:

Pick two points;
Calculate the squared error for those two points as a function of a;
Do this N times, then find the a that minimizes the sum of all of the squared errors, as we do with linear regression

Are we doing the first thing here or the second thing? Either way there's a simple analytic solution, but I'm not sure which procedure we're doing.

yaser 02-01-2013 12:19 PM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by Anne Paulson (Post 9109)
So, in this procedure we:

Pick two points;
Find the best slope a for those two points, the one that minimizes the squared error for those two points;
Do this N times and average all the as

Rather than:

Pick two points;
Calculate the squared error for those two points as a function of a;
Do this N times, then find the a that minimizes the sum of all of the squared errors, as we do with linear regression

Are we doing the first thing here or the second thing? Either way there's a simple analytic solution, but I'm not sure which procedure we're doing.

The first method estimates a for the average hypothesis {\bar g} (which takes into consideration only two points at a time). The second method estimates a for the best approximation of the target function (which takes into consideration all the points in the input space {\cal X} at once).

Anne Paulson 02-01-2013 12:28 PM

Re: Q4) h(x) = ax
 
OK, and then the average value of \bar{g} *is* the expected value of \bar{g}.

yaser 02-01-2013 12:33 PM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by Anne Paulson (Post 9114)
OK, and then the average value of \bar{g} *is* the expected value of \bar{g}.

To be technically accurate, the average value (which is also the expected value) of g is \bar{g}, which is short hand for saying that the average value of g({\bf x}) is {\bar{g}}({\bf x}) for all {\bf x}\in{\cal X}.

Axonymous 02-02-2013 10:17 PM

Re: Q4) h(x) = ax
 
I calculated what I think is the best approximation by minimizing the derivative over a of the integral of the sine function minus the line y=ax. When I compare this to the result of my simulation, there's a difference of about 30% between the two possible values for a.

I realize that it's reasonable to assume that \bar{g} won't be the best result (see minute 43 in lecture 8, comparing .20 to .21). But is anyone else getting a result that differs by so much?



gah44 02-03-2013 02:50 AM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by Axonymous (Post 9140)
I calculated what I think is the best approximation by minimizing the derivative over a of the integral of the sine function minus the line y=ax. When I compare this to the result of my simulation, there's a difference of about 30% between the two possible values for a.


Which problem is that for?

Like the lecture and the book, you consider a best fit for two points (least squares), and then average over all sets of two points (but not two of the same point). Then a in this case, or (a,b) in the book case, is/are the average over all such pairs of points.

I might believe that is 30% different from the one you mention.

You could also minimize the integral of the square of the sin()-ax.

Axonymous 02-03-2013 02:53 PM

Re: Q4) h(x) = ax
 
I wanted confirmation of the result that I got using the method we are supposed to implement. So I derived the slope of the "best" line, shown to us in slide 11 of lecture 8. (Which also applies in our case because it goes through the origin.) I did this by minimizing the area in yellow on that slide. (You can actually see that slope is close to 1 from the slide.)

I was surprised that the answer I got for question 4 is so different from this "perfect" approximation line that was found by minimizing the integral. It stands to reason that it should vary a little, but there is quite a difference between the two values.

sanbt 02-03-2013 05:20 PM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by Axonymous (Post 9159)
I wanted confirmation of the result that I got using the method we are supposed to implement. So I derived the slope of the "best" line, shown to us in slide 11 of lecture 8. (Which also applies in our case because it goes through the origin.) I did this by minimizing the area in yellow on that slide. (You can actually see that slope is close to 1 from the slide.)

I was surprised that the answer I got for question 4 is so different from this "perfect" approximation line that was found by minimizing the integral. It stands to reason that it should vary a little, but there is quite a difference between the two values.

So the slope of the best line is just the slope of the line passing through each 2 points you picked each time. (implies that Ein= 0) But then you need 2D integral to average over that expression, over [-1, 1] x [-1,1]. The result should be close to the simulation.

Axonymous 02-04-2013 10:21 AM

Re: Q4) h(x) = ax
 
Go to wolframalpha.com and ask for:

"derivative of integral of (sin(pi*x)-(a*x))^2 from -1 to 1 with respect to a"

Set the result equal to 0 and solve for a. This gives you the line that is the "best" approximation. (I believe.)

It is not the answer to question 4. It is the result that our simulation method hopefully gets close to.

What's interesting to me is how far from this value for a our simulation result is.

gah44 02-04-2013 04:06 PM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by Axonymous (Post 9181)
Go to wolframalpha.com and ask for:

"derivative of integral of (sin(pi*x)-(a*x))^2 from -1 to 1 with respect to a"

(snip)


What's interesting to me is how far from this value for a our simulation result is.

Don't forget that 1-(-1) is 2.

sanbt 02-04-2013 08:25 PM

Re: Q4) h(x) = ax
 
I apologize for the mistake in previous post. The slope of the best line is the expression of (a) that minimize Ein (which is not 0 in this case). Then you can do 2D integral to find expectation of that.

Moobb 04-28-2013 02:58 AM

Re: Q4) h(x) = ax
 
I am lost at this.. The procedure I follow is as described above, but the answer doesn't seem right and from what I've read elsewhere it's a common mistake on this question.. any hints possible on what I might be missing? I am picking two points, getting the best hypothesis by minimising the squared error, repeating this a number of times and assuming the answer is the average value within these runs..

yaser 04-28-2013 03:58 AM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by Moobb (Post 10616)
I am picking two points, getting the best hypothesis by minimising the squared error, repeating this a number of times and assuming the answer is the average value within these runs..

You are correct. This is the right procedure, where the answer you mention is {\bar g}(x) and the values you are averaging are g^{({\cal D})}(x).

Moobb 04-28-2013 04:13 PM

Re: Q4) h(x) = ax
 
Spent a good hour breaking my head trying to figure out where I was going wrong, finally decided to submit my best shot and can finally see what was going on!! :eek:
Thanks for the answer, somehow I was thick enough not to see what was right in front of me..

vsuthichai 04-29-2013 02:10 AM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by Anne Paulson (Post 9109)
So, in this procedure we:

Pick two points;
Find the best slope a for those two points, the one that minimizes the squared error for those two points;
Do this N times and average all the as

Rather than:

Pick two points;
Calculate the squared error for those two points as a function of a;
Do this N times, then find the a that minimizes the sum of all of the squared errors, as we do with linear regression

Are we doing the first thing here or the second thing? Either way there's a simple analytic solution, but I'm not sure which procedure we're doing.

How do you solve for the minimum a that produces the least squared error?

Ziad Hatahet 04-29-2013 02:37 AM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by vsuthichai (Post 10637)
How do you solve for the minimum a that produces the least squared error?

As far as I know, differentiate (ax_1-y_1)^2+(ax_2-y_2)^2 with respect to a, set it to 0, and solve for a.

vsuthichai 04-29-2013 02:59 AM

Re: Q4) h(x) = ax
 
Thanks, i just figured that out

jlevy 04-30-2013 03:55 AM

Re: Q4) h(x) = ax
 
Quote:

Originally Posted by geekoftheweek (Post 9081)
Is there a best way to minimize the mean-squared error?
Thanks

Just use w=inv(X`*X)*X`*Y
and note that X has just a single column (no column of 1's).

khohi 03-04-2016 07:13 AM

Re: Q4) h(x) = ax
 
Great :)

حب الشباب

vnator 03-13-2019 12:08 AM

Re: Q4) h(x) = ax
 
Without using any calculus, I figured that that the best approximation for g bar(x) would be 0x, seeing as how the average of slopes between all combinations of two random points would end up as 0. In that case, why is the answer for the problem E for none of the above?


All times are GMT -7. The time now is 12:18 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.