Q4) h(x) = ax
This question is similar to that in the lectures i.e.,
in the lecture H1 equals h(x) = ax + b Is this question different to the lecture in the respect we shouldn't add "b" (i.e., X0 the bias/intercept) when applying? Or should I treat the same? My confusion is because in many papers etc a bias/intercept is assumed even if not specified i.e., h(x) = ax could be considered the same as h(x) = ax + b 
Re: Q4) h(x) = ax
Thanks for comfirmation, much appreciated :)

Re: Q4) h(x) = ax
Is there a best way to minimize the meansquared error? I am doing gradient descent with a very low learning rate (0.00001) and my solution is diverging! not converging. Is it not feasible to do gradient descent with two points when approximating a sine?
Thanks 
Re: Q4) h(x) = ax
Never mind, I got my solution to converge, though I do not trust my answer. Oh well.

Re: Q4) h(x) = ax
Quote:
(since linear regression is basically analytical formula for minimizing mean square error). Also, you can confirm if your g_bar from simulation makes sense by calculate it directly. (calculate expectation of the hypothesis from each (x1,x2) over [1,1] x [1,1] ). This involves two integrals but you can plug in the expression to wolfram or mathematica. 
Re: Q4) h(x) = ax
I thought it would simply be (y1/x1 + y2/x2)/2 to find an a that minimizes the mean square error on two points, no?

Re: Q4) h(x) = ax
So, in this procedure we:
Pick two points; Find the best slope for those two points, the one that minimizes the squared error for those two points; Do this N times and average all the s Rather than: Pick two points; Calculate the squared error for those two points as a function of ; Do this N times, then find the that minimizes the sum of all of the squared errors, as we do with linear regression Are we doing the first thing here or the second thing? Either way there's a simple analytic solution, but I'm not sure which procedure we're doing. 
Re: Q4) h(x) = ax
Quote:

Re: Q4) h(x) = ax

Re: Q4) h(x) = ax

Re: Q4) h(x) = ax
I calculated what I think is the best approximation by minimizing the derivative over a of the integral of the sine function minus the line y=ax. When I compare this to the result of my simulation, there's a difference of about 30% between the two possible values for a.
I realize that it's reasonable to assume that \bar{g} won't be the best result (see minute 43 in lecture 8, comparing .20 to .21). But is anyone else getting a result that differs by so much? 
Re: Q4) h(x) = ax
Quote:
Like the lecture and the book, you consider a best fit for two points (least squares), and then average over all sets of two points (but not two of the same point). Then a in this case, or (a,b) in the book case, is/are the average over all such pairs of points. I might believe that is 30% different from the one you mention. You could also minimize the integral of the square of the sin()ax. 
Re: Q4) h(x) = ax
I wanted confirmation of the result that I got using the method we are supposed to implement. So I derived the slope of the "best" line, shown to us in slide 11 of lecture 8. (Which also applies in our case because it goes through the origin.) I did this by minimizing the area in yellow on that slide. (You can actually see that slope is close to 1 from the slide.)
I was surprised that the answer I got for question 4 is so different from this "perfect" approximation line that was found by minimizing the integral. It stands to reason that it should vary a little, but there is quite a difference between the two values. 
Re: Q4) h(x) = ax
Quote:

Re: Q4) h(x) = ax
Go to wolframalpha.com and ask for:
"derivative of integral of (sin(pi*x)(a*x))^2 from 1 to 1 with respect to a" Set the result equal to 0 and solve for a. This gives you the line that is the "best" approximation. (I believe.) It is not the answer to question 4. It is the result that our simulation method hopefully gets close to. What's interesting to me is how far from this value for a our simulation result is. 
Re: Q4) h(x) = ax
Quote:

Re: Q4) h(x) = ax
I apologize for the mistake in previous post. The slope of the best line is the expression of (a) that minimize Ein (which is not 0 in this case). Then you can do 2D integral to find expectation of that.

Re: Q4) h(x) = ax
I am lost at this.. The procedure I follow is as described above, but the answer doesn't seem right and from what I've read elsewhere it's a common mistake on this question.. any hints possible on what I might be missing? I am picking two points, getting the best hypothesis by minimising the squared error, repeating this a number of times and assuming the answer is the average value within these runs..

Re: Q4) h(x) = ax
Quote:

Re: Q4) h(x) = ax
Spent a good hour breaking my head trying to figure out where I was going wrong, finally decided to submit my best shot and can finally see what was going on!! :eek:
Thanks for the answer, somehow I was thick enough not to see what was right in front of me.. 
Re: Q4) h(x) = ax
Quote:

Re: Q4) h(x) = ax

Re: Q4) h(x) = ax
Thanks, i just figured that out

Re: Q4) h(x) = ax
Quote:
and note that X has just a single column (no column of 1's). 
Re: Q4) h(x) = ax

Re: Q4) h(x) = ax
Without using any calculus, I figured that that the best approximation for g bar(x) would be 0x, seeing as how the average of slopes between all combinations of two random points would end up as 0. In that case, why is the answer for the problem E for none of the above?

All times are GMT 7. The time now is 12:18 AM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.