Quote:
Originally Posted by Anne Paulson
So, in this procedure we:
Pick two points;
Find the best slope for those two points, the one that minimizes the squared error for those two points;
Do this N times and average all the s
Rather than:
Pick two points;
Calculate the squared error for those two points as a function of ;
Do this N times, then find the that minimizes the sum of all of the squared errors, as we do with linear regression
Are we doing the first thing here or the second thing? Either way there's a simple analytic solution, but I'm not sure which procedure we're doing.

The first method estimates
for the average hypothesis
(which takes into consideration only two points at a time). The second method estimates
for the best approximation of the target function (which takes into consideration all the points in the input space
at once).