So, in this procedure we:
Pick two points;
Find the best slope

for those two points, the one that minimizes the squared error for those two points;
Do this N times and average all the

s
Rather than:
Pick two points;
Calculate the squared error for those two points as a function of

;
Do this N times, then find the

that minimizes the sum of all of the squared errors, as we do with linear regression
Are we doing the first thing here or the second thing? Either way there's a simple analytic solution, but I'm not sure which procedure we're doing.