Quote:
Originally Posted by jjepsuomi
I watched the lecture on support vector machines and most of the lecture I did understand but the part with 2 preliminary technicalities got me totally confused, I didn't really get them. Could someone perhaps explain them to me in simple terms?
|
I can understand your confusion, as the advantage of these 2 steps is not evident until you go through the rest of the derivation, so they seem somewhat arbitrary when they are presented at the beginning. I'll focus here on the fact that they are "allowed" rather than "useful" since the correctness of the derivation only necessitates that these steps are allowed.
Think of the line defined by the equation

(where

by definition). This line is the same as

since the points that satisfy the first equation are identical to those that satisfy the second, right? You can scale all three parameters in that equation up or down without changing the line that the equation is representing. Therefore, if I require that you scale these coefficients in a particular way, I am allowed to do that. Don't worry about the wisdom of such step, just its correctness. The wisdom appears later on.
By the same token, if I decide to eliminate the notion of

and replace it by its value which is the constant 1, and now I call its coefficient in the above equation

rather than

, I have not changed anything in the essence of the problem that I am solving, so I am allowed to do it.