 LFD Book Forum What Quadratic Programming Package? #41
 elkka Invited Guest Join Date: Apr 2012 Posts: 57 Re: code 3 in Octave qp optimizer

I have tweaked my trick, and now consider it perfectly legal. I think it mght be interesting to those who had the same problem.

So, first, I modified quadratic matrix H by adding to it a diagonal matrix with very small diagonal values. Specifically, I used 10^(-15). I passed otherwise unchanged problem to qp. The result was some , which I immediately used as an initial value for the original optimization problem (with original H).

The result. The first optimization converged in under 200 steps for N=10 in 100% cases. It converged in 99.99% cases for N=100. The latter optimization frequently took just 1 iteration, though occasionally I saw it take more than 100. It did converge in 100% cases. My E_in became consistently 0, and my b's differed by magnitude of the order less than 10^(-8) in 99.99% cases (and I don't believe the outlier influenced my averages all that much).

I believe the trick is legal, because all I do is I find a better initial value of , but then still solve the same optimization problem.
#42
 markland Member Join Date: Apr 2012 Posts: 30 Re: What Quadratic Programming Package?

Thank you! That did the trick for me.
#43
 kurts Invited Guest Join Date: Apr 2012 Location: Portland, OR Posts: 70 Re: What Quadratic Programming Package?

Thanks for the tip, elkka!

That made a big difference in the quality of my results - in every single case, my "b"'s are perfectly aligned. Unfortunately, it also made a big difference in the answer I submitted It's a bit annoying that a technical, implementation package-specific tweak can have such a big effect and cause one to submit the wrong answer when one has otherwise done all the correct setup and formulation of the problem, but I suppose this is yet another example of the challenges in designing homework problems for an online audience.

The exercise was definitely worth it, I learned quite a bit from this problem, not the least of which is how to use a program like Octave - and as my wife has to keep reminding me, I'm not actually getting a *grade* in this class...
#44
 Hillbilly Invited Guest Join Date: Mar 2012 Location: West Virginia Posts: 45 Re: What Quadratic Programming Package?

Quote:
 Originally Posted by kurts and as my wife has to keep reminding me, I'm not actually getting a *grade* in this class...
My wife is following me on the scoreboard. If it weren't for that tangible evidence of accomplishing something, she probably wouldn't put up with me spending so much time on this class. I've tried explaining machine learning to her, but her take-away is, "it's about predicting things with computers." Thank goodness for the scoreboard, or I'd be gone already!
#45
 sakumar Member Join Date: Apr 2012 Posts: 40 Re: What Quadratic Programming Package?

Dang, elkka! That was a hell of a trick!

qp/SVM now works like a champ.

Any intuition into why this works? When I first tried your suggestion, I initially just used the normal H to get alpha0 and then fed alpha0 to qp. It made no difference. When I added in the tiny diagonal matrix, it was like night and day.
#46
 elkka Invited Guest Join Date: Apr 2012 Posts: 57 Re: What Quadratic Programming Package?

Well, matrix H seems to be degenerate in a fundamental way. Not only is its determinant 0, but its rank is 2 (for both N=10 and N=100). Having a 10*10 matrix of rank 2 is usually a computationally bad thing, let alone a 100*100 matrix of rank 2. So adding just a little bit of a well-behaved matrix to it seems to help computationally. Without knowing the exact algorithm behind qp it is hard to be more specific.

I am surprised though how little it took to improve things. And I am not at all 100% sure it is the correct approach. Let's wait for the answers. #47
 sakumar Member Join Date: Apr 2012 Posts: 40 Re: What Quadratic Programming Package?

Shown below are two runs with Octave qp, each run twice with the same data. Figure 1 is with @elkka's initialization suggestion. Figure 2 shows the results when the initial alpha is all ones.

The support vector points are circled.

With the right initialization (Figure 1), SVM/QP nails the support vectors every time. Without the initialization, it is less than impressive, only occasionally showing an optimum solution.

Figure 1 always has 3 points circled. Figure 2 is between 3 and 4. Also, the runtimes with the right initialization are a lot less -- obviously because it converges instead of running for max-iterations.  #48
 kurts Invited Guest Join Date: Apr 2012 Location: Portland, OR Posts: 70 Re: What Quadratic Programming Package?

sakumar,
I am an Octave newbie - can you show me the commands you used to generate those plots?
#49
 sakumar Member Join Date: Apr 2012 Posts: 40 Re: What Quadratic Programming Package?

Quote:
 Originally Posted by kurts sakumar, I am an Octave newbie - can you show me the commands you used to generate those plots?
Sure. Here it is. It is hacked up a bit because I cut and paste to get Figure 1 and Figure 2. It just runs one iteration and generates two plots.

Code:
iterations = 1;
n = 100;

test_n = 10000; % number of points used to determine Eout

svm_threshold = 10^(-5); % if alpha value is greater than this, count it as a support vector

function w = GetWFrom2Points(x, y)
w(3) = - 1.0;
w(2) = (x(2) - y(2)) / (x(1) - y(1));
w(1) = x(2) - w(2) * x(1);
end

X = zeros(3,n);
testX = zeros(3, test_n);

min_val_vec = zeros(n, 1);
min_val_vec = svm_threshold;

svm_winner_count = 0;
sv_count = 0;

for m = 1:iterations

svm_fraction_incorrect = 0.0;
perceptron_fraction_incorrect = 0.0;

do % we need to check to make sure the n points are not all on one side of the line
a = unifrnd(-1.0, 1.0, 1, 2);
b = unifrnd(-1.0, 1.0, 1, 2);

fw = GetWFrom2Points(a, b); % This is the target function
fw = fw';

% now generate n points -- these are the in-sample points
X(2:3,1:n) = unifrnd(-1.0, 1.0, 2, n);
X(1,1:n) = 1.0;

% get true value for in-sample points
ty = sign(fw'*X);
until abs(sum(ty)) != n % if all the y's added up equal to n, it means all the points are on one side of the line

% now we run the perceptron
% initialize hypothesis ("all-zero vector")
w = zeros(3,1);

while (1)
y = sign(w'*X);
comp = (y == ty);

if (comp) % if all comp values are 1, then we have converged
break;
end

% get random mismatched index
ilist = find(!comp);
m_indx = ilist(unidrnd(sum(!comp)));

w = w + ty(m_indx)*X(:, m_indx);
end

% now get probability that f(x) is not equal to g(x)
testX(2:3,1:test_n) = unifrnd(-1.0, 1.0, 2, test_n);
testX(1,1:test_n) = 1.0;

test_ty = sign(fw'*testX);
test_y = sign(w'*testX);

comp = (test_y == test_ty);

perceptron_fraction_incorrect += sum(!comp)/test_n;

% Now on to the quadratic programming. Am going to try the "qp" function in Octave

% initialize alpha0 vector to zeros (??)
alpha0 = ones(n, 1);

% the "q" matrix
q = ones(n,1)*-1;

% the A matrix (actually vector)
A = ty;

% the b value
b = 0;

% the lb vector (all zeros)
lb = zeros(n, 1);

X_svm = X(2:3, :);
% build the H (called Q in lecture) matrix
H = (ty'*ty).*(X_svm'*X_svm);

tempH = H + eye(100).*10^-15;
[alpha0, obj, info, lambda] = qp([], tempH, q, A, b, lb, []);

options.MaxIter = 1000;
[alpha, obj, info, lambda] = qp(alpha0, H, q, A, b, lb, [], options);

'INFO.INFO'
info.info

% count number of support vectors
comp = ge(alpha, min_val_vec);
sv_index = find(comp);
num_sv = sum(comp);

sv_count += num_sv;

svm_w = zeros(3, 1);

svm_w(2) = sum(X_svm(1, :).*(alpha'.*ty));
svm_w(3) = sum(X_svm(2, :).*(alpha'.*ty));

% get index of the maximum alpha
[maxval, maxindx] = max(alpha);

svm_w(1) = 1/ty(maxindx) - (svm_w(2:3,1))'*X_svm(:, maxindx);

negmark = lt(ty, min_val_vec);
negindex = find(negmark);

posmark = gt(ty, min_val_vec);
posindex = find(posmark);

figure(1);
plot(X(2,negindex), X(3,negindex), "@11", X(2, posindex), X(3, posindex), "@22", X(2, sv_index), X(3, sv_index), "o");

printf("NUMBER OF SUPPORT VECTORS %i\n", length(sv_index));

options.MaxIter = 1000;
[alpha, obj, info, lambda] = qp([], H, q, A, b, lb, [], options);

'INFO.INFO'
info.info

% count number of support vectors
comp = ge(alpha, min_val_vec);
sv_index = find(comp);
num_sv = sum(comp);

sv_count += num_sv;

svm_w = zeros(3, 1);

svm_w(2) = sum(X_svm(1, :).*(alpha'.*ty));
svm_w(3) = sum(X_svm(2, :).*(alpha'.*ty));

% get index of the minimum alpha
[maxval, maxindx] = max(alpha);

svm_w(1) = 1/ty(maxindx) - (svm_w(2:3,1))'*X_svm(:, maxindx);

negmark = lt(ty, min_val_vec);
negindex = find(negmark);

posmark = gt(ty, min_val_vec);
posindex = find(posmark);

figure(2);
plot(X(2,negindex), X(3,negindex), "@11", X(2, posindex), X(3, posindex), "@22", X(2, sv_index), X(3, sv_index), "o");

printf("NUMBER OF SUPPORT VECTORS %i\n", length(sv_index));

end
#50
 kurts Invited Guest Join Date: Apr 2012 Location: Portland, OR Posts: 70 Re: What Quadratic Programming Package?

Thanks!! Tags hw7, package, quadratic programming Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 09:54 PM. The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.