![]() |
#41
|
|||
|
|||
![]()
I have tweaked my trick, and now consider it perfectly legal. I think it mght be interesting to those who had the same problem.
So, first, I modified quadratic matrix H by adding to it a diagonal matrix with very small diagonal values. Specifically, I used 10^(-15). I passed otherwise unchanged problem to qp. The result was some ![]() The result. The first optimization converged in under 200 steps for N=10 in 100% cases. It converged in 99.99% cases for N=100. The latter optimization frequently took just 1 iteration, though occasionally I saw it take more than 100. It did converge in 100% cases. My E_in became consistently 0, and my b's differed by magnitude of the order less than 10^(-8) in 99.99% cases (and I don't believe the outlier influenced my averages all that much). I believe the trick is legal, because all I do is I find a better initial value of ![]() |
#42
|
|||
|
|||
![]()
Thank you! That did the trick for me.
|
#43
|
|||
|
|||
![]()
Thanks for the tip, elkka!
That made a big difference in the quality of my results - in every single case, my "b"'s are perfectly aligned. Unfortunately, it also made a big difference in the answer I submitted ![]() It's a bit annoying that a technical, implementation package-specific tweak can have such a big effect and cause one to submit the wrong answer when one has otherwise done all the correct setup and formulation of the problem, but I suppose this is yet another example of the challenges in designing homework problems for an online audience. The exercise was definitely worth it, I learned quite a bit from this problem, not the least of which is how to use a program like Octave - and as my wife has to keep reminding me, I'm not actually getting a *grade* in this class... |
#44
|
|||
|
|||
![]()
My wife is following me on the scoreboard. If it weren't for that tangible evidence of accomplishing something, she probably wouldn't put up with me spending so much time on this class. I've tried explaining machine learning to her, but her take-away is, "it's about predicting things with computers." Thank goodness for the scoreboard, or I'd be gone already!
|
#45
|
|||
|
|||
![]()
Dang, elkka! That was a hell of a trick!
qp/SVM now works like a champ. Any intuition into why this works? When I first tried your suggestion, I initially just used the normal H to get alpha0 and then fed alpha0 to qp. It made no difference. When I added in the tiny diagonal matrix, it was like night and day. |
#46
|
|||
|
|||
![]()
Well, matrix H seems to be degenerate in a fundamental way. Not only is its determinant 0, but its rank is 2 (for both N=10 and N=100). Having a 10*10 matrix of rank 2 is usually a computationally bad thing, let alone a 100*100 matrix of rank 2. So adding just a little bit of a well-behaved matrix to it seems to help computationally. Without knowing the exact algorithm behind qp it is hard to be more specific.
I am surprised though how little it took to improve things. And I am not at all 100% sure it is the correct approach. Let's wait for the answers. ![]() |
#47
|
|||
|
|||
![]()
Shown below are two runs with Octave qp, each run twice with the same data. Figure 1 is with @elkka's initialization suggestion. Figure 2 shows the results when the initial alpha is all ones.
The support vector points are circled. With the right initialization (Figure 1), SVM/QP nails the support vectors every time. Without the initialization, it is less than impressive, only occasionally showing an optimum solution. Figure 1 always has 3 points circled. Figure 2 is between 3 and 4. Also, the runtimes with the right initialization are a lot less -- obviously because it converges instead of running for max-iterations. |
#48
|
|||
|
|||
![]()
sakumar,
I am an Octave newbie - can you show me the commands you used to generate those plots? |
#49
|
|||
|
|||
![]() Quote:
Code:
iterations = 1; n = 100; test_n = 10000; % number of points used to determine Eout svm_threshold = 10^(-5); % if alpha value is greater than this, count it as a support vector function w = GetWFrom2Points(x, y) w(3) = - 1.0; w(2) = (x(2) - y(2)) / (x(1) - y(1)); w(1) = x(2) - w(2) * x(1); end X = zeros(3,n); testX = zeros(3, test_n); min_val_vec = zeros(n, 1); min_val_vec = svm_threshold; svm_winner_count = 0; sv_count = 0; for m = 1:iterations svm_fraction_incorrect = 0.0; perceptron_fraction_incorrect = 0.0; do % we need to check to make sure the n points are not all on one side of the line % asked to discard datapoint if that happens and start again a = unifrnd(-1.0, 1.0, 1, 2); b = unifrnd(-1.0, 1.0, 1, 2); fw = GetWFrom2Points(a, b); % This is the target function fw = fw'; % now generate n points -- these are the in-sample points X(2:3,1:n) = unifrnd(-1.0, 1.0, 2, n); X(1,1:n) = 1.0; % get true value for in-sample points ty = sign(fw'*X); until abs(sum(ty)) != n % if all the y's added up equal to n, it means all the points are on one side of the line % now we run the perceptron % initialize hypothesis ("all-zero vector") w = zeros(3,1); while (1) y = sign(w'*X); comp = (y == ty); if (comp) % if all comp values are 1, then we have converged break; end % get random mismatched index ilist = find(!comp); m_indx = ilist(unidrnd(sum(!comp))); w = w + ty(m_indx)*X(:, m_indx); end % now get probability that f(x) is not equal to g(x) testX(2:3,1:test_n) = unifrnd(-1.0, 1.0, 2, test_n); testX(1,1:test_n) = 1.0; test_ty = sign(fw'*testX); test_y = sign(w'*testX); comp = (test_y == test_ty); perceptron_fraction_incorrect += sum(!comp)/test_n; % Now on to the quadratic programming. Am going to try the "qp" function in Octave % initialize alpha0 vector to zeros (??) alpha0 = ones(n, 1); % the "q" matrix q = ones(n,1)*-1; % the A matrix (actually vector) A = ty; % the b value b = 0; % the lb vector (all zeros) lb = zeros(n, 1); X_svm = X(2:3, :); % build the H (called Q in lecture) matrix H = (ty'*ty).*(X_svm'*X_svm); tempH = H + eye(100).*10^-15; [alpha0, obj, info, lambda] = qp([], tempH, q, A, b, lb, []); options.MaxIter = 1000; [alpha, obj, info, lambda] = qp(alpha0, H, q, A, b, lb, [], options); 'INFO.INFO' info.info % count number of support vectors comp = ge(alpha, min_val_vec); sv_index = find(comp); num_sv = sum(comp); sv_count += num_sv; svm_w = zeros(3, 1); svm_w(2) = sum(X_svm(1, :).*(alpha'.*ty)); svm_w(3) = sum(X_svm(2, :).*(alpha'.*ty)); % get index of the maximum alpha [maxval, maxindx] = max(alpha); svm_w(1) = 1/ty(maxindx) - (svm_w(2:3,1))'*X_svm(:, maxindx); negmark = lt(ty, min_val_vec); negindex = find(negmark); posmark = gt(ty, min_val_vec); posindex = find(posmark); figure(1); plot(X(2,negindex), X(3,negindex), "@11", X(2, posindex), X(3, posindex), "@22", X(2, sv_index), X(3, sv_index), "o"); printf("NUMBER OF SUPPORT VECTORS %i\n", length(sv_index)); options.MaxIter = 1000; [alpha, obj, info, lambda] = qp([], H, q, A, b, lb, [], options); 'INFO.INFO' info.info % count number of support vectors comp = ge(alpha, min_val_vec); sv_index = find(comp); num_sv = sum(comp); sv_count += num_sv; svm_w = zeros(3, 1); svm_w(2) = sum(X_svm(1, :).*(alpha'.*ty)); svm_w(3) = sum(X_svm(2, :).*(alpha'.*ty)); % get index of the minimum alpha [maxval, maxindx] = max(alpha); svm_w(1) = 1/ty(maxindx) - (svm_w(2:3,1))'*X_svm(:, maxindx); negmark = lt(ty, min_val_vec); negindex = find(negmark); posmark = gt(ty, min_val_vec); posindex = find(posmark); figure(2); plot(X(2,negindex), X(3,negindex), "@11", X(2, posindex), X(3, posindex), "@22", X(2, sv_index), X(3, sv_index), "o"); printf("NUMBER OF SUPPORT VECTORS %i\n", length(sv_index)); end |
#50
|
|||
|
|||
![]()
Thanks!!
|
![]() |
Tags |
hw7, package, quadratic programming |
Thread Tools | |
Display Modes | |
|
|