 Michael Reach 05-13-2013 11:08 PM

*ANSWER* questions w linear regression & weight decay

I have been running the weight decay examples Q2-6, but haven't seen any real improvement in the out-of-sample error compared to no regularization at all. Is that just a feature of this particular problem, or should I recheck my calculations?

Unfortunately (or not), the answers I've been getting do appear as options on the multiple choices.

You should be seeing a change in the out-of-sample error when you vary (for certain values of at least). Are you using classification error as your error measure?

 jlaurentum 05-14-2013 06:29 AM

Are you using the correct formula for the one step solution?

I was using instead of and so regularization didnt make sense at all. I caught the error because I saw in another post that Professor Yaser corrected a student on the plus sign.

 Michael Reach 05-14-2013 06:54 AM

 Originally Posted by Ziad Hatahet (Post 10818) You should be seeing a change in the out-of-sample error when you vary (for certain values of at least). Are you using classification error as your error measure?
Well, I am seeing a change, just not really a reduction. Some are the same, some are bigger. I was expecting a dramatic drop in the out-of-sample error.

And yes, I have been using classification error, but that is a good point - I started using the regression residuals and such, but that mistake at least I caught.

 Michael Reach 05-14-2013 12:56 PM

As I suspected, all my answers on these were wrong. Does anyone have code (R if possible) to show, that I could use for comparison? I'm suspecting my problem was something dumb; even the original linear regression was wrong, and I compared that one with the same answer from the R lm() function.

I'm especially concerned since HW 7 uses all the same data again - so I really need to track this down.

 Elroch 05-14-2013 03:10 PM

 Originally Posted by Michael Reach (Post 10830) As I suspected, all my answers on these were wrong. Does anyone have code (R if possible) to show, that I could use for comparison? I'm suspecting my problem was something dumb; even the original linear regression was wrong, and I compared that one with the same answer from the R lm() function. I'm especially concerned since HW 7 uses all the same data again - so I really need to track this down.
Are you using lambda from 0.001 to 1000? I suppose it might be possible to forget to calculate the power. If you do this, the added term in the matrix described in this thread can hardly fail to have a significant effect.

 jlaurentum 05-14-2013 04:06 PM

Michael:

Here you go:

Code:

#READ IN THE FILES. datos1 <- read.table("in.dta") names(datos1) <- c("X1","X2","Y") datos2 <- read.table("out.dta") names(datos2) <- c("X1","X2","Y") #FOR THE FOLLOWING QUESTIONS, SET UP THE MATRIXES Z <- with(datos1,                         cbind(rep(1,nrow(datos1)),X1,X2,                                                 X1^2,X2^2,X1*X2,abs(X1-X2),abs(X1+X2)) ) Z <- as.matrix(Z) Zout <- with(datos2,                                         cbind(rep(1,nrow(datos2)),X1,X2,                                                 X1^2,X2^2,X1*X2,abs(X1-X2),abs(X1+X2)) ) Zout <- as.matrix(Zout) #NOW FIT WITH WEIGHT DECAY USING LAMBDA=10^-3 lambda <- 10^(-3) M <- t(Z)%*%Z + diag(rep(8,1))*lambda w <- solve(M)%*%t(Z)%*%datos1$Y Ym <- as.numeric(sign(Z%*%w)) Ein <- mean(datos1$Y!=Ym) Ym <- as.numeric(sign(Zout%*%w)) Eout <- mean(datos2\$Y!=Ym)

 Michael Reach 05-16-2013 07:16 PM

Thanks!

Yes, Elroch, I used the full range of lambda. I think my mistake is elsewhere.

 Elroch 05-17-2013 04:28 PM

 Originally Posted by Michael Reach (Post 10857) Thanks! Yes, Elroch, I used the full range of lambda. I think my mistake is elsewhere.
If you're like me, you've probably made a silly error which has nothing to do with understanding the method.

ok, I'm going to expose most of my insult to the art of programming for these questions. Don't use it as a style guide (especially that nasty bit of unvectorised code. Also I suspect the as.matrix's may be superfluous.) The data format should be clear, I hope.
Code:

WeightDecayLinearRegressionSolver <- function(inputs, outputs, lambda) {    # note inputs have bias co-ordinate   # inputs is a matrix of 2d points (with a bias)   # outputs is a vector providing a real valued function of those points   if (isTRUE(all.equal(var(outputs), 0))) {      # This is the completely degenerate case, which occurs when trying to classify data of a single class     result <- c(outputs, 0, 0)   }   else {     result <- PseudoInverse(t(as.matrix(inputs)) %*% as.matrix(inputs) + diag(rep(lambda, length(inputs[1,]))) ) %*% t(as.matrix(inputs)) %*% outputs   }   result } PseudoInverse <- function(mat) {   tmat <- t(as.matrix(mat))   inv(tmat %*% as.matrix(mat)) %*% tmat } ClassificationError <- function(actual, predicted) {   result = 0   for(i in 1:length(actual)) {     if(abs(actual[i] - predicted[i]) > 0.5) {       result <- result + 1     }   }   result/length(actual) }

 warren 05-19-2013 07:21 AM

I am really stuck starting with problem 2 on homework 6. I want to find out where I went wrong before I start on homework 7, since I got 3/10 on homework 6. Is there anybody here who reads Clojure who can tell me where I went wrong?
Code:

(ns hw6.core   (:require [clojure.java.io :as io]             [clatrix.core :as m] )) (defn pseudo-inverse [M]   (m/* (m/i (m/* (m/t M) M)) (m/t M))) (defn read-dataset [url]   (m/matrix (with-open [r (io/reader url)]               (doall (map                       (comp                       (partial map read-string)                       (partial re-seq #"\S+"))                       (line-seq r)))))) (defn augment-dataset [M]   (let [[x1s x2s] (m/cols M)         [n] (m/size M)]     (m/hstack (m/ones n 1)               x1s               x2s               (m/mult x1s x1s)               (m/mult x2s x2s)               (m/mult x1s x2s)               (m/abs (m/- x1s x2s))               (m/abs (m/+ x1s x2s))))) (defn ys [M]   (let [[_ _ ys] (m/cols M)] ys)) (defn read-in-sample []   (read-dataset "http://work.caltech.edu/data/in.dta")) (defn read-out-of-sample []   (read-dataset "http://work.caltech.edu/data/out.dta")) (defn read-setup []   (let [in (read-in-sample)         out (read-out-of-sample)]     {     :x-ins (augment-dataset in)     :x-outs (augment-dataset out)     :y-ins (ys in)     :y-outs (ys out)     }     )) (defn weights [probset]   (m/* (pseudo-inverse (:x-ins probset)) (:y-ins probset))) (defn e-in [probset]   (let [the-diff (m/- (m/* (:x-ins probset) (weights probset)) (:y-ins probset))         matches (count (filter (partial > 0.5) (m/mult the-diff the-diff)))         n (count (m/rows the-diff))]     (/ (- n matches) n))) (defn e-out [probset]   (let [the-diff (m/- (m/* (:x-outs probset) (weights probset)) (:y-outs probset))         matches (count (filter (partial > 0.5) (m/mult the-diff the-diff)))         n (count (m/rows the-diff))]     (/ (- n matches) n))) (defn problem-6-2-eval [x y]   (+ (* (- 3/35 x)         (- 3/35 x))     (* (- 21/125 y)         (- 21/125 y)))) ;;-------------------------------------------------------------------------- hw6.core> (seq (weights (read-setup))) (-1.6470670613492875 -0.14505926927976592 0.10154120500179364 -2.032968443227123 -1.8280437313439264 2.4815294496056963 4.158938609024668 0.31651714084678323) hw6.core> (e-in (read-setup)) 3/35 hw6.core> (e-out (read-setup)) 21/125 hw6.core> (problem-6-2-eval 0.03, 0.08) 0.010848081632653063 hw6.core> (problem-6-2-eval 0.03, 0.10) 0.007728081632653061 hw6.core> (problem-6-2-eval 0.04, 0.09) 0.008173795918367349 hw6.core> (problem-6-2-eval 0.04, 0.11) 0.005453795918367348 hw6.core> (problem-6-2-eval 0.05, 0.10) 0.005899510204081633 hw6.core>
This seems to show none of the answers for problem 2 being terribly close in the euclidean distance, but D being closest among them. According to the answer key, the correct answer is A.

Many many thanks in advance to whomever can straighten me out!!

