
#1




*ANSWER* q5
I got Q5 wrong and don't know where I've gone wrong. My code is below and I believe my partial derivatives are correct. I believe eta and the starting values are correct. I then loop through using the weights to calculate the new weights. The same weights are used for both partial derivatives and then both weights are updated at the end of the iteration. I then get the results at the bottom. I end up with iteration 17 being when the error drops below 10e14 (which is the wrong number of iterations). To calculate the error I'm summing the square of the differences and then taking the sqrt. (Refer to print statement in code).
So can someone point out what I've done wrong, I'm sure it's obvious to someone who got the answer right but it's not to me. Alternatively if anyone has some code that works then please post it and I'll try and work out how mine is different. Thank you. Code:
def derWRTu(u,v): result = Decimal(2)*(Decimal.exp(v) + Decimal(2) * v * Decimal.exp(u)) * (u * Decimal.exp(v)  Decimal(2)*v*Decimal.exp(u)) return result def derWRTv(u,v): result = Decimal(2)*(u*Decimal.exp(v)  Decimal(2) * Decimal.exp(u)) * (u * Decimal.exp(v)  Decimal(2)*v*Decimal.exp(u)) return result eta = Decimal(0.1) weight1 = Decimal(1.0) weight2 = Decimal(1.0) for i in range(1,20): temp1 = Decimal(weight1  eta * derWRTu(weight1,weight2)) temp2 = Decimal(weight2  eta * derWRTv(weight1,weight2)) print i, temp1, temp2, math.sqrt((temp1  weight1)**2 + (temp2  weight2)**2) weight1 = temp1 weight2 = temp2 iteration newWeight1 newWeight2 error 1 0.369542993196839967955794109 0.2139205536245797574025398844 1.5791038301 2 0.0305206903512627734316713927 0.5079340454438062055203607077 0.825302982601 3 0.1075231141989984274494072585 0.1222102555735032213170381668 0.393334737025 4 0.06564482581488226125563096705 0.0151665598769331032201461945 0.114944090002 5 0.04784117062171890279998341002 0.01848989922674513542330801974 0.038075285654 6 0.04499946309943379128005628962 0.02349925169679327305149255062 0.00575924594121 7 0.04475601902934555265991082578 0.02392429647039781800619960613 0.000489824534736 8 0.04473774604067714316407965262 0.02395617479661382948530576517 3.67441124156e05 9 0.04473639081750715769541853207 0.02395853892224864452997465102 2.72501740502e06 10 0.04473629039778214055992344406 0.02395871409914178947550918280 2.01918461425e07 11 0.04473628295735141684956930350 0.02395872707857492077823017572 1.49608052512e08 12 0.04473628240606795227909993835 0.02395872804025939239070240562 1.10849018094e09 13 0.04473628236522174893968490660 0.02395872811151340584997873671 8.21312776066e11 14 0.04473628236219533441993488016 0.02395872811679282384392635644 6.08534626789e12 15 0.04473628236197109852827296072 0.02395872811718399134381489950 4.50881079752e13 16 0.04473628236195448423608868001 0.02395872811721297408644352622 3.34070961782e14 17 0.04473628236195325323465204219 0.02395872811721512150250060613 2.47522933467e15 
#2




Re: Q5 *answer *
why are you taking the square root of the Ein?

#3




Re: *ANSWER* q5
As E is a vector I took the error to be the difference between two iterations. So based on that I took the difference of u/u' and v/v' then summed the square of these values and took the square root to get the combined length, taking this to be the error.
I see if I remove the sqrt and just take the error as the sum of the differences squared then I get the correct answer. Have I just misunderstood how to calculate an error in gradient descent? If I have then if someone could point out the particular slide or section of the lecture to rewatch I would be grateful. Alternatively if this is just something I should have known then a link to something explaining how to calculate the error would be great. Thank you. 
#4




Re: *ANSWER* q5

#5




Re: *ANSWER* q5
Of course, it's so obvious now that you've pointed it out and I've reread the question. Thank you.

Thread Tools  
Display Modes  

