![]() |
#1
|
|||
|
|||
![]()
From the results of running 1 vs 5, I met one of the possible answers, but it seems that is not correct. I can not say anything more for now... Maybe I made a mistake, but in the other questions related it seems I have the right answers.
|
#2
|
|||
|
|||
![]()
My results also ended up pointing into only one of the hypothesis. But there can be some mistake in the code -- only 100% sure after submission
![]() ![]() |
#3
|
|||
|
|||
![]()
The same here, simulation points to one of the answers. For some reason, I am not able to take comfort in that. I also did run my simulation for different values of lambda and they all seem to point to that same answer.
|
#4
|
|||
|
|||
![]()
When i ran the simulation I am getting 2 answers that matches my result. Now I am really confused how to proceed. Can somebody help me which one I should select? I am getting these answers repeatedly. Is there some rule on how many iterations Gradient descent should be run? I see the gradient descent error keeps decreasing even after 2000 runs. It make no difference in the outcome though.
|
#5
|
|||
|
|||
![]()
If the error keeps dropping, I would keep going. I didn't use gradient descent, myself. I solved it two ways, getting the same answer with both methods. First, I noted that it could be solved by quadratic programming. Second, I fed it to a (non-gradient) conjugate direction set routine that I've had lying around for years.
|
#6
|
|||
|
|||
![]()
Is it also possible to use the regularized normal equation? I'm looking at Lecture 12, Slide 11.
It seems funny to me to choose the parameters to minimize one error measure (mean square), yet evaluate {E_in} and {E_out} using another (binary classification). |
#7
|
||||
|
||||
![]() Quote:
Quote:
__________________
Where everyone thinks alike, no one thinks very much |
#8
|
|||
|
|||
![]()
Thank you professor, let me try the analytical solution approach and see if I get a different results. I am still getting 2 right answers from the quiz.
|
#9
|
|||
|
|||
![]()
Regularised Linear regression is called ridge regression in the stats literature
you can just use whatever code you use to do linear least squares by adding dummy data( see data augmentation) in link below http://www-stat.stanford.edu/~owen/c...larization.pdf no need for quadratic programming / stoch descent etc |
#10
|
||||
|
||||
![]() Quote:
__________________
Where everyone thinks alike, no one thinks very much |
![]() |
Tags |
question 10 |
Thread Tools | |
Display Modes | |
|
|