LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 4 - Overfitting

Reply
 
Thread Tools Display Modes
  #1  
Old 04-15-2016, 02:50 AM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Exercise 4.6

Hello, I have this answer for the Exercise 4.6 but I'm not sure if it's right?

Because sign(w^{T}x) = sign(\alpha w^{T}x) for any \alpha > 0, very small weights are still as powerful as large weights (all that matters is the accuracy of the calculations that computer being able to perform): That also means a hyperplane can be represented by many hypotheses, constraining the weights can reduce the number of hypotheses represents the same hyperplane. Hence soft-order constraint will be able to reduce the var component while likely not compromising the bias component.

----------------------------------------

Edit: I have just remembered that the growth function has already taken care of the issue many hypotheses representing the same hyperplane (and this issue does not affect the var component anyway (?)). So in this case the answer should be the hard-order constraint...? I'm really confused right now.
Reply With Quote
  #2  
Old 11-08-2016, 02:09 PM
CountVonCount CountVonCount is offline
Member
 
Join Date: Oct 2016
Posts: 17
Default Re: Exercise 4.6

I have the same question. Can someone help here?

From my understanding having small weights is not perfect for sign(s), since this will lead to a signal that is often around 0 and thus a small change of just one input has a high chance to lead to a completely different output, if the sign changes.

So it would be better to have big weights, thus the signal is always pushed to the big number region and the sign is more stable.

But I maybe I'm just wrong here.
Reply With Quote
  #3  
Old 11-09-2016, 05:32 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 590
Default Re: Exercise 4.6

Yes, the soft order constraint does not impact classification. Better regularize with the hard order constraint, or use the soft order constraint with the "regression for classification" algorithm.

Quote:
Originally Posted by ntvy95 View Post
Hello, I have this answer for the Exercise 4.6 but I'm not sure if it's right?

Because sign(w^{T}x) = sign(\alpha w^{T}x) for any \alpha > 0, very small weights are still as powerful as large weights (all that matters is the accuracy of the calculations that computer being able to perform): That also means a hyperplane can be represented by many hypotheses, constraining the weights can reduce the number of hypotheses represents the same hyperplane. Hence soft-order constraint will be able to reduce the var component while likely not compromising the bias component.

----------------------------------------

Edit: I have just remembered that the growth function has already taken care of the issue many hypotheses representing the same hyperplane (and this issue does not affect the var component anyway (?)). So in this case the answer should be the hard-order constraint...? I'm really confused right now.
__________________
Have faith in probability
Reply With Quote
  #4  
Old 11-09-2016, 05:35 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 590
Default Re: Exercise 4.6

Correct again.

So let us differentiate between the theory of machine learning and its implementation on finite precision computers. In theory, if you have an infinite precision machine, then the size of the weights does not matter because it is a mathematical fact that, for positive \alpha,

sign(\alpha w^T x)=sign(w^T x)

In finite precision, you typically want the weights to be around 1 and the inputs rescaled to be around 1 too (this is called input preprocessing and you can read about it in e-Chapter 9).

Quote:
Originally Posted by CountVonCount View Post
I have the same question. Can someone help here?

From my understanding having small weights is not perfect for sign(s), since this will lead to a signal that is often around 0 and thus a small change of just one input has a high chance to lead to a completely different output, if the sign changes.

So it would be better to have big weights, thus the signal is always pushed to the big number region and the sign is more stable.

But I maybe I'm just wrong here.
__________________
Have faith in probability
Reply With Quote
  #5  
Old 11-09-2016, 09:00 AM
CountVonCount CountVonCount is offline
Member
 
Join Date: Oct 2016
Posts: 17
Default Re: Exercise 4.6

Thanks for this clarification. It helps a lot for understanding.
Reply With Quote
  #6  
Old 11-10-2016, 05:44 AM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Re: Exercise 4.6

Thank you very much for your reply!
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:33 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.