LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 2 - Training versus Testing

Reply
 
Thread Tools Display Modes
  #1  
Old 07-23-2015, 12:20 AM
yongxien yongxien is offline
Junior Member
 
Join Date: Jun 2015
Posts: 8
Default Help in understanding proof for VC-dimension of perceptron.

https://work.caltech.edu/library/072.pdf

I am referring to the slides given in the link above.

I have a few questions regarding this proof:
1. Is the matrix invertible because of the way we construct it , such that it is lower triangular. If it is the case, I don't see why it does not work for d+2 or any other k > dimension of the perceptron.
2. Why the second part of the proof d + 1 >= d_vc not work for the case when k = d + 1 but only d +2?
3. I don't understand the statement why more points than dimension means we must have x_j = \sigma a_i x_i? (btw, d+1 is also more points than dimension)
Reply With Quote
  #2  
Old 07-23-2015, 01:01 AM
yongxien yongxien is offline
Junior Member
 
Join Date: Jun 2015
Posts: 8
Default Re: Help in understanding proof for VC-dimension of perceptron.

I don't understand the second part of the proof too. Why y_i = sign(w^T x_i) = sign(a_i).
Reply With Quote
  #3  
Old 07-23-2015, 02:20 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by yongxien View Post
1. Is the matrix invertible because of the way we construct it , such that it is lower triangular. If it is the case, I don't see why it does not work for d+2 or any other k > dimension of the perceptron.
The number of columns is restricted to d+1 (length of the input vector) so the matrix would not be square if we had d+2 rows.

Quote:
2. Why the second part of the proof d + 1 >= d_vc not work for the case when k = d + 1 but only d +2?
Because if you have only d+1 vectors, they can be linearly independent. The inevitable linear dependence of any d+2 vectors is what makes that part of the proof work.

Quote:
3. I don't understand the statement why more points than dimension means we must have x_j = \sigma a_i x_i? (btw, d+1 is also more points than dimension)
This is a consequence of the basic linear algebra result mentioned in the response to point 2. As for the other remark, the length of the ector is d+1, so d+1 (because of the zeroth coordinate) would not be more points than dimensions.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #4  
Old 05-05-2016, 07:55 AM
MaciekLeks MaciekLeks is offline
Member
 
Join Date: Jan 2016
Location: Katowice, Upper Silesia, Poland
Posts: 17
Default Re: Help in understanding proof for VC-dimension of perceptron.

Context:
To show that d_{vc}\leq d+1 we need to prove we cannot shatter any set od d+2 points. In this case we have d+2 vectors of \mathbf{x}_{i}, where 1\leq i\leq d+2, hence we have +1 vector than the total number of dimensions (our space is d+1 -dimensional). According to the theory of vector spaces when we have more vectors than dimensions of the space then they are linearly dependent (one of the vectors is a linear combination of the others). Which means that for instance our extra point \mathbf{x}_{d+2}, can be computed by the equation (in d+1-dimensional space): \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).
We need to show that we can't get \mathbf{y}=\begin{bmatrix}\pm1\\
\pm1\\
\pm1\\
\pm1\\
\pm1
\end{bmatrix} ( 2^{d+2}) dichotomies, where \mathbf{y} is of d+2 size.

Problem:

And now let me define my problem I have with understanding the video proof. Why do we correlate a_{i} coefficient with the y_{i}=sign(a_{i}) while they don't correlate? On what basis we can draw that sign(a_{i})=sign(\mathbf{w}^{T}\mathbf{x}_{i}), and that's why \mathbf{w}^{T}\mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{w}^{T}\mathbf{x}_{i}>0, hence we can't generate \mathbf{y}=\begin{bmatrix}\pm1\\
\pm1\\
...\\
\pm1\\
-1
\end{bmatrix}?
Reply With Quote
  #5  
Old 05-05-2016, 10:57 AM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by MaciekLeks View Post
Context:
To show that d_{vc}\leq d+1 we need to prove we cannot shatter any set od d+2 points. In this case we have d+2 vectors of \mathbf{x}_{i}, where 1\leq i\leq d+2, hence we have +1 vector than the total number of dimensions (our space is d+1 -dimensional). According to the theory of vector spaces when we have more vectors than dimensions of the space then they are linearly dependent (one of the vectors is a linear combination of the others). Which means that for instance our extra point \mathbf{x}_{d+2}, can be computed by the equation (in d+1-dimensional space): \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).
We need to show that we can't get \mathbf{y}=\begin{bmatrix}\pm1\\\pm1\\\pm1\\\pm1\\\pm1\end{bmatrix} ( 2^{d+2}) dichotomies, where \mathbf{y} is of d+2 size.

Problem:

And now let me define my problem I have with understanding the video proof. Why do we correlate a_{i} coefficient with the y_{i}=sign(a_{i}) while they don't correlate? On what basis we can draw that sign(a_{i})=sign(\mathbf{w}^{T}\mathbf{x}_{i}), and that's why \mathbf{w}^{T}\mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{w}^{T}\mathbf{x}_{i}>0, hence we can't generate \mathbf{y}=\begin{bmatrix}\pm1\\\pm1\\...\\\pm1\\-1\end{bmatrix}?
Here is my understanding:

For any data set of size d + 2, we must have linear dependence, and the question is: With such inevitable linear dependence, can we find at least a specific data set that can be implemented 2^N dichotomies? The video lecture shows that for any data set of size d + 2, there are some dichotomies (specific to the data set) that the perceptron cannot implement, hence there's no such a data set of size d + 2 can be shattered by the perceptron hypothesis set.

The proof tries to consider some dichotomies (specific to the data set) have two following properties:
- x_{i} with non-zero a_{i} get y_{i} = sign(a_{i}).
- x_{j} gets y_{j} = -1.

Hope this helps.
Reply With Quote
  #6  
Old 05-06-2016, 01:40 AM
MaciekLeks MaciekLeks is offline
Member
 
Join Date: Jan 2016
Location: Katowice, Upper Silesia, Poland
Posts: 17
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by ntvy95 View Post
Here is my understanding:

For any data set of size d + 2, we must have linear dependence, and the question is: With such inevitable linear dependence, can we find at least a specific data set that can be implemented 2^N dichotomies? The video lecture shows that for any data set of size d + 2, there are some dichotomies (specific to the data set) that the perceptron cannot implement, hence there's no such a data set of size d + 2 can be shattered by the perceptron hypothesis set.
I agree. I tried to write it in Context part of my post.

Quote:
Originally Posted by ntvy95 View Post
The proof tries to consider some dichotomies (specific to the data set) have two following properties:
- x_{i} with non-zero a_{i} get y_{i} = sign(a_{i}).
- x_{j} gets y_{j} = -1.

Hope this helps.
Unfortunately it does not help. I understand the assumption, but I do not understand the source of confidence that we can correlate a_{i} with perceptron outputs.
Reply With Quote
  #7  
Old 05-06-2016, 02:42 AM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by MaciekLeks View Post
Yes, I tried to write it in Context part of my post.



Unfortunately it does not help. I understand the assumption, but I do not understand the source of confidence that we can correlate a_{i} with perceptron outputs.
I'm not sure if I understand your point right:

If the perceptron hypothesis set can shatter the data set then for the same x_{i}, there exists w' and w'' such that sign(w'x_{i}) = 1 and sign(w''x_{i}) = -1. It means that if the data set is shatterable by the perceptron hypothesis set, for any x_{i} in the data set, we can choose the case to be y_{i} = 1 or y_{i} = -1 and be sure that there exists at least one hypothesis from perceptron hypothesis set can implement the case we have chosen.

Instead of choosing y_{i} = 1 or y_{i} = -1 explicitly, the proof let the choice for the value of y_{i} depends on the value of a_{i}: y_{i} = sign(a_{i}), because for whatever the real value of a_{i} is, sign(a_{i}) only has value of 1 or -1, hence y_{i} = 1 or y_{i} = -1. In my understanding, this dependence does not make the chosen dichotomy invalid.
Reply With Quote
  #8  
Old 05-06-2016, 04:34 AM
MaciekLeks MaciekLeks is offline
Member
 
Join Date: Jan 2016
Location: Katowice, Upper Silesia, Poland
Posts: 17
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by ntvy95 View Post
Instead of choosing y_{i} = 1 or y_{i} = -1 explicitly, the proof let the choice for the value of y_{i} depends on the value of a_{i}: y_{i} = sign(a_{i}), because for whatever the real value of a_{i} is, sign(a_{i}) only has value of 1 or -1, hence y_{i} = 1 or y_{i} = -1. In my understanding, this dependence does not make the chosen dichotomy invalid.
That's the point. This part is crucial: "the proof let the choice for the value of y_{i} depends on the value of a_{i}". The only certain correlation is y_{i}=sign(\mathbf{w}^{T}\mathbf{x}_{i})). How do we know (definition, theorem, lemma,...) that sign(a_{i}) works the same as sign(\mathbf{w}^{T}\mathbf{x}_{i})? IMHO it cannot be drawn from the \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).
Reply With Quote
  #9  
Old 05-06-2016, 09:08 AM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by MaciekLeks View Post
That's the point. This part is crucial: "the proof let the choice for the value of y_{i} depends on the value of a_{i}". The only certain correlation is y_{i}=sign(\mathbf{w}^{T}\mathbf{x}_{i})). How do we know (definition, theorem, lemma,...) that sign(a_{i}) works the same as sign(\mathbf{w}^{T}\mathbf{x}_{i})? IMHO it cannot be drawn from the \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).
I'm not sure if this is the point you are talking about: Here is my understanding:

If the perceptron hypothesis set can shatter the data set then: If y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0 then there exists such a w that satisfies sign(w^{T}x_{i}) = sign(a_{i}) for every x_{i} that a_{i} \neq 0. In other words, we first have y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0, then we must be able to find a w that satisfies the equation y_{i} = sign(a_{i}) = sign(w^{T}x_{i}) if the perceptron hypothesis set can shatter the data set.
Reply With Quote
  #10  
Old 05-09-2016, 08:18 AM
MaciekLeks MaciekLeks is offline
Member
 
Join Date: Jan 2016
Location: Katowice, Upper Silesia, Poland
Posts: 17
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by ntvy95 View Post
I'm not sure if this is the point you are talking about: Here is my understanding:

If the perceptron hypothesis set can shatter the data set then: If y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0 then there exists such a w that satisfies sign(w^{T}x_{i}) = sign(a_{i}) for every x_{i} that a_{i} \neq 0. In other words, we first have y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0, then we must be able to find a w that satisfies the equation y_{i} = sign(a_{i}) = sign(w^{T}x_{i}) if the perceptron hypothesis set can shatter the data set.
You've built up the intuition for the proof. I knew this, but when I read what you wrote, I started to have less doubt. Thanks for that. I still have some doubt, but it's even hard to describe. It's high time to move on.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 08:44 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.