LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 2 - Training versus Testing (http://book.caltech.edu/bookforum/forumdisplay.php?f=109)
-   -   Help in understanding proof for VC-dimension of perceptron. (http://book.caltech.edu/bookforum/showthread.php?t=4609)

yongxien 07-23-2015 12:20 AM

Help in understanding proof for VC-dimension of perceptron.
 
https://work.caltech.edu/library/072.pdf

I am referring to the slides given in the link above.

I have a few questions regarding this proof:
1. Is the matrix invertible because of the way we construct it , such that it is lower triangular. If it is the case, I don't see why it does not work for d+2 or any other k > dimension of the perceptron.
2. Why the second part of the proof d + 1 >= d_vc not work for the case when k = d + 1 but only d +2?
3. I don't understand the statement why more points than dimension means we must have x_j = \sigma a_i x_i? (btw, d+1 is also more points than dimension)

yongxien 07-23-2015 01:01 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
I don't understand the second part of the proof too. Why y_i = sign(w^T x_i) = sign(a_i).

yaser 07-23-2015 02:20 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Quote:

Originally Posted by yongxien (Post 11986)
1. Is the matrix invertible because of the way we construct it , such that it is lower triangular. If it is the case, I don't see why it does not work for d+2 or any other k > dimension of the perceptron.

The number of columns is restricted to d+1 (length of the input vector) so the matrix would not be square if we had d+2 rows.

Quote:

2. Why the second part of the proof d + 1 >= d_vc not work for the case when k = d + 1 but only d +2?
Because if you have only d+1 vectors, they can be linearly independent. The inevitable linear dependence of any d+2 vectors is what makes that part of the proof work.

Quote:

3. I don't understand the statement why more points than dimension means we must have x_j = \sigma a_i x_i? (btw, d+1 is also more points than dimension)
This is a consequence of the basic linear algebra result mentioned in the response to point 2. As for the other remark, the length of the ector is d+1, so d+1 (because of the zeroth coordinate) would not be more points than dimensions.

MaciekLeks 05-05-2016 07:55 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Context:
To show that d_{vc}\leq d+1 we need to prove we cannot shatter any set od d+2 points. In this case we have d+2 vectors of \mathbf{x}_{i}, where 1\leq i\leq d+2, hence we have +1 vector than the total number of dimensions (our space is d+1 -dimensional). According to the theory of vector spaces when we have more vectors than dimensions of the space then they are linearly dependent (one of the vectors is a linear combination of the others). Which means that for instance our extra point \mathbf{x}_{d+2}, can be computed by the equation (in d+1-dimensional space): \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).
We need to show that we can't get \mathbf{y}=\begin{bmatrix}\pm1\\
\pm1\\
\pm1\\
\pm1\\
\pm1
\end{bmatrix} ( 2^{d+2}) dichotomies, where \mathbf{y} is of d+2 size.

Problem:

And now let me define my problem I have with understanding the video proof. Why do we correlate a_{i} coefficient with the y_{i}=sign(a_{i}) while they don't correlate? On what basis we can draw that sign(a_{i})=sign(\mathbf{w}^{T}\mathbf{x}_{i}), and that's why \mathbf{w}^{T}\mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{w}^{T}\mathbf{x}_{i}>0, hence we can't generate \mathbf{y}=\begin{bmatrix}\pm1\\
\pm1\\
...\\
\pm1\\
-1
\end{bmatrix}?

ntvy95 05-05-2016 10:57 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Quote:

Originally Posted by MaciekLeks (Post 12334)
Context:
To show that d_{vc}\leq d+1 we need to prove we cannot shatter any set od d+2 points. In this case we have d+2 vectors of \mathbf{x}_{i}, where 1\leq i\leq d+2, hence we have +1 vector than the total number of dimensions (our space is d+1 -dimensional). According to the theory of vector spaces when we have more vectors than dimensions of the space then they are linearly dependent (one of the vectors is a linear combination of the others). Which means that for instance our extra point \mathbf{x}_{d+2}, can be computed by the equation (in d+1-dimensional space): \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).
We need to show that we can't get \mathbf{y}=\begin{bmatrix}\pm1\\\pm1\\\pm1\\\pm1\\\pm1\end{bmatrix} ( 2^{d+2}) dichotomies, where \mathbf{y} is of d+2 size.

Problem:

And now let me define my problem I have with understanding the video proof. Why do we correlate a_{i} coefficient with the y_{i}=sign(a_{i}) while they don't correlate? On what basis we can draw that sign(a_{i})=sign(\mathbf{w}^{T}\mathbf{x}_{i}), and that's why \mathbf{w}^{T}\mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{w}^{T}\mathbf{x}_{i}>0, hence we can't generate \mathbf{y}=\begin{bmatrix}\pm1\\\pm1\\...\\\pm1\\-1\end{bmatrix}?

Here is my understanding:

For any data set of size d + 2, we must have linear dependence, and the question is: With such inevitable linear dependence, can we find at least a specific data set that can be implemented 2^N dichotomies? The video lecture shows that for any data set of size d + 2, there are some dichotomies (specific to the data set) that the perceptron cannot implement, hence there's no such a data set of size d + 2 can be shattered by the perceptron hypothesis set.

The proof tries to consider some dichotomies (specific to the data set) have two following properties:
- x_{i} with non-zero a_{i} get y_{i} = sign(a_{i}).
- x_{j} gets y_{j} = -1.

Hope this helps.

MaciekLeks 05-06-2016 01:40 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Quote:

Originally Posted by ntvy95 (Post 12336)
Here is my understanding:

For any data set of size d + 2, we must have linear dependence, and the question is: With such inevitable linear dependence, can we find at least a specific data set that can be implemented 2^N dichotomies? The video lecture shows that for any data set of size d + 2, there are some dichotomies (specific to the data set) that the perceptron cannot implement, hence there's no such a data set of size d + 2 can be shattered by the perceptron hypothesis set.

I agree. I tried to write it in Context part of my post.

Quote:

Originally Posted by ntvy95 (Post 12336)
The proof tries to consider some dichotomies (specific to the data set) have two following properties:
- x_{i} with non-zero a_{i} get y_{i} = sign(a_{i}).
- x_{j} gets y_{j} = -1.

Hope this helps.

Unfortunately it does not help. I understand the assumption, but I do not understand the source of confidence that we can correlate a_{i} with perceptron outputs.

ntvy95 05-06-2016 02:42 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Quote:

Originally Posted by MaciekLeks (Post 12339)
Yes, I tried to write it in Context part of my post.



Unfortunately it does not help. I understand the assumption, but I do not understand the source of confidence that we can correlate a_{i} with perceptron outputs.

I'm not sure if I understand your point right:

If the perceptron hypothesis set can shatter the data set then for the same x_{i}, there exists w' and w'' such that sign(w'x_{i}) = 1 and sign(w''x_{i}) = -1. It means that if the data set is shatterable by the perceptron hypothesis set, for any x_{i} in the data set, we can choose the case to be y_{i} = 1 or y_{i} = -1 and be sure that there exists at least one hypothesis from perceptron hypothesis set can implement the case we have chosen.

Instead of choosing y_{i} = 1 or y_{i} = -1 explicitly, the proof let the choice for the value of y_{i} depends on the value of a_{i}: y_{i} = sign(a_{i}), because for whatever the real value of a_{i} is, sign(a_{i}) only has value of 1 or -1, hence y_{i} = 1 or y_{i} = -1. In my understanding, this dependence does not make the chosen dichotomy invalid.

MaciekLeks 05-06-2016 04:34 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Quote:

Originally Posted by ntvy95 (Post 12340)
Instead of choosing y_{i} = 1 or y_{i} = -1 explicitly, the proof let the choice for the value of y_{i} depends on the value of a_{i}: y_{i} = sign(a_{i}), because for whatever the real value of a_{i} is, sign(a_{i}) only has value of 1 or -1, hence y_{i} = 1 or y_{i} = -1. In my understanding, this dependence does not make the chosen dichotomy invalid.

That's the point. This part is crucial: "the proof let the choice for the value of y_{i} depends on the value of a_{i}". The only certain correlation is y_{i}=sign(\mathbf{w}^{T}\mathbf{x}_{i})). How do we know (definition, theorem, lemma,...) that sign(a_{i}) works the same as sign(\mathbf{w}^{T}\mathbf{x}_{i})? IMHO it cannot be drawn from the \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).

ntvy95 05-06-2016 09:08 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Quote:

Originally Posted by MaciekLeks (Post 12341)
That's the point. This part is crucial: "the proof let the choice for the value of y_{i} depends on the value of a_{i}". The only certain correlation is y_{i}=sign(\mathbf{w}^{T}\mathbf{x}_{i})). How do we know (definition, theorem, lemma,...) that sign(a_{i}) works the same as sign(\mathbf{w}^{T}\mathbf{x}_{i})? IMHO it cannot be drawn from the \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).

I'm not sure if this is the point you are talking about: Here is my understanding:

If the perceptron hypothesis set can shatter the data set then: If y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0 then there exists such a w that satisfies sign(w^{T}x_{i}) = sign(a_{i}) for every x_{i} that a_{i} \neq 0. In other words, we first have y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0, then we must be able to find a w that satisfies the equation y_{i} = sign(a_{i}) = sign(w^{T}x_{i}) if the perceptron hypothesis set can shatter the data set.

MaciekLeks 05-09-2016 08:18 AM

Re: Help in understanding proof for VC-dimension of perceptron.
 
Quote:

Originally Posted by ntvy95 (Post 12342)
I'm not sure if this is the point you are talking about: Here is my understanding:

If the perceptron hypothesis set can shatter the data set then: If y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0 then there exists such a w that satisfies sign(w^{T}x_{i}) = sign(a_{i}) for every x_{i} that a_{i} \neq 0. In other words, we first have y_{i} = sign(a_{i}) for every x_{i} that a_{i} \neq 0, then we must be able to find a w that satisfies the equation y_{i} = sign(a_{i}) = sign(w^{T}x_{i}) if the perceptron hypothesis set can shatter the data set.

You've built up the intuition for the proof. I knew this, but when I read what you wrote, I started to have less doubt. Thanks for that. I still have some doubt, but it's even hard to describe. It's high time to move on.


All times are GMT -7. The time now is 09:19 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.