View Single Post
  #5  
Old 05-05-2016, 10:57 AM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Re: Help in understanding proof for VC-dimension of perceptron.

Quote:
Originally Posted by MaciekLeks View Post
Context:
To show that d_{vc}\leq d+1 we need to prove we cannot shatter any set od d+2 points. In this case we have d+2 vectors of \mathbf{x}_{i}, where 1\leq i\leq d+2, hence we have +1 vector than the total number of dimensions (our space is d+1 -dimensional). According to the theory of vector spaces when we have more vectors than dimensions of the space then they are linearly dependent (one of the vectors is a linear combination of the others). Which means that for instance our extra point \mathbf{x}_{d+2}, can be computed by the equation (in d+1-dimensional space): \mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{x}_{i}, (where not all a_{i} coefficients are zeros).
We need to show that we can't get \mathbf{y}=\begin{bmatrix}\pm1\\\pm1\\\pm1\\\pm1\\\pm1\end{bmatrix} ( 2^{d+2}) dichotomies, where \mathbf{y} is of d+2 size.

Problem:

And now let me define my problem I have with understanding the video proof. Why do we correlate a_{i} coefficient with the y_{i}=sign(a_{i}) while they don't correlate? On what basis we can draw that sign(a_{i})=sign(\mathbf{w}^{T}\mathbf{x}_{i}), and that's why \mathbf{w}^{T}\mathbf{x}_{d+2}=\sum_{i=1}^{d+1}a_{i}\mathbf{w}^{T}\mathbf{x}_{i}>0, hence we can't generate \mathbf{y}=\begin{bmatrix}\pm1\\\pm1\\...\\\pm1\\-1\end{bmatrix}?
Here is my understanding:

For any data set of size d + 2, we must have linear dependence, and the question is: With such inevitable linear dependence, can we find at least a specific data set that can be implemented 2^N dichotomies? The video lecture shows that for any data set of size d + 2, there are some dichotomies (specific to the data set) that the perceptron cannot implement, hence there's no such a data set of size d + 2 can be shattered by the perceptron hypothesis set.

The proof tries to consider some dichotomies (specific to the data set) have two following properties:
- x_{i} with non-zero a_{i} get y_{i} = sign(a_{i}).
- x_{j} gets y_{j} = -1.

Hope this helps.
Reply With Quote