Quote:
Originally Posted by Elroch
Moobb, having rewatched the Q&A, my understanding is this. The independence that is important is that the input points are independently selected. Intuitively, they are a representative sample, rather than one which gives disproportionate importance to some region of the input space.
With regard to the features, these are a generalisation of coordinates which are used to describe the input data points (eg the value of a moving average is a feature which can be thought of as a coordinate, even though it is defined in terms of many coordinates). The independence that is preserved after a transformation is the independence between the data points, not the features: the set of points remains a representative sample of the (transformed) space of possible inputs.

Awesome explanation.
To add an example: Consider a Gaussian distribution with non Diagonal covariance matrix in 2D space. It is obvious that Features (read axis) are correlated or nonindependent. Performing a change of coordinate system, let's now have the eigenvector directions as the new coordinate system. No information is lost in the transformation (The space did not shrink or expand!) but now we have independent orthonomal coordinates. As pointed out, what is preserved is the "independence between the data points
not the features".