LFD Book Forum Lecture 3 Q&A independence of parameter inputs
 Register FAQ Calendar Mark Forums Read

#1
04-10-2013, 02:55 PM
 Moobb Junior Member Join Date: Apr 2013 Posts: 9
Lecture 3 Q&A independence of parameter inputs

There is a discussion about the importance of having independent input data and how this propagates to features. Is it true that features necessarily inherit independence from data? If they don't, how bad is that? For example, in Finance there are quite a few studies using support vector machines using a grid defined by different moving averages, which overlap (1w, 1m, etc). In this case the features are clearly not independent. Would this be seen as a questionable procedure?
#2
04-10-2013, 03:34 PM
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143
Re: Lecture 3 Q&A independence of parameter inputs

Quote:
 Originally Posted by Moobb There is a discussion about the importance of having independent input data and how this propagates to features. Is it true that features necessarily inherit independence from data? If they don't, how bad is that? For example, in Finance there are quite a few studies using support vector machines using a grid defined by different moving averages, which overlap (1w, 1m, etc). In this case the features are clearly not independent. Would this be seen as a questionable procedure?
Could you be more precise about which place in the book or lectures you are referring to regarding independence?

With regard to the choice of features for representing financial data, it is not difficult to remove the more obvious dependencies, but it is not clear that this is crucial. As an analog, suppose you have a basis for the plane (1,0) and (1,1). There is clearly a correlation between these two axes in your sense, but a simple linear transformation to a basis of (1,0) and (0,1) gets rid of it. If you are going to use kernels, you will be permitting many transformations of this type or others. The same is true of moving averages, where you can replace them with carefully chosen differences between them if you wish, but it may not be crucial.
#3
04-10-2013, 04:08 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: Lecture 3 Q&A independence of parameter inputs

Quote:
 Originally Posted by Moobb There is a discussion about the importance of having independent input data and how this propagates to features. Is it true that features necessarily inherit independence from data? If they don't, how bad is that? For example, in Finance there are quite a few studies using support vector machines using a grid defined by different moving averages, which overlap (1w, 1m, etc). In this case the features are clearly not independent. Would this be seen as a questionable procedure?
Could you use the [lecture3] macro (see the "Including a lecture video segment" thread at the top) to pinpoint the part you are referring to? Thank you.
__________________
Where everyone thinks alike, no one thinks very much
#4
04-10-2013, 06:31 PM
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143
Re: Lecture 3 Q&A independence of parameter inputs

Moobb, having rewatched the Q&A, my understanding is this. The independence that is important is that the input points are independently selected. Intuitively, they are a representative sample, rather than one which gives disproportionate importance to some region of the input space.

With regard to the features, these are a generalisation of co-ordinates which are used to describe the input data points (eg the value of a moving average is a feature which can be thought of as a co-ordinate, even though it is defined in terms of many co-ordinates). The independence that is preserved after a transformation is the independence between the data points, not the features: the set of points remains a representative sample of the (transformed) space of possible inputs.
#5
04-11-2013, 01:32 AM
 Moobb Junior Member Join Date: Apr 2013 Posts: 9
Re: Lecture 3 Q&A independence of parameter inputs

Many thanks for your answer and sorry for not including the reference for the lecture, it is 1:10:40 (can't include the tag directly right now). I believe I understood it now: if the input points are not independent than chances are it won't generalise well for the full set of possible inputs (taking an example from number identification, if you increase the size of number 8 by a factor of two, you won't learn anything new by doing so). Using the analogy to the coordinate systems, if the features are not independent than you may have less information than you suppose to have, but it may still be more practical than devising a feature that automatically incorporates only new elements, in practice the algorithm will benefit only from the new information incorporated from the feature. Guess there is a practical limit in terms of model complexity at some point? Or that you may end up incorporating just noise, so the use sometimes of dimensionality reduction prior to establishing your features? Thanks again!
#6
04-11-2013, 01:49 AM
 Rahul Sinha Junior Member Join Date: Apr 2013 Posts: 9
Re: Lecture 3 Q&A independence of parameter inputs

Quote:
 Originally Posted by Elroch Moobb, having rewatched the Q&A, my understanding is this. The independence that is important is that the input points are independently selected. Intuitively, they are a representative sample, rather than one which gives disproportionate importance to some region of the input space. With regard to the features, these are a generalisation of co-ordinates which are used to describe the input data points (eg the value of a moving average is a feature which can be thought of as a co-ordinate, even though it is defined in terms of many co-ordinates). The independence that is preserved after a transformation is the independence between the data points, not the features: the set of points remains a representative sample of the (transformed) space of possible inputs.
Awesome explanation.

To add an example: Consider a Gaussian distribution with non Diagonal covariance matrix in 2D space. It is obvious that Features (read axis) are correlated or non-independent. Performing a change of co-ordinate system, let's now have the eigenvector directions as the new co-ordinate system. No information is lost in the transformation (The space did not shrink or expand!) but now we have independent orthonomal co-ordinates. As pointed out, what is preserved is the "independence between the data points not the features".
#7
04-11-2013, 06:40 AM
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143
Re: Lecture 3 Q&A independence of parameter inputs

First, let's try this neat lecture tag with the time Moobb gave (converted to seconds):

Quote:
 Originally Posted by Moobb Many thanks for your answer and sorry for not including the reference for the lecture, it is 1:10:40 (can't include the tag directly right now). I believe I understood it now: if the input points are not independent than chances are it won't generalise well for the full set of possible inputs (taking an example from number identification, if you increase the size of number 8 by a factor of two, you won't learn anything new by doing so). Using the analogy to the coordinate systems, if the features are not independent than you may have less information than you suppose to have, but it may still be more practical than devising a feature that automatically incorporates only new elements, in practice the algorithm will benefit only from the new information incorporated from the feature. Guess there is a practical limit in terms of model complexity at some point? Or that you may end up incorporating just noise, so the use sometimes of dimensionality reduction prior to establishing your features? Thanks again!
Yes, non-independence of input data damages generalisation. But reversible transformations of any type don't reduce (or increase) the information content. In principle, there can never be a disadvantage in having extra features any more than there is a disadvantage in having more input points (you can just ignore some of them if you like), but some methods may not perform so well if you use more features.

Regarding financial time series, the assumptions of machine learning are not strictly true. It is recognised empirically that there is non-stationary behaviour (P(x) and P(y | x) change somewhat over time). [This may apply more to P(x) than P(y | x). For example, if you use 1990s data to create a model of the stock market, then used it in 2000, you might experience unexpected behaviour because you had never seen such market conditions before: the input data points could be a long way from any you had seen before. This is essentially the same reason that traders using any methods may lose when market conditions change a lot]

As well as this, there is the non-deterministic component of the behavior. This effectively reduces the size of the input data set for the purpose of predicting the deterministic component (which is the main aim). This may be an issue, since the total amount of data available is rather limited. [If you have a stationary process and as much data as you want, I believe you can get as close to perfect knowledge of the deterministic component as you wish. But of course the non-deterministic component remains here as well].
Quote:
 Originally Posted by Rahul Sinha Awesome explanation. To add an example: Consider a Gaussian distribution with non Diagonal covariance matrix in 2D space. It is obvious that Features (read axis) are correlated or non-independent. Performing a change of co-ordinate system, let's now have the eigenvector directions as the new co-ordinate system. No information is lost in the transformation (The space did not shrink or expand!) but now we have independent orthonomal co-ordinates. As pointed out, what is preserved is the "independence between the data points not the features".

For more see:
http://en.wikipedia.org/wiki/Multiva...l_distribution
https://en.wikipedia.org/wiki/Princi...onent_analysis
https://www.cs.princeton.edu/courses...notes/0419.pdf
http://www.cs.unm.edu/~williams/cs530/kl3.pdf
Bottom line is that it is always theoretically possible to remove the linear correlations between features using the transformation given by principal component analysis, but they can only be made independent in the case where they are jointly normally distributed (this means that any linear combination of them is normal).

Moreover, this is always a reversible process preserving the independence of your input data points.

As a footnote it is worth mentioning that a key reason market prediction is not entirely hopeless is that they do not exhibit perfect gaussian behaviour. It seems that the non-stationary behaviour is more important than the Hurst exponent being permanently greater than (or less than) 0.5 (it would be 0.5 for simple Brownian motion). eg see http://www.optimaltrader.net/old/hur...ictability.pdf
#8
04-11-2013, 08:34 PM
 Moobb Junior Member Join Date: Apr 2013 Posts: 9
Re: Lecture 3 Q&A independence of parameter inputs

Elroch, thank you so much for your help. Regarding your point about non stationarity and the difficulty that introduces for financial forecasting, do you see that as necessarily invalidating any attempt towards machine learning forecasting in Finace? Could it be that the time series itself is non stationary, but some specific patterns within it (which people try to capture with technical indicators for example) are stable? Those technical indicators would than be your features and maybe when we conditional on them your time series become more stationary? I think another main use in Finance is in terms of classification, which can then be used for portfolio allocation for example.
Many thanks again!!

 Thread Tools Display Modes Hybrid Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 03:32 AM.