LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Reply
 
Thread Tools Display Modes
  #1  
Old 05-15-2013, 05:18 PM
jlaurentum jlaurentum is offline
Member
 
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Machine Learning and census models

Hello All:

I have the data for 14 venezuelan census that were done between 1873 and 2011. That is, I have N=14 data points.

I am fitting this data to several different models (P is the population function)
  1. The Malthus model, with one parameter \frac{dP}{dt}=rP. This model implies unlimited "runaway" exponential growth. The parameter is r.
  2. the Verlhulst (logistic) model, with 2 parameters, the growth rate r and the carrying capacity K : \frac{dP}{dt}=rP\left(1-\frac{P}{K}\right). For the Venezuelan data, this model behaves pretty much like the Malthus model because it predicts a ridiculously high carrying capacity.
  3. In order to overcome the limitations on the midpopulation point K/2 being at the middle of the sigmoidal Verhulst curve, there are other models such as the Richards, Blumberg and the Generalized Logistic Growth model. Each of these comprises more parameters - 3 parameters for the Richards model, 4 for the Bloomberg model and 5 for the Generalized Logistic growth model which encompasses the previous 2 and looks like this: \frac{dP}{dt}=rP^\alpha\left(1-\left(\frac{P}{K}\right)^\beta\right)^\gamma.
  4. There is another model I tried out: a 2 compartiment model where each compartiment behaves like a Verlhulst model so this 2 compartiment model has 4 parameters overall. This is the model that gave the best fit: almost no in-sample error.

So I have 14 data points and I'm estimating 2, 3 4 and even 5 parameters... Is this a ridiculously hopeless situation? By the way, the way I fit these models is by "running" them (dynamic simulation) using numerical differential equation solvers and fitting the parameters using nonlinear optimization. I don't know if this can be considered a data learning situation- in some cases, like in the Generalized Logistic model, the search space is very complex and there are many many local minima where the optimization algorithms settle, in other cases the optimization algorithm converges into a single minimum error point. At any rate, the situation is that I have some data points and different models with different VC dimentions, hence I made the analogy with data learning...

What I'd like to do is to be able to generalize and predict with a certain confidence level once I have a "good" model: what is the population carrying capacity the model predicts? when is the inflection point reached? What are the implications of that on the demographic transition stages the country goes through? etc. etc. I realize that there is no possible way to predict this given the possibility of wars, famine, epidemics that can drastically alter population dynamics and cannot be accounted for by any model. However, under the supposition that whatever the mechanisms of populations dynamics that any model implies that have been at work in the past will continue to be in effect in the future, can the data learning tools provide any way to validate such models? Or is the situation hopeless given the small N and large VC dimensions?
Reply With Quote
  #2  
Old 05-17-2013, 02:32 AM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: Machine Learning and census models

My opinion is: Without enough data, it would be difficult for Machine Learning/Data Mining or even Human Intelligence to reach any conclusive statement about the process behind. Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
  #3  
Old 05-17-2013, 07:17 PM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Machine Learning and census models

How about this approach:

1. Don't look at the data! If you have looked at the data, find a machine learning expert who has not looked at the data and ask him to do it for you. [Unless you have some method of forgetting what you have seen, that is.]

2. Pick a learning method suited to the size of the data set and use leave-one-out cross-validation to find the optimal hypothesis.

The interesting question is what learning method in the second part. Something pretty general and regularized.

3. Bear in mind that extrapolation of non-stationary processes is not necessarily possible (the cross validation has an easy time of it, because most of the data points are internal).
Reply With Quote
  #4  
Old 05-18-2013, 03:39 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Machine Learning and census models

On reflection, I suspect that if the aim is extrapolation into the future a more principled alternative to function approximation with leave-one-out cross-validation might be a variation where the in sample data for each run consists only of the data preceding the out of sample data point. (The reason is that as fitting intermediate points is easier, it could easily lead to overfitting).

Has anyone any views on this rather important general situation?
Reply With Quote
  #5  
Old 05-19-2013, 02:35 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,476
Default Re: Machine Learning and census models

Quote:
Originally Posted by Elroch View Post
On reflection, I suspect that if the aim is extrapolation into the future a more principled alternative to function approximation with leave-one-out cross-validation might be a variation where the in sample data for each run consists only of the data preceding the out of sample data point.
In financial time series, the data set is often constructed along these lines, where a sliding window of say T points is used to construct the training examples, with the first T-1 points taken as input and the Tth point taken as output.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #6  
Old 05-19-2013, 05:18 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Machine Learning and census models

Quote:
Originally Posted by yaser View Post
In financial time series, the data set is often constructed along these lines, where a sliding window of say T points is used to construct the training examples, with the first T-1 points taken as input and the Tth point taken as output.
Indeed: I have seen some papers taking this natural approach effectively. Since a predictive method is never going to have anything but past data, at least this must be part of the design.

However, with much more data than the 14 points described here, methods of the type I have researched in the past use two levels of design (similar to those discussed in your lectures and examples about cross-validation) to select the class of model (but generally for hyperparameters, rather than choice of the order of a polynomial approximation). The input data may be N consecutive values in the past or some other derived data. The predicted data may be some single value in the future. A data set consists of thousands of such data points of T inputs and one output, constructed from a single data set.

The aim is still to be able to predict the output from some new set of T inputs. It is in the generation of a method to do this that my point arose (and I spotted jlaurentum's example is analogous).

One approach looks at the several thousand data points as a sample from a distribution and applies cross validation followed by training with all of the data to generate a hypothesis for that distribution. This seems fine, but has an implicit assumption of stationarity. When intermediate points are being used to validate, future points are being used to generate the model. But the behaviour of the system in the future may depend on how it has behaved at intermediate points (this is how people make their decisions about what to do, and what their programs are using as data). Hence there is a subtle cheat going on here related to non-stationarity. Whether it is harmful or not, I am not yet sure.

This is how I arrived at the alternative approach of only using validation data that is in the future of the data used to select hyperparameters, as well as the generation of the final model. This is something rather like cross validation, but different (as the data used in training is restricted). The choice of the training and validation window sizes is interesting: in both cases there is a compromise between wanting it to have lots of points, but to not extend over too much time (due to non-stationarity). It has some similarity to a method used by technical traders called walk-forward optimisation which uses future out of sample errors to validate a method.

This modified approach is not necessarily completely immune from non-stationarity. It relies on the models being used being sophisticated enough to capture the changing behaviour of the system as a whole, so may fail if this becomes not true at some time.

I am not aware if there is much published on the way in which non-stationarity plays a role in this scenario.

One thing that strikes me is the way that the selection of hyperparameters through cross-validation (or the variant above) and the generation of a predictive hypothesis breaks the learning process up into two separate parts in quite a surprising way (even though a very neat and effective one). I wonder whether there are any other alternative ways to organise the overall learning process that might give good results?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:33 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.