View Single Post
Old 05-15-2013, 05:18 PM
jlaurentum jlaurentum is offline
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Machine Learning and census models

Hello All:

I have the data for 14 venezuelan census that were done between 1873 and 2011. That is, I have N=14 data points.

I am fitting this data to several different models (P is the population function)
  1. The Malthus model, with one parameter \frac{dP}{dt}=rP. This model implies unlimited "runaway" exponential growth. The parameter is r.
  2. the Verlhulst (logistic) model, with 2 parameters, the growth rate r and the carrying capacity K : \frac{dP}{dt}=rP\left(1-\frac{P}{K}\right). For the Venezuelan data, this model behaves pretty much like the Malthus model because it predicts a ridiculously high carrying capacity.
  3. In order to overcome the limitations on the midpopulation point K/2 being at the middle of the sigmoidal Verhulst curve, there are other models such as the Richards, Blumberg and the Generalized Logistic Growth model. Each of these comprises more parameters - 3 parameters for the Richards model, 4 for the Bloomberg model and 5 for the Generalized Logistic growth model which encompasses the previous 2 and looks like this: \frac{dP}{dt}=rP^\alpha\left(1-\left(\frac{P}{K}\right)^\beta\right)^\gamma.
  4. There is another model I tried out: a 2 compartiment model where each compartiment behaves like a Verlhulst model so this 2 compartiment model has 4 parameters overall. This is the model that gave the best fit: almost no in-sample error.

So I have 14 data points and I'm estimating 2, 3 4 and even 5 parameters... Is this a ridiculously hopeless situation? By the way, the way I fit these models is by "running" them (dynamic simulation) using numerical differential equation solvers and fitting the parameters using nonlinear optimization. I don't know if this can be considered a data learning situation- in some cases, like in the Generalized Logistic model, the search space is very complex and there are many many local minima where the optimization algorithms settle, in other cases the optimization algorithm converges into a single minimum error point. At any rate, the situation is that I have some data points and different models with different VC dimentions, hence I made the analogy with data learning...

What I'd like to do is to be able to generalize and predict with a certain confidence level once I have a "good" model: what is the population carrying capacity the model predicts? when is the inflection point reached? What are the implications of that on the demographic transition stages the country goes through? etc. etc. I realize that there is no possible way to predict this given the possibility of wars, famine, epidemics that can drastically alter population dynamics and cannot be accounted for by any model. However, under the supposition that whatever the mechanisms of populations dynamics that any model implies that have been at work in the past will continue to be in effect in the future, can the data learning tools provide any way to validate such models? Or is the situation hopeless given the small N and large VC dimensions?
Reply With Quote