View Single Post
Old 05-20-2013, 08:55 AM
jlaurentum jlaurentum is offline
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Re: Machine Learning and census models

Thank you Professor Lin, Professor Yasser and Elroch.

Upon reading Elroch's response where he talks about stationarity, it ocurrs to me that this situation (population growth models) are perhaps non-stationary by nature: the way population grows is not independent of time. Plus, I don't think that in this context, the issue is to predict a future population level based on T past observations because the future level would depend on the time variable . Of course, I'm saying all this intuitively and presently, I'm not able to pin it down mathematically.

This "intuition" is based on some ideas I've picked up about demographic transition theory. To summarize: a profound economical, technological or ecological development upsets the social structure of a population and its reproductive behavior, having the effect of raising (or lowering) the carrying capacity parameter (K). The population level then transitions into this point of stability (K) in a smooth fashion as described by Adolphe Quetelet and Pierre Verhulst about 200 years ago. The newer population models are just "tweaks" elaborating on this idea: modifying the point in time where the population growth curve reaches the inflection point, etc. With the advent of the industrial revolution and modernization in almost all countries, population exploded due to better sanitary and conditions and nutrition. But then, the cost of having children has grown (because children require more years of school, providing for clothes, food, and are basically inproductive for the first 40 years of their lives). Consequently, population growth eventually slows down and reaches a stable point.

My belief - and from what I gather this was implicit in Quetelet's ideas as a Positivist thinker that he was - is that once a major technological/economical/natural event sets a growth mechanism in action, population levels change according to that "law", until a new event upsets this "law". In the case of venezuela, there has been more than one inflection point in the population curve, indicating that perhaps there have been several transcedental events affecting population growth. In fact, during the first 2 decades of the 20th century, there were several famines, malaria epidemics and adverse economic conditions (low prices of coffee, which was the countries staple export product). You can observe that at this time, population growth slowed down to almost 0. But then, during the 30's, with oil explotaion, the population growth picked up speed. Currently, it seems to be slowing down and the population is aging. According to my models, growth will stop by 2030. Has there been a new transcedental event to change these dynamics (like for example the chavista "socialist" revolution that has been going on for 15 years now)? Impossible to say.

From what you all have posted, I would think that if anything can be done at all by way of validation, it would have to be as Professor Yasser wrote, sliding a window of T observations, training on the first T-1 observations and validating on the last one. This I gather from Elroch's comment that cross-validation is always better when interpolating. How big would this window have to be? Big enough to barely have as many samples as parameters for a model? Big enough to cover the inflection points in the first part of the 20th century? If so, wouldn't that be like data-snooping?

I can post the census figures and some R code for estimating the models if you guys are interested.
Reply With Quote