View Single Post
  #2  
Old 03-10-2013, 09:30 AM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 610
Default Re: General question on sampling bias

Quote:
Originally Posted by palmipede View Post
Greetings!

Does one speak of sampling bias when the training data points come from the same distribution, e.g. normal, uniform, but are statistically dependent?

Say for example that a questionnaire is passed around via friendship links on Facebook. There is a chance that everyone might see the questionnaire over a long enough period of time but that period might be longer than the time allotted to the machine learning project.
While the situation you describe can cause bias that affects the generalization ability, I've never seen this kind of bias called "sampling bias", which was commonly reserved for non-matching distributions between training and testing.

There is an ongoing research topic called "Learning from non-IID data" which partially aims at making learning possible for the situation you describe. For instance,

http://www-connex.lip6.fr/~amini/ecml-wk-lniid.html

Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote