LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   General Discussion of Machine Learning (http://book.caltech.edu/bookforum/forumdisplay.php?f=105)
-   -   General question on sampling bias (http://book.caltech.edu/bookforum/showthread.php?t=4082)

palmipede 03-07-2013 10:42 PM

General question on sampling bias
 
Greetings!

Does one speak of sampling bias when the training data points come from the same distribution, e.g. normal, uniform, but are statistically dependent?

Say for example that a questionnaire is passed around via friendship links on Facebook. There is a chance that everyone might see the questionnaire over a long enough period of time but that period might be longer than the time allotted to the machine learning project.

htlin 03-10-2013 09:30 AM

Re: General question on sampling bias
 
Quote:

Originally Posted by palmipede (Post 9816)
Greetings!

Does one speak of sampling bias when the training data points come from the same distribution, e.g. normal, uniform, but are statistically dependent?

Say for example that a questionnaire is passed around via friendship links on Facebook. There is a chance that everyone might see the questionnaire over a long enough period of time but that period might be longer than the time allotted to the machine learning project.

While the situation you describe can cause bias that affects the generalization ability, I've never seen this kind of bias called "sampling bias", which was commonly reserved for non-matching distributions between training and testing.

There is an ongoing research topic called "Learning from non-IID data" which partially aims at making learning possible for the situation you describe. For instance,

http://www-connex.lip6.fr/~amini/ecml-wk-lniid.html

Hope this helps.


All times are GMT -7. The time now is 11:40 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.