LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 3 - The Linear Model (http://book.caltech.edu/bookforum/forumdisplay.php?f=110)
-   -   Coping with errors in training set (http://book.caltech.edu/bookforum/showthread.php?t=336)

student322 04-13-2012 10:42 PM

Coping with errors in training set
 
In Lecture 3, it was mentioned that when a human classifies hand-written digits, the error rate is around 2%. It seems then that training sets will also have errors (perhaps even introduced deliberately by mischevious individuals), and that the learning algorithm will thus be able to do no better than the error rate in the training set. Is this true? Are there any methods to efficiently detect and/or correct errors in the training set, aside from reviewing the whole set numerous times manually?

yaser 04-13-2012 11:34 PM

Re: Coping with errors in training set
 
Quote:

Originally Posted by student322 (Post 1273)
Are there any methods to efficiently detect and/or correct errors in the training set, aside from reviewing the whole set numerous times manually?

Data pruning is one such technique (for eliminating bad training examples). There were two PhD theses at Caltech that studied the subject.

htlin 04-16-2012 02:07 PM

Re: Coping with errors in training set
 
Quote:

Originally Posted by student322 (Post 1273)
In Lecture 3, it was mentioned that when a human classifies hand-written digits, the error rate is around 2%. It seems then that training sets will also have errors (perhaps even introduced deliberately by mischevious individuals), and that the learning algorithm will thus be able to do no better than the error rate in the training set. Is this true? Are there any methods to efficiently detect and/or correct errors in the training set, aside from reviewing the whole set numerous times manually?

In addition to the PhD theses, the following paper is on the issue and may be of interest to you.

Ling Li, Amrit Pratap, Hsuan-Tien Lin and Yaser S. Abu-Mostafa. Improving Generalization by Data Categorization. In A. Jorge et al., eds., Knowledge Discovery in Databases: PKDD '05, vol. 3721 of Lecture Notes in Artificial Intelligence, 157-168, Springer-Verlag, 2005.

http://www.csie.ntu.edu.tw/~htlin/pa...ingerprint.pdf

The methods "automatically" review the whole set. :cool:


All times are GMT -7. The time now is 12:23 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.