LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 3 - The Linear Model

Reply
 
Thread Tools Display Modes
  #1  
Old 04-13-2012, 11:42 PM
student322 student322 is offline
Junior Member
 
Join Date: Apr 2012
Posts: 2
Default Coping with errors in training set

In Lecture 3, it was mentioned that when a human classifies hand-written digits, the error rate is around 2%. It seems then that training sets will also have errors (perhaps even introduced deliberately by mischevious individuals), and that the learning algorithm will thus be able to do no better than the error rate in the training set. Is this true? Are there any methods to efficiently detect and/or correct errors in the training set, aside from reviewing the whole set numerous times manually?
Reply With Quote
  #2  
Old 04-14-2012, 12:34 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Coping with errors in training set

Quote:
Originally Posted by student322 View Post
Are there any methods to efficiently detect and/or correct errors in the training set, aside from reviewing the whole set numerous times manually?
Data pruning is one such technique (for eliminating bad training examples). There were two PhD theses at Caltech that studied the subject.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 04-16-2012, 03:07 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: Coping with errors in training set

Quote:
Originally Posted by student322 View Post
In Lecture 3, it was mentioned that when a human classifies hand-written digits, the error rate is around 2%. It seems then that training sets will also have errors (perhaps even introduced deliberately by mischevious individuals), and that the learning algorithm will thus be able to do no better than the error rate in the training set. Is this true? Are there any methods to efficiently detect and/or correct errors in the training set, aside from reviewing the whole set numerous times manually?
In addition to the PhD theses, the following paper is on the issue and may be of interest to you.

Ling Li, Amrit Pratap, Hsuan-Tien Lin and Yaser S. Abu-Mostafa. Improving Generalization by Data Categorization. In A. Jorge et al., eds., Knowledge Discovery in Databases: PKDD '05, vol. 3721 of Lecture Notes in Artificial Intelligence, 157-168, Springer-Verlag, 2005.

http://www.csie.ntu.edu.tw/~htlin/pa...ingerprint.pdf

The methods "automatically" review the whole set.
__________________
When one teaches, two learn.
Reply With Quote
Reply

Tags
errors, supervised learning, training set

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:02 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.