![]() |
#1
|
|||
|
|||
![]()
Hi,
This is an excerpt from 'Mahout in Action' chapter 14 on building a classifier: Preliminary analysis of data is critical to successful classification. It’s sometimes fun because the analysis often turns up Easter eggs like the Moon-Phase header line in table 14.2. These surprises can also be important in building a classifier, because they can uncover problems in the data or give you a key insight that simplifies the classification problem. Visualize early and visualize often. Sean Owen, Robin Anil, Ted Dunning, Ellen Friedman (2012-01-16 18:35:04.792000-06:00). Mahout in Action (Kindle Locations 6297-6300). Manning Publications. Kindle Edition. Would this be considered 'Data Snooping'? Thanks in Advance |
#2
|
|||
|
|||
![]()
You probably want an answer from an expert, but writing as just another student, I'd say yes, it is snooping, but it is nonetheless a good idea to do it because the learning apparatus between one's ears is likely to be superior in some respects to any published learning model. The problem with snooping is not that it's reprehensible. It's just that it's wrong to pretend it hasn't taken place when estimating the out of sample error.
|
#3
|
||||
|
||||
![]() Quote:
There is a fine line between data analysis (including visualization) and risky data snooping, though. In practical data mining applications (such as the KDD Cups that National Taiwan University has won in previous years), "careful data analysis" is important for reaching the best solution. For instance, in KDD Cup last year, without analysis/snooping, we could never have known that the music-ratings generated from the Yahoo! system went through several phase changes because of the upgrading of the system. ![]() Being not only a machine learning researcher but also a data mining practitioner, my advice is (0) Never snoop the test set, because it is a point of no return. (1) Be careful when analyzing/snooping the training set, and account for the complexity for every analyzing/snooping steps. Hope this helps.
__________________
When one teaches, two learn. |
![]() |
Tags |
classifiers, data snooping |
Thread Tools | |
Display Modes | |
|
|