LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 4 (http://book.caltech.edu/bookforum/forumdisplay.php?f=133)

 samirbajaj 08-04-2012 11:43 AM

Conflicting Lessons?

Dear Professor:

(I have asked stupid questions before, but I have no shame, so in that spirit, here's another one :) )

1) Lecture 8 - slide 16 - "Lesson Learned" - Match model complexity to data resources

2) Lecture 9 - slide 8 - "Lesson Learned" - Looking at data before choosing your model can be hazardous

-Samir

 Mayson Lancaster 08-06-2012 09:31 PM

Re: Conflicting Lessons?

Actually, it seems that the two do not quite conflict.

The first lesson: match model complexity to data resources, means that you vary the complexity of your model based on how much data you have [perhaps employing a heuristic such as N = 10*dvc or dvc = N/10]

The second says that you should not look at the data itself, not merely the amount of data, but the fine structure, distribution, features, etc.

(I'd be glad to get feedback from course staff as to how well my attempted explanation gets at the meaning of the two sayings.)

 yaser 08-06-2012 10:43 PM

Re: Conflicting Lessons?

Quote:
 Originally Posted by Mayson Lancaster (Post 3849) Actually, it seems that the two do not quite conflict. The first lesson: match model complexity to data resources, means that you vary the complexity of your model based on how much data you have [perhaps employing a heuristic such as N = 10*dvc or dvc = N/10] The second says that you should not look at the data itself, not merely the amount of data, but the fine structure, distribution, features, etc. (I'd be glad to get feedback from course staff as to how well my attempted explanation gets at the meaning of the two sayings.)
The feedback from the course staff is positive. :)

 gah44 08-07-2012 12:02 PM

Re: Conflicting Lessons?

I presume in this case "look" means a computer looked at the data.

Probably worse if a person looks at the data. People are very good at finding patterns, even if there isn't one, and, even worse, remembering patterns from previous (unrelated) data sets and applying them.

 All times are GMT -7. The time now is 10:32 PM.