Hiding Initial Hypothesis from the Data
Suppose I have a set of hypothesis, but I also want to look at the data to refine my choosing of the hypotheses to test.
So I randomly select 1/3 of the data points and see how my initial hypotheses work, and refine them, pick a new set H.
If I test the new H on the next 2/3 of the data, can I disregard the first 1/3 of the data and the first hypotheses that I tested, and therefore get a smaller H when using Hoeffding's bound? Or do I still have to consider all of the hypotheses tested so far?
|