Quote:
Originally Posted by nyc_ds
On page 152 (4.16b) E_cv is minimized at #features = 5 and 7. You say "the cross validation error is minimized between 5--7 feature dimensions; we take 6 feature dimensions..."
Why 6 and not 5, especially given the discussion about Occam's Razor that follows? 5 features would have same E_in, lower E_cv (and lower E_out but which you could not know in reality) and would be the simpler model. I know that it would make little practical difference but as a general principle shouldn't Occam's razor be favored?
|
Good point

. I guess we were trying to show how good the CV decision by itself will be, so we took a neutral view instead of resorting to additional tie-breakers like Occam's Razor.