Re: Restricted Learner's Rule of Thumb (Lecture 11)
On a specific point of jlaurentum, I have come to the intuitive conclusion that noise effectively reduces the size of the set of data, with the more noise, the more more data points needed to achieve the same results.
This is related to the much simpler idea that if you want to estimate a mean average with noisy (i.e. nonzero variance) data, then the accuracy is inversely proportional to the square root of the number of data points. Likewise, I would conjecture that a noisy machine learning problem might be reduced to a near noiseless one by having a very large number of data points (although the quantitative details of this are less clear).
In machine learning there is the complication that this intuition only applies to genuine (stochastic) noise. As "deterministic noise" is unvarying, it is not reduced (but the variance is). On reflection, I feel the term "deterministic noise" can be a little misleading, as it is a form of error which merely mimics noise to an observer, but lacks one of its properties (randomness). As an analogy with a physical measurement, it is more similar to a calibration error than to an uncertainty in measurement.
