I just recently attained a clear understanding of it myself (at least, I hope I did

. The setting for it is a biased coin, with p the probability of Heads, and (1-p) the probability of Tails. A Bernoulli trial of length N with this coin is just N tosses of the coins. The Bernoulli outcomes (i.e., specific Heads-Tails sequences) of length N have the binomial distribution.

For each Bernoulli outcome of length N, we can ask: what is the "relative frequency" of Heads? (relative frequency = (# of Heads / N) ). So, the relative frequency of Heads can be regarded as a *random variable* on the probability space of Bernoulli trials of length N.

The Hoeffding inequality serves to answer the following question: Given an epsilon > 0, what is an upper bound on the probability that the above random variable deviates from p by at least epsilon?

For the purposes of this class, p is the probability that a hypothesis returns +1 ("Heads"

. This probability is more often denoted mu.