Further studies have led me to what looks to me like a nice analogy between quantum computing and machine learning. This arises from a field of machine learning which we didn't look at much in the course: Baysian learning.

With the assumption of the prior distribution , this provides us, in principle, with the following.

We have a parametrised hypothesis set, each providing outputs, or a probability distribution of outputs:

with some prior distribution

for these hypotheses (the part Yaser described as "robbing a bank" in his lecture).

We are now given a set of inputs

. For any individual hypothesis we can work out the probability distribution of possible outputs from a particular

(in some cases this will be either 1 or 0, in others it will be a general probability). Then we apply Bayes rule to the prior and get a probability distribution of the hypotheses that could give that output. Turning the handle gives us a probability distribution for the outputs.

The relevance to this discussion is that we are effectively applying every possible hypothesis in parallel. The

s describes the "state" of the hypothesis, but we never "collapse" it into a single state.

Using this approach you can do things like use 10-degree polynomials as a hypothesis set, get 3 data points, and return a probability distribution of the value of the function at any other value. Or you could do the same while incorporating some knowledge of the uncertainty in the three data points.

I'll make the observation that although there is the huge philosophical problem of the necessity to come up with a prior distribution, this seems to parallel the practical usefulness of regularisation in the machine learning we used. We did it, but what were we doing? In what way is the magnitude of a coefficient anything to do with the error function? As well as that, we implicitly use maximum likelihood estimation which is a philosophical leap as well.

To me the contrast:

- use maximum likelihood and usually use regularisation to prefer simpler models
- use Bayes and usually make complex models less likely using a prior

is fascinating. In an applied field, it must come down to choosing a tool that is practical and useful: the former approach has had more success in practice so far, but things may change.