My long-term background is from research into functional analysis, and I find the Hilbert Space formulation of quantum mechanics a satisfying one, as well as partial glimpses of the harder theories that lead from relativistic quantum mechanics to things like symmetry breaking. Quantum computing fits a lot more naturally in this more abstract formulation than ones which can lead to conflicts with intuition.

I too have been mulling over the analogy between probabilistic inference in machine learning and the uncertainty of quantum mechanics and think an interesting example is to be found in an example used in the lectures.

Suppose you are presented with 2 data points as samples of a function from

. Having done this course we know that fitting a straight line through the points would not be a great idea, as it is likely to be overfitting. Given no other information, we have two plausible possible choices as to what to do, and unless we want to be convicted of data snooping, we had better have decided on which one of them to use before looking at the data points.

Machine A uses the hypothesis set of constant functions and fits the two points with their average.

Machine B uses the hypothesis set of lines through the origin and fits the two points using least squares regression on a single parameter, the slope.

Having used either of these machines we have a model and we can argue that it is the best model of those in our hypothesis set, but it's not possible to combine these two pieces of knowledge to arrive at something better.

This is a (perhaps poor) analogy of the concept of incompatible observations in quantum mechanics, where we can make observations of different types and make inferences from them, but not simultaneously. It's sort of like our window on the object is very small (two points of a function | one spin axis for angular momentum), and we have a choice of what we can look at through it (the mean or the slope | spin about just one axis).

It is a weakness of the analogy that an observation in quantum mechanics destroys information (strictly speaking, it moves from the subsystem being observed to entanglement between the measurement device and the subsystem, in a closely related way to "spooky" relationships central to quantum computers) but applying a machine learning algorithm to choose a hypothesis doesn't seem to destroy anything. Perhaps a closer analogy to the "information destruction" of an observation is the pollution of objectivity by data snooping. The observer may be obliged to be part of the experiment.

[EDIT: the crudeness of my analogy is made clear by the fact that the full linear hypothesis set

*may *be the appropriate one

*even if *we are only given two data points. This is most obvious if the target hypothesis is some unknown linear function and our observations are noiseless, but also true in other cases that are approximations to this. But I still think that the idea of independent, mutually incompatible inferences is one that could be made precise with a little careful construction. An interesting challenge would be to construct an example where two incompatible hypothesis sets achieve identical out of sample performance

- in the lectures, OOS errors varied between different hypothesis sets, so there was a single best one. ].