LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 8

Thread Tools Display Modes
Old 05-27-2013, 06:26 PM
Michael Reach Michael Reach is offline
Senior Member
Join Date: Apr 2013
Location: Baltimore, Maryland, USA
Posts: 71
Default Kernel methods for SVM and quantum computing

I'm posting this even though I don't have an intelligent question to ask. Only this: I recently took a MOOC on quantum computing (Vazirani at Berkeley from coursera) and then this course, and I'm a little struck by the similarity between the two subjects. If I learned anything in that course, it was that in quantum computing you have an infinite number of parallel processors available for your calculations - but unfortunately no way to get all their results. What you can do is some kind of compression of all those calculations into a single (set of) numbers, like the Fourier transform of all those wave functions sampled at a particular frequency, or other somewhat similar stuff. Then, if you're very lucky, you find that that sampling value will answer some important question. They've managed to find compressions that work to factor large numbers, search N boxes in log N steps, and a number of other interesting calculations that would take huge computing power any other way.

_Anyhow_, I was struck by the professor's explanation of kernel methods, which really sounded exactly the same. Infinite dimensional vector space out there, we're searching it, but we don't need to go there, just use a simple calculation of the kernel/dot product which gives us the essential information we need from that space...

Here I ought to ask a question, but I don't know what it should be. Maybe, can SVMs be a method of gathering information back from the QC multi-universes?
Reply With Quote
Old 05-29-2013, 10:49 AM
Elroch Elroch is offline
Invited Guest
Join Date: Mar 2013
Posts: 143
Default Re: Kernel methods for SVM and quantum computing

My long-term background is from research into functional analysis, and I find the Hilbert Space formulation of quantum mechanics a satisfying one, as well as partial glimpses of the harder theories that lead from relativistic quantum mechanics to things like symmetry breaking. Quantum computing fits a lot more naturally in this more abstract formulation than ones which can lead to conflicts with intuition.

I too have been mulling over the analogy between probabilistic inference in machine learning and the uncertainty of quantum mechanics and think an interesting example is to be found in an example used in the lectures.

Suppose you are presented with 2 data points as samples of a function from f: \mathbb R \rightarrow \mathbb R. Having done this course we know that fitting a straight line through the points would not be a great idea, as it is likely to be overfitting. Given no other information, we have two plausible possible choices as to what to do, and unless we want to be convicted of data snooping, we had better have decided on which one of them to use before looking at the data points.

Machine A uses the hypothesis set of constant functions and fits the two points with their average.

Machine B uses the hypothesis set of lines through the origin and fits the two points using least squares regression on a single parameter, the slope.

Having used either of these machines we have a model and we can argue that it is the best model of those in our hypothesis set, but it's not possible to combine these two pieces of knowledge to arrive at something better.

This is a (perhaps poor) analogy of the concept of incompatible observations in quantum mechanics, where we can make observations of different types and make inferences from them, but not simultaneously. It's sort of like our window on the object is very small (two points of a function | one spin axis for angular momentum), and we have a choice of what we can look at through it (the mean or the slope | spin about just one axis).

It is a weakness of the analogy that an observation in quantum mechanics destroys information (strictly speaking, it moves from the subsystem being observed to entanglement between the measurement device and the subsystem, in a closely related way to "spooky" relationships central to quantum computers) but applying a machine learning algorithm to choose a hypothesis doesn't seem to destroy anything. Perhaps a closer analogy to the "information destruction" of an observation is the pollution of objectivity by data snooping. The observer may be obliged to be part of the experiment.

[EDIT: the crudeness of my analogy is made clear by the fact that the full linear hypothesis set may be the appropriate one even if we are only given two data points. This is most obvious if the target hypothesis is some unknown linear function and our observations are noiseless, but also true in other cases that are approximations to this. But I still think that the idea of independent, mutually incompatible inferences is one that could be made precise with a little careful construction. An interesting challenge would be to construct an example where two incompatible hypothesis sets achieve identical out of sample performance - in the lectures, OOS errors varied between different hypothesis sets, so there was a single best one. ].
Reply With Quote
Old 06-03-2013, 07:06 AM
Elroch Elroch is offline
Invited Guest
Join Date: Mar 2013
Posts: 143
Default Re: Kernel methods for SVM and quantum computing

Further studies have led me to what looks to me like a nice analogy between quantum computing and machine learning. This arises from a field of machine learning which we didn't look at much in the course: Baysian learning.

With the assumption of the prior distribution , this provides us, in principle, with the following.

We have a parametrised hypothesis set, each providing outputs, or a probability distribution of outputs:

\{H(\alpha_i)\}_{{\alpha_i}\in A}

with some prior distribution \Psi(A) for these hypotheses (the part Yaser described as "robbing a bank" in his lecture).

We are now given a set of inputs x. For any individual hypothesis we can work out the probability distribution of possible outputs from a particular x (in some cases this will be either 1 or 0, in others it will be a general probability). Then we apply Bayes rule to the prior and get a probability distribution of the hypotheses that could give that output. Turning the handle gives us a probability distribution for the outputs.

The relevance to this discussion is that we are effectively applying every possible hypothesis in parallel. The \alpha_is describes the "state" of the hypothesis, but we never "collapse" it into a single state.

Using this approach you can do things like use 10-degree polynomials as a hypothesis set, get 3 data points, and return a probability distribution of the value of the function at any other value. Or you could do the same while incorporating some knowledge of the uncertainty in the three data points.

I'll make the observation that although there is the huge philosophical problem of the necessity to come up with a prior distribution, this seems to parallel the practical usefulness of regularisation in the machine learning we used. We did it, but what were we doing? In what way is the magnitude of a coefficient anything to do with the error function? As well as that, we implicitly use maximum likelihood estimation which is a philosophical leap as well.

To me the contrast:
  1. use maximum likelihood and usually use regularisation to prefer simpler models
  2. use Bayes and usually make complex models less likely using a prior
is fascinating. In an applied field, it must come down to choosing a tool that is practical and useful: the former approach has had more success in practice so far, but things may change.
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -7. The time now is 12:55 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.