LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 3 - The Linear Model (http://book.caltech.edu/bookforum/forumdisplay.php?f=110)
-   -   Gradient Descent on complex parameters (weights) (http://book.caltech.edu/bookforum/showthread.php?t=4320)

Kais_M 05-29-2013 12:03 PM

Gradient Descent on complex parameters (weights)
 
is it possible to use G.D. to optimize a complex parameter vector? still linear model with mean square error measure, but the parameters are complex not real. I did some derivation, but not sure my derivatives wrt complex numbers are correct.. would like to hear from people here how they dealt (would deal) with this problem.

many thanks,

yaser 05-29-2013 12:53 PM

Re: Gradient Descent on complex parameters (weights)
 
Quote:

Originally Posted by Kais_M (Post 10980)
is it possible to use G.D. to optimize a complex parameter vector? still linear model with mean square error measure, but the parameters are complex not real. I did some derivation, but not sure my derivatives wrt complex numbers are correct.. would like to hear from people here how they dealt (would deal) with this problem.

many thanks,

If the error function itself is real-valued, then this can be done by considering every complex parameter as two parameters (real and imaginary parts) and carrying out GD with respect to these (twice as many) real parameters. If the error function is complex, then there needs to be a definition of what the objective is since a "minimum complex number" is not a well-defined notion.

Elroch 05-29-2013 03:26 PM

Re: Gradient Descent on complex parameters (weights)
 
The purpose of a quality function for comparing options, such as an error function, requires at the very least a total order on the range. It would be reasonable for this to be topological, rather than explicitly metric: this would do no harm to the ideas of minima, or comparisons between errors. [One indication of this was in the lectures, where an error function was replaced by its logarithm, in the knowledge that this would preserve the total order].

But as Yaser indicates, complex numbers lack such an order (at least a natural one), so can't be the range of an error function.

Kais_M 05-30-2013 07:35 AM

Re: Gradient Descent on complex parameters (weights)
 
thank you for the quick reply. I am using a real error measure, sum of squared errors, but it is a function of complex parameters. When deriving the equations for the error and the update rule for gradient descent you will hit a point -unless I'm making the same mistake every time- where you have to compute the derivative wrt a complex parameter. I do not have any intuition into that... seems that Dr Yaser is saying that you have to look at the complex parameter as a 2D vector of real numbers and compute derivative wrt that vector.. this why the # of parameters doubles. is this an "engineering" solution?? or is it really mathematically correct.. there seems to be much more to this than meets the eye..

Elroch 05-30-2013 08:25 AM

Re: Gradient Descent on complex parameters (weights)
 
Quote:

Originally Posted by Kais_M (Post 10988)
thank you for the quick reply. I am using a real error measure, sum of squared errors, but it is a function of complex parameters. When deriving the equations for the error and the update rule for gradient descent you will hit a point -unless I'm making the same mistake every time- where you have to compute the derivative wrt a complex parameter. I do not have any intuition into that... seems that Dr Yaser is saying that you have to look at the complex parameter as a 2D vector of real numbers and compute derivative wrt that vector.. this why the # of parameters doubles. is this an "engineering" solution?? or is it really mathematically correct.. there seems to be much more to this than meets the eye..

Don't worry, it's just as simple as it appears. For this purpose, a complex parameter is simply two real parameters, since there is no multiplication by complex numbers involved.

z = x+iy \implies dz = dx + i dy
{\partial \over {\partial z}} = ({{\partial \over {\partial x}},{\partial \over {\partial y}}})

Kais_M 05-30-2013 08:39 AM

Re: Gradient Descent on complex parameters (weights)
 
Quote:

Originally Posted by Elroch (Post 10989)
z = x+iy \implies dz = dx + i dy
{\partial \over {\partial z}} = ({{\partial \over {\partial x}},{\partial \over {\partial y}}})

actually there is multiplication of complex numbers; one complex number is a parameter we are trying to optimize, the other is the data. The data is represented in the Fourier domain, that's why it's complex. When taking the derivative wrt the complex parameter and propagating it inside the formula for sum of squared errors you eventually have to take the derivative of the complex parameter multiplied by the complex data wrt the complex parameter... e.g. the complex parameters could be values in a transfer function, complex data is Fourier transform of real signal.

yaser 05-30-2013 11:02 AM

Re: Gradient Descent on complex parameters (weights)
 
Quote:

Originally Posted by Kais_M (Post 10988)
look at the complex parameter as a 2D vector of real numbers and compute derivative wrt that vector.. this why the # of parameters doubles. is this an "engineering" solution?? or is it really mathematically correct.

Say you apply the same principle of GD; that you are moving in the parameter space by a fixed-length step (in the direction that gives you the biggest change in your objective function, under linear approximation). If you take the size of a complex step to be the Euclidean size (magnitude of a complex number measured as the square root of the sum of its squared real and imaginary parts), then the approach quoted above would be the principled implementation of GD.

Elroch 05-30-2013 11:34 AM

Re: Gradient Descent on complex parameters (weights)
 
Quote:

Originally Posted by Kais_M (Post 10990)
actually there is multiplication of complex numbers; one complex number is a parameter we are trying to optimize, the other is the data. The data is represented in the Fourier domain, that's why it's complex. When taking the derivative wrt the complex parameter and propagating it inside the formula for sum of squared errors you eventually have to take the derivative of the complex parameter multiplied by the complex data wrt the complex parameter... e.g. the complex parameters could be values in a transfer function, complex data is Fourier transform of real signal.

Sorry, I think my last post just confused the issue. :o

If you have f: (x, y) \rightarrow \mathbb R, you know everything about the function regardless of whether you think of (x, y) as a complex number or not.

Specifically, you know the relative value at one point to another and the minimum. So you can choose to forget it ever was a complex function, think of it as a real function and do the optimisation you want. This is enough, right?


All times are GMT -7. The time now is 09:32 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.