LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 3 - The Linear Model

Reply
 
Thread Tools Display Modes
  #1  
Old 05-29-2013, 01:03 PM
Kais_M Kais_M is offline
Junior Member
 
Join Date: Apr 2013
Posts: 3
Default Gradient Descent on complex parameters (weights)

is it possible to use G.D. to optimize a complex parameter vector? still linear model with mean square error measure, but the parameters are complex not real. I did some derivation, but not sure my derivatives wrt complex numbers are correct.. would like to hear from people here how they dealt (would deal) with this problem.

many thanks,
Reply With Quote
  #2  
Old 05-29-2013, 01:53 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Gradient Descent on complex parameters (weights)

Quote:
Originally Posted by Kais_M View Post
is it possible to use G.D. to optimize a complex parameter vector? still linear model with mean square error measure, but the parameters are complex not real. I did some derivation, but not sure my derivatives wrt complex numbers are correct.. would like to hear from people here how they dealt (would deal) with this problem.

many thanks,
If the error function itself is real-valued, then this can be done by considering every complex parameter as two parameters (real and imaginary parts) and carrying out GD with respect to these (twice as many) real parameters. If the error function is complex, then there needs to be a definition of what the objective is since a "minimum complex number" is not a well-defined notion.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 05-29-2013, 04:26 PM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Gradient Descent on complex parameters (weights)

The purpose of a quality function for comparing options, such as an error function, requires at the very least a total order on the range. It would be reasonable for this to be topological, rather than explicitly metric: this would do no harm to the ideas of minima, or comparisons between errors. [One indication of this was in the lectures, where an error function was replaced by its logarithm, in the knowledge that this would preserve the total order].

But as Yaser indicates, complex numbers lack such an order (at least a natural one), so can't be the range of an error function.
Reply With Quote
  #4  
Old 05-30-2013, 08:35 AM
Kais_M Kais_M is offline
Junior Member
 
Join Date: Apr 2013
Posts: 3
Default Re: Gradient Descent on complex parameters (weights)

thank you for the quick reply. I am using a real error measure, sum of squared errors, but it is a function of complex parameters. When deriving the equations for the error and the update rule for gradient descent you will hit a point -unless I'm making the same mistake every time- where you have to compute the derivative wrt a complex parameter. I do not have any intuition into that... seems that Dr Yaser is saying that you have to look at the complex parameter as a 2D vector of real numbers and compute derivative wrt that vector.. this why the # of parameters doubles. is this an "engineering" solution?? or is it really mathematically correct.. there seems to be much more to this than meets the eye..
Reply With Quote
  #5  
Old 05-30-2013, 09:25 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Gradient Descent on complex parameters (weights)

Quote:
Originally Posted by Kais_M View Post
thank you for the quick reply. I am using a real error measure, sum of squared errors, but it is a function of complex parameters. When deriving the equations for the error and the update rule for gradient descent you will hit a point -unless I'm making the same mistake every time- where you have to compute the derivative wrt a complex parameter. I do not have any intuition into that... seems that Dr Yaser is saying that you have to look at the complex parameter as a 2D vector of real numbers and compute derivative wrt that vector.. this why the # of parameters doubles. is this an "engineering" solution?? or is it really mathematically correct.. there seems to be much more to this than meets the eye..
Don't worry, it's just as simple as it appears. For this purpose, a complex parameter is simply two real parameters, since there is no multiplication by complex numbers involved.

z = x+iy \implies dz = dx + i dy
{\partial \over {\partial z}} = ({{\partial \over {\partial x}},{\partial \over {\partial y}}})
Reply With Quote
  #6  
Old 05-30-2013, 09:39 AM
Kais_M Kais_M is offline
Junior Member
 
Join Date: Apr 2013
Posts: 3
Default Re: Gradient Descent on complex parameters (weights)

Quote:
Originally Posted by Elroch View Post
z = x+iy \implies dz = dx + i dy
{\partial \over {\partial z}} = ({{\partial \over {\partial x}},{\partial \over {\partial y}}})
actually there is multiplication of complex numbers; one complex number is a parameter we are trying to optimize, the other is the data. The data is represented in the Fourier domain, that's why it's complex. When taking the derivative wrt the complex parameter and propagating it inside the formula for sum of squared errors you eventually have to take the derivative of the complex parameter multiplied by the complex data wrt the complex parameter... e.g. the complex parameters could be values in a transfer function, complex data is Fourier transform of real signal.
Reply With Quote
  #7  
Old 05-30-2013, 12:02 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Gradient Descent on complex parameters (weights)

Quote:
Originally Posted by Kais_M View Post
look at the complex parameter as a 2D vector of real numbers and compute derivative wrt that vector.. this why the # of parameters doubles. is this an "engineering" solution?? or is it really mathematically correct.
Say you apply the same principle of GD; that you are moving in the parameter space by a fixed-length step (in the direction that gives you the biggest change in your objective function, under linear approximation). If you take the size of a complex step to be the Euclidean size (magnitude of a complex number measured as the square root of the sum of its squared real and imaginary parts), then the approach quoted above would be the principled implementation of GD.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #8  
Old 05-30-2013, 12:34 PM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Gradient Descent on complex parameters (weights)

Quote:
Originally Posted by Kais_M View Post
actually there is multiplication of complex numbers; one complex number is a parameter we are trying to optimize, the other is the data. The data is represented in the Fourier domain, that's why it's complex. When taking the derivative wrt the complex parameter and propagating it inside the formula for sum of squared errors you eventually have to take the derivative of the complex parameter multiplied by the complex data wrt the complex parameter... e.g. the complex parameters could be values in a transfer function, complex data is Fourier transform of real signal.
Sorry, I think my last post just confused the issue.

If you have f: (x, y) \rightarrow \mathbb R, you know everything about the function regardless of whether you think of (x, y) as a complex number or not.

Specifically, you know the relative value at one point to another and the minimum. So you can choose to forget it ever was a complex function, think of it as a real function and do the optimisation you want. This is enough, right?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 02:49 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.