LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Thread Tools Display Modes
Prev Previous Post   Next Post Next
Old 03-16-2013, 06:11 PM
rk1000 rk1000 is offline
Join Date: Jul 2012
Posts: 10
Default Neural Network predictions converging to one value

P.S.: Cross-posting my post with minor edits from one of the LFD course sub-forums. But this probably belongs here. And all are welcome to comment.

Professor Abu-Mostafa:

Sorry about the length of this post, but I would appreciate your advice on a problem I am facing with a neural network that I am trying to implement for regression.

The problem is that I am finding the predicted values eventually turn out to be the same for all inputs. It seemed to me, after doing some reading, that this is possibly a consequence of the saturation of the hidden layer. This is a network with one hidden layer and one linear output neuron. The hidden layer is non-linear, and I have tried various sigmoid functions here. I have tried tanh and logistic functions, and then after reading some papers on how these can result in saturation, I also tried rectified linear units i.e. max(0, x). However, after some amount of time that varies with the parameters, the output values are again all equal even with the rectified linear units. I am using gradient descent. I have tried mini-batches with several thousand examples, iterating over them to decrease the cost in each batch. I have also tried learning from just one example at a time. I have tried randomly permuting the inputs. I started with the cross-entropy error and am now working with mean square error. I have checked the gradient calculation by numerically checking some values by perturbing the weights slightly. I have also tried it with and without regularization. With tanh, the learning seemed to be quick even with one example at a time, but ran into this stuck behavior very early. With rectified linear units, it is learning more slowly but then it seems to be on a big plateau, and it took some time to get into this saturated or saturated-like state.

I think I am training on a sufficient number of examples. Their number is about 10 times the number of weights, as per my understanding of the VC analysis in your lectures.

I noticed in earlier runs that the predicted values tended to converge to the mean of the target output values of the last mini-batch that it had been trained on. It seems to me that it somehow wants to minimize the cost by finding the mean of the target outputs, and then use this mean for prediction. And that is the local minimum that it seems to move to. However, this does not happen right away, so I don't think I have accidentally coded anything into the cost function or the back-propagation specifically asking it to do this. Does the math of back-propagation encourage this specific kind of local minimum (predicted value tending to mean of outputs from mini-batch)?

While it is certainly possible there is a bug in my code, is this kind of behavior common? If so, what measures would you recommend to address it? Specifically, if I iterate over the same examples for a much longer duration, can the neural network move out of this state?

In fact, as this is happening even with rectified linear units, is this not theoretically a phenomenon of saturation to fixed hidden layer activations, but some other behavior, related to an overall tendency of the outputs towards certain local minima? Or are they really the same thing?

Is it possible to not get into such a situation by trying out many random combinations of initial weight values?

It seems to me that this is not really a generalization problem, and that regularization may not cure this, though it may find a different minimum where it saturates, by changing the cost. Is this intuition correct?

Thanks again for a wonderful course.
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -7. The time now is 05:15 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.