
#1




Neural Network predictions converging to one value
P.S.: Crossposting my post with minor edits from one of the LFD course subforums. But this probably belongs here. And all are welcome to comment.
Professor AbuMostafa: Sorry about the length of this post, but I would appreciate your advice on a problem I am facing with a neural network that I am trying to implement for regression. The problem is that I am finding the predicted values eventually turn out to be the same for all inputs. It seemed to me, after doing some reading, that this is possibly a consequence of the saturation of the hidden layer. This is a network with one hidden layer and one linear output neuron. The hidden layer is nonlinear, and I have tried various sigmoid functions here. I have tried tanh and logistic functions, and then after reading some papers on how these can result in saturation, I also tried rectified linear units i.e. max(0, x). However, after some amount of time that varies with the parameters, the output values are again all equal even with the rectified linear units. I am using gradient descent. I have tried minibatches with several thousand examples, iterating over them to decrease the cost in each batch. I have also tried learning from just one example at a time. I have tried randomly permuting the inputs. I started with the crossentropy error and am now working with mean square error. I have checked the gradient calculation by numerically checking some values by perturbing the weights slightly. I have also tried it with and without regularization. With tanh, the learning seemed to be quick even with one example at a time, but ran into this stuck behavior very early. With rectified linear units, it is learning more slowly but then it seems to be on a big plateau, and it took some time to get into this saturated or saturatedlike state. I think I am training on a sufficient number of examples. Their number is about 10 times the number of weights, as per my understanding of the VC analysis in your lectures. I noticed in earlier runs that the predicted values tended to converge to the mean of the target output values of the last minibatch that it had been trained on. It seems to me that it somehow wants to minimize the cost by finding the mean of the target outputs, and then use this mean for prediction. And that is the local minimum that it seems to move to. However, this does not happen right away, so I don't think I have accidentally coded anything into the cost function or the backpropagation specifically asking it to do this. Does the math of backpropagation encourage this specific kind of local minimum (predicted value tending to mean of outputs from minibatch)? While it is certainly possible there is a bug in my code, is this kind of behavior common? If so, what measures would you recommend to address it? Specifically, if I iterate over the same examples for a much longer duration, can the neural network move out of this state? In fact, as this is happening even with rectified linear units, is this not theoretically a phenomenon of saturation to fixed hidden layer activations, but some other behavior, related to an overall tendency of the outputs towards certain local minima? Or are they really the same thing? Is it possible to not get into such a situation by trying out many random combinations of initial weight values? It seems to me that this is not really a generalization problem, and that regularization may not cure this, though it may find a different minimum where it saturates, by changing the cost. Is this intuition correct? Thanks again for a wonderful course. 
#2




Re: Neural Network predictions converging to one value
Let me take this step by step. I assume that your training data points have different outputs and that your training gets the network to predict these outputs with reasonable approximation so that the network outputs are not the same for the training set. Is this correct?
__________________
Where everyone thinks alike, no one thinks very much 
#3




Re: Neural Network predictions converging to one value
Professor,
The training data points do have different target outputs. After training from a minibatch of the training set and adjusting the weights, when I try to predict the outputs of the same training minibatch, it produces identical (incorrect) outputs when it has undergone some training. Initially i.e. at the beginning of training, these values are very far wrong and also different but many are the same, but eventually after some training, all the predictions on the training set start to converge, and they get closer to the mean of the outputs of the latest training minibatch. I get the same output value if I then run the network on a validation set. When I continue training from the next minibatch and then test with that minibatch and then a validation set, the behavior is the same, but the predicted value changes after each training minibatch. I hope I have answered your question. Initially, I had conjectured that maybe, the network is completely biased and is just providing a fixed output from the weight of the bias term and no contribution from the other weights, and so I removed the bias term altogether. But the behavior stayed the same, and I brought the bias term back. 
#4




Re: Neural Network predictions converging to one value
Quote:
__________________
Where everyone thinks alike, no one thinks very much 
#5




Re: Neural Network predictions converging to one value
Sure, I just tried the training that you suggested. With repeated training over the same 2 examples that have different target outputs, it learns correctly, and eventually predicts them both with no training error.

#6




Re: Neural Network predictions converging to one value
Quote:
1. Computational: Not enough epochs, or a bad local minimum. 2. Inherent: The target function is almost impossible to capture given the size of the network. Let's deal with 1 first. Try a very long run with say 100 times the epochs and see if the result is better. Also try 100 different runs (initial random weights seeded differently) with the smaller number of epochs and see how the best result is.
__________________
Where everyone thinks alike, no one thinks very much 
Thread Tools  
Display Modes  

