LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Thread Tools Display Modes
Old 03-31-2013, 12:54 PM
Elroch Elroch is offline
Invited Guest
Join Date: Mar 2013
Posts: 143
Default Re: Neural Network predictions converging to one value

I have a couple of points, based on not dissimilar experiences of my own.

First, are you concentrating on the calculated errors on your out of sample data as you train the neural network? In sample errors are not easy to draw conclusions from (unless your data set is very large compared to the complexity of the neural network). I am not sure what software you are using, but in JNNS for example, you can see a graph of OOS errors as you are training.

Secondly, as a simple test, you could try repeating the training with the input data replaced by entirely random data (but keeping the same output data) to see the comparison.
Reply With Quote
Old 04-03-2013, 07:41 PM
rk1000 rk1000 is offline
Join Date: Jul 2012
Posts: 10
Default Re: Neural Network predictions converging to one value

Once it gets into that situation, the results are the same both in-sample and out-of-sample. It sounds like an interesting idea to try the training with random data, to see if there is any issue with the input. Thanks.
Reply With Quote
Old 04-04-2013, 02:42 AM
Elroch Elroch is offline
Invited Guest
Join Date: Mar 2013
Posts: 143
Default Re: Neural Network predictions converging to one value

Originally Posted by rk1000 View Post
Thanks for your comment. If I understand your post correctly, you're saying that maybe the inputs are not far from random? Well, I'm hoping that's not the case, but it's certainly possible that my representation of the actual input has some issues, as I was trying out what, to my knowledge, may be a non-standard way to represent this input.

And just to complete the picture about the training runs ... I did run it for 100 epochs with the same initial values, and I completed 50 runs with different random initial settings. I had to stop it mid-way through the 100 random runs, as it was beginning to sort of take over my computer. :-) I ran these with no regularization, and none of them showed the “identical predictions” problem, though they do not show good learning behavior. But with regularization, I do see the problem.
The most important thing is the statistical relationship between inputs and outputs, P(y | x), not the distribution of inputs in isolation, P(x). In principle, reversible transformations of the inputs really just disguise this relationship. Of course, transformations may have an effect on the behaviour of a tool such as a neural network, which is why they are used.

This relationship P(y | x) comprises a deterministic part plus noise which arises in different ways but has the same effect on the statistics.

In my experience, if you provide inputs that are explicitly independent of the outputs (so the outputs are independent of the inputs and P(y | x) is entirely random noise), a neural network will generally converge to a constant function whose value is the average of the outputs. The reason is that this function gives the absolute minimum RMSE. If a neural network converges to anything else in this case, it must be fitting the noise. This is unlikely to happen unless there is a small number of input points compared to the complexity of the neural network.

I should make clear that my understanding of the above is empirical with a core of simple probability theory. The detailed behaviour of neural networks is very obscure, and I am glossing over issues such as local minima, merely because I haven't seen this confusing the issue and suspect it generally won't.

As for the technical details, it is useful to monitor the RMSE errors on out of sample data as a neural network is being trained, because this helps distinguish between the useful effect of training (generalisation) and the bad effect of training (overfitting). This applies whether there is a deterministic relationship between inputs and outputs, a noisy relationship, or even when they are totally independent (in this case, a network can first model the average, but then may learn the random noise in a way which increases out of sample RMSE.

Can you describe the nature of your data? Is it financial time series data, perhaps?
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -7. The time now is 02:33 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.