Quote:
Originally Posted by rk1000
Sure, I just tried the training that you suggested. With repeated training over the same 2 examples that have different target outputs, it learns correctly, and eventually predicts them both with no training error.
|
OK. Now, this narrows it down to inability to learn properly
on the full data set. Two possibilities
1. Computational: Not enough epochs, or a bad local minimum.
2. Inherent: The target function is almost impossible to capture given the size of the network.
Let's deal with 1 first. Try a very long run with say 100 times the epochs and see if the result is better. Also try 100 different runs (initial random weights seeded differently) with the smaller number of epochs and see how the best result is.