I'd like to share a very interesting problem that I've had a lot of fun exploring, and learnt a few things about the behaviour of neural networks and efficiently training them. It can be studied with any general purpose neural network software, of which there are many.

The data set simply consists of identical inputs and outputs with all combinations of

bits. For example with

, a typical input/output pair would be

. So there are only

points in the data set. One nice thing is that this results in very fast training, which permits a lot of experimentation, with little waiting. Typical problems with thousands of data points can involve a lot more waiting, unless you have access to a cloud or a rack of GPUs!

With that data set, the problem would be a bit trivial without some constraint on design. The constraint is that one of the layers must only have a single neuron in it! Any number of other hidden layers of any size are permitted between the inputs and the single neuron and between the single neuron and the outputs.

The first issue is how to design the network. It clearly has

input neurons and

output neurons, one neuron in some hidden layer and one or more hidden layers between the inputs and the single neuron, and the single neuron and the outputs.

Without giving away too much, I will point out some of the things that I found interesting with the 4-bit version:

- One side of the network can be implemented much more simply than the other. You can find out which.
- Local minima can be a problem. Seeing very clear examples of this was very enlightening. A local minimum in this problem means that some sets of bits are getting corrupted in the network. Some choices of training algorithm get stuck permanently at different local minima depending on the initial weights
- This problem led me to explore various training algorithms, Suffice it to say, back propagation does not work well here, and the superior algorithm I have been mainly using up to now was not the best either. This was also very interesting and will affect my future use of neural networks.
- The choice of training algorithm may come close to removing the problem of local minima here. Very exciting fact, especially as these are so starkly visible in this problem.
- Convergence time varies a lot with the choice of random weights. Some training algorithms can get almost stuck at local minima, but eventually escape, rather than just getting stuck permanently.

Have fun! Seriously, I think it's well worth the time.