Associative Memory and Energy Models Intro

Probably the most famous one-liner in neuroscience is: "neurons that fire together wire together". This is generally taken to refer to Hebb's rule (1949) for associative memory (ie. the type of memory that allows us to recognize objects regardless of minor modifications and noise). This remained purely theoretical for decades until John Hopfield (1982) popularized it through a model of fully connected binary (active/inactive) neurons now called Hopfield networks. While there are many ways of learning synaptic weights for a hopfield network, Hebb's rule remains the simplest to interpret and is shown below: $$w_{ij} = \frac{1}{n}\sum_{\mu = 1}^{n}\epsilon_{i}^{\mu}\epsilon_{j}^{\mu}$$ where $w_{ij}$ is the synaptic weight between neurons i and j, $n$ is the number of patterns to learn, and $\epsilon_{i}^{\mu}$ corresponds to the activity of neuron i in pattern $\mu$. If we then provide the network with a new pattern of activation, the individual neuron activations will evolve to match a previously learned pattern. A beautiful analogy here can be drawn between this model and the Ising model in thermodynamics. In both, we can express the energy of the system at any time with a function that depends on the interaction between the system's parts. Both models' energy equation takes the form: $$E(\epsilon) = - \sum_{i,j}w_{ij}\epsilon_i\epsilon_j + \sum_{i}\epsilon_i\theta_i$$ where $E(\epsilon)$ is the energy of state $\epsilon$, $\epsilon$ is a vector of neuron activations (for Hopfield) or of atomic spins (for Ising). $w_{ij}$ is the interaction mediated by synapses (for Hopfield) or magnetic interactions (for Ising). Lastly, $\theta_i$ corresponds to the neural activation threshold (for Hopfield), and to an external magnetic field (for Ising). In both these models, the system will evolve over time to reduce the energy by moving away from states where "high energy" interactions require change.

Did I mention that analogies are a form of associative memory?

A demo is provided below. To use it, enter three patterns at the top (eg: X, T, and O touching the border), and press "Train Network". You can then enter a copy of one of these patterns at the bottom, add a bit of noise, and press "Start Running".

When does this model fail?

If you played around with the demo, you may have noticed that training on similar (ie. correlated) patterns will lead to errors. In fact, almost any visual motif such as letters will contain a great deal of correlation across patterns. However, if patterns are not correlated (something we might expect from neurons in higher level cortices due to representational changes and compression), the number of patterns that can be learned by a network scales linearly with the number of neurons in that network (we can learn approximately $0.15*n$ patterns, where n is the number of neurons), which seems like a reasonable number for practical purposes.

Interestingly, the network activity seems to be attracted to the learned patterns as well as to the negative of the learned patterns (try flipping black and white above). To me, this suggests something elegant about our cognitive processes. If neurons above represent ideas, then learning through the Hebbian rule leads to a kind of duality (yin and yang) in the associative conceptual world. We cannot have the concept of good without the concept of bad, or big without small, or love without ... you get what I mean. Please take this paragraph with a grain of salt, this is not a common view I have seen (nor ever seen), and is only applicable to this form of associative memory, which is not the only thing our brain are capable of.

Another possible problem with this model is that it is deterministic (ie. the evolution of the neural activations does not contain random fluctuations). This leads to possibility of the network activations getting stuck in a local minimum. Here, we must be familiar with the idea of an energy landscape, which can be thought of as an altitude map used in hiking. The idea is that if we always go downhill, we may never reach the bottom of the mountain/landscape if we get stuck in a bowl/local minimum. To increase the chance of reaching the lowest point in the landscape, we may need to walk uphill, and to do that we need energy. In the microscopic/thermodynamic world we are discussing, energy takes the form of heat or random movement. We would therefore like to add randomness to the process of associating a new pattern with the learned ones. Enter the Boltzmann Machine, a well named generalization of Hopfield networks.

Boltzmann Machine, energy, and RBMs

The energy function of Boltzmann machines is identical to that of that of the hopfield network. The only difference is in the evolution of neural activations. Neurons for which we know the activity are held fixed while the others evolve according to the equation: $$p(\mbox{unit i is active}) = \frac{1}{1+e^{-G/T}}$$ where G is the difference in energy between the inactive and active state of neuron i, and T is a parameter defining the temperature of the system. Notice that when $T \rightarrow 0$, only the sign of G counts, and the neuron is certain to reduce it's energy level. This corresponds to the Hopfield case. The Boltzmann machine therefore provides a distribution over matches to the templates learned.

Small tangent: annealing is a process in metallurgy whereby you improving the physical properties of a metal through a sequence of heating and slowly cooling the metal so that it recrystallizes. Essentially, the crystallized structure corresponds to a global minimum in the energy function of the metal. The same process, called simulated analogy-ing (actually called simulated annealing), can be used with Boltzmann machines to find the global minimum of their energy function.

While an implementation of the boltzmann machine is easy on a small scale such as in the demo above, training this system and performing simulated analogy-ing can become very expensive with higher dimensional data. One way around this issue is to use a Restricted Boltzmann Machine (RBM). RBMs essentially divide the neurons into separate sets of visible and hidden neurons, and "restrict" connections by allowing connections between sets but not within sets of neurons, allowing for quicker training.

Considerations

The RBM can be further extended into the concept of a Deep Belief Network, which are similar to RBMs with multiple layers of hidden neurons. This class of neural networks has many training methods, depending on the purpose, which I will not be covering here. However, it provides a direct link to other interesting concepts. For example, much of machine learning now depends on universal function approximation through deep learning, representational learning, and the latent variable learning, topics which I hope to cover in a future post.

At this point, many experts would agree that we are far out domain of neuroscience and "learning deeply" about ML. However, we must not become too engrossed by our fascination with specific methods and lose the big picture. A strong argument can be made that many of the current views on computational tools in machine learning and neuroscience are lacking a key ingredient. Is it possible to write and read generic, non-associational variables in neural network connections (in a biologically plausible way)? If yes, how much can this advance AI? If no, how do our brains learn?

Contact Info

I'm open to discussing anything, including research, projects, or work opportunities!

Just send me an email:

francis44carter@gmail.com