17 Memory and Attractor Dynamics

17.2 Hopfield Model

The Hopfield model (226), consists of a network of NN neurons, labeled by a lower index ii, with 1iN1\leq i\leq N. Similar to some earlier models (335; 304; 549), neurons in the Hopfield model have only two states. A neuron ii is ‘ON’ if its state variable takes the value Si=+1S_{i}=+1 and ‘OFF’ (silent) if Si=-1S_{i}=-1. The dynamics evolves in discrete time with time steps Δt\Delta t. There is no refractoriness and the duration of a time step is typically not specified. If we take Δt=1\Delta t=1ms, we can interpret Si(t)=+1S_{i}(t)=+1 as an action potential of neuron ii at time tt. If we take Δt=500\Delta t=500ms, Si(t)=+1S_{i}(t)=+1 should rather be interpreted as an episode of high firing rate.

Neurons interact with each other with weights wijw_{ij}. The input potential of neuron ii, influenced by the activity of other neurons is

hi(t)=jwijSj(t).h_{i}(t)=\sum_{j}w_{ij}\,S_{j}(t)\,. (17.2)

The input potential at time tt influences the probabilistic update of the state variable SiS_{i} in the next time step:

Prob{Si(t+Δt)=+1|hi(t)}=g(hi(t))=g(jwijSj(t)){\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g(h_{i}(t))=g\left(\sum_{j}w_{ij}% \,S_{j}(t)\right)\, (17.3)

where gg is a monotonically increasing gain function with values between zero and one. A common choice is g(h)=0.5[1+tanh(βh)]g(h)=0.5[1+\tanh(\beta h)] with a parameter β\beta. For β\beta\to\infty, we have g(h)=1g(h)=1 for h>0h>0 and zero otherwise. The dynamics are therefore deterministic and summarized by the update rule

Si(t+Δt)=sgn[h(t)]S_{i}(t+\Delta t)=\operatorname{sgn}[h(t)] (17.4)

For finite β\beta the dynamics are stochastic. In the following we assume that in each time step all neurons are updated synchronously (parallel dynamics), but an update scheme where only one neuron is updated per time step is also possible.

The aim of this section is to show that, with a suitable choice of the coupling matrix wijw_{ij} memory items can be retrieved by the collective dynamics defined in Eq. (17.3), applied to all NN neurons of the network. In order to illustrate how collective dynamics can lead to meaningful results, we start, in Section 17.2.1, with a detour through the physics of magnetic systems. In Section 17.2.2, the insights from magnetic systems are applied to the case at hand, i.e., memory recall.

A B
Fig. 17.5: Physics of ferromagnets. A. Magnetic materials consists of atoms, each with a small magnetic moment, here visualized as an arrow, a symbol for a magnetic needle. At low temperature, all magnetic needles are aligned. Inset: Field lines around one of the magnetic needles. B. At high temperature, some of the needles are misaligned (dashed circles). Cooling the magnet leads to a spontaneous alignment and reforms a pure magnet; schematic figure.

17.2.1 Detour: Magnetic analogy

Magnetic material contains atoms which carry a so-called spin. The spin generates a magnetic moment at the microscopic level visualized graphically as an arrow (Fig. 17.5A). At high temperature, the magnetic moments of individual atoms point in all possible directions. Below a critical temperature, however, the magnetic moment of all atoms spontaneously align with each other. As a result, the microscopic effects of all atomic magnetic moments add up and the material exhibits the macroscopic properties of a ferromagnet.

In order to understand how a spontaneous alignment can arise, let us study Eqs. (17.2) and (17.3) in the analogy of magnetic materials. We assume that wij=w0>0w_{ij}=w_{0}>0 between all pairs of neurons iji\neq j. and that self-interaction vanishes, wii=0w_{ii}=0.

Each atom is characterized by a spin variable Si=±1S_{i}=\pm 1 where Si=+1S_{i}=+1 indicates that the magnetic moment of atom ii points ‘upward’. Suppose that, at time t=0t=0, all spins take a positive value (SI=+1S_{I}=+1), except that of atom ii which has a value Si(0)=-1S_{i}(0)=-1 (Fig. 17.5A). We calculate the probability that, at time step t=Δtt=\Delta t, the spin of neuron ii will switch to Si=+1S_{i}=+1. This probability is according to Eq. (17.3)

Prob{Si(t+Δt)=+1|hi(t)}=g(hi(t))=g(j=1NwijSj(t))=g(w0(N-1)){\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g(h_{i}(t))=g\left(\sum_{j=1}^{N}w% _{ij}\,S_{j}(t)\right)=g(w_{0}\,(N-1)) (17.5)

where we have used our assumptions. With g(h)=0.5[1+tanh(βh)]g(h)=0.5[1+\tanh(\beta h)] and w0=β=1w_{0}=\beta=1, we find that for any network of more than three atoms, the probability that the magnetic moments of all atoms would align is extremely high. In physical systems, β\beta plays the role of an inverse temperature. If β\beta becomes small (high temperature), the magnetic moments no longer align and the material loses its spontaneous magnetization.

According to Eq. (17.5) the probability of alignment increases with the network size. This is an artifact of our model with all-to-all interaction between all atoms. Physical interactions, however, rapidly decrease with distance, so that the sum over jj in Eq. (17.5) should be restricted to the nearest neighbors of neuron ii, e.g., about 4 to 20 atoms depending on the configuration of the atomic arrangement and the range of the interaction. Interestingly, neurons, in contrast to atoms, are capable of making long-range interactions because of their far-reaching axonal cables and dendritic trees. Therefore, the number of topological neighbors of a given neuron is in the range of thousands.

An arrangement of perfectly aligned magnetic elements looks rather boring, but physics offers more interesting examples as well. In some materials, typically consisting of two different types of atoms, say A and B, an anti-ferromagnetic ordering is possible (Fig. 17.6). While one layer of magnetic moments points upward, the next one points downward, so that the macroscopic magnetization is zero. Nevertheless, a highly ordered structure is present. Examples of anti-ferromagnets are some metallic oxides and alloys.

To model an anti-ferromagnet, we choose interactions wij=+1w_{ij}=+1 if ii and jj belong to the same class (e.g., both are in a layer of type A or both in a layer of type B), and wij=-1w_{ij}=-1 if one of the two atoms belongs to type A and the other to type B. A simple repetition of the calculation in Eq. (17.5) shows that an anti-ferromagnetic organization of the spins emerges spontaneously at low temperature.

A B
Fig. 17.6: Storing patterns. A. Physical anti-ferromagnets consist of layers of atoms A and B. All magnetic moments are aligned within a layer of identical neurons, but exhibit different orientations between layers. A model where interactions within atoms of the same type are positive (solid lines) and between atoms of different type are negative (dashed lines) can explain the spontaneous order in the arrangement of magnetic moments. The interaction scheme for two atoms with its 10 nearest neighbors is indicated. B. If we replace magnetic moments by black and white pixels (squares), represented by active and inactive neurons, respectively, the neuronal network can store a pattern, such as T. Interactions are positive (solid lines) between pixels of the same color (black-to-black or white-to-white) and negative otherwise. Only a few representative interactions are shown. Schematic figure.

The same idea of positive and negative interactions wijw_{ij} can be used to embed an arbitrary pattern into a network of neurons. Let us draw a pattern of black and white pixels corresponding to active (pi=+1p_{i}=+1) and inactive (pi=-1p_{i}=-1) neurons, respectively. The rule extracted from the anti-ferromagnet implies that pixels of opposite color are connected by negative weights, while pixels of the same color have connections with positive weight. This rule can be formalized as

wij=pipj.w_{ij}=p_{i}\,p_{j}\,. (17.6)

This rule forms the basis of the Hopfield model.

17.2.2 Patterns in the Hopfield model

The Hopfield model consists of a network of NN binary neurons. A neuron ii is characterized by its state Si=±1S_{i}=\pm 1. The state variable is updated according to the dynamics defined in Eq. (17.3).

The task of the network is to store and recall MM different patterns. Patterns are labeled by the index μ\mu with 1μM1\leq\mu\leq M. Each pattern μ\mu is defined as a desired configuration {piμ=±1;1iN}\left\{p_{i}^{\mu}=\pm 1;1\leq i\leq N\right\}. The network of NN neurons is said to correctly represent pattern μ\mu, if the state of all neurons 1iN1\leq i\leq N is Si(t)=Si(t+Δt)=piμS_{i}(t)=S_{i}(t+\Delta t)=p_{i}^{\mu}. In other words, patterns must be fixed points of the dynamics (17.3).

For us as human observers, a meaningful pattern could, for example, be a configuration in form of a ‘T’, such as depicted in Fig. 17.6B. However, visually attractive patterns have large correlations between each other. Moreover, areas in the brain related to memory recall are situated far from the retinal input stage. Since the configuration of neurons in memory-related brain areas is probably very different from those at the retina, patterns in the Hopfield model are chosen as fixed random patterns; cf. Fig. 17.7.

During the set-up phase of the Hopfield network, a random number generator generates, for each pattern μ\mu a string of NN independent binary numbers {piμ=±1;1iN}\{p_{i}^{\mu}=\pm 1;1\leq i\leq N\} with expectation value piμ=0\langle p_{i}^{\mu}\rangle=0. Strings of different patterns are independent. The weights are chosen as

wij=cμ=1Mpiμpjμw_{ij}=c\sum_{\mu=1}^{M}p_{i}^{\mu}\,p_{j}^{\mu}\, (17.7)

with a positive constant c>0c>0. The network has full connectivity. Note that for a single pattern and c=1c=1, Eq. (17.7) is identical to the connections of the anti-ferromagnet, Eq. (17.6). For reasons that become clear later on, the standard choice of the constant cc is c=1/Nc=1/N.

A B
Fig. 17.7: Hopfield model. A. Top: Three random patterns μ=1,2,3\mu=1,2,3 in a network of N=8N=8 neurons. Black squares (piμ=+1p_{i}^{\mu}=+1) and white squares (piμ=-1p_{i}^{\mu}=-1) are arranged in random order. Bottom: The overlap m1=(1/N)ipi1Si(t)m^{1}=(1/N)\sum_{i}p_{i}^{1}\,S_{i}(t) measures the similarity between the current state S(t)={Si(t);1iN}S(t)=\{S_{i}(t);1\leq i\leq N\} and the first pattern. Here only a single neuron exhibits a mismatch (dotted line). The desired value in the pattern is shown as black and white squares, while the current state is indicated as black and white circles; schematic figure. B. Orthogonal patterns have a mutual overlap of zero so that correlations are Cμν=(1/N)ipiμpiν=δμνC^{\mu\nu}=(1/N)\sum_{i}p_{i}^{\mu}\,p_{i}^{\nu}=\delta^{\mu\nu} (top) whereas random patterns exhibit a small residual overlap for μν\mu\neq\nu (bottom).

17.2.3 Pattern retrieval

In many memory retrieval experiments, a cue with partial information is given at the beginning of a recall trial. The retrieval of a memory item is verified by the completion of the missing information.

In order to mimic memory retrieval in the Hopfield model, an input is given by initializing the network in a state S(t0)={Si(t0);1iN}S(t_{0})=\{S_{i}(t_{0});1\leq i\leq N\}. After initialization, the network evolves freely under the dynamics (17.3). Ideally the dynamics should converge to a fixed point corresponding to the pattern μ\mu which is most similar to the initial state.

In order to measure the similarity between the current state S(t)={Si(t);1iN}S(t)=\{S_{i}(t);1\leq i\leq N\} and a pattern μ\mu, we introduce the overlap (Fig. 17.7B)

mμ(t)=1NipiμSi(t).m^{\mu}(t)={1\over N}\sum_{i}p_{i}^{\mu}\,S_{i}(t)\,. (17.8)

The overlap takes a maximum value of 1, if Si(t)=piμS_{i}(t)=p_{i}^{\mu}, i.e., if the pattern is retrieved. It is close to zero if the current state has no correlation with pattern μ\mu. The minimum value mμ(t)=-1m^{\mu}(t)=-1 is achieved if each neuron takes the opposite value to that desired in pattern μ\mu.

The overlap plays an important role in the analysis of the network dynamics. In fact, using Eq. (17.2) the input potential hih_{i} of a neuron ii is

hi(t)=jwijSj(t)=cj=1Nμ=1MpiμpjμSj(t)=cNμ=1Mpiμmμ(t)h_{i}(t)=\sum_{j}w_{ij}\,S_{j}(t)=c\sum_{j=1}^{N}\sum_{\mu=1}^{M}p_{i}^{\mu}\,% p_{j}^{\mu}\,S_{j}(t)=c\,N\,\sum_{\mu=1}^{M}p_{i}^{\mu}\,m^{\mu}(t) (17.9)

where we have used Eqs. (17.7) and (17.8). In order to make the results of the calculation independent of the size of the network, it is standard to choose the factor c=1/Nc=1/N, as mentioned above. In the following we always take c=1/Nc=1/N unless indicated otherwise. For an in-depth discussion, see the scaling arguments in Chapter 12.

In order to close the argument, we now use the input potential in the dynamics Eq. (17.3) and find

Prob{Si(t+Δt)=+1|hi(t)}=g[μ=1Mpiμmμ(t)].{\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g\left[\sum_{\mu=1}^{M}p_{i}^{\mu}% \,m^{\mu}(t)\right]\,. (17.10)

Eq. (17.10) highlights that the MM macroscopic similarity values mμm^{\mu} with 1μM1\leq\mu\leq M completely determine the dynamics of the network.

Example: Memory retrieval

Let us suppose that the initial state has a significant similarity with pattern μ=3\mu=3, e.g., an overlap of mμ(t0)=0.4m^{\mu}(t_{0})=0.4 and no overlap with the other patterns mν=0m^{\nu}=0 for ν3\nu\neq 3.

In the noiseless case Eq. (17.10) simplifies to

Si(t0+Δt)=sgn[μ=1Mpiμmμ]=sgn[pi3m3(t0)]=pi3foralli.S_{i}(t_{0}+\Delta t)=\operatorname{sgn}\left[\sum_{\mu=1}^{M}p_{i}^{\mu}\,m^{% \mu}\right]\,=\operatorname{sgn}\left[p_{i}^{3}\,m^{3}(t_{0})\,\right]=p_{i}^{% 3}\quad{\rm for~{}all~{}}i\,. (17.11)

Hence, each neuron takes, after a single time step, the desired state corresponding to the pattern. In other words, the pattern with the strongest similarity to the input is retrieved, as it should be.

For stochastic neurons we find

Prob{Si(t0+Δt)=+1|hi(t)}=g[pi3m3(t0)].{\rm Prob}\{S_{i}(t_{0}+\Delta t)=+1|h_{i}(t)\}=g[p_{i}^{3}\,m^{3}(t_{0})]\,. (17.12)

We note that, given the overlap m3(t0)m^{3}(t_{0}), the right-hand side of Eq. (17.12) can only take two different values, corresponding to pi3=+1p_{i}^{3}=+1 and pi3=-1p_{i}^{3}=-1. Thus, all neurons that should be active in pattern 3 share the same probabilistic update rule

Prob{Si(t0+Δt)=+1|hi(t)}=g[m3(t0)]foralliwithpi3=+1 .{\rm Prob}\{S_{i}(t_{0}+\Delta t)=+1|h_{i}(t)\}=g[m^{3}(t_{0})]\quad{\rm for~{% }all~{}}i{\rm~{}with~{}}p_{i}^{3}=+1\,. (17.13)

Similarly all those that should be inactive share another rule

Prob{Si(t0+Δt)=+1|hi(t)}=g[-m3(t0)]foralliwithpi3=-1 .{\rm Prob}\{S_{i}(t_{0}+\Delta t)=+1|h_{i}(t)\}=g[-m^{3}(t_{0})]\quad{\rm for~% {}all~{}}i{\rm~{}with~{}}p_{i}^{3}=-1\,. (17.14)

Thus, despite the fact that there are NN neurons and MM different patterns, during recall the network breaks up into two macroscopic populations: those that should be active and those that should be inactive. This is the reason why we can expect to arrive at macroscopic population equations, similar to those encountered in part III of the book.

Let us use this insight for the calculation of the overlap at time t0+Δtt_{0}+\Delta t. We denote the size of the two populations by N+3N_{+}^{3} and N-3N_{-}^{3}, respectively, and find

m3(t0+Δt)\displaystyle m^{3}(t_{0}+\Delta t) =\displaystyle= 1Nipi3Si(t0+Δt)\displaystyle{1\over N}\sum_{i}p_{i}^{3}S_{i}(t_{0}+\Delta t) (17.15)
=\displaystyle= N+3N[1N+3iwithpi3=+1Si(t0+Δt)]-N-3N[1N-3iwithpi3=+1Si(t0+Δt)].\displaystyle{N_{+}^{3}\over N}\left[{1\over N_{+}^{3}}\sum_{i{\rm~{}with~{}}p% _{i}^{3}=+1}S_{i}(t_{0}+\Delta t)\right]-{N_{-}^{3}\over N}\left[{1\over N_{-}% ^{3}}\sum_{i{\rm~{}with~{}}p_{i}^{3}=-1}S_{i}(t_{0}+\Delta t)\right]\,.

We can interpret the two terms enclosed by the square brackets as the average activity of those neurons that should, or should not, be active, respectively. In the limit of a large network (NN\to\infty) both groups are very large and of equal size N+3=N-3=N/2N_{+}^{3}=N_{-}^{3}=N/2. Therefore, the averages inside the square brackets approach their expectation values. The technical term, used in the physics literature, is that the network dynamics are ‘self-averaging’. Hence, we can evaluate the square brackets with probabilities introduced in Eqs. (17.13) and (17.14). With Prob{Si(t0+Δt)=-1|hi(t)}=1-Prob{Si(t0+Δt)=+1|hi(t)}{\rm Prob}\{S_{i}(t_{0}+\Delta t)=-1|h_{i}(t)\}=1-{\rm Prob}\{S_{i}(t_{0}+% \Delta t)=+1|h_{i}(t)\}, we find

m3(t0+Δt)=12{2g[m3(t0)]-1}-12{2g[-m3(t0)]-1}.m^{3}(t_{0}+\Delta t)={1\over 2}\,\{2\,g[m^{3}(t_{0})]-1\}-{1\over 2}\,\{2\,g[% -m^{3}(t_{0})]-1\}\,. (17.16)

In the special case that g(h)=0.5[1+tanh(βh)]g(h)=0.5[1+\tanh(\beta h)] Eq. (17.16) simplifies to an update law

m3(t+Δt)=tanh[βm3(t)]m^{3}(t+\Delta t)=\tanh[\beta\,m^{3}(t)]\, (17.17)

where we have replaced t0t_{0} by tt, in order to highlight that updates should be iterated over several time steps.

A B
Fig. 17.8: Memory retrieval in the Hopfield model. A. The overlap mν(t+Δt)m^{\nu}(t+\Delta t) with a specific pattern ν\nu is given as a function of the overlap with the same pattern mν(t)m^{\nu}(t) in the previous time step (solid line); cf. Eq. (17.16). The overlap with the M-1M-1 other patterns is supposed to vanish. The iterative update can be visualized as a path (arrow) between the overlap curve and the diagonal (dashed line). The dynamics approach a fixed point (circle) with high overlap corresponding to the retrieval of the pattern. B. The probability PerrorP_{\rm error} that during retrieval an erroneous state-flip occurs corresponds to the shaded area under the curve; cf. Eq. (17.20). The width σ\sigma of the curve is proportional to the pattern load M/NM/N; schematic figure.

We close with three remarks. First, the dynamics of NN neurons has been replaced, in a mathematically precise limit, by the iterative update of one single macroscopic variable, i.e., the overlap with one of the patterns. The result is reminiscent of the analysis of the macroscopic population dynamics performed in part III of the book. Indeed, the basic mathematical principles used for the equations of the population activity A(t)A(t), are the same as the ones used here for the update of the overlap variable mμ(t)m^{\mu}(t).

Second, if β>1\beta>1, the dynamics converges from an initially small overlap to a fixed point with a large overlap, close to one. The graphical solution of the update of pattern ν=3\nu=3 (for which a nonzero overlap existed in the initial state) is shown in Fig. 17.8. Because the network dynamics is ‘attracted’ toward a stable fixed point characterized by a large overlap with one of the memorized patterns (Fig. 17.9A), the Hopfield model and variants of it are also called ‘attractor’ networks or ’attractor memories’ (24; 40).

Finally, the assumption that, apart from pattern 3, all other patterns have an initial overlap exactly equal to zero is artificial. For random patterns, we expect a small overlap between arbitrary pairs of patterns. Thus, if the network is exactly in pattern 3 so that m3=1m^{3}=1, the other patterns have a small but finite overlap |mμ|0|m^{\mu}|\neq 0, because of spurious correlations Cμν=(1/N)ipiμpiνC^{\mu\nu}=(1/N)\sum_{i}p_{i}^{\mu}p_{i}^{\nu} between any two random patterns μ\mu and ν\nu; Fig. 17.7B. If the number of patterns is large, the spurious correlations between the patterns can generate problems during memory retrieval, as we will see now.

17.2.4 Memory capacity

How many random patterns can be stored in a network of NN neurons? Memory retrieval implies pattern completion, starting from a partial cue. An absolutely minimal condition for pattern completion is that at least the dynamics should not move away from the pattern, if the initial cue is identical to the complete pattern (215). In other words, we require that a network with initial state Si(t0)=piνS_{i}(t_{0})=p_{i}^{\nu} for 1iN1\leq i\leq N stays in pattern ν\nu. Therefore pattern ν\nu must be a fixed point under the dynamics.

We study a Hopfield network at zero temperature (β=\beta=\infty). We start the calculation as in Eq. 17.9 and insert Sj(t0)=pjνS_{j}(t_{0})=p_{j}^{\nu}. This yields

Si(t0+Δt)\displaystyle S_{i}(t_{0}+\Delta t) =\displaystyle= sgn[1Nj=1Nμ=1Mpiμpjμpjν]\displaystyle\operatorname{sgn}\left[{1\over N}\sum_{j=1}^{N}\sum_{\mu=1}^{M}p% _{i}^{\mu}\,p_{j}^{\mu}\,p_{j}^{\nu}\right] (17.18)
=\displaystyle= sgn[piν(1Nj=1Npjνpjν)+1Nμνjpiμpjμpjν],\displaystyle\operatorname{sgn}\left[p_{i}^{\nu}\,\left({1\over N}\sum_{j=1}^{% N}p_{j}^{\nu}\,p_{j}^{\nu}\right)+{1\over N}\sum_{\mu\neq\nu}\sum_{j}p_{i}^{% \mu}\,p_{j}^{\mu}\,p_{j}^{\nu}\right]\,,

where we have separated the pattern ν\nu from the other patterns. The factor in parenthesis on the right-hand side adds up to one and can therefore be dropped. We now multiply the second term on the right-hand side by a factor 1=piνpiν1=p_{i}^{\nu}\,p_{i}^{\nu}. Finally, because piν=±1p_{i}^{\nu}=\pm 1, a factor piνp_{i}^{\nu} can be pulled out of the argument of the sign-function:

Si(t0+Δt)=piνsgn[1+1Njμνpiμpiνpjμpjν]=piνsgn[1-aiν].S_{i}(t_{0}+\Delta t)=p_{i}^{\nu}\,\operatorname{sgn}[1+{1\over N}\sum_{j}\sum% _{\mu\neq\nu}p_{i}^{\mu}\,p_{i}^{\nu}\,p_{j}^{\mu}\,p_{j}^{\nu}]=p_{i}^{\nu}\,% \operatorname{sgn}[1-a_{i\nu}]\,. (17.19)

The desired fixed point exists only if 1>aiν=1Njμνpiμpiνpjμpjν1>a_{i\nu}={1\over N}\sum_{j}\sum_{\mu\neq\nu}p_{i}^{\mu}\,p_{i}^{\nu}\,p_{j}^% {\mu}\,p_{j}^{\nu} for all neurons ii. In other words, even if the network is initialized in perfect agreement with one of the patterns, it can happen that one or a few neurons flip their sign. The probability to move away from the pattern is equal to the probability of finding a value aiν>0a_{i\nu}>0 for one of the neurons ii.

Because patterns are generated from independent random numbers piμ=±1p_{i}^{\mu}=\pm 1 with zero mean, the product piμpiνpjμpjν=±1p_{i}^{\mu}\,p_{i}^{\nu}\,p_{j}^{\mu}\,p_{j}^{\nu}=\pm 1 is also a binary random number with zero mean. Since the values piμp_{i}^{\mu} are chosen independently for each neuron ii and each pattern μ\mu, the term aiνa_{i\nu} can be visualized as a random walk of N(M-1)N\,(M-1) steps and step size 1/N1/N. For a large number of steps, the positive or negative walking distance can be approximated by a Gaussian distribution with zero mean and standard deviation σ=(M-1)/NM/N\sigma=\sqrt{(M-1)/N}\approx\sqrt{M/N} for M1M\gg 1. The probability that the activity state of neuron ii erroneously flips is therefore proportional to

Perror=12πσ1e-x22σ2dx12[1-erf(N2M)]P_{\rm error}={1\over\sqrt{2\pi}\sigma}\int_{1}^{\infty}e^{-x^{2}\over 2\sigma% ^{2}}{\text{d}}x\approx{1\over 2}\left[1-{\rm erf}\left(\sqrt{{N\over 2M}}% \right)\right] (17.20)

where we have introduced the error function

erf(x)=1π0xe-y2dy{\rm erf}(x)={1\over\sqrt{\pi}}\int_{0}^{x}e^{-y^{2}}\,{\text{d}}y (17.21)

The most important insight is that the probability of an erroneous state-flip increases with the ratio M/NM/N. Formally, we can define the storage capacity CstoreC_{\rm store} of a network as the maximal number MmaxM^{\rm max} of patterns that a network of NN neurons can retrieve

Cstore=MmaxN=MmaxNN2.C_{\rm store}={M^{\rm max}\over N}={M^{\rm max}\,N\over N^{2}}\,. (17.22)

For the second equality sign we have multiplied both numerator and denominator by a common factor NN which gives rise to the following interpretation. Since each pattern consists of NN neurons (i.e., NN binary numbers), the total number of bits that need to be stored at maximum capacity is MmaxNM^{\rm max}\,N. In the Hopfield model, patterns are stored by an appropriate choice of the synaptic connections. The number of available synapses in a fully connected network is N2N^{2}. Therefore, the storage capacity measures the number of bits stored per synapse.

Example: Erroneous bits

We can evaluate Eq. (17.20) for various choices of PerrorP_{\rm error}. For example, if we accept an error probability of Perror=0.001P_{\rm error}=0.001, we find a storage capacity of Cstore=0.105C_{\rm store}=0.105.

Hence, a network of 10’000 neurons is able of storing about 1’000 patterns with Perror=0.001P_{\rm error}=0.001. Thus in each of the patterns, we expect that about 10 neurons exhibit erroneous activity. We emphasize that the above calculation focuses on the first iteration step only. If we start in the pattern, then about 10 neurons will flip their state in the first iteration. But these flips could in principle cause further neurons to flip in the second iteration and eventually initiate an avalanche of many other changes.

A more precise calculation shows that such an avalanche does not occur, if the number of stored pattern stays below a limit such that Cstore=0.138C_{\rm store}=0.138 (18; 22).

17.2.5 The energy picture

A B
Fig. 17.9: Attractor picture and energy landscape. A. The dynamics are attracted toward fixed points corresponding to memory states (overlap mν=1m^{\nu}=1). Four attractor states are indicated. The dashed lines show the boundaries of the basin of attraction of each memory. B. The Hopfield model has multiple equivalent energy minima, each one corresponding to the retrieval (overlap mν=1m^{\nu}=1) of one pattern. Between the main minima, additional local minima (corresponding to mixtures of several patterns) may also exist.

The Hopfield model has symmetric interactions wij=wji=cμ=1Mpiμpjμw_{ij}=w_{ji}=c\sum_{\mu=1}^{M}p_{i}^{\mu}\,p_{j}^{\mu}\ . We now show that, in any network with symmetric interactions and asynchronous deterministic dynamics

Si(t+Δt)=sgn[h(t)]=sgn[jwijSj(t)]S_{i}(t+\Delta t)=\operatorname{sgn}[h(t)]=\operatorname{sgn}\left[\sum_{j}w_{% ij}\,S_{j}(t)\right] (17.23)

the energy

E=-ijwijSiSjE=-\sum_{i}\sum_{j}w_{ij}\,S_{i}\,S_{j} (17.24)

decreases with each state flip of a single neuron (226).

In each time step only one neuron is updated (asynchronous dynamics). Let us assume that after application of Eq. (17.23) neuron kk has changed its value from SkS_{k} at time tt to Sk=-SkS^{\prime}_{k}=-S_{k} while all other neurons keep their value Sj=SjS^{\prime}_{j}=S_{j} for jkj\neq k. The prime indicates values evaluated at time t+Δtt+\Delta t. The change in energy caused by the state flip of neuron kk is

E-E=-iwikSi(Sk-Sk)-jwkjSj(Sk-Sk).E^{\prime}-E=-\sum_{i}w_{ik}\,S_{i}\,(S^{\prime}_{k}-S_{k})-\sum_{j}w_{kj}\,S_% {j}\,(S^{\prime}_{k}-S_{k})\,. (17.25)

First, because of the update of neuron kk, we have Sk-Sk=2SkS^{\prime}_{k}-S_{k}=2S^{\prime}_{k}. Second, because of the symmetry wij=wjiw_{ij}=w_{ji}, the two terms on the right-hand side are identical, and iwikSi=iwkiSi=hk\sum_{i}w_{ik}S_{i}=\sum_{i}w_{ki}S_{i}=h_{k}. Third, because of Eq. (17.23), the sign of hkh_{k} determines the new value SkS^{\prime}_{k} of neuron kk. Therefore the change in energy is E-E=-4hksgnhk<0E^{\prime}-E=-4h_{k}\operatorname{sgn}{h_{k}}<0. In other words, the energy EE is a Liapunov function of the deterministic Hopfield network.

Since the dynamics leads to a decrease of the energy, we may wonder whether we can say something about the global or local minimum of the energy. If we insert the definition of the connection weights into the energy function (17.24), we find

E=-ij(cμpiμpjμ)SiSj=-cN2μ(mμ)2,E=-\sum_{i}\sum_{j}\left(c\sum_{\mu}p_{i}^{\mu}p_{j}^{\mu}\right)\,S_{i}\,S_{j% }=-c\,N^{2}\sum_{\mu}(m^{\mu})^{2}\,, (17.26)

where we have used the definition of the overlap; cf. Eq. (17.8).

The maximum value of the overlap with a fixed pattern ν\nu is mν=1m^{\nu}=1. Moreover, for random patterns, the correlations between patterns are small. Therefore, if mν=1m^{\nu}=1 (i.e., recall of pattern ν\nu) the overlap with other patterns μν\mu\neq\nu is mμ0m^{\mu}\approx 0. Therefore, the energy landscape can be visualized with multiple minima of the same depth, each minimum corresponding to retrieval of one of the patterns (Fig. 17.9B).

17.2.6 Retrieval of low-activity patterns

There are numerous aspects in which the Hopfield model is rather far from biology. One of these is that in each memory pattern, fifty percent of the neurons are active.

To characterize patterns with a lower level of activity, let us introduce random variables ξiμ{0,1}\xi_{i}^{\mu}\in\{0,1\} for 1iN1\leq i\leq N and 1μM1\leq\mu\leq M with mean ξiμ=a\langle\xi_{i}^{\mu}\rangle=a. For a=0.5a=0.5 and piμ=2ξiμ-1p_{i}^{\mu}=2\xi_{i}^{\mu}-1 we are back to the patterns in the Hopfield model. In the following we are, however, interested in patterns with an activity a<0.5a<0.5. To simplify some of the arguments below, we suppose that patterns are generated under the constraint iξiμ=Na\sum_{i}\xi_{i}^{\mu}=N\,a for each μ\mu, so that all patterns have exactly the same target activity aa.

The weights in the Hopfield model of Eq. (17.7) are replaced by

wij=cμ=1M(ξiμ-b)(ξjμ-a)w_{ij}=c^{\prime}\sum_{\mu=1}^{M}(\xi_{i}^{\mu}-b)\,(\xi_{j}^{\mu}-a)\, (17.27)

where aa is the mean activity of the stored patterns, 0b10\leq b\leq 1 a constant, and c=[2a(1-a)N]-1c^{\prime}=[2a(1-a)N]^{-1}. Note that Eq. (17.7) is a special case of Eq. (17.27) with a=b=0.5a=b=0.5 and c=2cc^{\prime}=2c.

As before, we work with binary neurons Si=±1S_{i}=\pm 1 defined by the stochastic update rule in Eqs. (17.2) and (17.3). To analyze pattern retrieval we proceed analogously to Eq. (17.10). Introducing the overlap of low-activity patterns

mμ=12a(1-a)Nj(ξjμ-a)Sjm^{\mu}={1\over 2a(1-a)N}\sum_{j}(\xi_{j}^{\mu}-a)\,S_{j} (17.28)

we find

Prob{Si(t+Δt)=+1|hi(t)}=g[μ=1M(ξiμ-b)mμ(t)].{\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g\left[\sum_{\mu=1}^{M}(\xi_{i}^{% \mu}-b)\,m^{\mu}(t)\right]\,. (17.29)

Example: Memory retrieval and attractor dynamics

Suppose that at time tt the overlap with one of the patterns, say pattern 3, is significantly above zero while the overlap with all other patterns vanishes mμmδμ3m^{\mu}\approx m\,\delta^{\mu 3}, where δnm\delta^{nm} denotes the Kronecker-δ\delta. The initial overlap is 0.1<m10.1<m\leq 1. Then the dynamics of the low-activity networks splits up into two groups of neurons, i.e., those that should be ‘ON’ in pattern 3 (ξi3=1\xi_{i}^{3}=1) and those that should be ‘OFF’ (ξi3=0\xi_{i}^{3}=0).

The size of both groups scales with NN: There are aNa\cdot N ‘ON’ neurons and (1-a)N(1-a)\cdot N ‘OFF’ neurons. For NN\to\infty, the population activity AONA^{\rm ON} of the ‘ON’ group (i.e., the fraction of neurons with state Si=+1S_{i}=+1 in the ‘ON’ group) is therefore well described by its expectation value

AON(t+Δt)=g(1-b)m3(t).A^{\rm ON}(t+\Delta t)=g(1-b)\,m^{3}(t)\,. (17.30)

Similarly, the ‘OFF’ group has activity

AOFF(t+Δt)=g(-b)m3(t).A^{\rm OFF}(t+\Delta t)=g(-b)\,m^{3}(t)\,. (17.31)

To close the argument we determine the overlap at time t+Δtt+\Delta t. Exploiting the split into two groups of size aNa\cdot N and (1-a)N(1-a)\cdot N, respectively, we have

m3(t+Δt)\displaystyle m^{3}(t+\Delta t) =\displaystyle= 12a(1-a)N[jwithξj3=1(1-a)Sj(t+Δt)+jwithξj3=0(-a)Sj(t+Δt)]\displaystyle{1\over 2a(1-a)N}\left[\sum_{j{\rm~{}with~{}}\xi_{j}^{3}=1}(1-a)% \,S_{j}(t+\Delta t)+\sum_{j{\rm~{}with~{}}\xi_{j}^{3}=0}(-a)\,S_{j}(t+\Delta t% )\right] (17.32)
=\displaystyle= 12[AON(t+Δt)-AOFF(t+Δt)].\displaystyle\frac{1}{2}\left[A^{\rm ON}(t+\Delta t)-A^{\rm OFF}(t+\Delta t)% \right]\,.

Thus, the overlap with pattern 3 has changed from the initial value m3(t)=mm^{3}(t)=m to a new value m3(t+Δt)m^{3}(t+\Delta t). Retrieval of memories works if iteration of Eqs. (17.30) - (17.32) makes m3m^{3} converge to a value close to unity while, at the same time, the other overlaps mνm^{\nu} (for ν3\nu\neq 3) stay close to zero.

We emphasize that the analysis of the network dynamics presented here does not require symmetric weights but is possible for arbitrary values of the parameter bb. However, a standard choice is b=ab=a which leads to symmetric weights and to a high memory capacity (522).