17 Memory and Attractor Dynamics 17.1 Associations and memory 17.3 Memory networks with spiking neurons

17.2 Hopfield Model

The Hopfield model (226), consists of a network of $N$ neurons, labeled by a lower index $i$ , with $1\leq i\leq N$ . Similar to some earlier models (335; 304; 549), neurons in the Hopfield model have only two states. A neuron $i$ is ‘ON’ if its state variable takes the value $S_{i}=+1$ and ‘OFF’ (silent) if $S_{i}=-1$ . The dynamics evolves in discrete time with time steps $\Delta t$ . There is no refractoriness and the duration of a time step is typically not specified. If we take $\Delta t=1$ ms, we can interpret $S_{i}(t)=+1$ as an action potential of neuron $i$ at time $t$ . If we take $\Delta t=500$ ms, $S_{i}(t)=+1$ should rather be interpreted as an episode of high firing rate.

Neurons interact with each other with weights $w_{ij}$ . The input potential of neuron $i$ , influenced by the activity of other neurons is

h_{i}(t)=\sum_{j}w_{ij}\,S_{j}(t)\,.

(17.2)

The input potential at time $t$ influences the probabilistic update of the state variable $S_{i}$ in the next time step:

{\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g(h_{i}(t))=g\left(\sum_{j}w_{ij}% \,S_{j}(t)\right)\,

(17.3)

where $g$ is a monotonically increasing gain function with values between zero and one. A common choice is $g(h)=0.5[1+\tanh(\beta h)]$ with a parameter $\beta$ . For $\beta\to\infty$ , we have $g(h)=1$ for $h>0$ and zero otherwise. The dynamics are therefore deterministic and summarized by the update rule

S_{i}(t+\Delta t)=\operatorname{sgn}[h(t)]

(17.4)

For finite $\beta$ the dynamics are stochastic. In the following we assume that in each time step all neurons are updated synchronously (parallel dynamics), but an update scheme where only one neuron is updated per time step is also possible.

The aim of this section is to show that, with a suitable choice of the coupling matrix $w_{ij}$ memory items can be retrieved by the collective dynamics defined in Eq. (17.3), applied to all $N$ neurons of the network. In order to illustrate how collective dynamics can lead to meaningful results, we start, in Section 17.2.1, with a detour through the physics of magnetic systems. In Section 17.2.2, the insights from magnetic systems are applied to the case at hand, i.e., memory recall.

Fig. 17.5: Physics of ferromagnets. A. Magnetic materials consists of atoms, each with a small magnetic moment, here visualized as an arrow, a symbol for a magnetic needle. At low temperature, all magnetic needles are aligned. Inset: Field lines around one of the magnetic needles. B. At high temperature, some of the needles are misaligned (dashed circles). Cooling the magnet leads to a spontaneous alignment and reforms a pure magnet; schematic figure.

17.2.1 Detour: Magnetic analogy

Magnetic material contains atoms which carry a so-called spin. The spin generates a magnetic moment at the microscopic level visualized graphically as an arrow (Fig. 17.5A). At high temperature, the magnetic moments of individual atoms point in all possible directions. Below a critical temperature, however, the magnetic moment of all atoms spontaneously align with each other. As a result, the microscopic effects of all atomic magnetic moments add up and the material exhibits the macroscopic properties of a ferromagnet.

In order to understand how a spontaneous alignment can arise, let us study Eqs. (17.2) and (17.3) in the analogy of magnetic materials. We assume that $w_{ij}=w_{0}>0$ between all pairs of neurons $i\neq j$ . and that self-interaction vanishes, $w_{ii}=0$ .

Each atom is characterized by a spin variable $S_{i}=\pm 1$ where $S_{i}=+1$ indicates that the magnetic moment of atom $i$ points ‘upward’. Suppose that, at time $t=0$ , all spins take a positive value ( $S_{I}=+1$ ), except that of atom $i$ which has a value $S_{i}(0)=-1$ (Fig. 17.5A). We calculate the probability that, at time step $t=\Delta t$ , the spin of neuron $i$ will switch to $S_{i}=+1$ . This probability is according to Eq. (17.3)

{\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g(h_{i}(t))=g\left(\sum_{j=1}^{N}w% _{ij}\,S_{j}(t)\right)=g(w_{0}\,(N-1))

(17.5)

where we have used our assumptions. With $g(h)=0.5[1+\tanh(\beta h)]$ and $w_{0}=\beta=1$ , we find that for any network of more than three atoms, the probability that the magnetic moments of all atoms would align is extremely high. In physical systems, $\beta$ plays the role of an inverse temperature. If $\beta$ becomes small (high temperature), the magnetic moments no longer align and the material loses its spontaneous magnetization.

According to Eq. (17.5) the probability of alignment increases with the network size. This is an artifact of our model with all-to-all interaction between all atoms. Physical interactions, however, rapidly decrease with distance, so that the sum over $j$ in Eq. (17.5) should be restricted to the nearest neighbors of neuron $i$ , e.g., about 4 to 20 atoms depending on the configuration of the atomic arrangement and the range of the interaction. Interestingly, neurons, in contrast to atoms, are capable of making long-range interactions because of their far-reaching axonal cables and dendritic trees. Therefore, the number of topological neighbors of a given neuron is in the range of thousands.

An arrangement of perfectly aligned magnetic elements looks rather boring, but physics offers more interesting examples as well. In some materials, typically consisting of two different types of atoms, say A and B, an anti-ferromagnetic ordering is possible (Fig. 17.6). While one layer of magnetic moments points upward, the next one points downward, so that the macroscopic magnetization is zero. Nevertheless, a highly ordered structure is present. Examples of anti-ferromagnets are some metallic oxides and alloys.

To model an anti-ferromagnet, we choose interactions $w_{ij}=+1$ if $i$ and $j$ belong to the same class (e.g., both are in a layer of type A or both in a layer of type B), and $w_{ij}=-1$ if one of the two atoms belongs to type A and the other to type B. A simple repetition of the calculation in Eq. (17.5) shows that an anti-ferromagnetic organization of the spins emerges spontaneously at low temperature.

Fig. 17.6: Storing patterns. A. Physical anti-ferromagnets consist of layers of atoms A and B. All magnetic moments are aligned within a layer of identical neurons, but exhibit different orientations between layers. A model where interactions within atoms of the same type are positive (solid lines) and between atoms of different type are negative (dashed lines) can explain the spontaneous order in the arrangement of magnetic moments. The interaction scheme for two atoms with its 10 nearest neighbors is indicated. B. If we replace magnetic moments by black and white pixels (squares), represented by active and inactive neurons, respectively, the neuronal network can store a pattern, such as T. Interactions are positive (solid lines) between pixels of the same color (black-to-black or white-to-white) and negative otherwise. Only a few representative interactions are shown. Schematic figure.

The same idea of positive and negative interactions $w_{ij}$ can be used to embed an arbitrary pattern into a network of neurons. Let us draw a pattern of black and white pixels corresponding to active ( $p_{i}=+1$ ) and inactive ( $p_{i}=-1$ ) neurons, respectively. The rule extracted from the anti-ferromagnet implies that pixels of opposite color are connected by negative weights, while pixels of the same color have connections with positive weight. This rule can be formalized as

w_{ij}=p_{i}\,p_{j}\,.

(17.6)

This rule forms the basis of the Hopfield model.

17.2.2 Patterns in the Hopfield model

The Hopfield model consists of a network of $N$ binary neurons. A neuron $i$ is characterized by its state $S_{i}=\pm 1$ . The state variable is updated according to the dynamics defined in Eq. (17.3).

The task of the network is to store and recall $M$ different patterns. Patterns are labeled by the index $\mu$ with $1\leq\mu\leq M$ . Each pattern $\mu$ is defined as a desired configuration $\left\{p_{i}^{\mu}=\pm 1;1\leq i\leq N\right\}$ . The network of $N$ neurons is said to correctly represent pattern $\mu$ , if the state of all neurons $1\leq i\leq N$ is $S_{i}(t)=S_{i}(t+\Delta t)=p_{i}^{\mu}$ . In other words, patterns must be fixed points of the dynamics (17.3).

For us as human observers, a meaningful pattern could, for example, be a configuration in form of a ‘T’, such as depicted in Fig. 17.6B. However, visually attractive patterns have large correlations between each other. Moreover, areas in the brain related to memory recall are situated far from the retinal input stage. Since the configuration of neurons in memory-related brain areas is probably very different from those at the retina, patterns in the Hopfield model are chosen as fixed random patterns; cf. Fig. 17.7.

During the set-up phase of the Hopfield network, a random number generator generates, for each pattern $\mu$ a string of $N$ independent binary numbers $\{p_{i}^{\mu}=\pm 1;1\leq i\leq N\}$ with expectation value $\langle p_{i}^{\mu}\rangle=0$ . Strings of different patterns are independent. The weights are chosen as

w_{ij}=c\sum_{\mu=1}^{M}p_{i}^{\mu}\,p_{j}^{\mu}\,

(17.7)

with a positive constant $c>0$ . The network has full connectivity. Note that for a single pattern and $c=1$ , Eq. (17.7) is identical to the connections of the anti-ferromagnet, Eq. (17.6). For reasons that become clear later on, the standard choice of the constant $c$ is $c=1/N$ .

Fig. 17.7: Hopfield model. A. Top: Three random patterns $\mu=1,2,3$ in a network of $N=8$ neurons. Black squares ( $p_{i}^{\mu}=+1$ ) and white squares ( $p_{i}^{\mu}=-1$ ) are arranged in random order. Bottom: The overlap $m^{1}=(1/N)\sum_{i}p_{i}^{1}\,S_{i}(t)$ measures the similarity between the current state $S(t)=\{S_{i}(t);1\leq i\leq N\}$ and the first pattern. Here only a single neuron exhibits a mismatch (dotted line). The desired value in the pattern is shown as black and white squares, while the current state is indicated as black and white circles; schematic figure. B. Orthogonal patterns have a mutual overlap of zero so that correlations are $C^{\mu\nu}=(1/N)\sum_{i}p_{i}^{\mu}\,p_{i}^{\nu}=\delta^{\mu\nu}$ (top) whereas random patterns exhibit a small residual overlap for $\mu\neq\nu$ (bottom).

17.2.3 Pattern retrieval

In many memory retrieval experiments, a cue with partial information is given at the beginning of a recall trial. The retrieval of a memory item is verified by the completion of the missing information.

In order to mimic memory retrieval in the Hopfield model, an input is given by initializing the network in a state $S(t_{0})=\{S_{i}(t_{0});1\leq i\leq N\}$ . After initialization, the network evolves freely under the dynamics (17.3). Ideally the dynamics should converge to a fixed point corresponding to the pattern $\mu$ which is most similar to the initial state.

In order to measure the similarity between the current state $S(t)=\{S_{i}(t);1\leq i\leq N\}$ and a pattern $\mu$ , we introduce the overlap (Fig. 17.7B)

m^{\mu}(t)={1\over N}\sum_{i}p_{i}^{\mu}\,S_{i}(t)\,.

(17.8)

The overlap takes a maximum value of 1, if $S_{i}(t)=p_{i}^{\mu}$ , i.e., if the pattern is retrieved. It is close to zero if the current state has no correlation with pattern $\mu$ . The minimum value $m^{\mu}(t)=-1$ is achieved if each neuron takes the opposite value to that desired in pattern $\mu$ .

The overlap plays an important role in the analysis of the network dynamics. In fact, using Eq. (17.2) the input potential $h_{i}$ of a neuron $i$ is

h_{i}(t)=\sum_{j}w_{ij}\,S_{j}(t)=c\sum_{j=1}^{N}\sum_{\mu=1}^{M}p_{i}^{\mu}\,% p_{j}^{\mu}\,S_{j}(t)=c\,N\,\sum_{\mu=1}^{M}p_{i}^{\mu}\,m^{\mu}(t)

(17.9)

where we have used Eqs. (17.7) and (17.8). In order to make the results of the calculation independent of the size of the network, it is standard to choose the factor $c=1/N$ , as mentioned above. In the following we always take $c=1/N$ unless indicated otherwise. For an in-depth discussion, see the scaling arguments in Chapter 12.

In order to close the argument, we now use the input potential in the dynamics Eq. (17.3) and find

{\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g\left[\sum_{\mu=1}^{M}p_{i}^{\mu}% \,m^{\mu}(t)\right]\,.

(17.10)

Eq. (17.10) highlights that the $M$ macroscopic similarity values $m^{\mu}$ with $1\leq\mu\leq M$ completely determine the dynamics of the network.

Example: Memory retrieval

Let us suppose that the initial state has a significant similarity with pattern $\mu=3$ , e.g., an overlap of $m^{\mu}(t_{0})=0.4$ and no overlap with the other patterns $m^{\nu}=0$ for $\nu\neq 3$ .

In the noiseless case Eq. (17.10) simplifies to

S_{i}(t_{0}+\Delta t)=\operatorname{sgn}\left[\sum_{\mu=1}^{M}p_{i}^{\mu}\,m^{% \mu}\right]\,=\operatorname{sgn}\left[p_{i}^{3}\,m^{3}(t_{0})\,\right]=p_{i}^{% 3}\quad{\rm for~{}all~{}}i\,.

(17.11)

Hence, each neuron takes, after a single time step, the desired state corresponding to the pattern. In other words, the pattern with the strongest similarity to the input is retrieved, as it should be.

For stochastic neurons we find

{\rm Prob}\{S_{i}(t_{0}+\Delta t)=+1|h_{i}(t)\}=g[p_{i}^{3}\,m^{3}(t_{0})]\,.

(17.12)

We note that, given the overlap $m^{3}(t_{0})$ , the right-hand side of Eq. (17.12) can only take two different values, corresponding to $p_{i}^{3}=+1$ and $p_{i}^{3}=-1$ . Thus, all neurons that should be active in pattern 3 share the same probabilistic update rule

{\rm Prob}\{S_{i}(t_{0}+\Delta t)=+1|h_{i}(t)\}=g[m^{3}(t_{0})]\quad{\rm for~{% }all~{}}i{\rm~{}with~{}}p_{i}^{3}=+1\,.

(17.13)

Similarly all those that should be inactive share another rule

{\rm Prob}\{S_{i}(t_{0}+\Delta t)=+1|h_{i}(t)\}=g[-m^{3}(t_{0})]\quad{\rm for~% {}all~{}}i{\rm~{}with~{}}p_{i}^{3}=-1\,.

(17.14)

Thus, despite the fact that there are $N$ neurons and $M$ different patterns, during recall the network breaks up into two macroscopic populations: those that should be active and those that should be inactive. This is the reason why we can expect to arrive at macroscopic population equations, similar to those encountered in part III of the book.

Let us use this insight for the calculation of the overlap at time $t_{0}+\Delta t$ . We denote the size of the two populations by $N_{+}^{3}$ and $N_{-}^{3}$ , respectively, and find

	$\displaystyle m^{3}(t_{0}+\Delta t)$	$\displaystyle=$	$\displaystyle{1\over N}\sum_{i}p_{i}^{3}S_{i}(t_{0}+\Delta t)$		(17.15)
		$\displaystyle=$	$\displaystyle{N_{+}^{3}\over N}\left[{1\over N_{+}^{3}}\sum_{i{\rm~{}with~{}}p% _{i}^{3}=+1}S_{i}(t_{0}+\Delta t)\right]-{N_{-}^{3}\over N}\left[{1\over N_{-}% ^{3}}\sum_{i{\rm~{}with~{}}p_{i}^{3}=-1}S_{i}(t_{0}+\Delta t)\right]\,.$		(17.15)

We can interpret the two terms enclosed by the square brackets as the average activity of those neurons that should, or should not, be active, respectively. In the limit of a large network ( $N\to\infty$ ) both groups are very large and of equal size $N_{+}^{3}=N_{-}^{3}=N/2$ . Therefore, the averages inside the square brackets approach their expectation values. The technical term, used in the physics literature, is that the network dynamics are ‘self-averaging’. Hence, we can evaluate the square brackets with probabilities introduced in Eqs. (17.13) and (17.14). With ${\rm Prob}\{S_{i}(t_{0}+\Delta t)=-1|h_{i}(t)\}=1-{\rm Prob}\{S_{i}(t_{0}+% \Delta t)=+1|h_{i}(t)\}$ , we find

m^{3}(t_{0}+\Delta t)={1\over 2}\,\{2\,g[m^{3}(t_{0})]-1\}-{1\over 2}\,\{2\,g[% -m^{3}(t_{0})]-1\}\,.

(17.16)

In the special case that $g(h)=0.5[1+\tanh(\beta h)]$ Eq. (17.16) simplifies to an update law

m^{3}(t+\Delta t)=\tanh[\beta\,m^{3}(t)]\,

(17.17)

where we have replaced $t_{0}$ by $t$ , in order to highlight that updates should be iterated over several time steps.

Fig. 17.8: Memory retrieval in the Hopfield model. A. The overlap $m^{\nu}(t+\Delta t)$ with a specific pattern $\nu$ is given as a function of the overlap with the same pattern $m^{\nu}(t)$ in the previous time step (solid line); cf. Eq. (17.16). The overlap with the $M-1$ other patterns is supposed to vanish. The iterative update can be visualized as a path (arrow) between the overlap curve and the diagonal (dashed line). The dynamics approach a fixed point (circle) with high overlap corresponding to the retrieval of the pattern. B. The probability $P_{\rm error}$ that during retrieval an erroneous state-flip occurs corresponds to the shaded area under the curve; cf. Eq. (17.20). The width $\sigma$ of the curve is proportional to the pattern load $M/N$ ; schematic figure.

We close with three remarks. First, the dynamics of $N$ neurons has been replaced, in a mathematically precise limit, by the iterative update of one single macroscopic variable, i.e., the overlap with one of the patterns. The result is reminiscent of the analysis of the macroscopic population dynamics performed in part III of the book. Indeed, the basic mathematical principles used for the equations of the population activity $A(t)$ , are the same as the ones used here for the update of the overlap variable $m^{\mu}(t)$ .

Second, if $\beta>1$ , the dynamics converges from an initially small overlap to a fixed point with a large overlap, close to one. The graphical solution of the update of pattern $\nu=3$ (for which a nonzero overlap existed in the initial state) is shown in Fig. 17.8. Because the network dynamics is ‘attracted’ toward a stable fixed point characterized by a large overlap with one of the memorized patterns (Fig. 17.9A), the Hopfield model and variants of it are also called ‘attractor’ networks or ’attractor memories’ (24; 40).

Finally, the assumption that, apart from pattern 3, all other patterns have an initial overlap exactly equal to zero is artificial. For random patterns, we expect a small overlap between arbitrary pairs of patterns. Thus, if the network is exactly in pattern 3 so that $m^{3}=1$ , the other patterns have a small but finite overlap $|m^{\mu}|\neq 0$ , because of spurious correlations $C^{\mu\nu}=(1/N)\sum_{i}p_{i}^{\mu}p_{i}^{\nu}$ between any two random patterns $\mu$ and $\nu$ ; Fig. 17.7B. If the number of patterns is large, the spurious correlations between the patterns can generate problems during memory retrieval, as we will see now.

17.2.4 Memory capacity

How many random patterns can be stored in a network of $N$ neurons? Memory retrieval implies pattern completion, starting from a partial cue. An absolutely minimal condition for pattern completion is that at least the dynamics should not move away from the pattern, if the initial cue is identical to the complete pattern (215). In other words, we require that a network with initial state $S_{i}(t_{0})=p_{i}^{\nu}$ for $1\leq i\leq N$ stays in pattern $\nu$ . Therefore pattern $\nu$ must be a fixed point under the dynamics.

We study a Hopfield network at zero temperature ( $\beta=\infty$ ). We start the calculation as in Eq. 17.9 and insert $S_{j}(t_{0})=p_{j}^{\nu}$ . This yields

	$\displaystyle S_{i}(t_{0}+\Delta t)$	$\displaystyle=$	$\displaystyle\operatorname{sgn}\left[{1\over N}\sum_{j=1}^{N}\sum_{\mu=1}^{M}p% _{i}^{\mu}\,p_{j}^{\mu}\,p_{j}^{\nu}\right]$		(17.18)
		$\displaystyle=$	$\displaystyle\operatorname{sgn}\left[p_{i}^{\nu}\,\left({1\over N}\sum_{j=1}^{% N}p_{j}^{\nu}\,p_{j}^{\nu}\right)+{1\over N}\sum_{\mu\neq\nu}\sum_{j}p_{i}^{% \mu}\,p_{j}^{\mu}\,p_{j}^{\nu}\right]\,,$		(17.18)

where we have separated the pattern $\nu$ from the other patterns. The factor in parenthesis on the right-hand side adds up to one and can therefore be dropped. We now multiply the second term on the right-hand side by a factor $1=p_{i}^{\nu}\,p_{i}^{\nu}$ . Finally, because $p_{i}^{\nu}=\pm 1$ , a factor $p_{i}^{\nu}$ can be pulled out of the argument of the sign-function:

S_{i}(t_{0}+\Delta t)=p_{i}^{\nu}\,\operatorname{sgn}[1+{1\over N}\sum_{j}\sum% _{\mu\neq\nu}p_{i}^{\mu}\,p_{i}^{\nu}\,p_{j}^{\mu}\,p_{j}^{\nu}]=p_{i}^{\nu}\,% \operatorname{sgn}[1-a_{i\nu}]\,.

(17.19)

The desired fixed point exists only if $1>a_{i\nu}={1\over N}\sum_{j}\sum_{\mu\neq\nu}p_{i}^{\mu}\,p_{i}^{\nu}\,p_{j}^% {\mu}\,p_{j}^{\nu}$ for all neurons $i$ . In other words, even if the network is initialized in perfect agreement with one of the patterns, it can happen that one or a few neurons flip their sign. The probability to move away from the pattern is equal to the probability of finding a value $a_{i\nu}>0$ for one of the neurons $i$ .

Because patterns are generated from independent random numbers $p_{i}^{\mu}=\pm 1$ with zero mean, the product $p_{i}^{\mu}\,p_{i}^{\nu}\,p_{j}^{\mu}\,p_{j}^{\nu}=\pm 1$ is also a binary random number with zero mean. Since the values $p_{i}^{\mu}$ are chosen independently for each neuron $i$ and each pattern $\mu$ , the term $a_{i\nu}$ can be visualized as a random walk of $N\,(M-1)$ steps and step size $1/N$ . For a large number of steps, the positive or negative walking distance can be approximated by a Gaussian distribution with zero mean and standard deviation $\sigma=\sqrt{(M-1)/N}\approx\sqrt{M/N}$ for $M\gg 1$ . The probability that the activity state of neuron $i$ erroneously flips is therefore proportional to

P_{\rm error}={1\over\sqrt{2\pi}\sigma}\int_{1}^{\infty}e^{-x^{2}\over 2\sigma% ^{2}}{\text{d}}x\approx{1\over 2}\left[1-{\rm erf}\left(\sqrt{{N\over 2M}}% \right)\right]

(17.20)

where we have introduced the error function

{\rm erf}(x)={1\over\sqrt{\pi}}\int_{0}^{x}e^{-y^{2}}\,{\text{d}}y

(17.21)

The most important insight is that the probability of an erroneous state-flip increases with the ratio $M/N$ . Formally, we can define the storage capacity $C_{\rm store}$ of a network as the maximal number $M^{\rm max}$ of patterns that a network of $N$ neurons can retrieve

C_{\rm store}={M^{\rm max}\over N}={M^{\rm max}\,N\over N^{2}}\,.

(17.22)

For the second equality sign we have multiplied both numerator and denominator by a common factor $N$ which gives rise to the following interpretation. Since each pattern consists of $N$ neurons (i.e., $N$ binary numbers), the total number of bits that need to be stored at maximum capacity is $M^{\rm max}\,N$ . In the Hopfield model, patterns are stored by an appropriate choice of the synaptic connections. The number of available synapses in a fully connected network is $N^{2}$ . Therefore, the storage capacity measures the number of bits stored per synapse.

Example: Erroneous bits

We can evaluate Eq. (17.20) for various choices of $P_{\rm error}$ . For example, if we accept an error probability of $P_{\rm error}=0.001$ , we find a storage capacity of $C_{\rm store}=0.105$ .

Hence, a network of 10’000 neurons is able of storing about 1’000 patterns with $P_{\rm error}=0.001$ . Thus in each of the patterns, we expect that about 10 neurons exhibit erroneous activity. We emphasize that the above calculation focuses on the first iteration step only. If we start in the pattern, then about 10 neurons will flip their state in the first iteration. But these flips could in principle cause further neurons to flip in the second iteration and eventually initiate an avalanche of many other changes.

A more precise calculation shows that such an avalanche does not occur, if the number of stored pattern stays below a limit such that $C_{\rm store}=0.138$ (18; 22).

17.2.5 The energy picture

Fig. 17.9: Attractor picture and energy landscape. A. The dynamics are attracted toward fixed points corresponding to memory states (overlap $m^{\nu}=1$ ). Four attractor states are indicated. The dashed lines show the boundaries of the basin of attraction of each memory. B. The Hopfield model has multiple equivalent energy minima, each one corresponding to the retrieval (overlap $m^{\nu}=1$ ) of one pattern. Between the main minima, additional local minima (corresponding to mixtures of several patterns) may also exist.

The Hopfield model has symmetric interactions $w_{ij}=w_{ji}=c\sum_{\mu=1}^{M}p_{i}^{\mu}\,p_{j}^{\mu}\$ . We now show that, in any network with symmetric interactions and asynchronous deterministic dynamics

S_{i}(t+\Delta t)=\operatorname{sgn}[h(t)]=\operatorname{sgn}\left[\sum_{j}w_{% ij}\,S_{j}(t)\right]

(17.23)

the energy

E=-\sum_{i}\sum_{j}w_{ij}\,S_{i}\,S_{j}

(17.24)

decreases with each state flip of a single neuron (226).

In each time step only one neuron is updated (asynchronous dynamics). Let us assume that after application of Eq. (17.23) neuron $k$ has changed its value from $S_{k}$ at time $t$ to $S^{\prime}_{k}=-S_{k}$ while all other neurons keep their value $S^{\prime}_{j}=S_{j}$ for $j\neq k$ . The prime indicates values evaluated at time $t+\Delta t$ . The change in energy caused by the state flip of neuron $k$ is

E^{\prime}-E=-\sum_{i}w_{ik}\,S_{i}\,(S^{\prime}_{k}-S_{k})-\sum_{j}w_{kj}\,S_% {j}\,(S^{\prime}_{k}-S_{k})\,.

(17.25)

First, because of the update of neuron $k$ , we have $S^{\prime}_{k}-S_{k}=2S^{\prime}_{k}$ . Second, because of the symmetry $w_{ij}=w_{ji}$ , the two terms on the right-hand side are identical, and $\sum_{i}w_{ik}S_{i}=\sum_{i}w_{ki}S_{i}=h_{k}$ . Third, because of Eq. (17.23), the sign of $h_{k}$ determines the new value $S^{\prime}_{k}$ of neuron $k$ . Therefore the change in energy is $E^{\prime}-E=-4h_{k}\operatorname{sgn}{h_{k}}<0$ . In other words, the energy $E$ is a Liapunov function of the deterministic Hopfield network.

Since the dynamics leads to a decrease of the energy, we may wonder whether we can say something about the global or local minimum of the energy. If we insert the definition of the connection weights into the energy function (17.24), we find

E=-\sum_{i}\sum_{j}\left(c\sum_{\mu}p_{i}^{\mu}p_{j}^{\mu}\right)\,S_{i}\,S_{j% }=-c\,N^{2}\sum_{\mu}(m^{\mu})^{2}\,,

(17.26)

where we have used the definition of the overlap; cf. Eq. (17.8).

The maximum value of the overlap with a fixed pattern $\nu$ is $m^{\nu}=1$ . Moreover, for random patterns, the correlations between patterns are small. Therefore, if $m^{\nu}=1$ (i.e., recall of pattern $\nu$ ) the overlap with other patterns $\mu\neq\nu$ is $m^{\mu}\approx 0$ . Therefore, the energy landscape can be visualized with multiple minima of the same depth, each minimum corresponding to retrieval of one of the patterns (Fig. 17.9B).

17.2.6 Retrieval of low-activity patterns

There are numerous aspects in which the Hopfield model is rather far from biology. One of these is that in each memory pattern, fifty percent of the neurons are active.

To characterize patterns with a lower level of activity, let us introduce random variables $\xi_{i}^{\mu}\in\{0,1\}$ for $1\leq i\leq N$ and $1\leq\mu\leq M$ with mean $\langle\xi_{i}^{\mu}\rangle=a$ . For $a=0.5$ and $p_{i}^{\mu}=2\xi_{i}^{\mu}-1$ we are back to the patterns in the Hopfield model. In the following we are, however, interested in patterns with an activity $a<0.5$ . To simplify some of the arguments below, we suppose that patterns are generated under the constraint $\sum_{i}\xi_{i}^{\mu}=N\,a$ for each $\mu$ , so that all patterns have exactly the same target activity $a$ .

The weights in the Hopfield model of Eq. (17.7) are replaced by

w_{ij}=c^{\prime}\sum_{\mu=1}^{M}(\xi_{i}^{\mu}-b)\,(\xi_{j}^{\mu}-a)\,

(17.27)

where $a$ is the mean activity of the stored patterns, $0\leq b\leq 1$ a constant, and $c^{\prime}=[2a(1-a)N]^{-1}$ . Note that Eq. (17.7) is a special case of Eq. (17.27) with $a=b=0.5$ and $c^{\prime}=2c$ .

As before, we work with binary neurons $S_{i}=\pm 1$ defined by the stochastic update rule in Eqs. (17.2) and (17.3). To analyze pattern retrieval we proceed analogously to Eq. (17.10). Introducing the overlap of low-activity patterns

m^{\mu}={1\over 2a(1-a)N}\sum_{j}(\xi_{j}^{\mu}-a)\,S_{j}

(17.28)

we find

{\rm Prob}\{S_{i}(t+\Delta t)=+1|h_{i}(t)\}=g\left[\sum_{\mu=1}^{M}(\xi_{i}^{% \mu}-b)\,m^{\mu}(t)\right]\,.

(17.29)

Example: Memory retrieval and attractor dynamics

Suppose that at time $t$ the overlap with one of the patterns, say pattern 3, is significantly above zero while the overlap with all other patterns vanishes $m^{\mu}\approx m\,\delta^{\mu 3}$ , where $\delta^{nm}$ denotes the Kronecker- $\delta$ . The initial overlap is $0.1<m\leq 1$ . Then the dynamics of the low-activity networks splits up into two groups of neurons, i.e., those that should be ‘ON’ in pattern 3 ( $\xi_{i}^{3}=1$ ) and those that should be ‘OFF’ ( $\xi_{i}^{3}=0$ ).

The size of both groups scales with $N$ : There are $a\cdot N$ ‘ON’ neurons and $(1-a)\cdot N$ ‘OFF’ neurons. For $N\to\infty$ , the population activity $A^{\rm ON}$ of the ‘ON’ group (i.e., the fraction of neurons with state $S_{i}=+1$ in the ‘ON’ group) is therefore well described by its expectation value

A^{\rm ON}(t+\Delta t)=g(1-b)\,m^{3}(t)\,.

(17.30)

Similarly, the ‘OFF’ group has activity

A^{\rm OFF}(t+\Delta t)=g(-b)\,m^{3}(t)\,.

(17.31)

To close the argument we determine the overlap at time $t+\Delta t$ . Exploiting the split into two groups of size $a\cdot N$ and $(1-a)\cdot N$ , respectively, we have

	$\displaystyle m^{3}(t+\Delta t)$	$\displaystyle=$	$\displaystyle{1\over 2a(1-a)N}\left[\sum_{j{\rm~{}with~{}}\xi_{j}^{3}=1}(1-a)% \,S_{j}(t+\Delta t)+\sum_{j{\rm~{}with~{}}\xi_{j}^{3}=0}(-a)\,S_{j}(t+\Delta t% )\right]$		(17.32)
		$\displaystyle=$	$\displaystyle\frac{1}{2}\left[A^{\rm ON}(t+\Delta t)-A^{\rm OFF}(t+\Delta t)% \right]\,.$		(17.32)

Thus, the overlap with pattern 3 has changed from the initial value $m^{3}(t)=m$ to a new value $m^{3}(t+\Delta t)$ . Retrieval of memories works if iteration of Eqs. (17.30) - (17.32) makes $m^{3}$ converge to a value close to unity while, at the same time, the other overlaps $m^{\nu}$ (for $\nu\neq 3$ ) stay close to zero.

We emphasize that the analysis of the network dynamics presented here does not require symmetric weights but is possible for arbitrary values of the parameter $b$ . However, a standard choice is $b=a$ which leads to symmetric weights and to a high memory capacity (522).