19 Synaptic Plasticity and Learning 19.4 Reward-based learning 20 Outlook: Dynamics in Plastic Networks

19.5 Summary

The Hebb rule ( 19.2.1 ) is an example of a local unsupervised learning rule. It is a local rule, because it only depends on pre- and postsynaptic firing rates and the present state $w_{ij}$ of the synapse, i.e., information that is easily ‘available’ at the location of the synapse. Experiments have shown that, not only the firing rates, but also the membrane voltage of the postsynaptic neuron, as well as the relative timing of pre- and postsynaptic spikes determine the amplitude and direction of change of the synaptic efficacy. In order to account for spike timing effects, classical pair-based models of STDP are formulated with a learning window that consists of two parts: If the presynaptic spike arrives before a postsynaptic output spike, the synaptic change is positive. If the timing is reversed, the synaptic change is negative. However, classical pair-based STDP models neglect the frequency and voltage dependence of synaptic plasticity which are included in modern variants of STDP models.

The synaptic weight dynamics of Hebbian learning can be studied analytically if weights are changing slowly as compared to the time scale of the neuronal activity. Weight changes are driven by correlations between pre- and postsynaptic activity. More specifically, simple Hebbian learning rules in combination with a linear neuron model find the first principal component of a normalized input data set. Generalized Hebb rules, such as Oja’s rule, keep the norm of the weight vector approximately constant during plasticity.

The interesting aspect of STDP is that it naturally accounts for temporal correlations by means of a learning window. Explicit expressions for temporal spike-spike correlations can be obtained for certain simple types of neuron model such as the linear Poisson model. Spike-based and rate-based rules of plasticity are equivalent as long as temporal spike-spike correlations are disregarded. If firing rates vary slowly, then the integral over the learning window plays the role of the Hebbian correlation term.

Hebbian learning or STDP are examples of unsupervised learning rules. Hebbian learning is considered to be a major principle of neuronal organization during development and a driving force for receptive field formation. However, Hebbian synaptic plasticity is not useful for behavioral learning, since it does not take into account the success (or failure) of an action. Three-factor learning rules combine the two Hebbian factors (i.e. pre- and postsynaptic activity) with a third factor (i.e., a neuromodulator such as dopamine) which conveys information about an action’s success. Three factor rules with an eligibility trace can be used to describe behavioral learning, in particular during conditioning experiments.

Literature

Correlation-based learning can be traced back to Aristoteles ⁹⁹ Aristoteles, ”De memoria et reminiscentia”: There is no need to consider how we remember what is distant, but only what is neighboring, for clearly the method is the same. For the changes follow each other by habit, one after another. And thus, whenever someone wishes to recollect he will do the following: He will seek to get a starting point for a change after which will be the change in question. and has been discussed extensively by James ( 242 ) who formulated a learning principle on the level of ‘brain processes’ rather than neurons:

When two elementary brain-processes have been active together or in immediate succession, one of them, on re-occurring, tends to propagate its excitement into the other.

A chapter of James’ book is reprinted in volume 1 of Anderson and Rosenfeld’s collection on Neurocomputing ( 25 ) . The formulation of synaptic plasticity in Hebb’s book ( 210 ) of which two interesting sections are reprinted in the collection of Anderson and Rosenfeld ( 25 ) has had a long-lasting impact on the neuroscience community. The historical context of Hebb’s postulate is discussed in the reviews of Sejnowski ( 466 ) and Makram et al. ( 326 ) .

Classical experimental studies on STDP are ( 328; 567; 117; 55; 56; 483 ) , but precursors of timing dependent plasticity can be found even earlier ( 295 ) . Note that for some synapses, the learning window is reversed ( 43 ) . For reviews on STDP, see Abbott and Nelson ( 3 ) ; Bi and Poo ( 54 ) ; Caporale and Dan ( 89 ) ; Sjöström and Gerstner ( 482 ) .

The theory of unsupervised learning and principal component analysis is reviewed in the textbook by Hertz et al. ( 215 ) . Models of the development of receptive fields and cortical maps have a long tradition in the field of computational neuroscience; see, e.g., von der Malsburg ( 540 ) ; Willshaw and von der Malsburg ( 550 ) ; Sejnowski ( 467 ) ; Bienenstock et al. ( 58 ) ; Kohonen ( 271 ) ; Linsker ( 300 ) ; Miller et al. ( 346 ) ; MacKay and Miller ( 318 ) ; Miller ( 344 ) ; for reviews see, e.g., Erwin et al. ( 144 ) ; Wiskott and Sejnowski ( 556 ) . The essential aspects of the weight dynamics in linear networks are discussed in Oja ( 369 ) ; Miller and MacKay ( 343 ) . Articles of Grossberg ( 200 ) and Bienenstock et al. ( 58 ) or the book of Kohonen ( 271 ) illustrate the early use of the rate-based learning rules in computational neuroscience.

The early theory of STDP has been developed by ( 178; 176; 256; 442; 530; 489; 448 ) but precursors of timing-dependent plasticity can be found in earlier rate-based formulations ( 216; 488 ) . Modern theories of STDP go beyond the pair-based rules ( 468; 394 ) , consider voltage effects ( 99 ) , variations of boundary conditions ( 202 ) or Calcium-based models ( 302; 303 ) ; for reviews see Morrison et al. ( 353 ) ; Sjöström and Gerstner ( 482 ) .

Experimental support for three-factor learning rules is reviewed in ( 429; 391 ) . Model studies to reward modulated STDP are ( 238; 294; 153; 157 ) . The consequences for behavior are discussed in ( 306; 307 ) . The classic reference for dopamine in relation to reward-based learning is Schultz et al. ( 460 ) . Modern reviews on the topic are ( 461; 462 ) .

Exercises

1.

Normalization of firing rate.

Consider a learning rule ${{\text{d}}\over{\text{d}}t}w_{ij}=\gamma\,(\nu_{i}-\nu_{\theta})\,\nu_{j}\,,$ i.e., a change of synaptic weights can only occur if the presynaptic neuron is active ( $\nu_{j}>0$ ). The direction of the change is determined by the activity of the postsynaptic neuron. The postsynaptic firing rate is given by $\nu_{i}=g(\sum_{j=1}^{N}w_{ij}\nu_{j})$ . We assume that presynaptic firing rates $\nu_{j}$ are constant.

(i) Show that $\nu_{i}$ has a fixed point at $\nu_{i}=\nu_{\theta}$ .

(ii) Discuss the stability of the fixed point. Consider the cases $\gamma>0$ and $\gamma<0$ .

(iii) Discuss whether the learning rule is Hebbian, anti-Hebbian, or non-Hebbian.
2.

Fixed point of BCM rule . Assume a single postsynaptic neuron $\nu_{i}$ which receives constant input $\nu_{j}>0$ at all synapses $1\leq j\leq N$ .

(i) Show that the weights $w_{ij}$ have a fixed point under the BCM rule ( 19.9 ).

(ii) Show that this fixed point is unstable.
3.

Receptive field development with BCM rule . 20 presynaptic neurons with firing rates $\nu_{j}$ connect onto the same postsynaptic neuron which fires at a rate $\nu_{i}^{\rm post}=\sum_{j=1}^{20}w_{ij}\,\nu_{j}$ . Synaptic weights change according to the BCM rule ( 19.9 ) with a hard lower bound $0\leq w_{ij}$ and $\nu_{\theta}=10$ Hz.

The 20 inputs are organized in two groups of 10 inputs each. There are two possible input patterns $\mbox{\boldmath$\xi$}^{\mu}$ , with $\mu=1,2$ .

(i) The two possible input patterns are: $\mu=1$ - group 1 fires at 3Hz and group 2 is quiescent; and $\mu=2$ - group 2 fires at 1Hz and group 1 is quiescent. Inputs alternate between both patterns several times back and forth. Each pattern presentation lasts for $\Delta t$ . How do weights $w_{ij}$ evolve? Show that the postsynaptic neuron becomes specialized to one group of inputs.

(ii) Similar to (i), except that that the second pattern now is $\mu=2$ : group 2 fires at 2.5Hz and group 1 is quiescent. How do weights $w_{ij}$ evolve?

(iii) As in (ii), but you are allowed to make $\nu_{\theta}$ a function of the time-averaged firing rate $\bar{\nu}_{i}^{\rm post}$ of the postsynaptic neuron. Is $\nu_{\theta}=\bar{\nu}_{i}^{\rm post}$ a good choice? Why is $\nu_{\theta}=(\bar{\nu}_{i}^{\rm post})^{2}/10Hz$ a better choice?

Hint: Consider the time it takes to update your time-averaged firing rate in comparison to the presentation time $\Delta t$ of the patterns.
4.

Weight matrix of Hopfield model. Consider synaptic weights that change according to the following Hebbian learning rule: ${{\text{d}}\over{\text{d}}t}w_{ij}=c(\nu_{i}-\nu_{0})\,(\nu_{j}-\nu_{0}).$ (i) Identify the parameters $c$ and $\nu_{0}$ with the parameters of Eq. ( 19.2.1 ).

(ii) Assume a fully connected network of $N$ neurons. Suppose that the initial weights $w_{ij}$ vanish. During presentation of a pattern $\mu$ , activities of all neurons $1\leq k\leq N$ , are fixed to values $\nu_{k}=p_{k}^{\mu}$ , where $p_{k}^{\mu}\in\{0,1\}$ and synapses change according to the Hebbian learning rule. Patterns are applied one after the other, each for a time $\Delta t$ . Choose an appropriate value for $\nu_{0}$ so that after application of $P$ patterns, the final weights are $w_{ij}=\gamma\sum_{j=1}^{P}p_{i}^{\mu}\,p_{j}^{\mu}$ . Express the parameter $\gamma$ by $c,\nu_{0},\Delta t$ .

(iii) Compare your results with the weight matrix of the Hopfield model in Chapter 17 . Is the above learning procedure realistic? Can it be classified as unsupervised learning?

Hint: Consider not only the learning phase, but also the recall phase. Consider the situation where input patterns are chosen stochastically.

PCA with Oja’s learning rule . In order to show that Oja’s learning rule ( 19.7 ) selects the first principal component proceed in three steps.

(i) Show that the eigenvectors $\{\mbox{\boldmath$e$}_{1},\dots,\mbox{\boldmath$e$}_{N}\}$ of $C$ are fixed points of the dynamics.

Hint: Apply the methods of Section 19.3 to the batch version of Oja’s rule and show that

\Delta\mbox{\boldmath\(w\)}=\gamma\,C\,\mbox{\boldmath\(w\)}-\gamma\,\mbox{% \boldmath\(w\)}\,[\mbox{\boldmath\(w\)}\cdot C\,\mbox{\boldmath\(w\)}]\,,

(19.49)

The claim then follows.

(ii) Show that only the eigenvector $\mbox{\boldmath$e$}_{1}$ with the largest eigenvalue is stable.

Hint: Assume that the weight vector $\mbox{\boldmath$w$}=\mbox{\boldmath$e$}_{1}+\epsilon\mbox{\boldmath$e$}_% {k}$ has a small perturbation $\epsilon\ll 1$ in one of the principal direction. Derive an equation for ${\text{d}}\epsilon/{\text{d}}t$ and show that the perturbation grows if $k\neq 1$ .

(iii) Show that the output rate represents the projection of the input onto the first principal component.

6.

Triplet STDP rule and BCM . Show that for Poisson spike arrival and output spike generated by an independent Poisson process of rate $\nu_{i}$ , the triplet STDP model gives rise to a rate-based plasticity model identical to BCM. Identify the function $\phi$ in Eqs. ( 19.8 ) and ( 19.9 ) with the parameters of the triplet model in ( 19.15 ).

Hint: Use the methods of Section 19.3.3 . Independent Poisson output means that you can neglect the pre-before-post spike correlations.