10 Estimating Models 10.4 Closed-loop stimulus design 11 Encoding and Decoding with Stochastic Neuron models

10.5 Summary

With modern statistical methods, we have fast and computationally tractable schemes to fit models of neural encoding and decoding to experimental data. A key insight is that, for a suitable chosen model class, the likelihood of the data being generated by the model is a concave function of the model parameters, i.e., there are no local maxima. Because of this, numerical methods of gradient ascent are bound to lead to the global maximum.

Generalized Linear Models (GLM) are the representative of this model class. Importantly, a large ensemble of generalized integrate-and-fire models, in particular the SRM with escape noise, belong to the family of GLM. As we have seen in previous chapters, the SRM can account for a large body of electrophysiological data and firing patterns such as adaptation, burst firing, time-dependent firing threshold, hyperpolarizing spike-after potential etc. The link from SRM to GLM implies that there are systematic and computationally fast methods to fit biologically plausible neuron models to data.

Interestingly, once neuron models are phrased in the language of statistics, the problems of coding and stimulus design can be formulated in a single unified framework. In the following chapter we will see that the problem of decoding can also be analyzed in the same statistical framework.

Literature

An early application of maximum likelihood approaches to neuronal data can be found in Brillinger (69). The application of the framework of Generalized Linear Models to the field of neuroscience has been made popular by Truccolo et al. (521); Pillow et al. (399). A review of Generalized Linear Models can be found in Dobson and Barnett (129).

The influential book of Rieke (437) gives a broad introduction to the field of neural coding. The time-rescaling theorem was exploited in (73) to develop useful goodness-of-fit methods for spike trains. Spike train metrics were introduced in Victor and Purpura (535, 534), but comparisons of spike trains in terms of PSTHs and other features has been commonly used before (392; 393; 174; 321; 138; 169). Many other spike train distances were also proposed (260; 531; 410; 234; 458; 357) which can be cast in the general framework of a vector space as outlined in Schrauwen and Campenhout (457), Paiva et al. (376) and Naud et al. (357); see also Paiva et al. (377, 378); Park et al. (388). Non-linear functions of the spike trains can also be used to relate to different features of the spiking process such as the interval distribution or the presence of definite firing patterns (535; 410; 513; 279; 280; 133).

Exercises

1.

Concave function and non-global optima .

(i) Suppose a function $G(x)$ has a global maximum at location $x_{0}$ . Suppose that $f(y)$ is a strictly increasing function of $y$ (i.e., $df/dy>0$ )

Show that $f(G(x))$ has a maximum at $x_{0}$ . Is it possible that $f(G(x))$ has further maxima as a function of $x$ ?

(ii) A strictly concave function $G$ can be defined as a curve with negative curvature $d^{2}G/dx^{2}<0$ for all $x$ . Show that a concave function can have at most one maximum.

(iii) Give an example of a concave function which does not have a maximum. Give an example of a function $G$ which has a global maximum, but is not concave. Give an example of a function $G$ which is concave and has a global maximum.
2.

Sum of concave functions

Consider a quadratic function $f_{k}(x)=1-(x-\vartheta_{k})^{2}$ .

(i) Show that $f_{k}$ is a concave function of $x$ for any choice of parameter $\vartheta_{k}$ .

(ii) Show that $f_{1}(x)+f_{2}(x)$ is a concave function.

(iii) Show that $\sum_{k}b_{k}f_{k}(x)$ with $b_{k}>0$ is a concave function.

(iv) Repeat the steps (ii) and (iii) for a family of functions $f_{k}$ which are concave, but not necessarily quadratic.
3.

Comparing PSTHs and spike train similarity measures. Experimentally the PSTH is constructed from a set of $N_{\rm rep}$ spike trains, $S_{i}(t)$ , measured from repeated presentations of the same stimulus. The ensemble average of the recorded spike trains:

$\frac{1}{N_{\rm rep}}\sum_{i=1}^{N_{\rm rep}}S_{i}(t)$ (10.53)

is typically convolved with a Gaussian function $h_{g}(x)=(2\pi\sigma^{2})^{-1/2}\exp\left(-x^{2}/2\sigma^{2}\right)$ with $\sigma$ around 5 ms, such that $A_{1}(t)=(h_{g}\ast\frac{1}{N_{\rm rep}}\sum S_{i})(t)$ is a smoothed PSTH. Suppose that two sets of experimental spike trains were recorded in two different conditions, resulting in two smoothed PSTHs $A_{1}(t)$ and $A_{2}(t)$ .
a) Show that the sum of the squared error $(A_{1}(t)-A_{2}(t))^{2}$ can be written as a distance between sets of spike train $D^{2}$ with the kernel $K(t,t^{\prime})=h_{g}(t)h_{g}(t^{\prime})$ .
b) Recall that the correlation coefficient between datasets $x$ and $y$ is

$c={\rm cov}(x,y)/\sqrt{{\rm cov}(x,x){\rm cov}(y,y)}.$ (10.54)

Show that the correlation coefficient between the two smoothed PSTHs can be written as a angular separation between the sets of spike trains with kernel $K(t,t^{\prime})=h_{g}(t)h_{g}(t^{\prime})$ .
4.

Victor and Purpura metric. Consider the minimum cost $C$ required to transform a spike train $S_{i}$ into another spike train $S_{j}$ if the only transformations available are
- Removing a spike has a cost of one.
- Adding a spike has a cost of one.
- Shifting a spike by a distance $d$ has a cost $q\,d$ where $q$ is a parameter defining temporal precision.
The $C$ defines a metric that measures the dissimilarity between spike train $S_{i}$ and spike train $S_{j}$ . The smaller $C$ the most alike the spike trains are in terms of spike timing.
a) For $q=0$ units of cost per seconds, show that $C$ becomes the difference in number of spikes in spike trains $S_{i}$ and $S_{j}$ .
b) For $q$ greater than four times the maximum firing frequency (i.t., the inverse of the shortest observed interspike interval), show that $C$ can be written as a distance $D_{ij}^{2}$ with kernel $K(t,t^{\prime})=h_{t}(t)\delta(t^{\prime})$ and triangular function $h_{t}(t)=(1-|t|q/2)\Theta(1-|t|q/2)$ where $\delta(\cdot)$ is the Dirac delta function and $\Theta(\cdot)$ is the Heaviside function.