10 Estimating Models

10.5 Summary

With modern statistical methods, we have fast and computationally tractable schemes to fit models of neural encoding and decoding to experimental data. A key insight is that, for a suitable chosen model class, the likelihood of the data being generated by the model is a concave function of the model parameters, i.e., there are no local maxima. Because of this, numerical methods of gradient ascent are bound to lead to the global maximum.

Generalized Linear Models (GLM) are the representative of this model class. Importantly, a large ensemble of generalized integrate-and-fire models, in particular the SRM with escape noise, belong to the family of GLM. As we have seen in previous chapters, the SRM can account for a large body of electrophysiological data and firing patterns such as adaptation, burst firing, time-dependent firing threshold, hyperpolarizing spike-after potential etc. The link from SRM to GLM implies that there are systematic and computationally fast methods to fit biologically plausible neuron models to data.

Interestingly, once neuron models are phrased in the language of statistics, the problems of coding and stimulus design can be formulated in a single unified framework. In the following chapter we will see that the problem of decoding can also be analyzed in the same statistical framework.

Literature

An early application of maximum likelihood approaches to neuronal data can be found in Brillinger (69). The application of the framework of Generalized Linear Models to the field of neuroscience has been made popular by Truccolo et al. (521); Pillow et al. (399). A review of Generalized Linear Models can be found in Dobson and Barnett (129).

The influential book of Rieke (437) gives a broad introduction to the field of neural coding. The time-rescaling theorem was exploited in (73) to develop useful goodness-of-fit methods for spike trains. Spike train metrics were introduced in Victor and Purpura (535, 534), but comparisons of spike trains in terms of PSTHs and other features has been commonly used before (392; 393; 174; 321; 138; 169). Many other spike train distances were also proposed (260; 531; 410; 234; 458; 357) which can be cast in the general framework of a vector space as outlined in Schrauwen and Campenhout (457), Paiva et al. (376) and Naud et al. (357); see also Paiva et al. (377, 378); Park et al. (388). Non-linear functions of the spike trains can also be used to relate to different features of the spiking process such as the interval distribution or the presence of definite firing patterns (535; 410; 513; 279; 280; 133).

Exercises

  1. 1.

    Concave function and non-global optima .

    (i) Suppose a function G(x)G(x) has a global maximum at location x0x_{0} . Suppose that f(y)f(y) is a strictly increasing function of yy (i.e., df/dy>0df/dy>0 )

    Show that f(G(x))f(G(x)) has a maximum at x0x_{0} . Is it possible that f(G(x))f(G(x)) has further maxima as a function of xx ?

    (ii) A strictly concave function GG can be defined as a curve with negative curvature d2G/dx2<0d^{2}G/dx^{2}<0 for all xx . Show that a concave function can have at most one maximum.

    (iii) Give an example of a concave function which does not have a maximum. Give an example of a function GG which has a global maximum, but is not concave. Give an example of a function GG which is concave and has a global maximum.

  2. 2.

    Sum of concave functions

    Consider a quadratic function fk(x)=1-(x-ϑk)2f_{k}(x)=1-(x-\vartheta_{k})^{2} .

    (i) Show that fkf_{k} is a concave function of xx for any choice of parameter ϑk\vartheta_{k} .

    (ii) Show that f1(x)+f2(x)f_{1}(x)+f_{2}(x) is a concave function.

    (iii) Show that kbkfk(x)\sum_{k}b_{k}f_{k}(x) with bk>0b_{k}>0 is a concave function.

    (iv) Repeat the steps (ii) and (iii) for a family of functions fkf_{k} which are concave, but not necessarily quadratic.

  3. 3.

    Comparing PSTHs and spike train similarity measures. Experimentally the PSTH is constructed from a set of NrepN_{\rm rep} spike trains, Si(t)S_{i}(t) , measured from repeated presentations of the same stimulus. The ensemble average of the recorded spike trains:

    1Nrepi=1NrepSi(t)\frac{1}{N_{\rm rep}}\sum_{i=1}^{N_{\rm rep}}S_{i}(t) (10.53)

    is typically convolved with a Gaussian function hg(x)=(2πσ2)-1/2exp(-x2/2σ2)h_{g}(x)=(2\pi\sigma^{2})^{-1/2}\exp\left(-x^{2}/2\sigma^{2}\right) with σ\sigma around 5 ms, such that A1(t)=(hg1NrepSi)(t)A_{1}(t)=(h_{g}\ast\frac{1}{N_{\rm rep}}\sum S_{i})(t) is a smoothed PSTH. Suppose that two sets of experimental spike trains were recorded in two different conditions, resulting in two smoothed PSTHs A1(t)A_{1}(t) and A2(t)A_{2}(t) .
    a) Show that the sum of the squared error (A1(t)-A2(t))2(A_{1}(t)-A_{2}(t))^{2} can be written as a distance between sets of spike train D2D^{2} with the kernel K(t,t)=hg(t)hg(t)K(t,t^{\prime})=h_{g}(t)h_{g}(t^{\prime}) .
    b) Recall that the correlation coefficient between datasets xx and yy is

    c=cov(x,y)/cov(x,x)cov(y,y).c={\rm cov}(x,y)/\sqrt{{\rm cov}(x,x){\rm cov}(y,y)}. (10.54)

    Show that the correlation coefficient between the two smoothed PSTHs can be written as a angular separation between the sets of spike trains with kernel K(t,t)=hg(t)hg(t)K(t,t^{\prime})=h_{g}(t)h_{g}(t^{\prime}) .

  4. 4.

    Victor and Purpura metric. Consider the minimum cost CC required to transform a spike train SiS_{i} into another spike train SjS_{j} if the only transformations available are
    - Removing a spike has a cost of one.
    - Adding a spike has a cost of one.
    - Shifting a spike by a distance dd has a cost qdq\,d where qq is a parameter defining temporal precision.
    The CC defines a metric that measures the dissimilarity between spike train SiS_{i} and spike train SjS_{j} . The smaller CC the most alike the spike trains are in terms of spike timing.
    a) For q=0q=0 units of cost per seconds, show that CC becomes the difference in number of spikes in spike trains SiS_{i} and SjS_{j} .
    b) For qq greater than four times the maximum firing frequency (i.t., the inverse of the shortest observed interspike interval), show that CC can be written as a distance Dij2D_{ij}^{2} with kernel K(t,t)=ht(t)δ(t)K(t,t^{\prime})=h_{t}(t)\delta(t^{\prime}) and triangular function ht(t)=(1-|t|q/2)Θ(1-|t|q/2)h_{t}(t)=(1-|t|q/2)\Theta(1-|t|q/2) where δ()\delta(\cdot) is the Dirac delta function and Θ()\Theta(\cdot) is the Heaviside function.