Computer simulation in brain science

Computer simulation in brain science

Edited by

RODNEY M. J. COTTERILL Division of Molecular Biophysics The Technical University of Denmark

The right of the University of Cambridge to print and sell all manner of books was granted by Henry VIII in 1534. The University has printed and published continuously since 1584.

CAMBRIDGE UNIVERSITY PRESS Cambridge New York New Rochelle Melbourne Sydney

CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521341790 © Cambridge University Press 1988 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1988 This digitally printed version 2008 A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data Computer simulation in brain science. Includes index. 1. Brain—Mathematical models. 2. Neural circuitry— mathematical models. 3. Computer simulation. I. Cotterill, Rodney, 1933. [DNLM: 1. Brain—physiology. 2. Computer simulation. WL 300 C738] QP376.C634 1988 612'82'0724 87-22430 ISBN 978-0-521-34179-0 hardback ISBN 978-0-521-06118-6 paperback

CONTENTS

List of contributors Preface

ix xv

Neurons and neural networks: general principles Some recent developments in the theory of neural networks Leon N. Cooper

1

Representation of sensory information in self-organizing feature maps, and the relation of these maps to distributed memory networks Teuvo Kohonen

12

Excitable dendritic spine clusters: nonlinear synaptic processing W. Rail and I. Segev

26

Vistas from tensor network theory: a horizon from reductionalistic neurophilosophy to the geometry of multi-unit recordings AndrdsJ. Pellionisz

44

Synaptic plasticity, topological and temporal features, and higher cortical processing Neurons with hysteresis? Geoffrey W. Hoffmann

74

On models of short- and long-term memories P. Peretto

88

Topology, structure, and distance in quasirandom neural networks /. W. Clark, G. C. Littlewort and J. Rafelski

104

A layered network model of sensory cortex Bryan J. Travis

119

Computer simulation of networks of electrotonic neurons E. Niebur and P. Erdos

148

vi 10

Contents A possible role for coherence in neural networks Rodney M. J. Cotterill

164

11

Simulations of the trion model and the search for the code of higher cortical processing Gordon L. Shaw and Dennis J. Silverman

189

12

AND-OR logic analogue of neuron networks Y. Okabe, M. Fukaya andM. Kitagawa

210

Spin glass models and cellular automata 13

Neural networks: learning and forgetting /. P. Nadal, G. Toulouse, M. Mezard, J. P. ChangeuxandS. Dehaene

14

Learning by error corrections in spin glass models of neural networks S. Diederich, M. Opper, R. D. Henkeland W. Kinzel 232

15

Random complex automata: analogy with spin glasses H. Flyvbjerg

240

The evolution of data processing abilities in competing automata Michel Kerszberg and A viv Bergman

249

16 17

The inverse problem for neural nets and cellular automata Eduardo R. Caianiello and Maria Marinaro

221

260

Cyclic phenomena and chaos in neural networks 18

A new synaptic modification algorithm and rhythmic oscillation Kazuyoshi Tsutsumi and Haruya Matsumoto

268

19

'Normal' and 'abnormal' dynamic behaviour during synaptic transmission G. Barna and P. trdi

293

Computer simulation studies to deduce the structure and function of the human brain P. A. Anninos and G. Anogianakis

303

Access stability of cyclic modes in quasirandom networks of threshold neurons obeying a deterministic synchronous dynamics /. W. Clark, K. E. Kurten andJ. Rafelski

316

22

Transition to cycling in neural networks G. C. Littlewort, J. W. Clark and J. Rafelski

345

23

Exemplification of chaotic activity in non-linear neural networks obeying a deterministic dynamics in continuous time K. E. Kurten andJ. W. Clark

357

20

21

Contents

vii

The cerebellum and the hippocampus 24

Computer simulation of the cerebellar cortex compartment with a special reference to the Purkinje cell dendrite structure L. M. Chajlakhian, W. L. Dunin-Barkowski, N. P. Larionova and A. Ju. Vavilina

372

25

Modeling the electrical behavior of cortical neurons - simulation of hippocampal pyramidal cells LyleJ. Borg-Graham

384

Olfaction, vision and cognition 26

Neural computations and neural systems /. / . Hopfield

405

Development of feature-analyzing cells and their columnar organization in a layered self-adaptive network Ralph Linsker

416

Reafferent stimulation: a mechanism for late vision and cognitive processes E. Harth, K. P. Unnikrishnan and A. S. Pandya

432

29

Mathematical model and computer simulation of visual recognition in retina and tectum opticum of amphibians Uwe an der Heiden and Gerhard Roth

455

30

Pattern recognition with modifiable neuronal interactions /. V. Winston

469

Texture description in the time domain H. J. Reitboeck, M. Pabst and R. Eckhorn

479

27

28

31

Applications to experiment, communication and control 32

Computer-aided design of neurobiological experiments IngolfE. Dammasch

495

33

Simulation of the prolactin level fluctuations during pseudopregnancy in rats P. A. Anninos, G. Anogianakis, M. Apostolakis and S. Efstratiadis 504

34

Applications of biological intelligence to command, control and communications Lester Ingber

35

Josin's computational system for use as a research tool

513

Gary Jo sin

534

Author index

550

Subject index

557

CONTRIBUTORS

Anninos, P. A. Dept of Neurology, Medical School, University of Thraki, Alexandroupolis, Greece. Anogianakis, G. Dept of Physiology, Faculty of Medicine, University of Thessalonika, Greece. Apostolakis, M. Dept of Neurology, Medical School, University of Thraki, Alexandroupolis, Greece. Barna, G. Central Research Institute for Physics of the Hungarian Academy of Sciences, H-1525, Budapest, Hungary. Bergman, Aviv SRI International, 33 Ravenswood Ave., Menlo Park, CA 94025, USA. Borg-Graham, LyleJ. Center for Biological Information Processing and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. Caianiello, Eduardo R. Dipartimento di Fisica Teorica e sue Metodologie per le Scienze Applicate, Universita di Salerno, 84100 Salerno, Italy. Chajlakhian, L. M. Information Transfer Problems Institute, USSR Academy of Sciences, Moscow, USSR. Changeux, J. P.

Institut Pasteur, 25 rue du Docteur Roux, 75724 Paris Cedex 15, France. Clark, J. W.

Institute of Theoretical Physics and Astrophysics, University of Cape Town, Rondebosch 7700, Cape, RSA. Permanent address: Dept of Physics and McDonnell Center for the Space Sciences, Washington University, St Louis, MO 63130, USA. ix

x

Contributors

Cooper, Leon N. Center for Neural Science and Dept of Physics, Brown University, Providence, Rhode Island 02912, USA. Cotterill, Rodney M. J. Division of Molecular Biophysics, The Technical University of Denmark, Building 307, DK-2800 Lyngby, Denmark. Dammasch, IngolfE. Zentrum Anatomie, Universitat Gottingen, Kreuzbergring 36, D-3400 Gottingen. Dehaene, S. Institut Pasteur, 25 rue du Docteur Roux, 75724 Paris Cedex 15, France Diederich, S. Institut fur Theoretische Physik, Universitat Giessen, Heinrich-Buff-Ring 16, 6300 Giessen, Federal Republic of Germany. Dunin-Barkowski, W. L. Information Transfer Problems Institute, USSR Academy of Sciences, Moscow, USSR. Eckhorn, R. Applied Physics and Biophysics, Philipps-University, Renthof 7, D-3550 Marburg, Federal Republic of Germany. Efstratiadis, S. Dept of Neurology, Medical School, University of Thraki, Alexandroupolis, Greece. trdi, P. Central Research Institute for Physics of the Hungarian Academy of Sciences, H-1525, Budapest, Hungary. Erdos, P. Institute of Theoretical Physics, University of Lausanne, CH-1015 Lausanne, Switzerland. Flyvbjerg, H. The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, DK-2100 Copenhagen 0 , Denmark. Fukaya, M. Dept of Electrical Engineering, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo,Japan. Harth, E. Dept of Physics, Syracuse University, Syracuse, NY 13244-1130, USA. an der Heiden, Uwe Naturwissenschaftliche Fakultat, Universitat Witten/Herdecke, D-5810 Witten, Federal Republic of Germany. Henkel, R. D. Institut fur Theoretische Physik, Universitat Giessen, Heinrich-Buff-Ring 16, 6300 Giessen, Federal Republic of Germany.

Contributors

xi

Hoffmann, Geoffrey W. Depts of Physics and Microbiology, University of British Columbia, Vancouver, B.C., Canada V6T2A6. Hopfield, J. J. Divisions of Chemistry and Biology, California Institute of Technology, Pasadena, CA 91125 and AT&TBell Laboratories, Murray Hill, NJ 07974, USA. Ingber, Lester Dept of Physics - Code 61IL, Naval Postgraduate School, Monterey, CA 93943, USA. Josin, Gary Neural Systems Incorporated, 3535 West 39th Avenue, Vancouver, British Columbia V6N 3A4, Canada. Kerszberg, Michel Institut fur Festkorperforschung der Kernforschungsanlage Julich, D. 5170 Julich, Federal Republic of Germany. Kinzel, W. Institut fur Theoretische Physik, Universitat Giessen, Heinrich-Buff-Ring 16, 6300 Giessen, Federal Republic of Germany. Kitagawa, M. Dept of Electrical Engineering, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo,Japan. Kohonen, Teuvo Dept of Technical Physics, Helsinki University of Technology, Rakentajanaukio 2 C, SF-02150 Espoo, Finland. Kurten, K. E. Institut fur Theoretische Physic, Universitat zu Koln, 5000 Koln 41, BRD. Larionova, N. P. Information Transfer Problems Institute, USSR Academy of Sciences, Moscow, USSR. Linsker, Ralph IBM, T. J. Watson Research Center, Yorktown Heights, NY 10598, USA. Littlewort, G. C. Institute of Theoretical Physics and Astrophysics, University of Cape Town, Rondebosch 7700, Cape, RSA. Marinaro, Maria Dipartimento di Fisica Teorica e sue Metodologie per le Scienze Applicate, Universita di Salerno, 84100 Salerno, Italy. Matsumoto, Haruya Dept of Instrumentation Engineering, Faculty of Engineering, Kobe University, Rokkodae, Kobe 657, Japan. Mezard, M. Ecole Normale Superieure, 24 rue Lhomond, 75231 Paris Cedex 05, France.

xii

Contributors

Nadal, J. P. Ecole Normale Superieure, 24 rue Lhomond, 75231 Paris Cedex 05, France. Niebur, E. Institute of Theoretical Physics, University of Lausanne, CH-1015 Lausanne, Switzerland. Okabe, Y. Dept of Electrical Engineering, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo,Japan. Opper, M. Institut fur Theoretische Physik, Universitat Giessen, Heinrich-Buff-Ring 16, 6300 Giessen, Federal Republic of Germany. Pabst, M. Applied Physics and Biophysics, Philipps-University, Renthof 7, D-3550 Marburg, Federal Republic of Germany. Pandya, A. S. Dept of Physics, University of Syracuse, Syracuse, NY 13244-1130, USA. Pellionisz, Andrds J. Dept of Physiology and Biophysics, New York University Medical School, 550 First Ave, New York, NY 10016, USA. Peretto, P. CEN, Grenoble 85x, 38041 Grenoble Cedex, France. Rafelski, J. Institute of Theoretical Physics and Astrophysics, University of Cape Town, Rondebosch 7700, Cape, RSA. Rail, W. Mathematical Research Branch, NIDDK, National Institutes of Health, Bethesda, Maryland, USA. Reitboeck, H. J. Applied Physics and Biophysics, Philipps-University, Renthof 7, D-3550 Marburg, Federal Republic of Germany. Roth, Gerhard Naturwissenschaftliche Fakultat, Universitat Witten/Herdecke, D-5810 Witten, Federal Republic of Germany. Segev, I. Mathematical Research Branch, NIDDK, National Institutes of Health, Bethesda, Maryland, USA. Shaw, Gordon L. Center for the Neurobiology of Learning and Memory, University of California, Irvine, CA 92717, USA. Silverman, Dennis J. Dept of Physics, University of California, Irvine, CA 92717 USA. Toulouse, G. Ecole Normale Superieure, 24 rue Lhomond, 75231 Paris Cedex 05, France.

Contributors

xiii

Travis, Bryan J. Los Alamos National Laboratory, Los Alamos, NM 87545, USA. Tsutsumi, Kazuyoshi Division of System Science, The Graduate School of Science and Technology, Kobe University, Rokkodai, Kobe 657, Japan. Unnikrishnan, K. P. Dept of Physics, Syracuse University, Syracuse, NY 13244-1130, USA. Vavilina, A. Ju. Information Transfer Problems Institute, USSR Academy of Sciences, Moscow, USSR. Winston, J. V. 206, Catalina, Fullerton, CA 92635, USA.

PREFACE

There has recently been a marked increase in research activity regarding the structural and function of the brain. Much of this has been generated by the more general advances in biology, particularly at the molecular and microscopic levels, but it is probably fair to say that the stimulation has been due at least as much to recent advances in computer simulation. To accept this view does not mean that one is equating the brain to an electronic computer, of course; far from it, those involved in brain research have long since come to appreciate the considerable differences between the cerebral cortex and traditional computational hardware. But the computer is nevertheless a useful device in brain science, because it permits one to simulate processes which are difficult to monitor experimentally, and perhaps impossible to handle by theoretical analysis. The articles in this book are written records of talks presented at a meeting held at the Gentofte Hotel, Copenhagen, during the three days August 20-22, 1986. They have been arranged in an order that places more general aspects of the subject towards the beginning, preceding those applications to specific facets of brain science which make up the balance of the book. The final chapters are devoted to a number of ramifications, including the design of experiments, communication and control. The meeting could not have been held without the financial support generously donated by the Augustinus Foundation, the Carlsberg Foundation, the Mads Clausen (Danfoss) Foundation, the Danish Natural Science Research Council, the Hartmann Foundation, IBM, the Otto M0nsted Foundation, NORDITA, the NOVO Foundation, and SAS. It is a pleasure to thank these philanthropic bodies, on behalf of all those involved in the enterprise. xv

xvi

Preface

Thanks are also due to John Clark and Uwe an der Heiden, for their moral support, in general, and specifically for their help in arranging the programme for the three days of tightly packed sessions. It is also a pleasure to acknowledge our indebtedness to Carolyn Hallinger, the conference secretary, and to Ove Broo S0rensen and Flemming Kragh for their support with a host of technical matters. Rodney Cotterill

Some recent developments in the theory of neural networks LEON N. COOPER

A question of great interest in neural network theory is the way such a network modifies its synaptic connections. It is in the synapses that memory is believed to be stored: the progression from input to output somehow leads to cognitive behaviour. When our work began more than ten years ago, this point of view was shared by relatively few people. Certainly, Kohonen was one of those who not only shared the attitude, but probably preceded us in advocating it. There had been some early work done on distributed memories by Pribram Grossberg Longuet Higgins and Anderson. If you consider a neural network, there are at least two things you can be concerned with. You can look at the instantaneous behaviour, at the individual spikes, and you can think of the neurons as adjusting themselves over short time periods to what is around them. This has led recently to much work related to Hopfield's model; many people are now working on such relaxation models of neural networks. But we are primarily concerned with the longer term behaviour of neural networks. To a certain extent this too can be formulated as a relaxation process, although it is a relaxation process with a much longer lifetime. We realized very early, as did many others, that if we could put the proper synaptic strengths at the different junctions, then we would have a machine which, although it might not talk and walk, would begin to do some rather interesting things. Kohonen has shown us some intriguing examples of such behaviour. We were soon confronted with a fundamental problem. Let us assume that you can form a network which will store memory, but which requires the adjustment of very large numbers of synaptic junctions. It is perfectly obvious that this cannot be done

2

Leon N. Cooper

genetically, or at least not 100% genetically, if you believe that experience has anything to do with the content of your memory. So there must be some kind of rule, or set of rules, by which the synaptic strengths change. Now, the use of the word 'synapse' may be something of a metaphor. The physiologists among us know that a synapse is a very complicated thing, and what we refer to as a synapse is really a logical grouping of large numbers of synapses, i.e. it is really a relation between inputs and outputs, and it is probable that what happens biologically is considerably more complex than any of the simple rules we write down. But the idea was that if we could write down a few rules which captured some of the qualitative properties, perhaps we would be on the right track. Now it seemed to me that one of the things most lacking in this field, the thing required to convert it from the hand-waving stage to a field which was science as I understand it, was to construct pieces of theory that were well-defined and had a really rigorous structure, obviously highly oversimplified, but that could be brought into correspondence with serious experiment. Now that is not easy to do, and I am still not sure whether we have done it. Nevertheless this has been one of the dominating themes of our work for the last ten years. We chose a preparation, visual cortex, which may be the wrong starting point since it is a very complicated system, as compared to simpler systems like Aplysia. However, it is a system where interesting things seem to be happening, where experiments can be done, where there is a rich tradition of at least 20 years of experimentation, and where one has very robust effects. I cannot discuss all of these here, but those familiar with visual cortex and the history of the Hubel-Wiesel cells - the preference of the cells for certain orientations - know that there is a long history of experimentation in this area in which one can change the input-output relations of individual neurons by changing the visual experience of the animal. One reason this seemed so intriguing to us was that it appeared to be a situation in which one could observe changes in the neuron input-output behaviour almost as well as one could in hippocampus or Aplysia. I like to call this 'learning on a single-cell level', but am immediately thrown into conflict with some psychologists, who say learning is a much more complicated thing. However, when we learn something, there must be a change in the neural network; I believe that the origin of that change is what happens in individual cells in the network. And so one should be able to relate the large-scale properties to changes in individual cells. It is those changes that occur on a single-cell level that give us learning in a network level.

Recent developments in the theory of neural networks Experiments have been done in visual cortex that seem to indicate that the response characteristics of visual cortical cells depend on the visual environment of the animal. For example, if an animal is normally reared, the visual cortical cells will be sharply tuned and will be responsive to edges of one orientation, but not to edges of other orientations. This famous result is due to Hubel and Wiesel. It has been known for many years that if you raise animals in deprived environments, for example in the dark, the cells are not sharply tuned, but are broadly tuned. If you raise an animal with one eye open and the other eye closed, the cells will become responsive to the open eye, and sharply tuned to that eye, and they will lose their responsiveness to the closed eye, and so on. Large numbers of repeatable experiments have been done. The question becomes: can one introduce a set of rules for synaptic change that will explain this behaviour? We have successfully introduced such a set of synaptic rules which involve what seem to be the variables that would provide the kind of learning we want on a network level. Let me describe this first on a single-cell level. Suppose you have a single cell with inputs from the external environment, and we introduce a modification procedure. What would be the necessary modification procedure to reproduce the experimental results? We were able to successfully reproduce what we call the 'classical' experimental results. In addition, we were able to show that there are some rather subtle new effects. Recent experiments do show these subtle effects. If they had been seen before, they had never been really explicitly pointed out or recognized. One problem with this work is that, in addition to simplifying the synaptic junction, it made one assumption which was clearly not permissible, in that we considered the cortical synapses of the lateral geniculate nucleus to be capable of being both positive and negative. In other words, they could be both excitatory and inhibitory. Our recent work has solved this problem. For our purposes, we can think of a set of inputs to a neuron, call them dx, . . ., dn. Now the dx, . . ., dn is an n-dimensional vector; think of that ^-dimensional vector as being a mapping of what is in the visual space. It is produced via the transduction cells of the retina, and so forth, so that for a particular image on the visual space you have a particular vector in this space. Now these inputs, measured in spikes per second, go through a set of synapses to the cortical cells. The cells integrate the electrical activity, so in effect the output of the cell is a non-linear monotonic function of m • d. We are mostly concerned about the roughly linear

3

4

LeonN. Cooper

region, though some of the other papers in these proceedings will take a different point of view and will try to make it as non-linear as possible. I am not going to insist on the virtue of one rather than the other. The point that we are making is that for the purposes of learning, the linearity is enough (Figs. 1.1 and 1.2). To summarize, d is the input from the external world, and it changes as the image on the screen changes; ra is a set of synaptic strengths that, if it changes at all, changes slowly with time. And it is ra that contains the learning of the system. If one has a particular set of ras, with ra thought of as a vector, a particular input that is parallel to it could give a large cell output. An input that is orthogonal to ra would give zero output, so this set of ras, is, in a certain sense, already a memory because it distinguishes between one set and another set of inputs. Of course, for one cell one gets a rather limited memory, but if there are networks of these cells, all kinds of interesting things could occur. Now the issue with which we are concerned is precisely how these ras change. Supposing you wish to 'teach' a single cell to recognize a vector. You could have a set of synaptic strengths that give a large response for one input and a very small one for another input. The cell will 'know' something. The question would then be: how do you design a rule so that the cell will learn to recognize a particular input? To correspond to the results in the visual cortex, you want the cell to learn to respond to one pattern and not to respond at all to others when in an environment with Fig. 1.1. The inputs du . . ., dn from axons via synaptic junctions mx, . . ., mn produce local depolarizations mxdx,. . ., mndn that are integrated in the cell body to produce a firing rate, c. The actual inputs and outputs are rapidly varying functions of time (spikes occur with durations of approximately 10~3). Relevant time intervals for learning and memory are thought to be of the order of 0.5 s. We assume that for learning the relevant inputs and outputs are time averaged frequencies.

outgoing signal

incoming signals

Recent developments in the theory of neural networks

5

all patterns present. On the other hand, for non-patterned input, a cell's response should not prefer one pattern over another (Fig. 1.3). Hebb originally proposed that if synaptic junctions change as a product of the output and input, they produce certain interesting correlation properties. It is now generally acknowledged that they cannot change in exactly this way. Still, one question is: is the post-synaptic variable involved? Some experimentalists say no, while every theoretician says yes. Yet there is good experimental evidence, particularly that obtained by Singer and others, which shows this variable must be implicated. If it is implicated, how? There are many other possibilities, particularly those involving what we call 'global' variables, for which there is good evidence. We introduced a form of modification in which the change in m} was a product of input dj and a 0 function. The properties of 0 are that if for a given input the output is too low, the synaptic strength decreases in proportion to the input. On the other hand, if above this modification threshold the output is large enough, then the synaptic strength increases proportionally to the input. To explain the existing experimental results we need a negative and a positive region. In addition, to give the system the proper stability properties, it is required that the threshold move back and forth and that, to give it the most beautiful properties of all, it is required that it be a non-linear function of the average output of the cell. We always chose c2, but it could actually be c to any power that is larger than unity (Fig. 1.4).

Fig. 1.2. The cell output is a non-linear function oim-d. In the linear region, we can write c = XjL\ rrijdj = m-d where ra; is a somewhat idealized 'synaptic junction'. saturation limit

output

nvd firing threshold

6

LeonN. Cooper

Let me describe the basic properties of the system. Suppose you have a two-dimensional system d1 and d2, that is, two inputs, so that the space of inputs can be spanned by these two vectors. You will then have four fixed points. One obtains a set of non-linear coupled differential equations. The non-linearity arises because all the ras are coupled to one another through the cs. If you have only two inputs d1 and d2, you look for fixed points in the space. Now thefixedpoints occur when 0 is zero. Recall that (j) is zero when the output is zero, or when the output is at threshold. If you count them up, this gives fourfixedpoints for two dimensions. These four fixed points have different properties. We call m\ and m\ the selective fixed points. Why selective? It is because if the synapses acquire those strengths, they give you maximum output for one of the inputs and zero output for the other. And it is very easy to see geometrically, because if m\ lies perpendicular to d2, when d2 comes in you get zero. When d1 comes in, you get a response. When m* is perpendicular to d1, you get zero for d1 and a response for d2. This is what Kohonen called an optimal mapping; d1 and d2 do not have to be orthogonal. In this respect it is a self-organizing optimal mapping. There is another fixed point, which is non-selective because it responds to both dl and d2, and yet another fixed point at zero (in other words the cell does not respond at all). Now the interesting and important question is: whichfixedpoints are stable? The answer is that the only stablefixedpoints are the selectivefixedpoints. So that wherever you start in the synaptic space, if you keep putting in dh and d2s you eventually end up at the selectivefixedpoints. Note that in order to get selectivity the synaptic strengths have both positive and negative values, but coming into the cortical cell from the eye, one encounters only excitatory synapses. If we limit synaptic Fig. 1.3. Neuron learning proceeds through synaptic modification, and an important question is what is the magnitude of mp. In general we might write rn. = F(dj, . . ., m}\ dk, . . ., c; c, . . .; X, Y, Z) where the four quantities in parentheses refer to the local instantaneous; quasi-local instantaneous; time averaged; and global contributions, respectively. In the BCM modification, this equation becomes: my = (p(c,c;X, Y,Z) dj — em.

Recent developments in the theory of neural networks strengths to positive values, we would have partial but not complete selectivity. There are experimental results that show, if you shut off the inhibitory synapses by adding a chemical such as bicuculin, some of the selectivity is lost. This problem will be addressed later. We can now model various experimental situations. We can have inputs from the left eye and the right eye, and then the output is a summation of left eye and right eye. We can have a normal environment in which we have patterned input to both eyes or, to contrast this, we can have monocular deprivation, patterns to one eye, noise to the other, and then we can run these simulations. We get results that correspond to the classical experimental results. For example, if in normal rearing, with both eyes open, the final stable state is at the selective fixed points in which the cells are binocular and selective, driven by the same pattern. If the eyes are closed, there is no development of selectivity (lid sutured or dark reared) and the cell is binocularly driven. For one eye open, the other eye closed - the very famous paradigm of monocular deprivation selectivity develops to the open eye, while the closed eye is driven to zero. One instance in which the moving threshold is clearly necessary is in the situation of reverse suture. To get recovery, the threshold must move to a very low value. In addition, there is a rather subtle connection between occular dominance and selectivity. How is it that, when both eyes are closed, the cell is not necessarily driven to zero, while when one eye is open and the other

Fig. 1.4. The modification threshold, 6m, is a non-linear function of c, the average output of the cell. We have used 6m = (l/co)(c)2, but the precise form is not critical. {cfc)

7

8

LeonN. Cooper

eye is closed, the response of the cell is driven to zero for the closed eye? From our point of view, the reason that it is not driven to zero when both eyes are closed is that the zero fixed point is unstable. But if one eye is closed and the other is open, once the open eye has become selective, the response of the cell is either close to zero or close to threshold. The 0 function can be expanded close to zero and close to threshold and, depending on whether the input to the open eye is preferred or nonpreferred, the appropriate expansion is at one point or the other. Now this expansion eventually results in a differential equation for the synapses between LGN and the closed eye that looks like this: x is plus or minus the noise squared times x, depending on whether the input to the open eye is preferred or non-preferred. Non-preferred inputs can only be achieved after the eye has become selective. In other words, before selectivity occurs, there is no driving of the closed eye to zero. And this gives you a correlation between occular dominance and selectivity. If one looks at some of the original results of Hubel and Wiesel, one can already see this correlation suggested. Recent experiments show a clear correlation between occular dominance and selectivity, and this is precisely what the theory predicts. This theory has been extended to include the situation in which there are many cells; in other words, where this is input from LGN to excitatory and inhibitory cortical cells and the excitatory and inhibitory cells are connected to one another via intracortical connections. We would like to restrict ra, that is to say the inputs between LGN and the cortical cells, to positive values, since these synapses are excitatory. The output of theyth cell is cy, and we have an intracortical synapse Ltj. We should state that the cell firing rate involves not only being pushed through LGN by the inputs from the right eye and left eye, but also the intracortical connections; that of course is a rather complicated problem which was analysed by myself and a former student, Chris Scofield. We separated excitatory and inhibitory synapses and got some very nice, new predictions. The analysis was complex and we had to rely on extensive computer simulation. This led us to introduce what we call a 'mean field approximation' for the cortical network. Those familiar with theories of magnetism will immediately see where this comes from. The essential idea is that the typical cell gets specific input from LGN and also from large numbers of cortical cells. The other cortical cells are also often pushed by LGN. It is well known that the collaterals can be fairly long, so we replace the effect of many cortical cells on the cell we are watching by an average. In doing that, we simplify things enormously, andfinallywe

Recent developments in the theory of neural networks

9

get consistency conditions. The cell i is pushed by the LGN inputs, and then there is a kind of modulatory effect, an averagefieldof the rest of the network. And this averagefieldof the rest of the network is of course also pushed by the LGN inputs. Very simply stated, previously we have c = md, whereas now we have c = (m — a)d, and previously we said that m goes to certain fixed points m *. The fundamental theorem that can be proved is that in the mean field theory, m goes tofixedpoints that are m* + a. We can show that all the oldfixedpoints arefixedpoints here, and that the stability of thefixedpoints is the same. Thus what was stable before is stable here. Recall that the problem previously was that in the m space one had to find a fixed point, which was inaccessible because of the necessity for negative components. Now, if a is sufficiently inhibitory, this effectively translates the co-ordinates, and all of the values of m* are now available. So the fixed points will be available with excitatory synapses between LGN and cortical cells if the average inhibition of the network is sufficient. A question that arises in learning theory is: do all synapses modify in the same way? We do not know the answer, of course, but we have been able to get away with assuming that we have modification between the LGN cortical synapses, and that there is no modification whatsoever among the inhibitory intracortical synapses. This makes the theory very beautiful and very easy to handle. I would like to summarize some of these ideas. The first point I have already made: the fixed points of the old theory now becomes available, assuming only LGN-cortical excitatory synapses if the network is sufficiently inhibitory. We find that most learning can occur in the LGN cortical synapses. The inhibitory GABAergic cortical-cortical synapses need not modify at all. An experiment was done by Bear and Ebner, at Brown, testing how these GABAergic synapses change under severe conditions of monocular deprivation. They were disappointed to find no such changes. But this was a most welcome result; the fact that we can get away without much inhibitory modification opens the wonderful possibility that the major modification is just for the excitatory LGN cortical synapses. This makes it a much easier problem to treat, mathematically. Some non-modifiable LGN cortical synapses are required. An obvious candidate for these are the synapses onto the inhibitory cells, which go onto shafts rather than spines. It has been seen, in a preliminary experiment by Singer, that the synapses onto the shafts seem to be more resistant, as he put it, than those onto spines.

10

Leon N. Cooper

One of the interesting new results of this theory is that, in binocular deprivation zero is still an unstable fixed point, but the zero is now constructed in the following way: the zero output of the cell comes both from the LGN cortical synapses and from the network synapses which we have taken to be inhibitory. Suppose we suppressed the inhibitory network synapses; then we might expect that cells we could not previously see would suddenly emerge. Such a result has in fact been obtained. Freeman has also shown that an increase in excitability causes cells to appear where they weren't otherwise seen. One of the most interesting results occurs in monocular deprivation. Recall from the previous analysis, that in monocular deprivation the closed eye response goes to zero. And remember that a was previously zero, so that the closed eye result goes to zero only if the closed-eye LGN cortical synapses went to zero. In the mean field theory, the closed-eye response also goes to zero; however that means that the LGN-cortical synapses do not go to zero, but rather, go to a. Therefore, in this theory, we get the monocular deprivation results, but we get them without the LGN-cortical synapses going to zero. So that if inhibition is suppressed, we should get some response for the closed eye. This is in agreement with a result of Sileto and others in which they used bicuculin, which shuts off the inhibitory response, and found an increase in cells responsive to the closed eye. This could be further investigated by post-stimulus histogram, which reveals separate excitatory and inhibitory effects. This makes it possible, even in cells giving an average output of zero, to see excitatory and inhibitory effects, so one can separate the effects of excitatory and inhibitory cells. In a molecular model we need, among other things, a candidate for the modification threshold, so that the synapses increase above Om, and decrease below Om. Further, Om must vary with the average activity of the cell. Such ideas are being developed actively now by Bear, Ebner and myself at Brown. We are pursuing the idea of using the distinction between the NMD A receptors and the non-NMDA receptors. On the post-synaptic membrane, the NMD A receptors are ones that allow calcium to enter. We are trying to link this with the threshold, Om, and are trying to determine if that threshold varies with the previous experience of the cell. In summary, what I would like to say is that we propose that we have a theoretical account of the way a little piece of cortex can evolve with experience. There are some dramatic simplifications that would be wonderful, if true. The theory does seem to account for a large variety of

Recent developments in the theory of neural networks

11

the data that already exist and one of the exciting possibilities for the future is that we can find molecular models that underly these assumptions. If all comes together in this happy way, you can be sure I will be talking about this subject for a long, long time.

Representation of sensory information in self-organizing feature maps, and the relation of these maps to distributed memory networks TEUVO KOHONEN

2.1 Is there enough motivation for a solid-state physics approach to the brain?

One of the salient features of the brain networks is that anatomical sections of a few millimetres width, taken from different parts of the cortex, look roughly similar by their texture. This observation might motivate a theoretical approach in which principles of solid-state physics are applied to the analysis of the collective states of neural networks. Such a step, however, should be made with extreme care. I am first trying to point out a few characteristics of the neural tissue which are similar to those of, or distinguish it from non-living solids. Similarities. There is a dense feedback connectivity between neighbouring neural units, which corresponds to interaction forces between atoms or molecules in solids. Generation and propagation of brain waves seem to support a continuous-medium view of the tissue. Differences. In addition to local, random feedback there exist plenty of directed connectivities, 'projections', between different neural areas. As a whole, the brain is a complex self-controlling system in which global feedback control actions often override the local 'collective' effects. Although the geometric structure of the neural tissue looks uniform, there exist plenty of specific biochemical and physiological differences between cells and connections. For instance, there exist some 40 different types of neural connection, distinguished by the particular chemical transmitter substances involved in signal

Sensory information in self-organizing feature maps

13

transmission, and these chemicals vary from one part of the brain to another. It is also difficult to see how the collective-state models could be applicable to the description of high-level intellectual functions or cognitive states, in which complex global experiences of the organism are reflected. For this reason I am rather suspicious about the applicability of the simplest models as such to problems in which concepts with a high level of coding and specificity, e.g., linguistic items, are applied. My personal view is that one has to look at the neural circuits as signalprocessing stages, which decode and interpret concomitant signals gradually, in many steps. Most clearly such processing invisible in early sensory stages, whereas, although similar principles might be applied on higher levels, too, their interpretation with our present formalisms is very difficult. Many experimentalists have criticized the theoretical models because they do not contain all the neural components that are known to exist in a particular area, or because high-level cognitive states are not demonstrated. This, on the other hand, is a fundamental misinterpretation of the modelling approach. For the understanding of the basic phenomena, modern science always aims at idealized experiments in which, for the study of one or a few factors at a time, the influence of the other factors ought to be eliminated. The theoretical model may also be simplified in its geometric structure or signal transformation function, if it is only regarded as a pure case example of an infinite number of its possible realizations. In the network models discussed in this paper, a compromise between biological accuracy and theoretical clarity has been made. The main purpose has been to demonstrate a phenomenon that can be made to occur under certain conditions. These conditions seem to be fulfilled in many biological systems. 2.2 Physically motivated system equations for neurons

Thefirstchoice that one has to make in modelling concerns the geometric size of the section of a network to be taken into consideration. I do not consider any collective-network model referring to an area larger than, say, 5 mm in diameter as realistic. Then, however, one may be justified to neglect the differences in signal delays within the area in consideration. This then means that evoked potentials and brain waves cannot be demonstrated by such simplified models; the latter are regarded as macroscopic phenomena which arise from system properties. The

14

Teuvo Kohonen

observed wave-like dynamic phenomena very probably result from complex processing operations on signals during their passage between the thalamic nuclei and cortical areas, whereby they may not represent the most elementary processing functions at all. The second requirement to be imposed on a model is clear definition of its input and output signals; the state of the network cannot simply be 'assumed'. The only reasonable definition of signal intensity is impulse frequency on an axon, which complies with the triggering frequency, or the 'activity' of a neural cell from which this axon emerges. Input to the network is connected through a set of afferent axons; if the size of the network is restricted to a few millimetres, it is not a poor approximation to assume that the input axons spread their signals to most, if not all, principal cells of this network. In addition to afferent axons, and the efferent axons which transmit signals to other system parts, the network further contains internal feedback connections (axon collaterals and feedbacks through interneurons). Each output is then a function of the inputs and other outputs; this seems to be a fundamental property of neural networks, and the transfer function of such a feedback system is not quite trivial. Because of the global control actions of the brain, it is neither easy to measure or even define the transformation of signals in a neural cell, i.e., its transfer function. One of the biggest difficulties seems to be that the transfer parameters are not unique; the chemical state of the network may modulate them. Long-term changes, corresponding to learning effects, are superposed on the short-term modulation. Traditionally, the neuron has been regarded as a thresholding device which somehow forms a weighted sum of its input signals and triggers an output signal. The input weights are thereby identified with the synaptic connectivities made by the afferent axons on the neuron. The output signal has been assumed zero if the sum remains below a certain threshold value, and one (or another constant) if the sum exceeds the threshold. I think it is time to abandon this view. Although the 'all-or-none' principle may hold for the triggering of individual pulses, it is the impulse frequency which defines the neural signal value. The neural impulses usually occur in sporadic bursts or volleys, and the averaged number of pulses generated by a neuron over any interval of time is a continuous positive function of the input signals. The output could only be regarded as binary if the impulse frequency would beflippedbetween its maximum value and zero, which is simply not true. If the output frequency, or triggering rate rji of neuron / were related to the input pulse trains with frequencies £7,

Sensory information in self-organizing feature maps

15

respectively, then the transfer function of the neuron should be expressed as some continuous function where the faj represent some coupling strengths of input signals to this neuron. For the present discussion it is helpful to assume some analytical form for r/i, e.g., In

\

(2)

where a is a monotonically increasing scalar function ('sigmoid function'), with saturation limits at zero and some positive limit, and 6( is a 'threshold' value. The scalar-product form in the argument of o is not mandatory, as long as the neuron can somehow act as a feature-sensitive unit. Assume that;t = (£i,£2> • •,£„) is the input vector, and m,= (fai,fa2,- • .,fan)isa parametric weight vector of neuron /. Any functional form, even a nonlinear one, for whichfais a maximum when x matches best with mt will be acceptable for the 'transfer function' of the neuron. Adaptation or learning in this network corresponds to changes in the^y that are proportional to the occurring signals. According to the classical hypothesis of Hebb (1949), the connection strength between two cells increases if and only if the presynaptic activity (input signal) and postsynaptic activity (output of the neuron) are simultaneously high. In the traditional modelling notation, if, e.g., eqn (2) is taken to represent the transfer function, then dfaj/dt = arj^j, (3) where a is a 'plasticity parameter'. Obviously, this 'Hebb's law' needs some modification, first of all because it describes an irreversible effect; since both rj( and £y are positive semidefinite, all the/^y will increase until the adaptable resources are exhausted. One modification of this law is to assume that the connectivity resources at a neuron are finite, and the incoming signals compete on them. Then, in effect (for details, cf., e.g., Kohonen, 1984), the factor £;- in Eqn (3) would be replaced by a term of the form £;- — %jb, making changes in faj positive or negative, i.e., reversible. In further analysis we may ignore | ^ if we rescale |y; then we shall remember that the redefined £; may attain positive or negative values. Another modification that is both natural and useful is inclusion of some active, nonlinear forgetting term to Eqn (3). Active forgetting means that this term is nonzero only if rjj is nonzero. For instance, we may

16

Teuvo Kohonen

write cfoy/df = ari&j-Pirii)^, (4) where /?(*7;) is an otherwise arbitrary function, except that the constant term in its Taylor expansion shall be equal to zero. For many choices of the analytical form of /?, Eqn (4) has been found to normalize the//,•,• such that the lengths of the vectors mt will asymptotically tend to values which only depend on a and the serial coefficients of /?, not on the signals. The latter result means that the comparison of x and mt for the degree of their match can be based on the comparison of either inner products m]x or the lengths of the vectorial differences 11 JC — ra,-11. 2.3 Basic network structure for neural models

Perhaps the simplest nontrivial network organization in which rather complicated phenomena can already occur is depicted in Fig. 2.1. It may be interesting to compare this structure with a similar schematic representation of the cortical architectonics drawn by some neuroanatomists (cf., e.g., Braitenberg, 1974). In reality, the cells should be imagined to form a two-dimensional layer; in Fig. 2.1 this layer is drawn one-dimensional for clarity. The input lines (afferent axons) have been assumed to make connections with all the cells in this piece of network; such an assumption is not necessary in principle because similar effects are demonstrable with even randomly made input connections (e.g., Kohonen, 1982fl). The regular input structure, on the other hand, represents a theoretical case Fig. 2.1. The basic structure of neural circuits used in brain models. Input matrix

M

Feedback matrix

N

Output responses

y

Sensory information in self-organizing feature maps

17

which allows a simpler mathematical treatment. Furthermore, in artificial devices (e.g., Kohonen, Makisara & Saramaki, 1984) such a complete connectivity matrix M has been found to yield the best signal processing accuracy, whereby it may be taken as the 'ideal case'. The structure of the feedback network, denoted by matrix Nin Fig. 2.1, may also be simplified in different ways for different purposes. We may assume, as made in the detailed example to be discussed below, that the matrix elements of TV only depend on the distance between a pair of cells. The dynamical behaviour of the system of Fig. 2.1 may be described by Eqns (5), (6) and (7). Let x be the vector of input signals, and y the vector of output signals, respectively; alternatively, y may be regarded as the set of activity states of the cells. Then dy/dt = f(x, y, M, N) (5) is a differential equation that describes the relaxation of network activity due to the feedback. If x were held constant, and if M and N were temporarily constant, too, then y should be stipulated to converge to some asymptotic state. Relaxation in neural circuits is a fast phenomenon, which will settle down in a few hundreds of milliseconds. The physical variables associated with relaxation consist of electrical potentials, ionic concentrations, and relatively simple organic chemicals such as neurotransmitters. The parametric variables M and N which describe neural connectivities may change due to the occurring signals (cell activities), too, giving rise to adaptive phenomena. The differential equations relating to input and feedback connectivities, respectively, may be written dMIdt = g(x, y, M)

(6)

dN/dt = h(y, N).

(7)

and The physical changes associated with M and N correspond to changes in proteins and anatomical structures such as branching of the cells; accordingly, the time constants are much longer, on the average of the order of days or weeks. It is possible that there also exist simpler biochemical factors relating to M and N, which react to signals even faster, say, in seconds or minutes; such factors might then constitute the link between short-term and long-term memories. In this discussion, however, we shall ignore all the other adaptive phenomena except the long-term changes, and imagine that they primarily take place in the input matrix M.

18

Teuvo Kohonen 2.4 Self-organizing feature maps

The brain is a collection of very different functions; these are as diverse as the biological behaviour itself. Some of the neural circuits are needed for the centralized control of the energy supply, i.e., cardiovascular functions, respiration, and metabolism; many emergency functions, in the form of stimulus-response-relations, are probably also stored. However, for the control of more complex behaviour it is necessary to imagine or forecast sensory phenomena or other related occurrences, and this is not possible without memory. The mental images must correspond to some organized states of the brain, whereas it is self-evident that memory does not store images in photographic form. One has to realize that there are two different aspects in a discussion of memory: (1) the internal representation ('coding') of sensory information in the brain networks; (2) the memory mechanism which stores and retrieves such representations. Physiological studies have indicated that the brain encodes sensory information, at least in the primary sensory areas, in various geometrically organized 'maps'. One can find isolated areas on the cortex over which, as it were, a two-dimensional coordinate system is defined. These coordinates often correspond to well-defined dimensions of sensory experiences such as topographic coordinates of the body, pitch of audible tones, hue and saturation of colour, etc. Various feature-specific cells exist everywhere in the brain, but a clear order along specified features has only been found in the primary sensory areas. This does not exclude the possibility that ordered maps would exist on higher levels, too; however, the metric of such representations which defines the order may be more complex. In this presentation I shall demonstrate that the basic network model of Fig. 2.1, with proper feedback connectivities and system equations, automatically forms such geometric representations of feature dimensions of the input signals. This function seems to be independent of the sensory modality, whereby even in artificial systems, many different kinds of maps can be formed (e.g., Kohonen, 1982c). It seems that each part of the map, in a certain optimal (nonlinear) way, seeks the dominant parameters of the input signals and displays them on the map. We have mostly dealt with two-dimensional maps on account of their easy visualization, whereas the method is readily generalizable to arbitrary topologies.

Sensory information in self-organizing feature maps

19

The structure of the self-organizing network is otherwise identical with that of Fig. 2.1, except that the cell layer is two-dimensional, and the feedback connectivity matrix N is assumed constant. In accordance with Eqn (1), the output of each cell is now described by a functional law of the type

Vi = a( 2 life + 2 vikrjk - o\

(8)

where 5/ is the set of cells which have feedback connections to cell /. The weights vik are assumed constant, with values which are often nicknamed the 'Mexican hat function': if cell k lies within a certain distance of cell /, then vik is a function of this distance only, in a way depicted in Fig. 2.2. Eqn (8), combined with Eqn (4), would then completely define the self-organizing process in which the set of values /itj will be optimally tuned to the feature coordinates. The mathematical discussion of this form, however, would be very cumbersome, and numerical computation based on these equations would also become heavy. For this reason we have significantly simplified this process, retaining only the most important functional dependencies in computation. At the same time the process became more effective! One of the most central phenomena associated with self-organization is a clustering effect which strongly enhances the activity patterns over the network. We shall first demonstrate this effect in the 'relaxation approximation' whereby the parameters jutj in Eqn (8) are assumed constant. Their values may be assumed random, with a slight bias such that for any Fig. 2.2. The 'Mexican hat' function. Interaction

Lateral distance —

TeuvoKohonen

20

input x one may perceive a shallow maximum among the rjt. (Such a bias would be formed already during the first steps of the original adaptive process.) If, for simplicity again, we demonstrate this effect in a onedimensional network, we can see how the particular feedback described by the 'Mexican hat function' will enhance the initial distribution of activity. With a temporarily stationary input JC, an activity 'bubble' will be formed at a location where ra[x, the inner product of the input vector x and the weight vector of unit /, was roughly maximum (Fig. 2.3). The form of the 'bubble' is very stable, and only its location depends on x and the ra,. This fact is now used for the simplification of the process equations: we postulate the formation of a 'bubble', with fixed radius, at a location where mjx is maximum. However, taking into account the discussion in connection with Eqn (4), namely, that the adaptive process tends to normalize the vectors mt to constant length, we can further simplify the algorithm, e.g., in the following form. Let us start with arbitrary, random initial values jutj. Let samples of input vector x be drawn from a defined statistical density function p(x). For each sample we apply the following two computational rules: (1) A stationary activity 'bubble' with some fixed radius is assumed to be formed in the network at a place where 11x — mt 11 is minimum. If \\x — mc\\ = min, {||jt - m, 11}, then Nc is defined as the set of cells corresponding to the active 'bubble'.

Fig. 2.3. Formation of 'bubbles' of activity over a one-dimensional network. 10 10

0.1 f=0

"

\

0.01

50

Sensory information in self-organizing feature maps

21

(2) If the /Ay are changing according to Eqn (8), and if the activities r\i saturate to zero or high limits, then Eqn (8) can be written, with proper scaling, in two separate parts (notice that rjt E {0, 1} and

iBfa) e {0,1}): dmt/dt = a(x - mt) for i G Nc (inside the bubble), 1 dmt/dt = 0 for i £ Nc (outside the bubble).} ^ The above algorithm has been used in all practical simulations and self-organizing experiments. Even in this simplified form its mathematical treatment has been shown to be difficult. This is a Markov process, but due to the geometric constraints, its convergence conditions are very complicated. A few proofs for low-dimensional cases have recently been represented (e.g., Kohonen, 19826; Kohonen & Oja, 1982; Cottrell & Forth, 1984,1986; Ritter & Schulten, 1986). 2.5 Simulations

We have demonstrated by a number of different experiments (cf. Kohonen, 1982c; Kohonen, 1984; Kohonen et al., 1984) that the above process is able to form two-dimensional 'maps' of the input signals x such that the most important feature dimensions that are present in the statistical distribution of x will be displayed as coordinates in the map. The following result is first used for illustration. The inner ear, as is well known, performs a frequency analysis of the acoustic waveforms and sends the result, an approximation of the spectrum, to the brain via a bundle of axons. By analogy, we collected samples of natural continuous speech, and each input vector x consisted of a 15-channel acoustic spectrum of the speech waveform, integrated over a 25.6-millisecond interval of time. The 15-dimensional vectors x were connected to a two-dimensional network model of the same type as Fig. 2.1 in parallel. The responses from the 'map' to different acoustic spectra were labelled according to the location of the centroid of the 'bubble' and the corresponding phoneme present in speech. The 'neurons' of the map were shown to learn the responses to the different phonemes in an orderly fashion (Fig. 2.4). (The 'neurons' in this demonstration were organized in a hexagonal lattice corresponding to the letters.) We have used such phoneme maps for the recognition of continuous speech, and practical microprocessor equipment has already been constructed (Torkkola & Riittinen, 1986). The purpose of the next example is to demonstrate that a map of the environment of a subject can be formed in a self-organizing process

22

Teuvo Kohonen

whereby the observations can be mediated by very rude, nonlinear, and mutually dependent mechanisms such as arms and detectors of their bending angles. In this demonstration, two artificial arms, with two joints each, were used for the feeling of a planar surface. The geometry of the setup is illustrated in Fig. 2.5. The tips of both arms touched the same point which during the training process was selected at random, with a uniform distribution over the framed area. At the same time, two signals, proportional to the bending angles, were obtained from each arm; these signals (£l9 £2> £3, £4), were led to a self-organizing array of the earlier type, and adaptation of its parameters took place. The lattice of lines which has been drawn onto the framed area in this picture connects those points which represent virtual images of the weight vectors JU^, i.e., showing to which point on the plane each 'neuron' became most sensitive; each crossing thus corresponds to one cell of the neural network model. The lines are used to indicate which units are neighbours in the network. When a particular crossing point on the lattice is touched, the corresponding 'neuron' gives the maximum response. The resulting map was tested for both arms separately, i.e., by letting each of them touch the plane and looking at which point it had to be in order to cause the maximum response at a particular 'neuron'. These two maps coincided almost perfectly.

Fig. 2.4. Two-dimensional map of Finnish phonemes. The double labels mean mapping of different phonemes onto the same location. a

a

a

o

a

o

a

o o

I

a a

o o

l

h

h

r

c

v

p t p

n d

k t

f

y

g

y

e y

j

n

h

hj

r

h p

t

i i j

j

i k

h

e i

i

h r

p

j j

n

p p

e

m

t p

t

d

t

n

vm

k

f

g

m

d

r

l r

r v

e

e

r

h

k

.

c

a

v v

B

r

v u

c

h

m u

u

h

j s

s

s

Sensory information in self-organizing feature maps

23

2.6 Discussion The first question, naturally, concerns the occurrence of 'maps' in the brain and their real form. Actually the name 'map' is used in brain physiology in two senses. Firstly, several kinds of ordered maps have been found in the primary sensory areas, and they very closely resemble the artificial maps reported in this paper. Secondly, there exist everywhere in the brain feature-selective cells that are scattered more or less randomly over a particular area, without clear topographic organization. It might happen, however, that even in the latter case the cells have been arranged according to some ultrametric order, although very little attention has been paid to this in physiological recordings. One might ask for what purpose such maps are necessary. There is a clear answer to this: if the feature-selective functions are separated spatially and ordered according to some metrics, their responses become more logical and selective than if these functions were distributed among common neural units. Spatial separation is also advantageous for the control of motor functions which are organized in similar maps. The most important and at the same time very difficult problem in the processing of sensory information seems to be how the neural system can optimally operate on the vague stochastic signals provided by the sensory organs. Among many alternatives for analyzing systems, we have found that the self-organizing maps described in this paper yield the best accuracy in simple recognition tasks. This is perhaps due to their ability to automatically allocate an optimal number of computing resources to the different feature dimensions, and due to the accurate nonlinear determination of the classification functions. The primary sensory areas of the Fig. 2.5. Self-ordered mapping of the input plane of a feeler mechanism.

mt

24

Teuvo Kohonen

brain seem to be organized similarly, along various feature dimensions of the sensory signals. There is some doubt about the same principle being applicable to higher-level information-processing in the brain. Although it is possible to demonstrate the formation of hierarchical data structures in the artificial maps, there is still no conclusive evidence about similar highlevel maps occurring in the brain. It seems as if the higher cognitive states were more dynamic and not localizable; on the other hand, localized responses to certain perceived conceptual items have been found. Perhaps it would be safer to restrict these maps, at least in the beginning, to early sensory information processing and to relate them to experimental findings. On the other hand, it might perhaps be too much to expect that small-sized models like these maps would already exhibit cognitive abilities. They should be regarded as the basic components in biological information processing, in a similar sense as logic circuits facilitate high-level abstract information processing in digital computers. References Braitenberg, V. (1974). 'On the representation of objects and their relations in the brain.' In Physics and Mathematics of the Neurons Systems (eds. Conrad, M., Giittinger, W., & Dal Cin, M.). Springer-Verlag, Lecture Notes in Biomathematics, Berlin, Heidelberg, New York, pp. 290-8. Cottrell,M. & Forth, J. C. (1984). Etude a"unprocessus a"auto-organisation. Universite de Paris-Sud, Report 84 T 57. Cottrell, M. & Forth, J. C. (1986). 'A stochastic model of retinotopy: a self-organizing process.' Biological Cybernetics, 53, 405-11. Hebb, D. (1949). Organization of Behaviour, Wiley, New York. Kohonen, T. (1982a). 'Self-organized formation of topologically correct feature maps.' Biological Cybernetics, 43, 59-69. Kohonen, T. (19826). 'Analysis of a simple self-organizing process.' Biological Cybernetics, 44,135-40. Kohonen, T. (1982c). 'Clustering, taxonomy, and topological maps of patterns'. In Proc. 6th Int. Conference on Pattern Recognition, Munich, Germany, 19-22 October, 1982, pp. 114-28. Kohonen, T. (1984). Self-Organization and Associative Memory. Springer-Verlag,Senesm Information Sciences, Vol. 8, Berlin, Heidelberg, New York, Tokyo. Kohonen, T., Makisara, K. & Saramaki, T. (1984). 'Phonotopic maps - insightful representation of phonological features for speech recognition'. In Proc. 7th Int. Conference on Pattern Recognition, Montreal, Canada, 30 July-2 August, 1984, pp. 182-5. Kohonen, T. & Oja, E. (1982). A Note on a Simple Self-Organizing Process. Helsinki University of Technology Report TKK-F-A474.

Sensory information in self-organizing feature maps

25

Ritter, H. & Schulten, K. (1986). 'On the stationary state of Kohonen's self-organizing sensory mapping. Biological Cybernetics, 54, 99-106. Torkkola, K. & Riittinen, H. (1986). A Microprocessor-Based Word Recognition System for Large Vocabularies. Helsinki University of Technology Report TKK-F-A591.

Excitable dendritic spine clusters: nonlinear synaptic processing WILFRID RALL and IDAN SEGEV

3.1 Passive cable properties of dendrites

The modeling of dendritic trees was carefully presented and discussed in earlier publications; only a few points will be summarized here. In Rail, 1962 it was shown how the partial differential equation for a passive nerve cable can represent an entire dendritic tree, and how this can be generalized from cylindrical to tapered branches and trees; this paper also showed how to incorporate synaptic conductance input into the mathematical model, and presented several computed examples. In Rail, 1964 it was shown how the same results can be obtained with compartmental modeling of dendritic trees; this paper also pointed out that such compartmental models are not restricted to the assumption of uniform membrane properties, or to the family of dendritic trees which transforms to an equivalent cylinder or an equivalent taper and, consequently, that such models can be used to represent any arbitrary amount of nonuniformity in branching pattern, in membrane properties, and in synaptic input that one chooses to specify. Recently, this compartmental approach has been applied to detailed dendritic anatomy represented as thousands of compartments (Bunow et al., 1985; Segev et al., 1985; Redman & Clements, personal communication). Significant theoretical predictions and insights were obtained by means of computations with a simple ten-compartment model (Rail, 1964). One computation predicted different shapes for the voltage transients expected at the neuron soma when identical brief synaptic inputs are delivered to different dendritic locations; these predictions (and their elaboration, Rail, 1967) have been experimentally confirmed in many laboratories (see Jack etal., 1975; Redman, 1976; Rail, 1977). Another computation demonstrated significantly different results at the soma for

Excitable dendritic spine clusters

27

two different spatio-temporal patterns of synaptic input to the dendrites (i.e. the same amount of input produced a different output; see Fig. 7 of Rail, 1964). The distal-to-proximal sequence produced a larger voltage amplitude at the soma; given a suitable threshold, this input pattern could be discriminated. The proximal-to-distal sequence produced a longer lasting voltage of lower amplitude; this could be useful to bias the neuron to befiredby a small additional input. It may be noted that this theoretical prediction provided the basis for an interpretation of 'asymmetric' firing patterns in cochlear neurons by Erulkar etal. (1968). Other computations provided insight into the conditions for either linear or nonlinear combinations of the effects of synaptic inputs delivered to different dendritic locations, for both excitatory and inhibitory synaptic inputs (Figs. 8 and 9 of Rail, 1964). Further discussion and references can be found in Rail, 1964, 1967, 1977; Jack etal., 1975; Redman, 1976; Rail & Segev, 1985, 1987. Here Fig. 3.1 illustrates an idealized dendritic neuron consisting of six equal dendritic trees. Current is injected at a single branch terminal designated (I); this input branch is distinguished from its sibling branch (S), and its first and second cousin branches (C-l) and (C-2). The resulting steady voltage distribution in the various branches of the input tree (shown in this diagram) was computed from the general solution of this problem (Rail & Rinzel, 1973). One noteworthy feature of these results is the contrasting decrement of voltage in the input branch and its sibling branch. Both branches have the same length and diameter in this idealized tree. However, the input branch is open to a large current flow into its parent branch; this permits a large flow of current along its cytoplasmic core resulting in a steep voltage decrement along the length of the input branch. In contrast, the sealed terminal of the sibling branch allows zero current toflowout of that end; also, little current flows across the high resistance of the cylindrical branch membrane; with so little current, this branch is almost isopotential. This contrast applies also to dendritic spines with interesting functional consequences (see below). Another feature of these results is the contrast in input resistance values when the distal input location is compared with a central input location (at the soma). In this diagram, the dashed curve shows the lower voltage values obtained when the same amount of current is injected at the soma as that previously injected at the distal branch. In this example, the distal input resistance is 16 times larger than the somatic input resistance, and still larger factors can result from additional orders of

W. Rail & I. Segev

28

branching (Rail & Rinzel, 1973). This contrast in input resistance and its effect on voltage amplitude (i.e. local synaptic depolarization) is very important to the attainment of threshold conditions in excitable dendritic spines located on distal dendritic branches. 3.2 Dendritic spines and spine stem resistance

The existence of dendritic spines has been known for a hundred years, since the classical studies of Ramon y Cajal; however, the demonstration of synaptic contacts on spines was accomplished much later by means of electron microscopic observations (Gray, 1959). The variety in spine size and shape was demonstrated by Jones & Powell (1969) and by Peters & Kaiserman-Abramof (1970), as shown here in Fig. 3.2. Also included in Fig. 3.1. Diagram of idealized neuron model composed of six dendritic trees, and plot of steady state voltage values for three different cases: current input at the soma (curve with short dashes), current input to a single distal branch terminal (input branch designated I; continuous curve), and the case of input divided equally among eight related branch terminals (I, S and cousins, C-l and C-2; curve with long dashes). Modified from Rail & Rinzel (1973) which can be consulted for the mathematical statement and solution of this problem. The voltage scale is expressed relative to the product of the injected current and a reference input resistance (i.e. that of the dendritic trunk cylinder extended to semi-infinite length). 4.0

r

1.0

1.0

Excitable dendritic spine clusters

29

Fig. 3.2. Neuroanatomical montage prepared as slide in 1971 to introduce dendritic spines. Upper left shows dendritic branches covered with dendritic spines, from an 1897 study of cortical neurons by Berkley, following Ramon y Cajal. Upper right shows diagrammatic pyramidal cell together with enlarged drawings of spines and synapses, based upon electron microscopic observations, modified from Jones and Powell (1969). Lower left shows variety of dendritic spine shapes and sizes, modified from Peters and Kaiserman-Abramof (1970), while lower right gives our estimates (made in 1971) of the ranges of spine stem resistance values corresponding to the spines at left. JOHNS HOPKINS HOSPITAL REPORTS Vol. 6

DENDRITIC SPINES RAMON Y CAJAL (1888)

BERKLEY (1897) (CHANG, 1952)

E.M.:

GRAY, 1959 COLONNIER, 1968 also Blackstad; Jacobson. Pappas, & Purpura; Laatsch & Cowan; Scheibels; Westrum Diamond et al.; Llinas et al.; and others

JONES & POWELL 1969

PETERS & KAISERMAN-ABRAMOFF, 1970 Shapes and length of dendritic spines*

Our estimates of

Shapes

SPINE STEM RESISTANCE

1. Stubby

^ss ~ 105 to 10 6 ohm

Average length 1.0 n Range 0.5-1.5 JU

2. Mushroom-shaped

*

XI

Average length 1.5 n Range 0.5-2.5 n Average stem length 0.8 n Average bulb dimensions 1.4 X0.5 p

3. Thin = 'Classical' Average length 1.7 // Range 0.5-4.0 M Average stem length 1.1 n Average bulb dimensions 0.6 p _Stem diameter 0.05-0.3 u

10 6 to 10 7 ohm

10 7 to 10 9 ohm

(using R- ~ 65 ohm cm)

30

W. Rail & I. Segev

this diagram are our estimates of the electrical resistance to current flow inside the spine stem between the spine head and the spine base. These estimates depend on spine stem geometry and on the value of the intracellular resistivity, neither of which are known accurately; also, membranods inclusions within the spine stem cytoplasm may significantly increase the spine stem resistance (Wilson etal., 1983; Miller etal., 1985; Rail & Segev, 1987). It was recognized by Chang (1952) that the high electrical resistance of the thin spine stem could be expected to attenuate the effect on the postsynaptic neuron of a synaptic input to a spine head. Because he expected significant attenuation, Chang concluded that summation of many synaptic inputs would be needed for effective excitation of such neurons. 3.3 Changing synaptic efficacy by changing spine stem resistance The idea that changes in spine stem resistance values would change the relative weighting (synaptic efficacy) of many different synapses was introduced by Rail and Rinzel (1971^,6). At that time, they computed both steady state and transient responses for synaptic conductance input to a dendritic spine; this spine had only passive membrane, but a range of values was assumed for the amount of synaptic input and the values of spine stem resistance and other parameters (Rail, 1970, 1974, 1978; Rinzel, 1982; see also Diamond etal., 1970; Jack etal., 1975). For steady state conditions, a simple Ohm's law argument can be used to explain the effect of spine stem resistance. This is illustrated by Fig .3.3; the spine stem current is given by the intracellular voltage difference from spine head to spine base, divided by the spine stem resistance (this is true for both steady states and transients because more sophisticated analysis shows that negligible current crosses the membrane of the spine stem). This spine stem current times the input resistance (of the neuron at the branch input point where the spine is attached) gives the steady state voltage at this branch input point (relative to the resting intracellular reference potential). Using physical intuition (or the algebra summarized in Fig. 3.3) one can see, for example, that the voltage at the spine base will equal exactly half that at the spine head for the special case where the spine stem resistance equals the branch input resistance; in other words, half of the total voltage drop occurs along the spine stem, while the other half occurs along the branches of the whole neuron (as illustrated in Fig. 3.1). Both the graph and the equation in Fig. 3.3 show how the voltage ratio

Excitable dendritic spine clusters

31

Fig. 3.3. Diagram summarizing steady state implications of Ohm's law for the currents, voltages and resistances of one dendritic spine and the branch to which it is attached, also prepared as slide in 1971. The symbols, / s s and Rss, represent spine stem current and resistance; the voltages, VSH and Vm, are for the spine head and the spine base, which is also the branch input point having branch input resistance designated Rm. The plot, at bottom displays the steady state ratio dependence defined by the equation above; it suggests an 'operating range' for adjustments of synaptic efficacy by changes of /? ss relative to Rm. SPINE STEM CURRENT

For steady state conditions

where /?B! represents BRANCH INPUT RESISTANCE

STEADY STATE RATIO

32

W. Rail & I. Segev

(spinebase-spinehead) depends on the ratio of spine stem resistance to branch input resistance; this illustrates the idea of an 'operating range' for changes in synaptic efficacy determined by changes in spine stem resistance (relative to branch input resistance). A qualitatively similar 'operating range' was found for transient responses to brief synaptic input, provided that the synaptic conductance input was sufficient for a large depolarization of the spine head. It may be noted that this did depend on nonlinearity in the spine head (approach to voltage saturation for depolarization due to conductance input); for very small inputs (that would be of no physiological interest) these nonlinear effects are negligible, as was later recognized also by others (Koch & Poggio, 1983; Kawato & Tsukahara, 1984; Turner, 1984; Wilson, 1984). In view of this 'operating range' for changing spine stem resistance, it was suggested that evolution could have sacrificed maximal synaptic power in exchange for adjustability of relative synaptic weights. It was also pointed out that changes in the relative synaptic weights of large numbers of such synapses might contribute to neural plasticity underlying learning and memory (Rail & Rinzel, 1971a,b; Rail, 1974,1978; Rinzel, 1982). This suggestion that the spine stem might be an important morphological locus for changes in synaptic weights did have an impact on anatomical studies; review of that literature is provided by Coss & Perkel (1985). Also noteworthy are the serial reconstructions of electron micrographs of dendritic branches, spines and synapses recently reported by Harris etal. (1985 and personal communication). 3.4 Excitable spines 3.4.1 Spines with excitable spine head membrane The possibility that the membrane of dendritic spine heads might be excitable was assumed by Diamond et al. (1970). Although this possibility came up in various informal discussions at that time, it is noteworthy that Julian Jack analysed this possibility carefully and published an early and astute discussion (Jack et al., 1975); he pointed out that for an optimal range of parameter values, an action potential at the spine head could result in amplification of the synaptic effect. Not until 1983, to the best of our knowledge, did anyone carry out transient computations for the response of an excitable spine head to brief synaptic input; then two independent research groups reported preliminary results at a symposium of the Society for Neuroscience. To acknowledge this coincidence, we arranged to submit paired short papers, first to Nature and then to Science (only to be told that these results and insights

Excitable dendritic spine clusters

33

were of insufficient interest to a wide audience); paired papers were finally published in Brain Research (Miller et aL, 1985; Perkel & Perkel, 1985). Since then various functional implications of excitable dendritic spines have been explored in collaborative discussions (with John Miller, John Rinzel, and Gordon Shepherd). Miller has focused more on the conditions under which excitable dendritic spine interactions could generate bursts of spikes (Malinow & Miller, 1984). Shepherd has focused more upon the possibility of saltatory propagation in distal dendrites, from one excitable spine head to another (Shepherd et aL, 1984). Our computations have focused on chain reaction effects in clusters of excitable spines (Rail & Segev, 1987). 3.4.2 Nonlinear dependence on spine stem resistance: excitable spines Here Fig. 3.4 illustrates computed results for a single dendritic spine located on a dendrite having an input resistance of 262 megohms. A brief synaptic conductance input is delivered to the spine head and the resulting voltage transients are shown for the spine head (upper left) and for the spine base (lower left, with different amplitude scale); the solid lines are for excitable spine head membrane, while the dashed lines are corresponding controls for passive membrane. The left side of this diagram shows results for only two values of spine stem resistance: 630 megohms for (a) and 1000 megohms for (b). Comparing the two spine head action potentials at upper left, the shorter latency and greater amplitude of (b) indicate a secure spike, while the delay and the smaller amplitude of (a) indicate an insecure spike which barely succeeded under near threshold conditions. The reason case (b) is secure in that the larger spine stem resistance results in a steeper and larger depolarization of the spine head in response to the same synaptic input; this is shown best by the dashed curves which correspond to passive spine head membrane. The right side of Fig. 3.4 summarizes results computed for 45 different values of spine stem resistance; a very strong nonlinearity is apparent. For spine stem resistance values less than 400 megohms (for this set of parameter values) the response of the excitable spine head membrane differs negligibly from the passive controls; when this resistance is increased from 400 to 600 megohms, some nonlinear deviation from the passive controls can be seen at the spine head; this is even more apparent at the spine base (see lower right). The greatest nonlinearity is over the range from 600 to 700 megohms; this clearly corresponds to conditions just below and just above threshold for generation of an action potential in the spine head membrane.

34

W. Rail & I. Segev 3.4.3 Optimal range for maximal amplification

Because the voltage delivered to the spine base (see lower right of Fig. 3.4) becomes smaller as spine stem resistance values increase above 700 megohms, it follows that maximal amplification of synaptic efficacy occurs for an optimal range of values near threshold; in this case, a Fig. 3.4. Dependence on spine stem resistance values, computed for depolarizing voltages at spine head and spine base in response to brief synaptic conductance input to the spine head; shown for excitable spine head membrane (continuous curves) and for passive spine head membrane (dotted curves). See text for description and discussion. Computations assumed a spine head area of 1.5 square micra, of which i was assigned to the synaptic contact area and | was either passive membrane (with parameters given below) or excitable membrane with Hodgkin & Huxley (1952) kinetics adjusted to ten times the channel density for squid axon at 22 degrees, using the computer program described in Parnas & Segev (1979); synaptic excitatory conductance had a peak value of 0.37 nanosiemens, with a reversal potential of 100 mV, and a time course proportional to r-exp {—tip), where the peak time, p = 0.035 ms (with a 1.4-ms passive membrane time constant, this corresponds to a value of 40 for the usual alpha parameter); the dendrite was simplified to a 0.63-micron diameter cylinder extending one length constant in both directions (with sealed ends) and with parameters (Rm (membrane resistance) = 1400 ohm cm2, Cm (membrane capacitance) = 1.0 microfarad per cm2, Ri (internal resistance) = 70 ohm cm) implying an input resistance of 262 megohms. Head

r

a

r Base

a b

0.75 Time(ms)

2 4 6 8 10 Spine stem resistance (X10'MQ)

Excitable dendritic spine clusters

35

maximal amplification factor of about 6 occurs over an optimal range of about 630 to 670 megohms. An intuitive explanation of the reduced amplification computed for the larger spine stem resistance values can be achieved by noting that the areas under the two action potentials, (a) and (b) at upper left, are approximately the same, and that it is this time integral of spine head voltage (less spine base voltage) that drives the spine stem current and delivers charge to the neuron (at the spine base); thus, with approximately equal voltage drive, Ohm's law implies that the current and charge delivered to the neuron must decrease as the spine stem resistance is increased. Clearly, changes in other parameters which shift threshold conditions will change the optimal range for this parameter. 3.4.4 Distal branch arbors and spine clusters The linear density of dendritic spines on distal dendritic branches has been reported as about two spines per micrometer of length (Wilson etal., 1983); higher values can result when corrections are made for spines hidden from view (Feldman & Peters, 1979); the serial reconstructions of Harris etal. (1985 and personal communication) have yielded significantly higher densities for some neuron types. In any case, for distal branches of 25 micrometer length, it is quite conservative to allow 50 spines per branch. To simplify our computations, we idealized the distal dendritic arbors (Fig. 3.5) and assumed that every branch has exactly 50 spines, and that exactly five of these possess excitable spine head membrane. The inset in Fig. 3.5 shows more detail for distal branches, A and B, together with their parent branch, C. The symbolic notation is meant to indicate a particular case in which synaptic input is delivered to two of the excitable spines belonging to branch A, and to three of the passive spines belonging to branch B, but to none of the spines belonging to parent branch C. This particular case is one of the five different cases presented next in Fig. 3.6. 3.4.5 Processing of different synaptic inputs The use of symbols in Fig. 3.6 differs only slightly from Fig. 3.5. The left-hand column shows only those spines whose synapses are active in that particular trial; it is important to emphasize that the computation includes 45 passive spines and five excitable spines present on every branch for every trial. The middle column shows only those excitable spines that fired in response to that trial. The right-hand column shows only the resultant peak depolarization that reaches the neuron soma;

36

W. Rail &L Segev

some of the computed voltage values in the branches will be mentioned in the text. Case 1 shows a synaptic input delivered to only one distal excitable spine; only that one spine fires, generating a local voltage peak of 8.5 mV (at the middle of branch A); this delivers 24 microvolts to the soma, being Fig. 3.5. Diagram for focus on one distal dendritic arbor of idealized neuron with six dendritic trees (of whichfiveare represented by their equivalent cylinders; see Rail & Rinzel, 1973). Black spine heads are excitable, while those shown as open circles are passive; see text for further description and discussion of symbols and spine clusters. This idealized neuron was used for the computations summarized in Fig. 3.6. The branching is symmetric and satisfies the | power diameter constraint for transformation to an equivalent cylinder (Rail, 1962, 1964, 1977); all branch lengths are set equal to 0.2 in dimensionless electrotonic length; here Rm was 2500 ohm cm2, implying a 2.5-ms passive membrane time constant for the usual membrane capacity value; this Rm together with Ri of 70 ohm cm implies a 180-micron length constant for distal branches of 0.36 micron diameter; then the input resistance at the soma is 7.8 megohms, and with five orders of branching, Table I of Rail & Rinzel, 1973 shows that the distal branch input resistance must be about 50 times larger (here L = 1.2, M = 5 and N = 6). Computations for this model made use of the computer program, SPICE; see Vladimirescu etal. (1980), Bunow et al. (1985) and Segev et al. (1985). Excitable spine head membrane used Hodgkin & Huxley (1952) kinetics adjusted tofivetimes the channel density for squid axon at 22 degrees.

37

Excitable dendritic spine clusters

double the amount that would result from the same input to a distal passive spine. In this case, the excitable spine produces a small amplification of the response to the synaptic input, but there is no chain reaction between excitable spines. The same result would be expected for a single input to any one excitable spine on any of the terminal branches of this model. Case 2 shows simultaneous synaptic inputs to two excitable spines on branch A; here the local depolarization of branch A is sufficient to reach the firing threshold of the other three excitable spines on branch A; this delivers an 84-mV peak depolarization at the soma. This result more than triples the result of Case 1; it is not five times as great for two reasons: three spines fire with a slight delay relative to the first two; also, the

Fig. 3.6. Excitable spine cluster firing contingencies; schematic summary for the five cases described and discussed in the text. Note that branches A, B and C represent the same distal dendritic arbor shown enlarged in Fig. 3.5. The computations were based on the information summarized in the legend of that diagram. Input conditions

Activated excitable spine heads

Peak EPSP (at soma)MV)

24

84

140

230

50

38

W. Rail & I. Segev

driving potential for synaptic current is reduced by the depolarization of branch A which peaks at 25.5 mV. Several fascinating insights can be derived from this case. It is clear that the local depolarization is sufficient for the chain reaction to take place in branch A, but the 11.7-mV peak in branch B and the 8.6-mV peak in branch C are not quite sufficient to trigger chain reactions in the excitable spines of those branches. Note that the depolarization in branch B is larger than that in branch C; this was expected from an understanding of the asymmetric voltage attenuations in Fig. 3.1; this effect favors distally spreading chain reactions, and limits central spread. Cases 3, 4, and 5 show examples of the different results obtainable by adding a small amount of synaptic input to that of Case 2. Case 3 shows simultaneous synaptic input to all five excitable spines belonging to branch A; here simultaneous firing of the five excitable spines of branch A produces sufficient local depolarization in branch B to fire (with slight delay) all five of its excitable spines; however, the local depolarization in parent branch C is not sufficient to fire its excitable spines (another example of the asymmetric decrement shown in Fig. 3.1). The result at the soma is less than twice that of Case 2 and less than ten times that of Case 1, but it is more than twice the control value for passive spines. Case 4 shows a different combination of five simultaneous synaptic inputs with a significantly different result that can be attributed to more synchronous firing in branches A and B. The two inputs to branch A are the same as in Case 2, but here three additional simultaneous inputs are delivered to any three of the 45 passive spines of branch B; this additional input produces enough additional local depolarization in branch B to fire itsfiveexcitable spines almost simultaneously with those of branch A; the resulting depolarization in parent branch C is more than in Case 3, and here proved sufficient to fire its five excitable spines. Thus 15 spines fire producing a 230-mV peak at the soma; this is nearly ten times that for Case 1, and nearly three times that for Case 2; the amplification factor is close to four, relative to passive control. Case 5 shows one of many possible examples of how effective and specific synaptic inhibitory input can be. Synaptic inhibition delivered to only one or two excitable spines can make the difference between success or failure in the firing of a large cluster of spines. The timing of inhibitory input relative to excitatory input can also be investigated along the lines previously explored (Rail, 1964; Jacket al., 1975; Segev &Parnas, 1983). Reviewing Fig. 3.6 suggests obvious additional cases, some of which we

Excitable dendritic spine clusters

39

have already computed. If synchronous synaptic input is delivered to all five excitable spines of branch C, we find that all fifteen excitable spines on these three branches do fire; the voltage peak in A and B occurs 0.78 ms later than that in C. In this case, the amount of depolarization that spreads into the sibling arbor is significant, but not sufficient to fire those 15 excitable spines without some additional synaptic input to that arbor; with such an assist we would fire a cluster of 30 excitable spines in this pair of sibling arbors, but again, this could be blocked by very few strategically placed inhibitory synaptic inputs. 3.4.6 Brief discussion of excitable spine clusters From the examples above it is clear that excitable spine clusters of different size can be fired, and that success or failure tofiredepends upon a number of contingencies. This could provide a basis for logical processing of different synaptic input combinations, and for the realization of various computations and integrative functions. All of this depends very nonlinearly upon the various input parameters and system parameters, such as spine stem resistance. The spine stem resistance is still an attractive locus for changing synaptic weights (the effect is now more sensitive because of the steep nonlinearity displayed in Fig. 3.4); however, we do not suggest that this parameter represents the only one of importance. Also, it may be noted that the logical possibilities provided by the contingencies for the firing of various excitable spine clusters in distal dendritic arbors represents an updated version of an idea (about information processing in dendrites) that has been noted by several investigators over the years (Lorente de No & Condouris, 1959; Arshavskii etal., 1965; Rail, 1970; Goldstein & Rail, 1974; also R. FitzHugh, personal communication, and Y. Y. Zeevi, Ph.D. thesis and personal communication). 3.4.7 Significance of distal branch properties Although this theoretical model can be made to work for different but inter-related ranges of parameter values, the importance of several insights noted with Fig. 3.1 can now be usefully reviewed. Without the large input resistance value found at distal dendritic locations, the local membrane depolarization produced by a few spine firings would be insufficient to fire other spines. The asymmetry of voltage decrement has several interesting consequences: (i) the local voltage of the distal branch spreads with negligible decrement into the spine heads of the nearby spines that did not receive synaptic input; if this were not true (if there

40

W. Rail & I. Segev

were tenfold voltage decrement into these spine heads, as suggested by Diamond et aL, 1970, these other spines would not reach threshold) no chain reaction could occur; (ii) without asymmetry in the arbors, the chain reaction would travel centrally, enveloping the entire neuron in an 'all or nothing' response, which would destroy the richness made possible by distal cluster firings; (iii) with the asymmetry, fractionation into distal clusters of different size occurs naturally. In addition to these points, large values of spine stem resistance (plus branch input resistance) are needed in order to reach threshold depolarization of the excitable spine head when it receives a reasonable amount of synaptic input. In other words, a rather special set of circumstances makes distal dendritic locations particularly well suited for excitable dendritic spines. 3.5 Conclusion For all of the above reasons, we suggest that evolution has placed voltage dependent ionic channels in the membrane of some distally located spine heads in sufficient number to make them excitable. Whether this suggestion is correct is not yet known; it is testable, in principle, by three different techniques, at least one of which can be expected to succeed in the next few years. These are (i) the use of antibodies to mark the locations of particular channels, (ii) the use of a patch clamp or a suction electrode to record from individual spines, and (iii) the use of voltage sensitive dyes to record voltage transients and perhaps also voltage decrements in distal dendritic branches and dendritic spines. Acknowledgement Dr Segev was a Fogarty Fellow at NIH; his present address is Department of Neuroscience, Institute of Life Sciences, the Hebrew University, Jerusalem, Israel. Some of the same results, figures, and insights have been presented at other symposia and may also appear in resulting publications. References Arshavskii, Y. L, M. B. Berkinblit, S. A. Kovalev, V. V. Smolyaninov & L. M. Chailakhyan (1965). 'The role of dendrites in the functioning of nerve cells.' Dokl. Akademii Nauk SSSR, 163, 994-7. Translation in Doklady Biophysics, Consultants Bureau, N. Y. Bunow, B., I. Segev & J. W. Fleshman (1985). 'Modeling the electrical behavior of anatomically complex neurons using a network analysis program: excitable membrane.' Biol.Cybern., 53, 41-56.

Excitable dendritic spine clusters

41

Chang, H. T. (1952). 'Cortical neurons with particular reference to the apical dendrites.' Cold Spring Harb. Symp. Quant. BioL, 17,189-202. Colonnier, M. (1968). 'Synaptic patterns on different cell types in the different laminae of the cat visual cortex. An electron microscope study.' Brain Res., 9, 268-87. Coss,R. G. &D. H. Perkel(1985). 'The function of dendritic spines: a review of theoretical issues.' Behavioural and Neural Biol., 44,151-85. Diamond, J., E. G. Gray & G. M. Yasargil (1970). 'The function of the dendritic spines: an hypothesis.' In Excitatory Synaptic Mechanisms, P. Anderson & J. K. S. Jansen, eds., pp. 213-22, Universitetsforlaget, Oslo. Erulkar, S. D., R. A. Butler & G. L. Gerstein (1968). 'Excitation and inhibition in cochlear nucleus. II. Frequency-modulated tones.' /. Neurophysiol., 31, 537-48. Feldman, M. L. & A. Peters (1979). 'A technique for estimating total spine numbers on Golgi-impregnated dendrites.' /. Comp. Neurol., 118, 527-42. Goldstein, S. S. & W. Rail (1974). 'Changes of action potential shape and velocity for changing core conductor geometry.' Biophys. J., 14, 731-57. Gray, E. G. (1959). 'Axo-somatic and axo-dendritic synapses of the cerebral cortex: an electron microscopic study.' /. Anat., 93, 420-33. Harris, K. M., Trogadis, J. & Stevens, J. K. (1985). 'Three-dimensional structure of dendritic spines in the rat hippocampus (CA1) and cerebellum.' Abstracts, Soc. for Neuroscience 15th Annual Meeting, 306. Hodgkin, A. L. & A. F. Huxley (1952). 'A quantitative description of membrane current and its application to conduction and excitation in nerve.' /. Physiol. (Lond.), 117, 500-44. Jack, J. J. B., D. Noble & R. W. Tsien (1975). Electric Current Flow in Excitable Cells, Oxford Univ. Press, Lond. Jones, E. G. & T. P. S. Powell (1969). 'Morphological variations in the dendritic spines of the neocortex.' /. Cell. Sci., 5, 509-19. Kawato,M. &N. Tsukahara(1984). 'Electrical properties of dendritic spines with bulbous end terminals.' Biophys. J.,46,155-66. Koch, C. &T. Poggio (1983). 'A theoretical analysis of electrical properties of spines.' Proc. R. Soc. Lond. (BioL), 218, 455-77. Lorente de No, R. & G. A. Condouris (1959). 'Decremental conduction in peripheral nerve: integration of stimuli in the neuron.' Proc. Nat. Acad. Sci., 45, 592-617. Malinow, R. & J. P. Miller (1984). 'Interactions between active dendritic spines could generate bursts of spikes. Soc. Neurosci. Abstr., 10, 547. Miller, J. P., W. Rail & J. Rinzel (1985). 'Synaptic amplification by active membrane in dendritic spines.' Brain Res., 325, 325-30. Parnas, I. & I. Segev (1979). 'A mathematical model for conduction of action potentials along bifurcating axons.' /. Physiol. (Lond.), 295, 323-43. Perkel, D. H. & D. J. Perkel (1985). 'Dendritic spines: role of active membrane in modulating synaptic efficacy.' Brain Res., 325, 331-5. Peters, A. & I. R. Kaiserman-Abramof (1970). 'The small pyramidal neuron of the rat cerebral cortex. The perikaryon, dendrites and spines.Mm. /. Anat., 127, 321-56. Rail, W. (1962). 'Theory of physiological properties of dendrites. 'Ann. N. Y. Acad. Sci., 96,1071-92. Rail, W. (1964). 'Theoretical significance of dendritic trees for neuronal input-output

42

W. Rail & I. Segev

relations.' In Neural Theory and Modeling, R. Reiss, ed., pp. 13-91, Stanford Univ. Press, Stanford, CA. Rail, W. (1967). 'Distinguishing theoretical synaptic potentials computed for different soma-dendritic distributions of input.' /. Neurophysiol., 30,1138-68. Rail, W. (1970). 'Cable properties of dendrites and effect of synaptic location.' In Excitatory Synaptic Mechanisms. P. Andersen & J. K. S. Jansen, eds., pp. 175-87. Universitetsforlaget, Oslo. Rail, W. (1974). 'Dendritic spines, synaptic potency and neuronal plasticity.' In Cellular Mechanisms Subserving Changes in Neuronal Activity, C. D. Woody, K. A. Brown, T. J. Crow & J. D. Knispel, eds., Brain Info. Service Res. Report, 3,13-21. Rail, W. (1977).' Core conductor theory and cable properties of neurons.' In Handbook of Physiology, Vol. 1, Pt. 1, The Nervous System, Cellular Biology of Neurons, J. M. Brookhart, V. B. Mountcastle & E. R. Kandel, eds., pp. 39-97. American Physiological Society, Bethesda, MD. Rail, W. (1978). 'Dendritic spines and synaptic potency.' In Studies in Neurophysiology, R. Porter, ed., Cambridge University Press, N.Y. Rail, W. & J. Rinzel (1971a). 'Dendritic spines and synaptic potency explored theoretically.' Proc. I. U.P.S. (XXV Intl. Congr.), IX, 466. Rail, W. & J. Rinzel (19716). 'Dendritic spine function and synaptic attenuation calculations.' Prog. andAbstr. Soc. Neurosci. First Ann. Mtg., 64. Rall,W. & J. Rinzel(1973). 'Branch input resistance and steady attenuation for input to one branch of a dendritic neuron model.' Biophys. J., 13, 648-88. Rail, W. & I. Segev (1985). 'Space clamp problems when voltage clamping branched neurons with intracellular microelectrodes.' In Voltage and Patch Clamping with Microelectrodes, T. G. Smith, Jr., H. Lecar, S. J. Redman & P. Gage, eds., pp. 191-215, American Physiological Society, Bethesda, MD. Rail, W. & Segev, I. (1987). 'Functional possibilities for synapses on dendrites and on dendritic spines.' In Synaptic Function (eds. G. M. Edelman, W. E. Gall, and W. M. Cowan). John Wiley, N.Y. pp. 605-36. Rail, W., G. M. Shepherd, T. S. Reese & M. W. Brightman (1966). 'Dendro-dendritic synaptic pathway in the olfactory bulb.' Exp. Neurol., 14, 44-56. Redman, S. J. (1976). 'A quantitative approach to the integrative function of dendrites.' In International Review of Physiology: Neurophysiology II, Vol. 10, R. Porter, ed., pp. 1-36, University Park Press, Baltimore. Rinzel, J. (1982). 'Neuronal plasticity (learning).' In Some Mathematical Questions in Biology - Neurobiology, Vol. 15, Lectures on Mathematics in the Life Sciences, R. M. Miura, ed., pp. 7-25, American Mathematical Society, Providence, RI. Segev, I. & I. Parnas (1983). 'Synaptic integration mechanisms. Theoretical and experimental investigation of temporal postsynaptic interaction between excitatory and inhibitory inputs.' Biophys. /., 41, 41-50. Segev, I., J. W. Fleshman, J. P. Miller & B. Bunow (1985). 'Modeling the electrical behavior of anatomically complex neurons using a network analysis program: passive membrane.' Biol. Cybern., 53, 27—40. Shepherd, G. M. & R. K. Brayton (1979). 'Computer simulation of a dendrodendritic synaptic circuit for self- and lateral-inhibition in the olfactory bulb.' Brain Res., 175, 377-82.

Excitable dendritic spine clusters

43

Shepherd, G. M., R. K. Brayton, A. Belanger, J. P. Miller, R. Malinow, I. Segev, J. Rinzel & W. Rail (1984). 'Interactions between active dendritic spines could augment impact of distal dendritic synapses.' Soc. Neurosci. Abstr., 10, 547. Turner, D. A. (1984). 'Conductance transients on dendritic spines in a segmental cable model of hippocampal neurons.' Biophys. J., 46, 85-96. Vladimirescu, A., A. R. Newton & D. O. Pederson (1980). 'SPICE version 26.0.' User's Guide, EECS Dept., University of California, Berkeley. Wilson, C. J. (1984). 'Passive cable properties of dendritic spines and spiny neurons.' /. Neurosci., 4, 281-97. Wilson, C. J., P. M. Groves, S. T. Kitai & I. C. Linder (1983). 'Three dimensional structure of dendritic spines in the rat neostriatum.' J. Neurosci., 3, 383-98.

Vistas from tensor network theory: a horizon from reductionalist neurophilosophy to the geometry of multi-unit recordings ANDRAS J. PELLIONISZ

4.1 The brain and the computer: a misleading metaphor in place of brain theory

Contrary to the philosophy of natural sciences, the brain has always been understood in terms of the most complex scientific technology of manmade organisms, for the simple reason of human vanity. Before and after the computer era, the brain was paraded in the clothing of hydraulic systems (in Descartes' times), and in the modern era as radio command centers, telephone switchboards, learn-matrices or feedback control amplifiers. Presently, it is fashionable to borrow terms of holograms, catastrophes or even spin glasses. Comparing brains to computers, however, has been by far the most important and most grossly misleading metaphor of all. Its importance has been twofold. First, the early post-war era was the first and last time in history that such analogy paved the way both to a model of the single neuron, the flip-flop binary element, cf. McCulloch & Pitts, 1943, and to a grand mathematical theory of the function of the entire brain (i.e., information processing and control by networks implementing Boolean algebra, cf. Shannon, 1948; Wiener, 1948). Second, the classical computer, the so-called von Neumann machine, provided neuroscience with not only a metaphor, but at the same time with a powerful working tool. This made computer simulation and modelingflourishin the brain sciences as well (cf. Pellionisz, 1979). The basic misunderstanding inherent in the metaphor, nevertheless, left brain theory in an eclipse, although the creator of the computers was thefirstto point out (von Neumann, 1958) that these living- and non-living epitomes of complex organisms appear to operate on diametrically opposite structuro-functional principles. The von Neumann-type presentday computers are serially organized systems, governed by a central clock,

Vistas from tensor network theory

45

working through enormous sequences of operations which span great logical depths. They are processors of information in the well-defined Shannonian probability-theory sense, performing functions of mathematical logic and control. In contrast, future non-von Neumann processors ('Neuronal Computers', cf. Eckmiller, 1988), in order to be true to their other name of brain-like machines, have to be massively parallel systems with no clock, and having to do without the principle of simultaneity (cf. Pellionisz & Llinas, 1982). Their logical structure is extremely shallow, typically 3-7-step in depth, just as in the case of the living brain. These instruments are processors of multidimensional parameters. The signals admittedly carry 'biological information', yet the mathematical definition of this term is hitherto nonexistent (cf. Pellionisz, 1983). Moreover, the core of brain function is not the exertion of logical or control operations upon the outside word but its representation by an internal model (cf. Pellionisz, 1983). In order to define general brain function, therefore, the emergence of a conceptually and formally homogeneous representational brain theory is required that is based on the most proper philosophy and axiomatic structure (cf. Palm & Aertsen, 1986; Pellionisz, 19866).

4.2 Neurophilosophy: the place of reductionalist brain theory in natural science The author's contribution to meeting the above challenges is manifested in tensor network theory, developed through the past decade (for review, see Pellionisz, 1986e, 1987«). After close to a decade of its development, this article sizes up its fundamental features and outlines the fields of its projected major applications in the future (see Fig. 4.1). In its philosophy, tensor network theory is based on the conviction (see in detail in Churchland, 1986) that brain theory becomes more a part of natural sciences if it abandons the dogma of emulating the most advanced technology. Instead, it had better build its own theoretical structure on carefully laid axioms and utilize, of course, the most powerful mathematical approach available in the natural sciences to represent universal invariants (cf. Pellionisz, 19866). The specific concept and formalism in tensor network theory is that brain function is implemented by neuronal network transformations that represent physical objects by dual, sensory and motor-type multidimensional general vectors (mathematically, these are covariant and contravariant tensors, cf. Bickley & Gibson, 1962; Pellionisz & Llinas, 1980). Based on such reductionalist predilection,

Andrds Pellionisz

46

tensor theory approaches the brain-mind structurofunctional entity from the viewpoint of multidimensional functional geometry, using it to build a geometrical representation theory. Thus, it is not by chance that the core of its mathematical apparatus is the one used in the unification of physical spacetime (theory of generalized reference-frame-aspecific vector-matrix operations, i.e., tensor transformations, as used, e.g., in. relativity; cf. Levi-Civita, 1926; Einstein, 1916). Therefore, tensor network theory is philosophically much more akin to the modern multidimensional superstring-theory of the universe (cf. Green, 1986), by elevating brain theory into the realm of the abstract natural sciences, than, e.g., to the overreductionalist brain-model offered by the quantitative descriptive apparatus of electronic gain-controlled amplifiers which, in effect, makes brain theory a chapter in control-engineering. 4.3 Tensorial approach to brain theory: representation of invariants by geometrical transformations of intrinsic coordinates

Mathematically, tensor network theory is based on the fact that the structure of the physical geometry of the organisms determines those natural coordinate systems that are intrinsic to the expression of their function. Therefore, adoption of the concept and formalism of coordinate-system-aspecific generalized vectors and matrices (tensors) enables and liberates one to deal with any frame of reference, in fact 'letting the brain speak in its own terms' (cf. Simpson & Graf, 1985). A characteristic example of coordinate Fig. 4.1. Fields of research, potentially benefiting from a conceptually and formally homogeneous brain theory, such as Tensor Network Theory of the Central Nervous System. NEUROPHILOSOPHY: Mind as Geometrical Representation by Networks: Reductionalism

K

H

BRAIN THEORY: Multidimensional Geometrical Representation via Spaces over Intrinsic Coordinates

NEURONAL NETWORKS: General Theory and implementation of Realistic Neuronal Circuitries

MULTI-UNIT PHYSIOLOGY: Corretation-coeffs. as Metric Tensor: Geometry of Intrinsic Spaces

SENSORIMOTOR SYSTEMS: Integration of Anatomy. Physiology with Theory & Implementation

SINGLE-UNIT PHYSIOLOGY: Exploration of Intrinsic Coordinates by Classical Electrophysiology

ANATOMICAL DATA - BANKS: Computerized Quantitative Maps of Body-Coordinates

f COMPUTING BY NEURONAL NETS: Parallel, non-von Neumann Type I Brain-Like Information Processing

NEUROBOTICS: Unified Geometrical Control & Information Theory of Intelligent Organisms

REHABILITATION MEDICINE: Functional Muscle Stimulation and EMG Interpretation

Vistas from tensor network theory

47

systems that are specified by the physical geometry of the body is shown in Fig. 4.2 concerning a head-stabilizing neuronal apparatus, the so-called vestibulo-collic reflex (see in detail in Pellionisz & Peterson, 1988). Passively occurring head-movements are measured by the vestibular semicircular canal apparatus, and are compensated for by expressing the same movement (with opposite sign) by means of coordinated contractions of neck-muscles. It is physically obvious that the head contains built-in natural frames of reference for expressing its movements. As anatomically measured by Blanks, Curthoys & Markham (1972, see Fig. 4.2(fe)) the three vestibular canals form an arrangement whose characteristic axes constitute a coordinate system that resembles the well-known Cartesian (3-axis, orthogonal) frame of reference. On the other hand, as anatomically established by Baker, Goldberg & Peterson (1985), the head-rotational-axes belonging to the pulling of neck muscles comprise a 30-axis arrangement (see Fig. 4.2(c)). Indeed, this neck-motor frame is one of the clearest examples of a highly non-orthogonal system of coordinates that is vastly overcomplete (since it uses ten times as many axes as the minimum required for expressing 3-dimensional rotations of a body around a center). Thus, this scheme demonstrates the possibility and importance of describing CNS function by means of transformations within and among general coordinate systems. While in the case of a few highly specialized systems (e.g. the vestibular canals) it is tempting to fall back on the use of Cartesian vectorial expressions, in most sensory and motor systems (let alone higher CNS functions) the frames intrinsic to the neuronal expressions just simply cannot be taken for granted. Thus, when the CNS represents, e.g., head movements both in the sensory and motor manner, the question is not //"the brain implements transformations of head-rotation expressed in vestibular frame into head-rotation expressed in the neck-motor frame, but how the CNS does it by its neuronal networks. Further, the question is what group of neuroscientists is up to the challenge of making use, for their own purposes, of those potent and general concepts and formalisms that are made available for quantitatively dealing with such general coordinate system transformations, both in sensorimotor neuronal operations and elsewhere in the CNS. 4.4 Neuronal networks: general theory of the structure and function of realistic neuronal circuits

While tensor network theory has been formulated to provide a conceptually and mathematically homogeneous abstract brain theory, it aims at

Vestibular sensory frame (b)

Neck muscle motor frame

rotation axes of vestibular

rotation axes of neck muscles

(a)

:ollic reflex (VCR)

Downloaded from Cambridge Books Online by IP 195.209.231.150 on Mon Oct 15 12:58:39 BST 2012. http://dx.doi.org/10.1017/CBO9780511983467.005 Cambridge Books Online © Cambridge University Press, 2012

Vistas from tensor network theory

49

never losing touch with the concrete neuroanatomical and physiological reality. By providing neuroscience with a network theory, it directly addresses those organizational properties which are inherent in and intrinsic to the physical organization of the brain. It has long been customary to mathematically represent the massively parallel neuronal networks of the CNS by matrices, which become here specific concrete implementations of the general reference-frame-aspecific tensor operators (cf. Pellionisz, 1986e). Availability of a general network theory may be significant, since it is a widely held view that neuroscience must have a powerful enough concept and formalism that can be accepted as an abstract understanding of the function ofspecific quantitative neuronal networks. A particular elaboration of the above general features is shown in Fig .4.3 (cf. Pellionisz & Peterson, 1988). The scheme illustrates one of the main difficulties posed by a general coordinate-system-transformation; i.e. the frames may be overcomplete. For example, the neck-motor system can produce the same movement using an infinite number of different patterns of muscle activation. The solution proposed by Pellionisz (1984) utilizes the difference between covariant and contravariant representations of the desired movement, both expressed in the motor frame as determined by the muscle geometry. The covariant representation can be uniquely established by projecting the movement vector upon each of the muscle axes. The problem is tofindits unique inverse, the contravariant representation. In an overcomplete system the problem is not that this does not exist but that there are an infinite number of inverses. It has been proposed (Pellionisz, 1983, 1984) that the CNS chooses a unique solution, the MoorePenrose generalized inverse of the covariant metric (Albert, 1972), which may be implemented by a network that could plausibly be constructed by developing nervous systems (Pellionisz & Llinas, 1985). Related models

Fig. 4.2. An example of implementing CNS function by transformations of general vectors among multidimensional, non-orthogonal coordinate systems. The frames of reference intrinsic to the vestibular-canal to neck-muscle sensorimotor reflex, (a) The head-stabilization sensorimotor reflex includes the vestibular semicircular canal sensory apparatus, and neck-muscle motor apparatus, (b) The vestibular apparatus is characterized by directions in the three-space belonging to A: anterior, P: posterior, H: horizontal semicircular canals (data from Blanks et al., 1972). (c) The 30 major neck-muscles are characterized by rotational-directions (data from Baker et al., 1985). (d) Motoneurons which generate rotations, expressed in the neck-muscle frame, have to receive signals transformed from the vestibular sensory frame in order to properly stabilize the head.

50

Andrds Pellionisz

have been prepared for the vestibulo-ocular reflex (Simpson & Pellionisz, 1984), for the vestibulo-collic reflex (Peterson et«/., 1985) and for arm movements (Gielen & van Zuylen, 1986). The solution in Fig. 4.3 is based on the three-step scheme of sensorimotor tensor transformation. The task is to change (a) the sensory frame into motor, (b) the measured, covariant type vector to an executable contravariant version, and (c) to increase dimensions from three to thirty. The central, covariant embedding tensor accomplishes both (a) and (c), simply by projecting the three sensory (/ subscripts) upon the 30 motor axes (p subscripts), mathematically expressed as cip = st mp, (1) where s and m are the coordinates of the (normalized) sensory and motor axes, and each matrix-element of cip is the inner (scalar) product of the vectors of coordinates of the /th and/?th axis. Fig. 4.3. Tensor network model of the three-step vestibulo-collic head-stabilization sensorimotor reflex (cf. Pellionisz & Peterson, 1988). Transformation from sensory coordinates to a motor frame (where the latter may be of higher dimensions) can be accomplished by a three-step tensorial scheme. The vestibular sensory metric tensor, vestibulo-collic sensorimotor tensor and contravariant neck motor tensor transformations can be expressed verbally, by abstract reference frame aspecific tensor-symbolism (see in text) or by matrix- (patch-) and network-diagrams. Here, the three matrices are shown, for the particular vestibular and oculomotor frames of the cat (cf. Fig. 4.2) by patch-diagrams only, and by a quantitative visualization of the corresponding neuronal networks that can accomplish such transfer. Network diagrams illustrate the massively parallel architecture of the CNS, where convergences and divergences are the rule, and separated point-to-point connections rarely, if ever, characterize the structure. Motor frame of reference rotation-axes of the 30 neck muscles in the cat

Sensory frame

Vestibulo-collic sensorimotor tensor

Vestibular sensory metri

Contravariant neck-motor tensor: the Moore-Penrose generalized inverse of the 30 X 30 = 900 element covariant motor metric tensor. For legibility, the matrix-elements are shown by proportional, filled and empty circles (for +/— numbers)

Vistas from tensor network theory

51

The reason that the cip covariant embedding tensor is necessary but not sufficient is that cip is aprojective tensor. It turns a physical-type (contravariant) input vector into an output that is provided in its projectioncomponents (covariants). However, our case is the opposite; the available sensory input is covariant, while the output required is contravariant. This is why the other two conversions are necessary; the vestibular sensory metric tensor gpr that converts covariant sensory reception into contravariant sensory perception, and the contravariant neck-motor metric gie (the large 30 x 30 matrix in Fig. 4.3) that turns covariant motor intention into contravariant motor execution. This general function of transforming covariant non-orthogonal versions into contravariant ones by a metric tensor can be accomplished for any given set of axes by a matrix of divergent-convergent neuronal connections among primary and secondary vestibular neurons and among brain-stem premotor neurons and neck-motoneurons (Baker et al., 1984). The required contravariant metric tensor gpr is the inverse of the covariant metric tensor gpr'

^

= (ZPrYl

(2)

where components gpr are the inner (scalar) products of the vectors of coordinates of the (normalized) axes st: gpr = S/S;

(3)

The question of how CNS neuronal networks can arrive at a unique covariant-to-contravariant transformation led to the proposal of a metaorganization principle and procedure which utilizes the MoorePenrose generalized inverse (Pellionisz, 1983, 1984; Pellionisz & Llinas, 1985). This solution is based on arriving at the eigenvectors of the system (those special vectors whose covariant and contravariant expressions have identical directions) by a reverberative oscillatory procedure (muscle proprioception recurring as motoneuron output, setting up stabilizing tremors). The eigenvectors would imprint a matrix of neural connections that can serve as the proper coordination-device (e.g. cerebellar neuronal circuit). The unique inverse of gie can be obtained from the outer (dyadic matrix) product (symbolized by > < ) of the eigenvectors E m , weighted by the inverses of the eigenvalues Xm (the inverse is 0 if Xm = 0,cf. Albert, 1972): m).

(4)

The tensor network model of the vestibulo-collic reflex emerges from the quantitative data of Fig. 4.2 in the form shown in Fig. 4.3. Each of the

52

Andrds Pellionisz

three matrices in the model is represented by a patch-diagram in which the size and sign of each matrix element are indicated byfilled(positive) and open (negative) circles. Four columns represent canal inputs (H,A,P, at the left side of the network-diagram), motor nerve outputs (LC . . ., at right side) and two intermediate neural stages. Another rendering of the tensor network model of the vestibulocollic reflex is shown in Fig. 4.4. This presentation is basically a neuromorphological elaboration of the tensorial scheme of Fig. 4.3. First, the transformation-matrices are not represented here by visually difficult-tocomprehend complex sets of interconnections (used in the top part of Fig. 4.3), but by so-called 'tensor modules' (cf. part of Fig. 4.4, marked 'cerebellar nuclei'). In such a module, the input vector arriving by the incoming axons is transformed by the synaptic interconnection-set into an output vector (the connections are shown by patches, cf. bottom part of Fig. 4.3). A single dendritic tree of the output neurons is drawn to symbolize cells which implement the transformation. A second difference in Fig. 4.4 in comparison with Fig. 4.3 is the detailed elaboration of the cerebellar embodiment of the motor metric tensor. In Fig. 4.3, the third and last transformation is shown by a one-step network. This conversion, however, does not occur in a simple 'throughput-type' network, but is performed by the cerebellar 'add-on-type' network (cf. Pellionisz, 1984). The 'add-on' structurofunctional architecture of the cerebellum (probably a result of its character as 'an evolutionary afterthought') enables Fig. 4.4. 'Tensor modules' in a network model of the vestibular-neck motor head-stabilization reflex, involving the cerebellum. In all, the 3-dimensional vestibular signals, expressed by covariant components, are transformed by the vestibulo-cerebellar neuronal network into 30-dimensional neck motor signals, expressed by contravariant vectorial components. This rendering of the tensormatrices utilizes tensor-modules (see, e.g., the module marked 'cerebellar nuclei'), where the /z-dimensional input vector is shown by a strip of n incoming axons, the n-dimensional output vector is by a strip of n outgoing axons. The dendritic trees of output neurons are shown only by a representative single cell, and the synaptic connectivities among input and output neurons are shown by an n X n matrix, illustrated by a patch-diagram. The basic 3-step transformation is implemented in the vestibular and cerebellar nuclei. The cerebellum serves as an add-on circuit, which turns the covariant motor intention into a contravariant motor execution. The accessory optic system and olivary system serve the role of reporting the misperformance of the head-compensation reflex thus the climbing fibers generate an ongoing modification of the cerebellar metric tensor (Simpson et al., 1979). AOS: accessory optic system, CF: climbing fibers, CN: cerebellar nucleofugal path, GC: granule cells, ME: motor execution signals, MF: mossy fibers, PC: Purkinje cells, PF: parallel fibers.

TSN-SOH

NfiTWOttX

UWDSKLYJWGJ

Ti-JS

MODfiL

Or

VS'JTJ3ULO-COLLIC

VESTIBULAR NUCLEI:

Covariant Sensory Input Covariant Mofor Output

CEREBELLUM:

Covariant Motor Input Contravariant Motor Output

ACCESSORY OPTIC SYSTEM:

Input: Error in Visual Coordinates Output: Error in Motor Coordinates

INFERIOR OLIVE:

Updating the Motor Metric via Eigenvector-Discrepancies

VE'jriEJULAR SENSORY MEASUREMENT

ISCREPANCY IS DETECTED VISION AND tfflUZED t O THE METRIC TENSOR

Downloaded from Cambridge Books Online by IP 195.209.231.150 on Mon Oct 15 12:58:39 BST 2012. http://dx.doi.org/10.1017/CBO9780511983467.005 Cambridge Books Online © Cambridge University Press, 2012

54

Andrds Pellionisz

direct sensorimotor operations even without any cerebellar contribution (motor performance is retained in case of cerebellar ablation but it becomes 'dysmetric'; cf. Bloedel etal., 1985). The 'add-on' architecture is in accord with the hypothesis (Pellionisz, 1985£) that the function of the cerebellum is to turn motor intentions (which are covariant vectors, specifying the independent features of the goal, but whose components do not add up to properly make the performance) into motor executions (which are contravariantvectors, whose components actually perform the goal). Metaphorically, this cerebellar architecture is similar to a 'secretarial antechamber' that intercepts and transforms the intentional commands emanating from the boss' main office. For his 'good intentions' could directly operate the system even in the absence of the 'add-on' side-loop of secretarial executive transformation, but such absence of secretaries typically results in 'ataxic' performance of the system. This metaphor also indicates that while a good secretary never initiates anything, a 'secretarial' knowledge of what executive commands should be attached to intentional goal-specifications does require an internal model of relations existing in the external system; a representation of the external geometry. In the network model of Fig. 4.4 which conforms with the structure of well-known cerebellar nets, the geometrical model of covariant-contravariant relationships is comprised in the 'cerebellar nuclei' tensor module. The additional circuits of the model (the accessory optic system and climbing fiber system arising from the inferior olive, cf. Simpson, Soodak & Hess, 1979) serve as a corollary that monitors the misperformance of the cerebellar transformation, yielding ongoing adaptive modifications of the metric tensor (Llinas & Pellionisz, 1985; cf. also Pellionisz & Llinas, 1979; Pellionisz, 1986a). 4.5 Sensorimotor systems: proving ground of brain theory Tensorial brain theory was first applied to sensorimotor systems. Firstly, any brain theory should prove its adequacy on simple primary CNS operations before being considered relevant for complex and highlevel CNS tasks. It would be premature to forge theories of associations, pattern recognition, let alone tackling such almost purely philosophical problems as the neuronal embodiment of consciousness or 'free will', if a particular approach in brain theory could not even explain the function of, e.g., simple 3-neuron structures as the vestibulo-ocular reflex arc (Pellionisz & Graf, 1987). Secondly, since sensorimotor operations are measurable and describable by physical

Vistas from tensor network theory

55

means, it is possible to put forward, in case of sensorimotor transfer, not merely an abstract brain theory but also its quantitative elaboration. This should, in turn, result in direct experimental comparison of experimental measurements with theoretical predictions (Peterson et al., 1985; Gielen & van Zuylen, 1986). Thirdly, as will be shown below, any theoretical understanding gained from the knowledge of sensorimotor systems lends itself to direct utilization not only in the immediate fields relating to motor systems, such as kinesiology and rehabilitation medicine, but also in the biologically related fields of robotics (neurobotics, cf. Pellionisz, 1983) and of the technology of brain-like computers (neuronal computers', cf. Eckmiller, 1988). An eventual co-evolution of brain theory with some of the most important technological challenges of our time may prove to be of substantial benefit both for constructing intelligent robots and brain-like computers, assuming that brain theory is not an epigon of the technology but can put forward its own mathematical foundation (Pellionisz, 1987b, c, 1988a). 4.6 Data-banks for quantitative anatomy: computerized maps of body-coordinates

Looking beyond the basic challenge of creating and formulating a geometrical approach in brain theory, the first requirement towards its quantitative elaboration is the availability of morphological data that specify the coordinate systems intrinsic to the physical geometry of living organisms. Quantitative morphological specification of sensorimotor systems began about a century ago by Helmholtz' (1896) measurements of the extraocular musculature. Thisfieldhas experienced a rapid growth in the past decade (Blanks et ah, 1972; Ezure & Graf, 1984; Simpson et aL, 1986; Daunicht & Pellionisz, 1986). The data-sets should ideally be obtained in a manner in which they are compatible with one another as well as with the theoretical requirements. Also, it would be useful to make them widely and conveniently available for the research community. Therefore, it is expected that thefieldwill soon support the establishment of data-banks to gather, hold and disseminate the findings of the rapidly emerging discipline of quantitative computerized anatomy. The data-banks could serve as nodes of a computer network. With the widespread availability of today's economical graphic work-stations, linked by telephone network-connections, quantitative sensorimotor research and its applications will no doubt experience a quantum jump of efficiency. As outlined below, such a modernized approach to reveal the

56

Andrds Pellionisz

structure of living organisms is expected to have a major impact on a range offieldsof research and applications. 4.7 Rehabilitation medicine and kinesiology: functional muscle stimulation and EMG-motor unit interpretation

The main contribution of the tensorial approach to the sensorimotor field may be that of providing with a quantitative theory (e.g. by offering a general solution for the coordination of overcomplete musculature, cf. Simpson & Pellionisz, 1984; Peterson et al., 1985; Gielen & Zuylen, 1986). Nevertheless, potential use of its elements, e.g. the Moore-Penrose generalized inverse, can be illustrated even in such simple structures of anatomy as the skeleton of the neck-motor apparatus (Fig. 4.5(a)). The most rudimentary physical geometry underlying motor performance is the skeletal structure. Once the position and the measurements of the vertebral column of the cat's neck is established (cf. Vidal, Graf & Berthoz, 1986), it is possible to use the quantitative tensor model to predict the nature of the constraints that the Moore-Penrose generalized inverse would impose on head-movements executed with the use of this overcomplete joint-structure. The intrinsic system of coordinates for the 2-dimensional displacement of the head (specified by displacement of the cat's eye) can be calculated from the x,y coordinates of the vertebral rotation-joints. As shown in Fig. 4.5(a), given a motor intention-vector,

Fig. 4.5. Coordination of an overcomplete skeleto-motor system. Predictions are calculated by the Moore-Penrose generalized inverse of the covariant metric tensor of the coordinates intrinsic to the 8 cervical joint-7 neck muscle motor apparatus (programming by: A. Pellionisz & J. Laczko, anatomy by: F. Richmond, J. Baker, P. Vidal & W. Graf, tensor model by A. Pellionisz & B. Peterson), (a) Tensor model of constraints of movements inherent in the overcomplete skeletal apparatus composed of 8 cervical joints. The coordinateaxes of the displacement of the head are determined by the joint rotation-points. A movement intention (see arrow) is decomposed into covariant intention components and transformed into contravariants by the Moore-Penrose inverse. (b) Pulling of each of the 7 representative muscles determine a displacement of the head (in case of multiarticular muscles, the knowledge of the relative stiffness of joints is required), (c) Movement intention, similar to that in A, will produce a head-shift, with almost all movement at C1/C2 and C7. Predicted muscleactivations correspond to EMG signals, thus could be a basis for functional stimulation and EMG interpretation, (d) Model, identical to that in C, producing a markedly different movement-pattern for a 'look-up' motor intention. Movement is almost exclusively C1/C2 rotation, without neck-tilt ('EMG'-pattern is also different). Different 'motor strategies' may arise from a single model.

1

obliquus c.c.

(a)

occipitoscap.

(0

Predicted muscle activation ('EMG') (a basis for functional stimulation)

(d)

r llll'i' 1 !!

Downloaded from Cambridge Books Online by IP 195.209.231.150 on Mon Oct 15 12:58:39 BST 2012. http://dx.doi.org/10.1017/CBO9780511983467.005 Cambridge Books Online © Cambridge University Press, 2012

58

Andrds Pellionisz

the Moore-Penrose generalized inverse of the covariant metric tensor of the intrinsic frame will determine a characteristic movement-pattern of the head, in which the cervical column remains a rigid body and almost all of the movement is generated by rotation around C1/C2 and C7 joints (cf. Vidal & Berthoz, 1986). While it is important to study the fundamental physical constraints of motor performance imposed by the skeleton, movements are controlled by the CNS not in a rather low-dimensional, nonetheless overcomplete, joint-coordinate space, but in very high dimensional neuronal coordinate space. Between these two extremes of dimensionality is the muscle-space, spun over the coordinates intrinsic to the pulling of the individual muscles. Fig. 4.5(b-d) shows a preliminary study of the motor control of a musculoskeletal system from the muscle space. As seen, even though the apparatus is overcomplete since 2-dimensional displacements of the eye-center are determined by eight joints and seven muscles, the tensorial approach can predict, with the use of the Moore-Penrose generalized inverse, a unique execution of movement. It is a characteristic feature of the model that the pattern of movement differs greatly on the motor intention (specified by the displacement of the eye-center). For example, Fig. 4.5(c) shows a head-movement similar to one in Fig. 4.5(a). However, if the intention is to look up (Fig. 4.5(d)) the identical model will display a movement-pattern where practically all rotation occurs at C1/C2 and the neck will not tilt. This model suggests, therefore, that the 'different strategies' occurring in motor performance (Nashner, 1977) may be an epiphenomenon of one underlying model and might not invoke a set of different mechanisms to choose from. The model in Fig. 4.5(b-d) goes beyond the skeletal model only in the sense that in the case of multiarticular muscles the calculation of the intrinsic coordinate system (establishing the axes that belong to the pull of individual muscles) also necessitates an assumption of the relative stiffness of the joints. The study shown in Fig. 4.5 can predict an activation-pattern of an overcomplete number of muscles in case of a coordinated movement. This illustrates the potential use of the tensorial approach in the fields of prothestics (Mann, 1981), and functional neuromuscular stimulation (FNS, cf. Kralj & Grobelnik, 1973; Mauritz, 1986; Gruner, 1986 in Pellionisz, 1987b). In these applications a central problem is to arrive at a biologically realistic algorithm which can generate the unique set of an overly large number of muscle activation components that are necessary to make an intended movement. The tensorial analysis could also prove to be useful for the interpretation of large numbers of EMG and motor

Vistas from tensor network theory

59

unit measurements (cf. Loeb & Richmond, 1986), where the problem is, again, on what theoretical grounds to conceptually unify and interpret the multiple sets of quantitative experimental data. The problem of interpreting multi-unit EMG and motor unit signals also relates to the question discussed in the last section of this paper. 4.8 Neurobotics: unified geometrical theory of intelligent organisms

Treating motor control problems in terms of multidimensional geometry (with the use of general coordinates) may have an importance in a wider context. Namely, it could lead to a generally applied formalism that yields the means for the unification of fields that are as closely related to sensorimotor research as kinesiology, sports medicine and ergonomy (both in civilian and other applications) and also with those that presently seem to be beyond the realm of the biological sciences. As discussed elsewhere (Pellionisz, 1983, 1985c; Loeb, 1983) by adopting, both in robotics and neuroscience, a common language, e.g. the formalism of generalized vectorial expressions (not just those expressed in Cartesian 3-dimensional, orthogonal frames) these fields could be united by their common language. Finally, in the widest context, the question of how the CNS may exert communication, control and command operations on a most complex (living) organism, in terms of multidimensional geometry and by means of massively parallel computation, is not without the interest of c3 theorists (cf. Ingber in this volume). 4.9 Computing by neuronal networks: the nature of computation and the structure of the networks

Presently, there is a rapidly growing interest in computing by neuronal networks (cf. this volume; also Eckmiller, 1988). Thus, the question may arise how the tensorial approach relates to this unfolding trend. First of all, while other approaches aim at interpreting the function of imaginary neuronal networks that lack any specific structure (characterized by a set of 'everything-to-everything' interconnections), the tensor approach deals with existing, not arbitrary, neuronal networks (such as vestibuloocular, vestibulo-collicular and cerebellar networks). Further, this approach provides formal means of handling both their structure (cf. the 'tensor module' above) and their function, in terms of transformation of general vectorial expressions. Perhaps the most important difference is, however, that the tensor formalism defines the intrinsic mathematical nature of computation: stating that the calculations performed by networks are transformations of generalized vectors that are expressed in

60

Andrds Pellionisz

intrinsic coordinates (Pellionisz, 1986d). Thus, in case of the cerebellum, for example, it is possible to state the general function of specific cerebellar circuits (e.g. in different species), i.e., that all individually different cerebellar circuits implement a general covariant-contravariant metric tensor transformation. As a matter of course, it can be reasonably expected that by adopting the axiom of general coordinates, a large part of the research done in the field of associative memories and intelligence will gain new dimensions in the not-so-far-future. 4.10 Single cell electrophysiology: exploration of intrinsic coordinates

Lowering our sights from the distant vistas to present-day possibilities and necessities, a practical and immediate question is how the inherently multidimensional theories may relate to data-procurement by classical and widely available single-cell recordings. Since it is not the actually utilized technique that determines the fundamental merits of a scientific project but the potency of the underlying scientific hypothesis, it is therefore proposed here that by adopting a multidimensional concept even single-cell recordings may quickly gain new significance. An example of this may be the exploration of coordinates intrinsic to neuronal function in the CNS. In case of sensorimotor systems, sensory detectors (e.g. primary vestibular neurons) must, by definition, use the frame intrinsic to the structure of sensory mechanism (the vestibular canals). On the other hand, motor effectors (e.g. oculo-motor or neckmotor neurons) must utilize the frame intrinsic to the musculature. Thus, when detecting direction-sensitive firings of neurons in the middle of a sensorimotor apparatus (e.g. brain-stem saccadic bursters, or neurons of the motor cortex; cf. Georgeopoulos, Schwartz & Kettner, 1986), an immediate question is whether these neurons use the sensory or the motor frame or something other. In fact, based on available data (Simpson et al., 1986) it has already been proposed that these cells may use a coordinate system that is neither the sensory nor the motor frame, but the eigenvector-frame of the extraocular muscle apparatus (Pellionisz, 1986c, 1988fo). Since eigenvector-frames have been calculated for several species (Pellionisz, 1985a; Pellionisz & Graf, 1987; Pellionisz, 1986c; Daunicht & Pellionisz, 1986), quantitative predictions are already available to be tested in a comparative manner, since predictions of the eigendirections are different in various species. These theoretical predictions could be verified or rejected by means of experimental investigations using only classical single-unit recordings.

Vistas from tensor network theory

61

4.11 Multi-unit physiology: correlation coefficients as metric tensor: exploring the geometry of functional CNS hyperspaces defined over multi-unit signals

Although, for technical reasons, classical electrophysiological methods have been developed for single units it has been evident to most workers that, given the axiom that the CNS is a massively parallel system, sooner or later experimental methods needed to be invented to access a multitude of neurons simultaneously (see the review in Llinas, 1974). Such, socalled multi-unit recording techniques have, indeed, been pioneered through the past decades (cf. Freeman, 1975; Gerstein et al., 1983; Reitboeck, 1983; Bower & Llinas, 1983). Partly because establishing, mastering and honing such techniques is an exceedingly demanding endeavor, attention has only recently been focussed on the further, and equally excruciating question of how to theoretically interpret the vast arrays of data made available by such parallel methods. At first, the mere visualization of such parallel recordings is satisfactory (Bower & Llinas, 1983), since it represents the long-awaited fullfilment of the dream by Sherrington (1906), who envisioned the massively parallel brain function in the form of the dynamic flickerings of myriads of neurons as an 'enchanted loom'. The classical quantitative analysis of multi-unit data is the crosscorrelation technique (cf. review in Gerstein et al., 1983). This method concludes in establishing n x n tables of cross-correlogram coefficients among n signals. One of the many advantages of this stochastical approach is the availability of software for this conventional quantitative computer analysis. The most important shortcoming inherent in correlograms is, however, that they have hitherto been the end-product of the analysis. The interpretation and evaluation of the n x n tables of crosscorrelograms (in case of n data source) is, however, a source of frustration for the neuroscience community (cf. Kruger, 1982). Another, more recent fundamental concept of interpreting multi-unit recordings is the massive data-compression of n recordings along time into the movement in time of a single point in a functional n-space. This extremely powerful concept, which was pioneered by Aertsen, Gernstein & Johannesma (1986), is depicted in Fig. 4.6. In such an approach, the individual activities in the multi-unit recording represent at every time-point an ordered set of quantities; a mathematical vector. The coordinates are, then, taken as representing a point in the n-space. Although this concept would open the way to comprehensive

62

Andrds Pellionisz

geometrical interpretation of multi-unit recordings, such as calculation of distances, directions, trajectories, center of mass, gravitational clustering and similar geometrical features, such calculations are possible only in the case where the geometry of the n-dimensional hyperspace is known. As it has often been pointed out (see, e.g., the note added in proof in Pellionisz & Llinas, 1985, #2), however, a central problem of brain theory is that there is absolutely no assurance that the CNS functional hyperspaces are limited to either simple Euclidean or even to Riemannian geometry. Fig. 4.6. The functional geometry inherent in CNS hyperspaces defined over multi-unit signals is not a matter of convenient assumption of an Euclidean metric. On the contrary; establishment of the metric tensor of the unknown geometry is the goal of multi-unit experimentation, (a) multi-unit recording symbolized by n = 3 signal sources, (b) 'Point in the n-space' concept (Aertsen et al., 1983) of interpreting the recorded activities (e.g., firings of neurons), (c) Convenient, but unsupported assumption of an Euclidean 'flat' geometry in the n-space permits calculation of geometrical features, but the working hypothesis that the Kronecker delta serves as the metric tensor is untenable, (d) The concept of a proposed approach to multi-unit recordings: The functional geometry of the /i-space is unknown, it is to be established by determining its metric tensor. firiny of neurons

(a)

1

||

time windows

II 1 1 III I I I | | I II I I I I | | I I IIIII I |

1

|| |

II

| |

Nil |

||

Illl I

F, F2

2" 0

3' 1

2' 5

3' 2

2.

.4.

3.

.5.

P: P: points in an A?-dimensional hyperspace whose geometry is: (d)

PA

— "1

• delta) Hi

Metric tensor is to be established! (Curved space)

. — •

—^

)

Vistas from tensor network theory

63

When postulating a functional hyperspace over activities of multi-units, the problems with arbitrarily assumed geometries become painfully obvious. The first question is whether the space is spun over discrete or continuous variables of coordinate components. While most workers operate with the tacit assumption that neuronal activities represent 'continuous' variables (e.g., frequencies), moreover, that the manifold is derivable, 'smooth' (Aertsen, Gernstein & Johannesma, 1986), even this working hypothesis is not universal. Assumptions of a discrete space, spun over 0, 4-1 (or - 1 , +1) binary values of neuronal activity-variables can still be found, possibly because of the remnants of the 'Computers = Brains' McCulloch-Pitts school (where neurons were considered as flip-flop binary units, just like computer-elements). Postulation of such discontinuous, thus non-derivable (non-smooth) manifold is particularly questionable in case of interpretations of multi-unit recordings from the cerebellum (Carman, Rasnow & Bower, 1986). Operations of this organ, throughout evolution, centered around vesft'few/o-cerebellar transformations. The vestibulo-cerebellar apparatus, however, is well-known to employ a frequency-coding (see, e.g., Bloedel et al., 1985), resulting in a reasonably smooth and continuous functional space. A further questionable assumption is the postulation of a highly specific structure of CNS multidimensional functional manifold (e.g. invoking a geometry with Hamming-distances; Carman et al., 1986), since there is absolutely no guarantee that such geometry, indeed, is manifested by CNS function. In most approaches, in fact, the simplest and most parsimonious assumptions are introduced, such that the functional multidimensional hyperspace is continuous and 'smooth', moreover, that it is endowed with a position-independent 'flat' Euclidean geometry (cf. Aertsen etal., 1986). While most workers are keenly aware of the provisionary nature of such initial postulates (which only serve technical convenience), one cannot overemphasize the stopgap nature of this compromise, lest some followers might be led to the mistaken belief that the geometry of CNS functional spaces is truly known. In contrast, as depicted in Fig. 4.6, the nature of the geometry of CNS functional hyperspaces is not a matter of convenient assumption, but represents the very challenge that neuroscience must, at some time, squarely face and properly meet. In fact, neuronal functional manifolds may well be endowed with complex geometries that are characterized by a metric tensor which is position-dependent (the space being curved), the axes could be non-orthogonal, non-rectilinear (curvilinear) or even only locally linear. Further, the distinct possibility exists that some CNS

64

Andrds Pellionisz

hyperspaces (e.g., cognitive neocortical spaces in infants) may not have, at an early developmental stage, an explicit structured geometry at all. It is possible, that 'learning', defined here as the structuring of the geometry of the functional space, may start with amorphic, 'chaotic' spaces with no metric tensor at all. While the above arguments are tacitly accepted in general, it often presents an irresistible temptation that the assumption of a Euclidean metric, even if it is false, permits swift calculations of distances, directions, trajectories, etc., in the CNS manifolds. In contrast, an acknowledgement that the geometry is, indeed, unknown would keep such activities on an uncertain hold until methods for establishing the unknown metric were made available. In an attempt to break through the above impasse and to contribute to further fruition of the seeds inherent in the above-mentioned existing techniques, an approach is proposed as in Fig. 4.7 (announced in the Soc. Neurosci. Convention, 1986), which in effect could synthesize the 'tableof-cross-correlograms' stochastical interpretation with the 'point-in-then-space' geometrical analysis. Such unification may open a new way to reveal features of the metric tensor of the geometry of the functional n-dimensional hyperspace. The proposal hinges on the consideration that

Fig. 4.7. Concept of the proposal, that the table of cross-correlation coefficients (r) of the activity of ^-signal sources approaches the table of covariant metric tensor components (g). Both the coupling of covariant (projection-type) vectorial components (r), and the angle between the axes (g = cos (0)) expresses the same measure: 'how close are a and bl Correlation coefficients (n X n table for n neurons)

Covariant metric tensor (n X n table for n axes) Pellionisz' proposal:

Statistical measure of the coupling between a,b values of individual points as judged from / (= 4) number of samples

Geometrical measure of the interdependence of general coordinates in the a,b A?-space, judged by /?(= 2) axes

a 0 ' How close are a and b ?'

Vistas from tensor network theory

65

if the points in the n-space are viewed as expressed in a general, non-orthogonal frame (which, however, we may not know) then the n x n correlation coefficient table contains statistical measures of the coupling between the separate coordinates (e.g., a, b components) belonging to the points. If the (unknown) axes were perfectly aligned, the a and b values would be identical (coupled by coefficient 1), whereas in case of an orthogonal set of axes the a and b values would be independent (the coupling would be 0). Thus, the correlation-coefficients, by expressing the degree of how close are a and b, are directly related to the angle between the coordinate-axes, therefore correlograms may help us establish the relation of the unknown axes. The above conceptual intuition has been mathematically explored in the study shown in Fig. 4.8. For demonstration purposes, a two-axis

Fig. 4.8. Quantitative elaboration of the proposal. Comparison of the covariant metric tensor and the cross-correlation-coefficient r, calculated for four randomly selected points in a two-axis frame (the angle of axes incremented by 5°). (a) Comparison of r and g reveals a similarity of these measures, even if only four data-points are considered, (b, c) Two-axis frame with four data points. Covariant vector components are closely coupled (close to 1) if the cosine of the angle of axes is near 1, whereas the coupling is loose (close to 0) if the cosine is nearing 0. Formulae at bottom show the conventional method of calculating correlogramcoefficients, and the covariant metric tensor. Covariant metric tensor # = cos(0) compared to the correlation-coefficient r, (as calculated from only four points located at random) (a)

r (corr. coeff.)

66

Andrds Pellionisz

frame of reference was investigated, with a varying 0 angle between them (see Fig. 4.8(6) and (c)). For four randomly selected points, the covariant (projection-type) a and b components were established. Visual comparison of Fig. 4.8(6) and (c) shows, that the a and b components are very close to one another if the 0 angle is small (Fig. 8(c)), while the a and b components are rather different (e.g., for points 2 and 4) if the 0 angle is close to perpendicular (Fig. 4.8(6)). The visual impression is borne out by mathematical analysis, where the cross-correlogram coefficient (r) and the covariant metric (g) are calculated by the conventional formulae below (where the covariant metric yields the cosine of the angle of axes): (5,6) rb

§a» = 2 ^S-JjB =

K) cos

(0«»)

(6)

Plotting r and g = cos (0) in Fig. 4.8(a) reveals that even in case of only four (different) randomly selected points in each two-axis frame (where 0 was changed byfive-degreeincrements), the r and g = cos (0) values are, indeed, close enough to warrant further studies. It has to be emphasized, that the cross-correlogram method is an inherently statistical stochastic measure, while the geometrical measure of the closeness of the axes (the cosines of the angles between them) yields a single deterministic value of 0. Since in case of stochastical analysis the size of the statistical sample is crucial, therefore the calculation of r = g has been implemented in Fig. 4.9 for a varying, much larger number than four randomly selected points, in order to ascertain the convergence of r = g with the increase of the number of points. Comparison of the precision of r = gin case of 4,10,100 or 1000 points clearly shows that for a customary 3-5% biological precision the statistical sample need not be larger than about 100 measurement-points. Given the fact that in multiunit recordings firing of units can usually be obtained during protracted time (with literally thousands of unitary activities either in extracellular spiking, motor unit or EMG activities), the required number of sampling should pose no insurmountable difficulty. The proposal for the convergence of the correlation coefficient table to the table of covariant metric tensor components appears to be a useful beginning. The road, however, is long towards synthesizing the classical statistical correlogram-analysis of multi-unit recordings with a recent, multidimensional geometrical interpretation. However, in the proposed

Vistas from tensor network theory

67

approach the geometry (the metric tensor) of the multidimensional functional hyperspace is not taken for granted, but the very purpose of the analysis is to establish the unknown metric tensor. Thus, one can foresee that with enough time and investment new types of functional geometries of the CNS will be revealed, such that we have very little knowledge (or even imagination) about at the present time. In order to provide a glimpse of the future possibilities, an arbitrary example is shown in Fig. 4.10, to illuminate how one would go about conceptually and formally treating large nxn tables of cross-correlogram components. Suppose that a 30 x 30 table of cross-correlogram coefficients were experimentally established in a 30-electrode-recording. It is visible in the left-side plotting in Fig. 4.10, that the off-diagonals of the crosscorrelogram component table are non-zeroes. This means that the activities of the measured units are not independent of one another, but they are coupled. This could be the result of the activities arising from a

Fig. 4.9. Convergence of the correlation coefficient r to the covariant metric tensor g if the size of the statistical sample increases (from n = 4ton = 10,n = 100 and n = 1000). A sample-size in the range of 100 is deemed sufficient for biological precision of 3-5%. Convergence of the correlation-coefficient r to the covariant metric tensor g = cos(0) by increasing the number of points (n) in the statistical sample

/7 = 4

n= 100

n= 1000

68

Andrds Pellionisz

coordinate system with non-orthogonal (non-independent) axes (in fact, in the given arbitrary example the coupled 'recordings' originated from the neck-muscle axes, shown in Fig. 4.2). The two questions that an investigator may ask are as follows: (a) is it possible to reconstruct the set of coordinate axes which yielded the given table? (b) without the knowledge of the axes, is it possible to understand the functional geometry inherent in such recordings? A positive answer to question (a), in many cases, is not altogether impossible. If the firings arise, e.g., from motoneurons, which are connected to a set of muscles (as in the case of this exemplary demonstration shown in the left plotting in Fig. 4.10), then measurement of the physical geometry of the muscular arrangement could directly reveal Fig. 4.10. Demonstration of a table of cross-correlogram-coefficients considered equal to the covariant metric tensor, and calculating its dual contravariant metric tensor. The calculation uses the Moore-Penrose formula, yielding a proper inverse if the space is complete, and a generalized inverse if the space is overcomplete. With measurement of correlograms (left), and calculation of the dual metric (right), both metric tensors are available, thus the geometry of the functional space is determined. This enables one to calculate geometrical features of eigenvectors, distances, angles, geodesic trajectories, etc. (In this arbitrary demonstration the cross-correlograms were not taken from multi-unit measurements, but originated from the set of neck-motor axes shown in Fig. 4.2.) Tensorial interpretation of multi-unit data by correlogram = covariant metric proposal OOOOOOOOOOOo - -••• oOOoOooOOOOooOoooo* oOOoOooOOOO* oOOooo* •OooOOOOOOOOOoo• •o•"OOOOoOOOOOoo--OOOOoOOOOOO o•-OOOOOOOOOoo •• > ooOOOooO"OO ••••••ooOOOOOOOO ••••••oooOOOOOOO • • • • • • • • • • • • • • • • ••oOOOOOOO Ooo• • • • • • • • • • • • • • • • OOOOOOO O O O o o o o « « • • • • • • • • • • • • • •••oooo Oo o• • •••••••#••••••••o o oOOOO OOOOooo•••••••••••••••• - -ooO o o o• • • ••••••••••••••••o • • • • • • • • • • • oOooOOO O O Oooo••>••••••••••••••••ooooOOO OOOOOOO ••••••••••••••• •o oO O O O O O O O O • • • • • • • • • • # • • • • • • • •o OOOOOOOO oo•••••••••••••••••••o OOOOOOOOoo•••••••••••••••••••o o o•Oo o O O O o o • • • • • • • • • • • • • • oOOOOOOOOOO••ooo ••••••••••• O O O O O O O O O O O • • o oo • • • • • # • • • • • «oOOOOOoOOOO• • o • o o • • • • • • # • • • • • o OOOOOOOOOOO o O - o o i M ••••••••• • o o o O O o • O O O O o o O o O O o •• • • • • • • • • ooooOooOOOOo OOoOOo••• •••••••• • ••• • • o O O O O O O O O O O O o o o • • • • • • • #

••••••• •••••• ••• •••••••••• •••••••••• •

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

Correlogram coefficients (r) of an array of n neurons {n X n) Proposal: r

a,b=9a.b (Corr. coefficients = covariant metric)

•

» •••• • # • • • • • • • • o o • o • O O * • *o« • •

•

> oio • « ( • •