## Abstract

Bidirectionality, forward and backward information flow, is introduced in neural networks to produce two-way associative search for stored stimulus-response associations (*A _{i},B_{i}*). Two fields of neurons,

*F*and

_{A}*F*, are connected by an

_{B}*n*×

*p*synaptic marix M. Passing information through M gives one direction, passing information through its transpose M

*gives the other. Every matrix is bidirectionally stable for bivalent and for continuous neurons. Paired data (*

^{T}*A*) are encoded in M by summing bipolar correlation matrices. The bidirectional associative memory (BAM) behaves as a two-layer hierarchy of symmetrically connected neurons. When the neurons in

_{i},B_{i}*F*and

_{A}*F*are activated, the network quickly evolves to a stable state of two-pattern reverberation, or pseudoadaptive resonance, for every connection topology M. The stable reverberation corresponds to a system energy local minimum. An adaptive BAM allows M to rapidly learn associations without supervision. Stable short-term memory reverberations across

_{B}*F*and

_{A}*F*gradually seep pattern information into the long-term memory connections M, allowing input associations (

_{B}*A*) to dig their own energy wells in the network state space. The BAM correlation encoding scheme is extended to a general Hebbian learning law. Then every BAM adaptively resonates in the sense that all nodes and edges quickly equilibrate in a system energy local minimum. A sampling adaptive BAM results when many more training samples are presented than there are neurons in

_{i},B_{i}*F*and

_{A}*F*, but presented for brief pulses of learning, not allowing learning to fully or nearly converge. Learning tends to improve with sample size. Sampling adaptive BAMs can learn some simple continuous mappings and can rapidly abstract bivalent associations from several noisy gray-scale samples.

_{B}© 1987 Optical Society of America

## I. Introduction: Storing Data Pairs in Associative Memory Matrices

An *n* × *p* real matrix M can be interpreted as a matrix of synapses between two fields of neurons. The input or bottom-up field *F _{A}* consists of

*n*neurons {

*a*, …,

_{l}*a*}. The output or top-down field

_{n}*F*consists of

_{B}*p*neurons {

*b*, …,

_{l}*b*}. The neurons

_{p}*a*and

_{i}*b*are the units of short-term memory (STM). For convenience, we shall use

_{j}*a*and

_{i}*b*to indicate neuron names and neuron states. Matrix entry

_{j}*m*is the synaptic connection from

_{ij}*a*to

_{i}*b*. It is the unit of long-term memory (LTM). The sign of

_{j}*m*determines the type of synaptic connection: excitatory if

_{ij}*m*> 0, inhibitory if

_{ij}*m*< 0. The magnitude of

_{ij}*m*determines the strength of the connection. A real

_{ij}*n*-dimensional row vector

**A**represents a state of

*F*, a STM pattern of activity across the neurons

_{A}*a*, …,

_{l}*a*. A real

_{n}*p*-dimensional row vector

**B**represents a state of

*F*. An associative memory is any vector space transformation

_{B}*T:R*→

^{n}*R*. Usually

^{p}*T*is nonlinear. The matrix mapping M:

*R*→

^{n}*R*is a linear associative memory. When

^{p}*F*and

_{A}*F*are distinct, M is a heteroassociative associative memory. It stores vector data pairs (

_{B}**A**

*,*

_{i}**B**

*). In the special case when*

_{i}*F*=

_{A}*F*, M is an autoassociative associative memory. It stores data vectors

_{B}**A**

*.*

_{i}Recall proceeds through vector-matrix multiplication and nonlinear state transition. The *p*-vector **A M** is a fan-in vector of input sums to the neurons in *F _{B}*:

**A M**= (

*I*

_{b1}, …,

*I*). Specifically, each neuron

_{bp}*a*fans out its numeric output

_{i}*a*across each synaptic pathway

_{i}*m*, sending the gated product

_{ij}*a*to each neuron

_{i}m_{ij}*b*in

_{j}*F*. Each neuron

_{B}*b*receives a fan-in of

_{j}*n*gated products

*a*, arriving independently and perhaps asynchronously, and sums them to compute its input

_{i}m_{ij}*I*=

_{bj}*a*

_{i}m_{1}

*+ … +*

_{j}*a*. Neuron

_{n}m_{nj}*b*processes input

_{j}*I*to produce the output signal

_{bj}*S*(

*I*). In general the signal function

_{bj}*S*is nonlinear, usually sigmoidal or S-shaped. The associative memory M recalls the vector of output signals [

*S*(

*I*

_{b1}), …,

*S*(

*I*)] when presented with input key

_{bp}*A*. In the simplest associative memories, linear associative memories, each neuron's output signal is simply its input signal:

*S*(

*I*) =

_{bj}*I*. Then associative recall is simply vector multiplication:

_{bj}**B**=

**A M**.

What is the simplest way to store *m* data pairs (*A*_{1},*B*_{1}),(*A*_{2},*B*_{2}), …, (*A _{m},B_{m}*) in an

*n*×

*p*associative memory matrix M? The simplest storage procedure is to convert each association (

*A*) into an

_{i},B_{i}*n*×

*p*matrix M

*, then combine each association matrix M*

_{i}*pointwise. The simplest pointwise combination technique is addition: M = M*

_{i}_{1}+ … + M

*. The simplest operation for converting two row vectors*

_{m}**A**

*, and*

_{i}**B**

*of dimensions*

_{i}*n*and

*p*into an

*n*×

*p*matrix M

*is the vector outer product ${\mathbf{\text{A}}}_{i}^{T}{\mathbf{\text{B}}}_{i}$. So the simplest way to store*

_{i}*m*(

**A**

*,*

_{i}**B**

*) is to sum outer product or correlation matrices:*

_{i}*et al.*[3] If the input patterns

**A**

_{1}, …,

**A**

*are orthonormal $\u2014{\mathbf{\text{A}}}_{i}{\mathbf{\text{A}}}_{j}^{T}=1$ if*

_{m}*i*=

*j*, 0 if not—perfect recall of the associated output patterns {

**B**

_{I}…,

**B**

*} is achieved in the forward direction:*

_{m}**A**

_{1}, …,

**A**

*are not orthonormal, as in general they are not, the second term on the right-hand side of Eq. (2), the noise term, contributes crosstalk to the recalled pattern by additively modulating the signal term. More generally, as Kohonen[2] has shown, the least-squares optimal linear associative memory (OLAM) M is given by M = A* B, where A is the*

_{m}*m*×

*n*matrix whose

*i*th row is A

*, B is the*

_{i}*m*×

*p*matrix whose

*i*th row is B

*, and A* is the Moore-Penrose pseudoinverse of A. If {A*

_{i}_{1}, …, A

*} are orthonormal, the OLAM M = A*

_{m}*B, which is equivalent to the memory scheme in Eq. (1).*

^{T}## II. Discrete Bidirectional Associative Memory (BAM) Stability

Suppose we wish to synchronously feed back the recalled output B to an associative memory M to improve recall accuracy. The recalled output B is some nonlinear transformation *S* of the input sum A M:B = *S*(A M) = [*S*(AM^{1}), …, *S*(A M* ^{p}*)], where M

*is the*

^{j}*j*th column of M. What is the simplest way to feed B back to the associative memory? Since M has dimensions

*n*×

*p*and

**B**is a

*p*vector,

**B**cannot vector multiply M, but it can multiply the M matrix transpose (adjoint) M

*. Thus the simplest feedback scheme is to pass*

^{T}**B**backward through M

*. Any other feedback scheme requires more information in the form of a*

^{T}*p*×

*n*matrix N different from M

*. Field*

^{T}*F*receives the top-down message B M

_{A}*and produces the new STM pattern ${A}^{\prime}=S(\mathbf{\text{B}}\phantom{\rule{0.3em}{0ex}}{\text{M}}^{T})=[S(\mathbf{\text{B}}\phantom{\rule{0.3em}{0ex}}{\text{M}}_{1}^{T}),\dots ,S(\mathbf{\text{B}}\phantom{\rule{0.3em}{0ex}}{\text{M}}_{n}^{T})\phantom{\rule{0.2em}{0ex}}]$ across*

^{T}*F*, where M

_{A}*is the*

_{i}*i*th row (column) of M (M

*). Carpenter[4] and Grossberg[5]–[9] interpret top-down signals as expectations in their adaptive resonance theory (ART). Intuitively*

^{T}*A′*is what the field

*F*expects to see when it receives bottom-up input

_{B}*B*.

If *A′* is fed back through M, a new *B′* results, which can be fed back through M* ^{T}* to produce

*A″*, and so on. Ideally this back-and-forth flow of distributed information will quickly equilibrate or resonate on a fixed iata pair (

*A*):

_{f},B_{f}*A,B*), then M is said to be bidirectionally stable.[10],[11]

Which matrices are bidirectionally stable for which signal functions *S*? Linear associative memory matrices are obviously in general not bidirectionally stable. We shall limit our discussion to sigmoidal or S-shaped signal functions *S*, such as *S*(*x*) = (1 + *e*^{−x)−1}, or more generally, to bounded monotone increasing signal functions. Grossberg[12] long ago showed that this is not a limitation at all. He proved that, roughly speaking, a sigmoidal signal function is optimal in the sense that, in unidirectional competitive networks, it computes a quenching threshold below which neural activity is suppressed as noise and above which activity is contrast enhanced and then stored as a stable reverberation in STM. In particular, linear signal functions amplify noise as faithfully as they amplify signals. This theoretical fact reflects the evolutionary fact that real neuron firing frequency is sigmoidal.

First we consider bivalent, or McCulloch-Pitts,[13] neurons. Each neuron *a _{i}* and

*b*is either on (+1) or off (0 or −1) at any time. Hence a state

_{j}*A*of

*F*is a point in the Boolean

_{A}*n*-cube

*B*= {0,1}

^{n}*or {−1,1}*

^{n}*. A state*

^{n}*B*of

*F*is a point in

_{B}*B*= {0,1}

^{p}*or {−1,1}*

^{p}*. A state of the bidirectional associative memory (BAM) (*

^{p}*F*, M,

_{A}*F*) is a point (

_{B}*A,B*) in the bivalent product space

*B*×

^{n}*B*. Topologically, a BAM can be viewed as a two-layer hierarchy of symmetrically connected fields:

^{p}What is the simplest signal function *S* for a bivalent BAM (*F _{A}*, M,

*F*)? The simplest

_{B}*S*is a threshold function:

*M*is the

_{i}*i*th row (column) of M (M

*) and M*

^{T}*is the*

^{j}*j*th column (row) of M (M

*). If the input sum to each neuron equals its threshold 0, the neuron maintains its current state. It stays on if it already is on, off if already off. For simplicity, each neuron has threshold 0 and no external inputs. In general,*

^{T}*a*has a numeric threshold

_{i}*T*and constant numeric input

_{i}*I*has threshold

_{i}; b_{j}*S*and input

_{j}*J*. A bivalent BAM is then specified by the vector 7-tuple (

_{j}*F*, M,

_{A}, T, I*F*) and the threshold laws (3) and (4) are modified accordingly; e.g.,

_{B}, S, J*a*= 1 if $\mathbf{\text{B}}\phantom{\rule{0.3em}{0ex}}{M}_{i}^{T}\phantom{\rule{0.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}{I}_{i}>{T}_{i}$

_{i}Which matrices M are bidirectionally stable for bivalent BAMs? All matrices. Every synaptic connection topology rapidly equilibrates, no matter how large the dimensions *n* and *p*. This surprising theorem is proved in [Refs. 11] and [14] and generalizes the well-known undirectional stability for autoassociative networks with square symmetric M, as popularized by Hopfield[15] and reviewed below. Bidirectionality, forward and backward information flow, in neural nets produces two-way associative search for the nearest stored pair (**A*** _{i}*,

**B**

*) to an input key. Since every matrix is bidirectionally stable, many more matrices can be decoded than those in which information has been deliberately encoded.*

_{i}When the BAM neurons are activated, the network quickly evolves to a stable state of two-pattern reverberation, or nonadaptive resonance.[4],[7] The resonance is nonadaptive because no learning occurs. The weights *m _{ij}* are fixed. This behavior approximates equilibrium behavior in a learning context since changes in the synapses (LTM traces)

*m*are invariably slower than changes in the neuron activations (STM traces)

_{ij}*a*and

_{i}*b*. Below we shall exploit this property to construct adaptive BAMs.

_{j}The stable reverberation corresponds to a system energy local minimum. Geometrically, an input pattern is placed on the BAM energy surface as a ball bearing in the bivalent product space *B ^{n}* ×

*B*. In particular, the bipolar correlation encoding scheme described below sculpts the energy surface so that the data pairs (

^{p}*A*) are stored as local energy minima. The input ball bearing rolls down into the nearest basin of attraction, dissipating energy as it rolls. Frictional damping brings it to rest at the bottom of the energy well, and the pattern is classified or misclassified accordingly. Thus the BAM behaves as a programmable dissipative dynamic system.

_{i}, B_{i}For completeness we review the proof[10],[11] that every matrix is bivalently bidirectionally stable. The proof technique is to show that some system functional *E*:*B ^{n}* ×

*B*→

^{p}*R*is a Lyapunov function or bounded monotone decreasing energy function for the network. The energy function decreases if state changes occur. System stability occurs when the functional

*E*rapidly obtains its lower bound, where it stays forever. Lyapunov functionals provide a shortcut to the global analysis of nonlinear dynamic systems, sidestepping the often hopeless task of solving the many coupled nonlinear difference or differential equations. The most general Lyapunov stability result is the Cohen-Grossberg theorem[16] for symmetric undirectional autoassociators, which we extend in this and the next section to arbitrary bidirectional heteroassociators. The Lyapunov trick of the Cohen-Grossberg theorem is to substitute the neuron state-transition equations into the derivative of the appropriate energy function, and then use a sign argument to show that the derivative is always nonpositive. Hopfield[15] used the discrete version of this Lyapunov trick to show that zero-diagonal symmetric unidirectional autoassociators are stable for asynchronous or serial state changes, i.e., where at any moment at most one neuron changes state. The argument we now present subsumes this case when

*F*=

_{A}*F*and M = M

_{B}*in simple asynchronous operation. An appropriate measure of the energy of the bivalent (A,B) is the sum (average) of two energies: the energy A M B*

^{T}*of the forward pass and the energy B M*

^{T}*A*

^{T}*of the backward pass. Taking the negative of these quadratic forms gives*

^{T}*T*=

_{i}*S*= 0 and inputs

_{j}*I*=

_{i}*J*= 0, which we shall assume for simplicity. In general the appropriate energy function includes thresholds and inputs linearly:

_{j}BAM convergence is proved by showing that synchronous or asynchronous state changes decrease the energy and that the energy is bounded below, so the BAM monotonically gravitates to fixed points. *E* is trivially bounded below for all A and B:

Synchronous vs asynchronous state changes must be clarified. Synchronous behavior occurs when all or some neurons within a field change their state at the same clock cycle. Asynchronous behavior is a special case. Simple asynchronous behavior occurs when only one neuron per field changes state per cycle. Subset asynchronous behavior occurs when some proper subset of neurons within a field changes state per cycle. These definitions of asynchrony are cross sectional. The resultant time-series interpretation of asynchronous behavior is that each neuron in a field randomly and independently changes state, converting the BAM network into a stochastic process. In the proof below we do not assume that changes occur concurrently in the two fields *F _{A}* and

*F*. Otherwise, in principle the energy function might increase. Examination of the argument below shows, though, that this is very unlikely in large networks since so many additive terms in the energy differential are always negative. In any event, the BAM model of back-and-forth information flow we have been developing implicitly assumes that state changes are occurring in at most one field

_{B}*F*or

_{A}*F*at a time. Further, the Lyapunov argument below shows that synchronous operation produces sums of pointwise (neuronwise) energy changes that can be large. In practice this means synchronous updates produce much faster convergence than asynchronous updates.

_{B}First we consider state changes in field *F _{A}*. A similar argument will hold for changes in

*F*. Field

_{B}*F*change is denoted by Δ

_{A}*A*=

*A*

_{2}−

*A*

_{1}= (Δ

*a*

_{1}, …, Δ

*a*) and energy change by Δ

_{n}*E*=

*E*

_{2}−

*E*

_{1}. Hence Δ

*a*= −1, 0, or +1 for a binary neuron. Then

_{i}*a*> 0, the state transition law (3) above implies $\text{B}\phantom{\rule{0.3em}{0ex}}{M}_{i}^{T}>0$. If Δ

_{i}*a*< 0, Eq. (3) implies $\text{B}\phantom{\rule{0.3em}{0ex}}{M}_{i}^{T}<0$. Hence state change and input sum agree in sign. Hence their product is positive: $\Delta \phantom{\rule{0em}{0ex}}{a}_{i}\phantom{\rule{0.3em}{0ex}}\text{B}\phantom{\rule{0.3em}{0ex}}{M}_{i}^{T}>0$. Hence Δ

_{i}*E*< 0. Similarly, the sign law (4) for

*b*implies Δ

_{j}*E*= −

*A*M ΔB

*< 0. Since M was an arbitrary*

^{T}*n*×

*p*real matrix, this proves that every matrix is bivalently bidirectionally stable.

## III. BAM Correlation Encoding

Which BAM matrix M best encodes *m* binary pairs (*A _{i},B_{i}*)? The correlation encoding scheme in Eq. (1) suggests adding the outer-product matrices
${\text{A}}_{i}^{T}\phantom{\rule{0.3em}{0ex}}{\text{B}}_{i}$ pointwise, at least to facilitate forward recall. Will this work for backward recall? The linearity of the transpose operator implies that it will:

*can never be negative. So the state transition laws (3) and (4) imply that*

^{j}*a*=

_{i}*b*= 1 once

_{j}*a*and

_{i}*b*turn on, which they probably will after the first update. Exceptions can occur for initial null vectors or a null matrix M, when

_{j}*a*=

_{i}*b*= 0.

_{j}Bipolar state vectors do not produce this problem. Suppose (*X _{i},Y_{i}*) is the bipolar version of the binary pair (

*A*), i.e., binary zeros are replaced with minus ones, i.e.,

_{i},B_{i}*X*= 2

_{i}*A*−

_{i}*I*and

*Y*= 2

_{i}*B*−

_{i}*I*, where

*I*is a unit vector of

*n*-many or

*p*-many ones. Then the

*ij*th entry of ${\text{X}}_{k}^{T}{\text{Y}}_{k}$ is excitatory (+1) if the vector elements ${x}_{i}^{k}$ and ${y}_{j}^{k}$ agree in sign, inhibitory (−1) if they disagree in sign. This is simple conjunctive or Hebbian correlation learning. Thus the sum M of bipolar outer-product matrices

*by binary or bipolar vectors produces input sums of different signs, so Eqs. (3) and (4) are not trivialized.*

^{T}Note that to encode *m* binary vectors **A**_{1}, …, **A*** _{m}* in a unidirectional autoassociative memory matrix, Eq. (8) reduces to the symmetric matrix
${\text{X}}_{1}^{T}{\text{X}}_{1}\phantom{\rule{0.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\dots \phantom{\rule{0.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}{\text{X}}_{m}^{T}{\text{X}}_{m}$, which is the storage mechanism used by Hop-field[15] (who also zeros the main diagonal to improve recall). Note also that the pair (A

*,B*

_{i}*) can be unlearned or forgotten (erased) by summing $\u2014{\text{X}}_{i}^{T}{\text{Y}}_{i}$, or, equivalently, by encoding $({\text{A}}_{i}^{c},{\text{B}}_{i})$ or $({\text{A}}_{i},{\text{B}}_{i}^{c})$ since bipolar complements are given by ${\text{X}}_{i}^{c}=-\phantom{\rule{0em}{0ex}}{\text{X}}_{i}$ and ${\text{Y}}_{i}^{c}=-\phantom{\rule{0em}{0ex}}{\text{Y}}_{i}$. Equation (8) allows data to be read, written, or erased from memory. Further, ${({\text{X}}_{i}^{c})}^{T}{\text{Y}}_{i}^{c}={\text{X}}_{i}^{T}{\text{Y}}_{i}$, so storing (A*

_{i}*,B*

_{i}*) through Eq. (8) implies storing $({\text{A}}_{i}^{c},{\text{B}}_{i}^{c})$ as well.*

_{i}Strictly speaking bipolar correlation learning laws such as Eq. (8) can be biologically implausible. They imply that synapses can change character from excitatory to inhibitory, or inhibitory to excitatory, with successive experience. This is seldom observed with real synapses. However, when the number of stored patterns *m* is fairly large, |*m _{ij}*| > 0 tends to hold. So the addition or deletion of relatively few patterns does not on average change the sign of

*m*.

_{ij}Is it better to use binary or bipolar state vectors for recall from Eq. (8)? In [Ref. 10] we prove that bipolar coding is better on average. Much of the argument can be seen from the properties of the bipolar signal–noise expansion

The *c _{ij}* are correction coefficients. Ideally the

*c*will behave in sign and magnitude so as to move Y

_{ij}*closer to Y*

_{j}*and give Y*

_{i}*more positive weight the closer Y*

_{j}*is to Y*

_{j}*. Then the right-hand side of Eq. (9) will tend to equal a positive multiple of Y*

_{j}*and thus threshold to Y*

_{i}*or B*

_{i}*. When the input X is nearer X*

_{i}*than all other X*

_{i}*, the subsequent output Y should tend to be nearer Y*

_{j}*than all other Y*

_{i}*. When Y is fed back through M*

_{j}*, the output X′ should tend to be even closer to X*

^{T}*than X was, and so on. Combining this argument with the signal–noise expansion (9) and its transpose-based backward analog, we obtain an estimate of the BAM storage capacity for reliable recall:*

_{i}*m*< min(

*n,p*). No more data pairs can be stored and accurally recalled than the lesser of the vector dimensions used.

This analysis explains much BAM behavior without Lyapunov techniques. However, such accurate decoding implicitly assumes that if stored input patterns are close, stored output patterns are close. Specifically we make the continuity assumption:

*l*

^{1}distance. This is an implicit assumption of continuous mapping networks. When a data set substantially violates it, as in the parity mapping, which indicates whether there is an even or odd number of ones in a bit vector, supervised learning techniques such as backward error propagation[17]–[20] are preferable.

Do the correction coefficients *c _{ij}* behave as desired? They do, when (10) holds, in the sense that they naturally connect bipolar and binary spaces:

*is more than half the space away, so to speak, from A*

_{j}*, and thus by (10) if B*

_{i}*is approximately more than half the space away from B*

_{j}*, the negative sign of*

_{i}*c*corrects Y

_{ij}*by converting it to ${\text{Y}}_{j}^{c}$, which is a better approximation of Y*

_{j}*since ${\text{B}}_{j}^{c}$ is approximately less than half the space away from B*

_{i}*. The magnitude of*

_{i}*c*then further corrects Y

_{ij}*by directly approaching the maximum signal amplification factor,*

_{j}*n*, as $\text{H}({\text{B}}_{i}{\text{B}}_{j}^{c})$ approaches 0. If A

*is less than half the space away from A*

_{j}*, then*

_{i}*c*> 0 and

_{ij}*c*approaches

_{ij}*n*as H(B

*,B*

_{i}*) approaches 0. If A*

_{j}*is equidistant between A*

_{j}*and ${\text{A}}_{i}^{c}$, then*

_{i}*c*= 0. Finally, bipolar coding of state vectors is better on average than binary coding in the sense that on average

_{ij}*c*always correct better in magnitude than the mixed coefficients ${\text{A}}_{i}\phantom{\rule{0.2em}{0ex}}{\text{X}}_{j}^{T}$ and sometimes the mixed coefficients can have the wrong sign.

_{ij}Consider a simple example. Suppose we wish to store two pairs given by

_{1},A

_{2}) = 1/3 ∼ 1/2 = 1/4 H(B

_{1},B

_{2}). Convert these binary pairs to bipolar pairs:

*,Y*

_{i}*) to correlation matrices ${\text{X}}_{i}^{T}{\text{Y}}_{i}$:*

_{i}_{2}by 1 bit. In particular, suppose we present an input A = (0 1 1 0 0 0) to the BAM. Then

_{2},B

_{2}) with initial energy

*E*(A,B

_{2}) = −4. Now suppose an input A = (0 0 0 1 1 0) is presented to the BAM. Since H(A,A

_{1}) = 3 < 5 = H(A,A

_{2}), we might expect A to evoke the resonant pair (A

_{1},B

_{1}). In fact

*,B*

_{i}*).*

_{i}Figure 1 displays snapshots of asynchronous BAM recall. Approximately six neurons update between snapshots. The spatial alphabetic associations (*S,E*), (*M,V*), and (*G,N*) are stored. *F _{A}* contains

*n*= 10 × 14 = 140 neurons.

*F*contains

_{B}*p*= 9 × 12 = 108 neurons. A 40% noise corrupted version (99 bits randomly flipped) of (

*S,E*) is presented to the BAM and (

*S,E*) is perfectly recalled, illustrating the global order-from-chaos aesthetic appeal of asynchronous BAM operation.

BAMs are also natural structures for optical implementation. Perhaps the simplest all-optical implementation is a holographic resonator with M housed in a transmission hologram sandwiched between two phase-conjugate mirrors. Figures 2 and 3 display two different optical BAMs discussed in [Ref. 21]. Figure 2 displays a simple matrix–vector multiplier BAM with M represented by a 2-D grid of pixels with varying transmittances. Figure 3 displays a BAM based on a volume reflection hologram. The box labeled threshold device accepts a weak signal image on one side and produces an intensified and contrast-enhanced version of the image on its output side. The Hughes liquid crystal light valve or two-wave mixing are two ways to implement such a device. Note that the configuration requires the hologram to be read with light of two different polarizations. Hence diffraction efficiency of holograms recorded as birefringence patterns in photorefractive crystals will be somewhat compromised.

## IV. Continuous BAMs

A continuous BAM[10],[11] is specified by, for example, the additive dynamic system

*a*and

_{i}*b*can take on arbitrary real values.

_{j}*S*is a sigmoid signal function. More generally, we shall only assume that

*S*is bounded and strictly monotone increasing, so that

*S′*=

*dS*(

*x*)/

*dx*> 0. For definiteness, we assume all signals

*S*(

*x*) are in [0,1] or [−1,1], so that the output (observable) state of the BAM is a trajectory in the product unit hypercube

*I*×

^{n}*I*where

^{p}*I*= [0,1]

^{n}*or [−1,1]*

^{n}*. For example, in the simulations below we use the bipolar logistic sigmoid*

^{n}*S*(

*x*) = 2(1 +

*e*

^{−cx)−1}for

*c*> 0.

*I*and

_{i}*J*are constant external inputs.

_{j}The first term on the right-hand sides of Eqs. (14) and (15) are STM passive decay terms. The second term is the endogenous feedback term. It sums gated bipolar signals from all neurons in the opposite field. The third term is the exogenous input, which is assumed to change so slow relative to the STM reaction times that it is constant. Of course both right-hand sides of Eqs. (14) and (15) are in general multiplied by time constants, as is each term. We omit these constants for notational convenience.

The additive model [Eqs. (14) and (15)] can be extended to a shunting[8] or multiplicative model that allows multiplicative self-excitation through the term
$({\text{A}}_{i}\phantom{\rule{0.2em}{0ex}}-\phantom{\rule{0.2em}{0ex}}{a}_{i})\phantom{\rule{0.2em}{0ex}}[S({a}_{i})\phantom{\rule{0.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}{I}_{i}^{E}]$ and multiplicative cross-inhibition through a similar term, where A* _{i}* (B

*) is the positive upper bound on the activation of*

_{j}*a*(

_{i}*b*), and ${I}_{i}^{I}({J}_{j}^{I})$ and ${I}_{i}^{E}({J}_{j}^{E})$ are the respective constant non-negative inhibitory and excitatory inputs to

_{j}*a*(

_{i}*b*). The shunting model can then be written

_{j}*a*(

_{i}*b*) can be replaced with

_{j}*C*+

_{i}*a*(

_{i}*D*+

_{j}*b*) where

_{j}*C*(

_{i}*D*) is a non-negative constant. Then the range of

_{j}*a*(

_{i}*b*) is the interval [−

_{j}*C*] ([−

_{i},A_{i}*D*]). The bidirectional stability of systems (16) and (17) follows from the same source of stability as the additive model, the bidirectional/heteroassociative extension of the Cohen-Grossberg theorem.[16] The thrust of this extension is to symmetrize an arbitrary rectangular connection matrix M by forming the zero-block diagonal matrix N:

_{j},B_{j}*. Thus the bidirectional heteroassociative procedure is converted to a large-scale unidirectional autoassociative procedure acting on the augmented state vectors*

^{T}**C**= [

**A**|

**B**], for which the Cohen-Grossberg theorem applies. The subsumption of the unidirectional version of Eqs. (16) and (17) by fixed-weight competitive networks is discussed in [Ref. 16]. The Cohen-Grossberg theorem is further extended in the next section when we prove the stability of adaptive BAMs. For simplicity we shall continue to analyze only the additive model, which subsumes the symmetric unidirectional autoassociative circuit model put forth by Hopfield[22] when M = M

*.*

^{T}As shown by Kosko,[10],[11] the appropriate bounded Lyapunov or energy function *E* for the additive BAM system [Eqs. (14) and (15)] is

*E*is computed term by term. The objective is to factor out

*S′*(

*a*) ${\dot{a}}_{i}$ from terms involving inputs to

_{i}*a*and

_{i}*S′*(

*b*) ${\dot{b}}_{j}$ from terms involving inputs to

_{j}*b*, regroup, then substitute in the STM Eqs. (15) and (16). The time derivative of the integrals is equivalent to the sum of the time derivative of

_{j}*F*[

*a*(

_{i}*t*)] for

*F*terms, of

_{A}*G*[

*b*(

_{j}*t*)] for

*F*terms. The chain rule gives

_{B}*dF/dt*=

*dF/da*$d{a}_{i}\phantom{\rule{0em}{0ex}}/\phantom{\rule{0em}{0ex}}\mathit{\text{dt}}={S}^{\prime}\phantom{\rule{0.2em}{0ex}}({a}_{i})\phantom{\rule{0.3em}{0ex}}{\dot{a}}_{i}\phantom{\rule{0.3em}{0ex}}{a}_{i}$. The

_{i}*F*input term gives

_{A}*S′*(

*a*) ${\dot{a}}_{i}$

_{i}*I*. The product rule of differentiation is used to compute the time derivative of the quadratic form, which gives the sum of the two endogenous feedback terms in Eqs. (14) and (15) modulated by the respective terms

_{i}*S′*(

*a*) ${\dot{a}}_{i}$ and S′ (b

_{i}_{i}) ${\dot{b}}_{j}$. Rearrangement then gives

*S′*> 0, Eq. (19) implies that

*Ė*= 0 if and only if ${\dot{a}}_{i}={\dot{b}}_{j}=0$ for all

*i*and

*j*. At equilibrium all activations and signals are constant. Since M was an arbitrary

*n*×

*p*real matrix, this proves that every matrix is continuously bidirectionally stable.

As Hopfield[22] has noted, in the high-gain case when the sigmoid signal function *S* is steep, the integral terms vanish from Eq. (18). Then the equilibria of the continuous energy *E* in Eq. (18) are the same as those of the bivalent energy *E* in Eq. (5), namely, the vertices of the product unit hypercube *I ^{n}* ×

*I*or, equivalently, the binary points in

^{p}*B*×

^{n}*B*. Continuous BAM convergence then has an intuitive fuzzy set interpretation. A fuzzy set is simply a point in the unit hypercube

^{p}*I*or

^{n}*I*. Each component of the fuzzy set is a fit[14] (rather than bit) value, indicating the degree to which that element fits in or belongs to the subset. In a unit hypercube, the midpoint of the hypercube, M = (1/2,1/2, …, 1/2) has maximum fuzzy entropy[14] and binary vertices have minimum fuzzy entropy. In a continuous BAM the trajectory of an initial input pattern—an ambiguous or fuzzy key vector—is from somewhere inside

^{p}*I*×

^{n}*I*to the nearest product-space binary vertex. Hence this disambiguation process is precisely the minimization of fuzzy entropy.[11],[14]

^{p}## V. Adaptive BAMs

BAM convergence is quick and robust when M is constant. Any connection topology always rapidly produces a stable contrast-enhanced STM reverberation across *F _{A}* and

*F*. This stable STM reverberation is not achieved with a lateral inhibition or competitive[12],[23] connection topology within the

_{B}*F*and

_{A}*F*fields, as it is in the adaptive resonance model,[4] since there are no connections within

_{B}*F*and

_{A}*F*. The idea behind an adaptive BAM is to gradually let some of this stable STM reverberation seep into the LTM connections M. Since the BAM rapidly converges and since the STM variables

_{B}*a*and

_{i}*b*change faster than the LTM variables

_{j}*m*change in learning, it seems reasonable that some type of convergence should occur if the

_{ij}*m*change gradually relative to

_{ij}*a*and

_{i}*b*. Such convergence depends on the choice of learning law for

_{j}*m*.

_{ij}In this section we show that, if *m _{ij}* adapts according to a generalized Hebbian learning law, every BAM adaptively resonates in the sense that all nodes (STM traces) and edges (LTM traces) quickly equilibrate. This real-time learning result extends the Lyapunov approach to the product space

*I*×

^{n}*I*×

^{p}*R*

^{n×p}. The LTM traces

*m*tend to learn the associations (A

_{ij}*,B*

_{i}*) in unsupervised fashion simply by presenting A*

_{i}*to the bottom-up field of nodes*

_{i}*F*and simultaneously presenting B

_{A}*to the top-down field of nodes*

_{i}*F*. Input patterns sculpt their own attractor basins in which to reverberate. In addition to simple heteroassociative storage and recall, simulation results show that a pure bivalent association (A

_{B}*,B*

_{i}*) can be quickly learned, or abstracted from, noisy gray-scale samples of (A*

_{i}*,B*

_{i}*). Many continuous mappings, such as rotation mappings, can also be learned by sampling instantiations of the mappings, often more instantiations than permitted by the storage capacity constraint*

_{i}*m*< min(

*n,p*) for simple heteroassociative storage.

How should a BAM learn? How should synapse *m _{ij}* change with time given successive experience? In the simplest case no learning occurs, so

*m*should decay to 0. Passive decay is most simply a model with a first-order decay law:

_{ij}*m*(

_{ij}*t*) =

*m*(0)

_{ij}*e*

^{−t}→ 0 as time increases. This simple model contains two ubiquitous features of unsupervised real-time learning models: exponentiation and locality. The mechanism of real-time behavior is exponential modulation. Learning only depends on locally available information, in this case

*m*. These two properties facilitate hardware instantiation and increase biological plausibility.

_{ij}What other information is locally available to the synapse *m _{ij}*? Only information about

*a*and

_{i}*b*. What is the simplest way to additively include information about

_{j}*a*and

_{i}*b*into Eq. (20)? Multiply or add

_{j}*a*and

_{i}*b*—

_{j}*a*or

_{i}b_{j}*a*+

_{i}*b*. Multiplicative combination is conjunctive; learning requires signals from both neurons. Additive combination is disjunctive; learning only requires signals from one neuron. Hence associative learning favors the product

_{j}*a*. This choice is also an approximation of the correlation coding scheme (9) and produces a naive Hebbian learning law:

_{i}b_{j}*m*can be unbounded since

_{ij}*a*and

_{i}*b*can, in principle, just grow and grow. This possibility is sure to occur in feedback networks. So Eq. (21) is unacceptable. Moreover, on closer examination of

_{j}*m*, which symmetrically connects the ith neuron in

_{ij}*F*with the

_{A}*j*th neuron in

*F*, we see that the activations

_{B}*a*and

_{i}*b*are not locally available to

_{j}*m*.

_{ij}Only the signals *S*(*a _{i}*) and

*S*(

*b*) are locally available to

_{j}*m*. In Eq. (8) the bipolar vectors can be interpreted as vectors of threshold signals. So the simplest way to include the locally available information to

_{ij}*m*is to add the bounded signal correlation term

_{ij}*S*(

*a*)

_{i}*S*(

*b*) to Eq. (20). We call this a signal Hebb law:

_{j}*m*is found by setting the right-hand side of Eq. (22) equal to 0:

_{ij}The signal Hebb law is bounded since the signals are bounded. Suppose for definiteness that *S* is a bipolar signal function. Then

*,B*

_{i}*) as the conjunction IF A*

_{i}*THEN B*

_{i}*, and IF B*

_{i}*THEN A*

_{i}*. Moreover, the bipolar endpoints −1 and +1 can be expected to abound with a steep bounded*

_{i}*S*.

Suppose *m _{ij}* is maximally increasing due to

*S*(

*a*)

_{i}*S*(

*b*) = 1. Then Eq. (22) reduces to the simple first-order equation

_{j}*m*is maximally decreasing, the right-hand side of Eq. (24) is −1 and

_{ij}*m*approaches +1 exponentially fast independent of initial conditions. This agrees with Eq. (23). The signal Hebb law (22) asymptotically approaches the bipolar correlation learning scheme (8) for a single data pair. So the learning BAM for simple heteroassociative storage can still be expected to be capacity constrained by

_{ij}*m*< min(

*n,p*).

The BAM memory medium produced by Eq. (22) is almost perfectly plastic. Scaling constants in Eq. (22) must be carefully chosen. In particular, the forget term −*m _{ij}* in Eq. (22) must be scaled with a constant less than unity. Otherwise present learning washes away past learning

*m*(0). In practice this means that a training list of associations (A

_{ij}_{1},B

_{1}), …, (A

*,B*

_{m}*) should be presented to the adaptive BAM system more than once if each pair (A*

_{m}*,B*

_{i}*) is presented for the same length of time. Alternatively, the training list can be presented once if the first pair (A*

_{i}_{1},B

_{1}) is presented longer than (A

_{2},B

_{2}) is presented, (A

_{2},B

_{2}) longer than (A

_{3},B

_{3}), (A

_{3},B

_{3}) longer than (A

_{4},B

_{4}), and so on. This holds because the general integral solution to Eq. (22) is an exponentially weighted average of sampled patterns.

In what sense does the adaptive BAM converge? We prove below that it always converges in the sense that nodes and edges rapidly equilibrate or resonate when environmentally perturbed. Recall and learning can simultaneously occur in a type of adaptive resonance.[4]–[9]

At this point it is instructive to distinguish simple adaptive BAM behavior from standard adaptive resonance theory (ART) behavior. The high-level processing behavior of the Carpenter-Grossberg[4] ART model can be sketched as follows. Only one node in *F _{B}* fires at a time, the instar[8] node

*b*that won the competition for bottom-up activation when a binary input pattern was presented to

_{j}*F*. The winner

_{A}*b*then fans out its spatial pattern or outstar[8] to the nodes in

_{j}*F*. If this fan-out pattern sufficiently matches the input pattern presented to

_{A}*F*, a stable pattern of STM reverberation is set up between

_{A}*F*and

_{A}*F*, learning can occur (but need not), and instar

_{B}*b*has recognized or categorized the input pattern. Otherwise

_{j}*b*is shut off and another instar winner

_{j}*b*fans out its spatial pattern, etc., until a match occurs or, if no match occurs, until the binary input pattern trains some uncommitted node

_{k}*b*to be its instar. Hence each instar node

_{u}*b*in the ART model recognizes or categorizes a single input pattern or set of input patterns, depending on how high a degree of match is desired. Match degree can be deliberately controlled. Direct access to a trained instar is assured only if the input matches exactly, or nearly, the pattern learned by the instar. The more novel the pattern presented to

_{j}*F*, and the higher the desired degree of match, the longer the ART system tends to search its instars to classify it.

_{A}In the adaptive BAM every *F _{B}* node

*b*in parallel fans out its outstar across

_{j}*F*when a STM pattern is active across

_{A}*F*. The signal Hebb law (22) distributes recognition capability across all the edges of all the

_{A}*b*nodes so that most bivalent associations are unaffected by removing a particular node. The closest analog to a specifiable degree of match in a BAM is the storage-capacity relationship between pattern number and pattern dimensionality,

_{j}*m*< min(

*n,p*). The closer

*m*is to the maximum reliable capacity, the greater the match, between an input pattern and a stored association (A

*,B*

_{i}*), required to evoke (A*

_{i}*,B*

_{i}*) into a stable STM reverberation. When*

_{i}*m*is small relative to the maximum capacity, there tend to be few basins of attractions in the state space

*I*×

^{n}*I*, the basins tend to have wide diameters, and they tend to correspond to the stored associations (A

^{p}*,B*

_{i}*). Each stored association tends to recognize or categorize a large set of input stimuli. When*

_{i}*m*is large, there tend to be several basins, with small diameters. When

*m*is large enough, only the exact patterns A

*or B*

_{i}*will evoke (A*

_{i}*,B*

_{i}*). Within capacity constraints, all inputs tend to fall into the basin of the nearest stored association and thus have direct access to nearest stored associations. Novel patterns are classified or misclassified as rapidly as more familiar patterns.*

_{i}Learning can also occur in an adaptive BAM during the rapid recall process. Familiar patterns tend to strengthen or restrengthen the reverberating associations they elicit. Novel patterns tend to misclassify to spurious energy wells (attractor basins), which in effect recognize them, or by Eq. (22) they tend to dig their own energy wells, which thereafter recognize them. As the simulation results discussed below show, many more patterns can be stably presented to the BAM than min(*n,p*) if they resemble stored associations. Otherwise the forgetting effects of Eq. (22) prevail and at any moment the adaptive BAM tends to remember no more than the most recent min(*n,p*)-many distinct inputs (elicited associations).

We now prove that the adaptive BAM converges to local energy minima. Denote the bounded energy function in Eq. (18) by *F*. Then the appropriate energy or Lyapunov function for the adaptive BAM dynamic system of Eqs. (16), (17), and (22) is simply

*m*is bounded. When the product rule of differentiation is applied to the time-varying triple product in the quadratic form component of

_{ij}*F*[Eq. (18)], we get the triple sum

*E*in Eq. (27) gives, on rearrangement,

*Ė*= 0, Eq. (28) and

*S′*> 0 imply that both edges and nodes have stabilized: ${\dot{m}}_{\mathit{\text{ij}}}={\dot{a}}_{i}={\dot{b}}_{j}=0$ for all

*i*and

*j*. Hence every signal Hebb BAM adaptively resonates. This result further generalizes in a straightforward way to any number of layered BAM fields that are interconnected, not necessarily contiguously, by Eq. (22).

Can an adaptive BAM learn and recall simultaneously? In the ART model[4] a mechanism of attentional gain control [inhibition due to the sum of *F _{B}* signals

*S*(

*b*)] is introduced to enable neurons

_{j}*a*in

_{i}*F*to distinguish environmental inputs

_{A}*I*from top-down feedback patterns

*B*. In principle, an attentional gain control mechanism can also be added to an adaptive BAM. Short of this new mechanism, how can neuron

*a*distinguish external input

_{i}*I*and internal feedback input from

_{i}*F*? In Eq. (14) these terms both additively effect the time change of

_{B}*a*. So external and internal feedback to

_{i}*a*can only differ in their patterns of magnitude and duration over some short time interval. If the magnitude and duration of inputs are indistinguishable, the inputs are indistinguishable to

_{i}*a*. When they differ,

_{i}*a*can in principle learn and recall simultaneously.

_{i}Suppose a randomly fluctuating, uninformative environment confronts the adaptive BAM. Then *I _{i}* tends to have zero mean in short time intervals. This allows

*a*to be driven by internal feedback from

_{i}*F*. If learning is permitted, familiar STM reverberations, evoked perhaps by other

_{B}*a*(or

_{k}*b*), can be strengthened. When

_{j}*I*remains relatively constant over an interval, a new pattern can be learned, and can be learned while

_{i}*F*and

_{A}*F*reverberate, eventually dominating those reverberations. If the reverberations are spurious, learning is enhanced by appropriately weighting

_{B}*I*. In simulations, scaling

_{i}*I*by

_{i}*p*, the number of neurons in

*F*, has proved effective presumably because it balances the magnitude of

_{B}*I*against the magnitude of the internal

_{i}*F*feedback sum in Eq. (14).

_{B}An extension of these ideas is the sampling adaptive BAM. There is a trade-off between learning time and learning samples. The standard learning model is to present relatively few samples for long lengths of learning time, typically until learning converges or is otherwise terminated, as in simple heteroassociative storage, or to present few samples over and over, as in backpropagation.[17]–[20] In what we shall call sampling learning several samples are presented briefly—typically many more patterns than neuron dimensionality—and the underlying patterns, associations, or mappings are better learned as sample size increases. Learning is not allowed to converge. Only a brief pulse of learning occurs for each sample. When the sampling learning technique is applied to the adaptive BAM, a sampling adaptive BAM results. For example, an adaptive BAM can rapidly learn a rotation mapping, if *n* = *p*, by simply presenting a few spatial patterns at *F _{A}* and concurrently presenting the same pattern rotated some fixed degree at

*F*. Thereafter any pattern presented at

_{B}*F*produces the stable STM reverberation with the input pattern at

_{A}*F*and its rotated version at

_{A}*F*.

_{B}We note that Hecht-Nielsen[24] has developed his feedforward counterpropagation sampling learning technique for learning continuous mappings, and probability density functions that generate mappings, by applying Grossberg's outstar learning theorem[8],[9] and by applying the sampling learning technique to Grossberg's unsupervised competitive learning[2],[23]:

*i*

_{1}, …,

*i*) is a normalized input pattern or probability distribution presented to

_{n}*F*and

_{A}*b*provides competitive modulation, e.g.,

_{j}*b*= 1 if

_{j}*b*wins the

_{j}*F*instar competition for activation and

_{B}*b*= 0 otherwise. For simple autoassociative storage the competitive instar learning law (29) is also dimension bounded for non-sampling learning. No more distributions at

_{j}*F*can be recognized at

_{A}*F*than, obviously, the number p of instar nodes at

_{B}*F*. Yet Hecht-Nielsen[24] has demonstrated that sampling learning with Eq. (29) can learn a sine wave, which has minimal dimensionality, well with thirty neurons and a few hundred random samples, almost perfectly with a few thousand random samples.

_{B}Figures 4–6 display the results of a sampling BAM experiment. *F _{A}* and

*F*each contain forty-nine grayscale neurons arranged in a 7 × 7 pixel tray. The output of the bipolar logistic signal function

_{B}*S*(

*x*) is discretized to six gray-scale levels, where

*S*(

*x*) = −1 is white and

*S*(

*x*) = 1 is black.

*S*(

*x*) = −1 if activation

*x*< −51,

*S*(

*x*) = 1 if

*x*> 51. Forty-eight randomly generated gray-scale noise patterns are presented to the adaptive BAM. The forty-eight samples violate the storage capacity

*m*≪ min(

*n,p*) for simple heteroassociative storage. Figure 4 displays six of these random samples. Twenty-four of the samples are noisy versions of the bipolar association (

*Y,W*); twenty-four are noisy versions of (

*B,Z*). Noise was created by picking numbers in [−60,60] according to a uniform distribution, then adding them to the activation values, −52 or 52, underlying the bivalent signal values making up (

*Y,W*) and (

*B,Z*). Unlike in simple heteroassociative storage, no sample is presented long enough for learning to fully or nearly converge. Samples are briefly presented four at a time—four from the (

*Y,W*) training set, then four from the (

*B,Z*) training set, then the next four from the (

*Y,W*) training set, and so on to exploit the exponentially weighted averaging effects of the signal Hebb learning law (22).

Figure 5 demonstrates recall and abstraction with the sampling adaptive BAM. A new noisy version of *Y* is presented to field *F _{A}*. The initial STM activation across

*F*and

_{A}*F*is random. The BAM converges to the pure bipolar association (

_{B}*Y*,

*W*) it has never experienced but has abstracted from the noisy training samples. As in Plato's theory of ideals—and unlike the naive empiricist denial of abstraction of Locke, Berkeley, and Hume—it is as if the BAM learns redness from red things, smoothness from smooth things, triangularity from triangles, etc., and thereafter associates new red things with redness, not with most-similar old red things.

In Figure 6 the BAM is thinking about the STM reverberation (*Y,W*). A new noisy version of *Z* is presented to field *F _{B}*, superimposing it on the (

*Y,W*) reverberation. The reverberating thought is soon crowded out of STM by the environmental stimulus

*Z*. The BAM again converges to the unobserved pure bipolar association, this time (

*B,Z*), it abstracted from the noisy training samples.

This research was supported by the Air Force Office of Scientific Research (AFOSR F49620-86-C-0070) and the Advanced Research Projects Agency of the Department of Defense under ARPA Order 5794. The author thanks Robert Sasseen for developing all software and graphics.

## Figures

## References

**1. **T. Kohonen, “Correlation Matrix Memories,” IEEE Trans. Comput. **C-21**, 353 (1972). [CrossRef]

**2. **T. Kohonen, *Self-Organization and Associative Memory* (Springer-Verlag, New York, 1984).

**3. **J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones, “Distinctive Features, Categorical Perception, and Probability Learning: Some Applications of a Neural Model,” Psychol. Rev. **84**, 413 (1977). [CrossRef]

**4. **G. A. Carpenter and S. Grossberg, “A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine,” Comput. Vision Graphics Image Process. **37**, 54 (1987). [CrossRef]

**5. **S. Grossberg, “Adaptive Pattern Classification and Universal Recoding, II: Feedback, Expectation, Olfaction, and Illusions,” Biol. Cybern. **23**, 187 (1976). [PubMed]

**6. **S. Grossberg, “A Theory of Human Memory: Self-Organization and Performance of Sensory-Motor Codes, Maps, and Plans,” Prog. Theor. Biol. **5**, 000 (1978).

**7. **S. Grossberg, “How Does a Brain Build a Cognitive Code?,” Psychol. Rev. **87**, 1 (1980). [CrossRef] [PubMed]

**8. **S. Grossberg, *Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control* (Reidel, Boston, 1982).

**9. **S. Grossberg, *The Adaptive Brain, I and II* (North-Holland, Amsterdam, 1987).

**10. **B. Kosko, “Bidirectional Associative Memories,” IEEE Trans. Syst. Man Cybern. **SMC-00**, 000 (1987).

**11. **B. Kosko, “Fuzzy Associative Memories,” in *Fuzzy Expert Systems*, A. Kandel, Ed. (Addison-Wesley, Reading, MA, 1987).

**12. **S. Grossberg, “Contour Enhancement, Short Term Memory, and Constancies in Reverberating Neural Networks,” Stud. Appl. Math. **52**, 217 (1973).

**13. **W. S. McCulloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bull. Math. Biophys. **5**, 115 (1943). [CrossRef]

**14. **B. Kosko, “Fuzzy Entropy and Conditioning,” Inf. Sci. **40**, 165 (1986). [CrossRef]

**15. **J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational Abilities,” Proc. Natl. Acad. Sci. U.S.A. **79**, 2554 (1982). [CrossRef] [PubMed]

**16. **M. A. Cohen and S. Grossberg, “Absolute Stability of Global Pattern Formation and Parallel Memory Storage by Competitive Neural Networks,” IEEE Trans. Syst. Man Cybern. **SMC-13**, 815 (1983). [CrossRef]

**17. **D. B. Parker, “Learning Logic,” Invention Report S81-64, File 1, Office of Technology Licensing, Stanford U. (Oct. 1982).

**18. **D. B. Parker, “Learning Logic,” *TR-47*, Center for Computational Research in Economics and Management Science, MIT (Apr. 1985).

**19. **D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” ICS Report 8506, Institute for Cognitive Science, U. California San Diego (Sept. 1985).

**20. **P. J. Werbos, “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences,” Ph.D. Dissertation in Statistics, Harvard U. (Aug. 1974).

**21. **B. Kosko and C. Guest, “Optical Bidirectional Associative Memories,” Proc. Soc. Photo-Opt. Instrum. Eng. 758, (1987).

**22. **J. J. Hopfield, “Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons,” Proc. Natl. Acad. Sci. U.S.A. **81**, 3088 (1984). [CrossRef] [PubMed]

**23. **S. Grossberg, “Adaptive Pattern Classification and Universal Recoding, I: Parallel Development and Coding of Neural Feature Detectors,” Biol. Cybern. **23**, 121 (1976). [CrossRef] [PubMed]

**24. **R. Hecht-Nielsen, “CounterPropagation Networks,” in *Proceedings, First International Conference on Neural Networks* (IEEE, New York, 1987).