Deep photonic reservoir computer based on frequency multiplexing with fully analog connection&#x00A0;between layers

Alessandro Lupo; Enrico Picco; Marina Zajnulina; Marina Zajnulina; Serge Massar

doi:10.1364/OPTICA.489501

1. INTRODUCTION

Artificial intelligence is probably the most disruptive new technology to emerge during the first decades of the XXI century. Its success is based on the use of deep neural networks in which multiple layers of artificial neurons are connected in a feed-forward architecture [1,2]. Recent advances include, for instance, image classification and analysis [3], game playing [4], protein structure prediction [5,6], chatbots that simulate human conversation such as ChatGPT and Bing [7,8], and more.

Artificial neural networks are fundamentally analog systems simulated on a digital computer. Thus, it seems highly attractive to replace the digital simulation with analog hardware, as this could result in considerable energy savings. Photonics is particularly attractive for analog implementation of neural networks due to its potential for very high speed (see, e.g., [9,10]), parallelism (see, e.g., [11,12]), possibility of implementing spiking networks (see, e.g., [13,14]), and low energy consumption per operation (see, e.g., [15]). The importance of deep neural networks for complex applications has led to several demonstrations of deep photonic networks using on-chip integrated optics [16,17], optoelectronics [18], and a 3D-printed stack of diffractive surfaces [19]. These configurations replicate the mathematical concept of artificial neural networks, i.e., they physically implement layers of linear and nonlinear transformations of the input by deploying optical and opto-electronic components.

However, in order to realize analog implementations of artificial neural networks, one should try to exploit as much as possible the natural dynamics of the employed physical system. Some neural network algorithms, such as extreme learning machines (ELMs) [20] or reservoir computers (RCs) [21–24], are more amenable to physical implementations because only a part of their weights is trained, while the rest can be chosen at random. These random connections can often be replaced by the inherent dynamics of the physical system without loss of performance.

Reservoir computers, which are the topic of the present work, are randomized recurrent neural networks (RNNs) in which the recurrence is provided by a (simulated or physical) high-dimensional nonlinear dynamical system called a “reservoir” [21]. RCs have been successfully implemented in analog systems including photonics, electronics, spintronics, mechanics, biology, and more (see [22] for a review). Many photonic RC implementations use a delay loop and a single dynamical node to encode multiple neurons by means of time multiplexing (as proposed in [25]); see, e.g., [26,27]. Although the time multiplexing architecture is simple to implement, it suffers from an intrinsic slowdown because the time to process an input will be given by the number of neurons multiplied by the time to process a single neuron. Alternative approaches that do not suffer from this inherent slowdown use a form of parallelism such as spatial multiplexing (in free space optics [28] or multimode fibers [29]), a hybrid temporal/spatial approach [30], or frequency multiplexing [31].

As in the case with other types of neural networks, assembling several RCs in a deep architecture enhances the overall system performance in data processing. Deep RCs were first used in [32] and studied in more depth in [33], where it is shown that the serial connection among different RC layers enhances the system performance by enriching its dynamics. Different ways of combining (in series or in parallel) photonic reservoirs into networks are compared in [34]. Motivated by these works, the first experimental implementation of a deep-RC is reported in [35], showing significant improvement in performance when the number of layers is increased. However, in this work each reservoir was implemented using the time multiplexing architecture, which is not optimal in terms of computing speed, and, more importantly, the connection between reservoirs was implemented digitally. The latter is also the case in the related work [36]. Reference [37] proposes an architecture for a deep reservoir based on time delay architecture with analog connection between the layers.

Fig. 1. Left panel: standard reservoir computing scheme. Right panel: deep reservoir computing scheme. The weights in black are fixed, while the weights in red are trained.

Download Full Size | PDF

Here, we report a deep reservoir configuration consisting of two interconnected reservoir layers with a fully analog connection that does not require data storage or processing on a digital computer. Our experiment is based on a recently reported RC in which the neuron signals are encoded in the amplitudes of a frequency comb, while the mixing between the neurons is realized by electro-optic phase modulators [31]. This architecture allows for a relatively easy-to-realize optical output layer, as the output weights can be applied to the comb lines using a programmable spectral filter, while the nonlinear summation of the weighted neurons can be executed by a photodiode. The photodiode measures the total intensity of the weighted frequency comb and introduces a quadratic nonlinearity. This technique, already employed in [31] to generate the output signals with optical weighting, allows us to use the output of a reservoir as an input to a second one without leaving the analog domain. Here, we also fully exploit the frequency degree of freedom of light by using the same hardware for implementing multiple reservoirs simultaneously, each one working in a different slice of the spectrum. In particular, we report two simultaneous RC computations and demonstrate that combining these computations in a deep fashion improves the overall performance as compared to using two independently running parallel reservoirs. We test two strategies for optimizing the interconnections between the layers in the deep configuration. In the first (simpler) approach, we only adjust the strength of the connections, whereas in the second approach, we optimize the connections using the covariance matrix adaptation evolution strategy (CMA-ES) [38]. To our surprise, we find that both approaches yield comparable results.

In Section 2, we present the algorithms, the experimental setup, and the benchmarking methods; in Section 3, we present and discuss results; finally, in Section 4, we present conclusions and outlooks for this work.

2. METHODS

A. Algorithms

1. Reservoir Computing

A reservoir computer (RC; see left panel of Fig. 1) [21] is a recurrent neural network composed of three layers: the input layer, the reservoir layer, and the output layer. Only the output weights are trained, while the input and internal weights are fixed and remain untrained.

The experimental system is based on the frequency multiplexing RC scheme described in [31]. The neurons are encoded in the complex amplitudes of the lines of a frequency comb, and the neuronal interconnections are realized via frequency-domain interference that provides a power exchange between the lines. The electric field in the reservoir can, thus, be expressed as

(1)$$E(t) = \sum\limits_k {x_k}(t)\exp ({i(\omega + k\Omega)t} ),$$

where $\omega$ is the central frequency of the comb, $\Omega$ the frequency spacing between the comb lines, and ${x_k}(t)$ are the slowly varying amplitudes of the comb lines that encode neuron information. To describe the RC application more conveniently, we focus on the $N$ most central lines of the comb, which are the ones encoding information. Moreover, we group the amplitude of these lines in an $N$ dimensional complex vector ${{\textbf x}_n}$ that evolves in slow, discrete time $n$. The discrete timescale corresponds to the discrete evolution of the RC states.

The RC based on frequency multiplexing uses nonlinear input and output layers and a linear reservoir (which is a powerful architecture, as demonstrated in [39]). It can be described by the evolution equations

(2)$${{\textbf x}_n} = {\textbf W} \cdot {{\textbf x}_{n - 1}} + {{\textbf W}_{{\rm in}}} \cdot {f_{{\rm in}}}({{u_n}} ),$$

(3)$${y_n} = {\textbf W}_{{\rm out}}^T \cdot |{{\textbf x}_n}{|^2},$$

where ${u_n}$ (a real scalar) is the input signal to the reservoir at timestep $n$, ${y_n}$ (a real scalar) is the output signal of the reservoir at timestep $n$, ${{\textbf W}}$ is a complex $N \times N$ matrix representing the internal connections of the reservoir, ${{\textbf W}_{{\rm in}}}$ is a complex $N$ dimensional vector representing the input-to-reservoir connections, ${{\textbf W}_{{\rm out}}}$ is a vector of $N$ real readout weights with ${\;^T}$ denoting the transpose, and $|{{\textbf x}_n}{|^2}$ is the vector obtained by taking the norm square of ${{\textbf x}_n}$ elementwise. The output weights are optimized using ridge regression so that the output ${y_n}$ approximates the desired output as well as possible.

In our implementation, the input signal is provided through a Mach–Zehnder modulator operating in the negative quadrature point. Hence, the input nonlinearity ${f_{{\rm in}}}$ is given by the modulator transfer function

(4)$${f_{{\rm in}}}(u) = {E_0} \cdot \sin ({\gamma \cdot u} ),$$

where ${E_0}$ represents the input radiation amplitude, and $\gamma$ is the driving strength of the electrical signal to the modulator.

Equation (3) can be implemented by measuring each component of $|{{\textbf x}_n}{|^2}$ and then carrying out the scalar product offline, i.e., on a digital computer. This is the method used in the present work. However, we note that the output ${y_n}$ can also be obtained directly in the analog domain using the following procedure [31,40]. The optical signal is sent to a programmable spectral filter with two outputs yielding two signals ${\textbf W}_{{\rm out}}^ + \cdot {{\textbf x}_n}$ and ${\textbf W}_{{\rm out}}^ - \cdot {{\textbf x}_n}$ (each given by a complex $N$ dimensional vector representing the comb line amplitudes) where ${\textbf W}_{{\rm out}}^ +$ and ${\textbf W}_{{\rm out}}^ -$ are $N \times N$ diagonal matrices with positive real coefficients corresponding respectively to the square root of the positive and negative elements of ${{\textbf W}_{{\rm out}}}$. These two signals are then sent to two photodiodes that measure their total power and the difference of the powers is computed. Accordingly, the output reads as

(5)$${y_n} = {\left| {{\textbf W}_{{\rm out}}^ + \cdot {{\textbf x}_n}} \right|^2} - {\left| {{\textbf W}_{{\rm out}}^ - \cdot {{\textbf x}_n}} \right|^2},$$

where $| \cdot {|^2}$ denotes taking the norm square of a vector.

2. Deep Reservoir Computing

A deep reservoir computer (deep-RC; see right panel of Fig. 1) is a stack of RC layers connected in series. The deep-RC output signal is a linear combination of neuron values of each reservoir. The hierarchy introduced by the serial connection enhances the network performance as the different reservoirs have different dynamics, thus enriching the states of the full deep-RC.

A deep-RC composed of ${N_{{\rm layers}}}$ layers, each one comprising $N$ neurons, as implemented in our system, is described by the set of equations

(6)$${\textbf x}_n^{{\rm (1)}} = {{\textbf W}^{{\rm (1)}}} \cdot {\textbf x}_{n - 1}^{{\rm (1)}} + {\textbf W}_{{\rm in}}^{{\rm (1)}} \cdot {f_{{\rm in}}}({{u_n}} ),$$

(7)$${\textbf x}_n^{{\rm (i)}} = {{\textbf W}^{{\rm (i)}}} \cdot {\textbf x}_{n - 1}^{{\rm (i)}} + {\textbf W}_{{\rm in}}^{{\rm (i)}} \cdot {f_{{\rm in}}}({u_n^{(i)}} ),\quad i = 2, \ldots ,{N_{{\rm layers}}},$$

(8)$$u_n^{(i + 1)} = {\left| {{\textbf W}_{{\rm out}}^{{\rm (i)}} \cdot {\textbf x}_n^{{\rm (i)}}} \right|^2},\quad i = 1, \ldots ,{N_{{\rm layers}}} - 1,$$

(9)$$y_n^{({\rm A})} = {\textbf W}_{{\rm out}}^{{\rm (A)}T} \cdot |{\textbf x}_n^{{\rm (A)}}{|^2},$$

where the superscript $(i)$, $1 \le i \le {N_{{\rm layers}}}$, identifies the reservoir layer. As before, ${{\textbf W}^{{\rm (i)}}}$ is a complex $N \times N$ matrix representing the internal connections of the $i$-th reservoir layer, ${\textbf W}_{{\rm in}}^{{\rm (i)}}$ is a complex $N$ dimensional vector representing the input connections of the $i$-th reservoir layer, ${\textbf W}_{{\rm out}}^{{\rm (i)}}$ is an $N \times N$ diagonal matrix with positive real coefficients representing the output connections of the $i$-th layer, and $| \cdot |$ in Eq. (8) denotes the norm square of the argument, which is a vector. In the experiment, we use a two-layer configuration, i.e., ${N_{{\rm layers}}} = 2$, but the equations easily generalize to more layers. The first reservoir layer is driven by the input time series ${u_n}$, while the subsequent reservoir layers are driven by the outputs $u_n^{(i)}$ of the preceding layers [Eq. (8)]. Note that in our implementation, the connections between the consecutive layers only consist of positive weights, contained in the diagonal of ${\textbf W}_{{\rm out}}^{{\rm (i)}}$. This is why there is only a single term on the right hand side of Eq. (8) [as compared to Eq. (5)]. We note that these equations do not account for possible delays between the consecutive RC layers introduced by the experimental setup (e.g., the length of optical or electrical connections). This corresponds to the situation in our experiment, in which these delays are compensated via digital postprocessing. We note that such delays can in principle always be compensated for by adding optical or electrical delay lines of appropriate length.

The deep-RC output, $y_n^{({\rm A})}$, is obtained by combining the states from all layers, i.e., by taking a linear combination of the intensities of all the comb lines with a photodiode. To express this, we have defined ${\textbf x}_n^{{\rm (A)}} = \left({{\textbf x}_n^{{\rm (1)}},{\textbf x}_n^{{\rm (2)}}, \ldots ,{\textbf x}_n^{({N_{{\rm layers}}})}}\right)$ as the complex vector of size ${N_{{\rm layers}}} \cdot N$ representing the full deep-RC state at timestep $n$, and ${\textbf W}_{{\rm out}}^{{\rm (A)}}$ as the vector of ${N_{{\rm layers}}} \cdot N$ real output weights. The output weights are optimized using ridge regression.

Note that the interconnection between the consecutive layers (say layers $i$ and $i + 1$) is determined by $3N$ real parameters: the $N$ positive real elements of the diagonal matrix ${\textbf W}_{{\rm out}}^{{\rm (i)}}$, and both the real and imaginary parts of the $N$ elements of the vector ${\textbf W}_{{\rm in}}^{{\rm (i} + {1)}}$. Of these, only $N$ elements of the diagonal matrix ${\textbf W}_{{\rm out}}^{{\rm (i)}}$ can be tuned in our experimental setup. (This is to be compared with the proposal of [33], where the interconnection is given by an $N \times N$ random matrix, whose spectral radius is tuned. More advanced algorithms and topologies, such as presented in [34,35,41], aim to achieve more freedom in tuning interconnections.)

B. Experimental Setup

Experimental Setup. The experimental system is based on [31], modified such that it supports two RC computations at the same time.

Fig. 2. Experimental setup. Optical connections are in blue, electrical connections in red. MZM, lithium niobate Mach–Zehnder modulator; AWG, arbitrary waveform generator; C, fiber couplers; EDFA, erbium-doped-fiber amplifier; PM, phase modulator; RF source, radio frequency source at frequency $\Omega$; RF AMP, radio frequency amplifiers; PSF, programmable spectral filter; PD, photodiode; ES, electric switch.

Download Full Size | PDF

Fig. 3. Normalized spectral power of the radiation as measured at the output of the fiber loop, after coupler C 3. Red markers indicate the input wavelengths ${\lambda _1} = 1550.2\,{\rm nm}$ and ${\lambda _2} = 1555.4\,{\rm nm}$.

Download Full Size | PDF

Figure 2 shows the schematic of the experiment. All fiber connections and couplers are single-mode and polarization-maintaining. We employ two continuous-wave laser sources (CW source 1 and CW source 2) at wavelengths ${\lambda _1} = 1550.2\,\,{\rm nm}$ and ${\lambda _2} = 1555.4\,\,{\rm nm}$. The laser outputs are modulated by two Mach–Zehnder modulators (MZM 1 and MZM 2). Both MZMs are biased to operate in the negative quadrature point (bias controllers are not shown in Fig. 2). The transfer functions of MZM 1 and MZM 2 define the input nonlinearities of the two RC layers, ${f_{{\rm in}}}$ in Eqs. (4), (6), and (7). MZM 1 is driven by an arbitrary waveform generator (AWG 1) which supplies the input signal $u_n^{(1)}$. MZM 2 can be driven by a second arbitrary waveform generator (AWG 2) or the output of another photodiode (PD 2). The modulated signals are merged together in a 50/50 fiber coupler (C 1) and then injected into an erbium-doped-fiber amplifier (EDFA 1). EDFA 1 raises the total power to 9 dBm, which is equally distributed between the signals. After the amplification, both signals are injected into a phase modulator (PM 1). PM 1 is driven by a sinusoidal radio-frequency signal (frequency $\Omega \approx 17\,\,{\rm GHz}$, power ${\rm P}1 \approx 30\,\,{\rm dBm}$). The radio-frequency signal is generated by an RF clock (RF source) and amplified by an RF amplifier (RF AMP 1). The phase modulation provided by PM 1 generates two frequency combs centered at ${\lambda _1}$ and ${\lambda _2}$ (Fig. 3). The spacing of the comb lines is equal to $\Omega$, and the number of lines depends on P1. In our implementation, PM 1 provides approximately 20 usable comb lines per comb, i.e., 20 neurons. Both frequency combs constitute the input stimuli for the two reservoir networks. The amplitude of each line determines how strongly the input signal is coupled to the particular neuron encoded in that line. Hence, the distribution of (complex) amplitudes among the comb lines defines two vectors of the input-to-reservoir weights, ${\textbf W}_{{\rm in}}^{{\rm (1)}}$ and ${\textbf W}_{{\rm in}}^{{\rm (2)}}$. Both frequency combs are injected into a fiber loop through a 30/70 coupler (C 2). The fiber loop is 15 m long, corresponding to a roundtrip frequency of approximately 20 MHz. The input signals are synchronized with the roundtrip time of the loop such that each timestep of the input signals entirely fills the loop. Hence, the processing frequency of our system is fixed by the cavity length and is approximately 20 MHz. The fiber loop contains a second phase modulator (PM 2) and an optical amplifier (EDFA 2). PM 2 is driven by a signal generated by the same RF source as PM 1, but it undergoes a different amplification (RF AMP 2). Hence, it has the same frequency but a different power, ${\rm P}2 \approx 20\;{\rm dBm} $. The phase modulation provided by PM 2 creates frequency interference among the lines of the same comb, thus implementing the (complex-weighted) connection between the neurons of the same reservoir. EDFA 2 compensates for the losses in the loop. The transformation of the combs over a roundtrip, including the effects of phase modulation, amplification, and dispersion (which acts differently on each comb line/neuron), defines the matrices ${{\textbf W}^{{\rm (1)}}}$ and ${{\textbf W}^{{\rm (2)}}}$. The amplitudes of both combs at each roundtrip $n$ provide the states of the two reservoirs ${\textbf x}_n^{{\rm (1)}}$ and ${\textbf x}_n^{{\rm (2)}}$.

A part of the circulating radiation is extracted by a 20/80 fiber coupler (C 3), amplified by EDFA 3, and directed to the readout circuit. The readout consists of a multi-channel programmable spectral filter (PSF, Coherent II-VI Waveshaper, with resolution 0.01 dB) and two photodiodes (PD 1 and PD 2), measuring each of the two PSF outputs. The first PSF channel, connected to PD 1, is employed to measure the evolution of both reservoirs. The measurement procedure consists of selecting a single comb line per time, by setting a band-pass filter on the PSF channel, and recording the intensity of this comb line by PD 1. At the end of the procedure, the intensities of all comb lines, i.e., the norm square of the components of vectors ${\textbf x}_n^{{\rm (1)}}$ and ${\textbf x}_n^{{\rm (2)}}$, are recorded on a computer. Ridge regression is employed to train the output weights ${\textbf W}_{{\rm out}}^{{\rm (A)}}$ (with a regularization parameter of ${10^{- 5}}$, considering neuron signals of the order of one). The output of the reservoir is then obtained by multiplying measured line intensities by trained output weights. Note that the training can only be realized with the support of a digital computer, while the application of the output weights can be realized optically in the analog domain [Eq. (5)].

Fig. 4. Three tested configurations for the two RCs. “Reservoir ${\lambda _1}$” is encoded in the frequency comb centered around ${\lambda _1}$, while “reservoir ${\lambda _2}$” is encoded in the frequency comb centered around ${\lambda _2}$. Both reservoirs are executed on the same photonic substrate. (a) Shallow-RC: one of the two reservoirs performs the benchmark task as a traditional RC, while the other reservoir processes a different time series in parallel. (b) Parallel-RC: both reservoirs process the same input time series, but their dynamics are decoupled from each other. A single output layer is trained, which combines signals from both reservoirs. (c) Deep-RC: two reservoirs constitute two layers of a deep-RC.

Download Full Size | PDF

Operation Modes. We use two operation modes: “deep” and “independent.”

In the deep-RC mode, the second channel of the programmable spectral filter is configured to select and transmit only the comb centered at ${\lambda _1}$ after having applied an attenuation mask ${\textbf W}_{{\rm out}}^{{\rm (1)}}$. Consequently, PD 2 measures the signal $u_n^{(2)} = {| {{\textbf W}_{{\rm out}}^{{\rm (1)}} \cdot {\textbf x}_n^{{\rm (1)}}} |^2}$. The output of PD 2 drives MZM 2, and thus constitutes the input of the second RC at ${\lambda _2}$. In this configuration, the system is a two-layer deep-RC, as described in Subsection 2.A.2.

In the independent mode, both RC computations are decoupled by driving MZM 2 through a second, independent, arbitrary waveform generator AWG 2 (the second channel of PSF and PD 2 are deactivated). Thus, the computations do not interact with each other and are carried out independently.

The selection of the computation mode, deep or independent, is made by flipping an electric switch that selects whether MZM 2 is driven by PD 2 or by AWG 2, as illustrated in Fig. 2.

Stabilization. The experimental setup is sensitive to acoustic noise and thermal drift. To limit these effects, the optical loop, including PM 2 and EDFA 2, is mounted inside an insulated box on an optical table. Furthermore, two PID controllers piezo-tune the emission wavelengths of both laser sources in order to fix the operating condition to a certain point in the loop transfer function. The PID controllers are fed by the intensity of the reflection of each comb at the entrance of the loop, i.e., at the coupler C 2. This requires two auxiliary photodiodes and spectral filters (not represented in Fig. 2).

C. Benchmark Tasks

We selected two benchmark tasks, the first consisting of the prediction of the evolution of a chaotic time series, and the second one consisting of the compensation of the distortion produced in a nonlinear communication channel.

The time series prediction task is based on the infrared laser dataset of the Santa Fe Time Series Competition [42]. The time series ${u_t}$ is supplied as input, and the task consists of producing ${u_{t + \tau}}$, with ${-}5 \le \tau \le + 5$. Note that when the timeshift $\tau$ is negative, the task consists of remembering the past, while when the timeshift $\tau$ is positive, the task consists of predicting the future. The accuracy is expressed in terms of the normalized mean square error (NMSE) between the target signal and the produced output. When running this benchmark, the training set is composed of 6000 timesteps, and the testing set is composed of 2500 timesteps (this is a standard 70%–30% repartition). We discard the first 500 timesteps of the reservoir output to avoid operating in a transient phase.

The nonlinear channel compensation task was first used in the RC community in [21]. A random signal composed of four different symbols is propagated along a simulated channel exhibiting nonlinearity, noise, and memory about past inputs. The task consists of reconstructing the original input given the channel output. The performance is evaluated for different signal-to-noise ratios (SNRs) in the range of [8 dB, 32 dB]. The results are expressed in terms of the symbol error rate (SER), i.e., the ratio of wrongly reconstructed output symbols over the total number of transmitted symbols. When running this benchmark, the training set is composed of 14,000 timesteps, and the testing set is composed of 30,000 timesteps. We discard the first 1000 timesteps of the reservoir output to avoid operating in a transient phase. Note that, contrary to the time series benchmark relying on a limited dataset, the nonlinear channel dataset can be easily generated on the fly. This is why we employed a larger amount of data points for the initial wash-out and the testing.

Every benchmark result has been validated through a 100-step cross-validation, meaning that the points belonging to the train and test datasets have been selected at random for 100 times and the results have been averaged.

D. Tested Configurations

Our photonic system supports two RCs that operate simultaneously, either independently, i.e., in parallel, or connected in series. We evaluated the performance of three different configurations, described in Fig. 1. First, we consider a “shallow-RC” configuration [Fig. 4(a)] where only one of two independent RCs executes the benchmark task, constituting a “traditional” RC as described in Section 2.A.1. In this configuration, the second RC processes a different, not evaluated, computation with the purpose of simulating a parallel-computation scenario where two different tasks are performed at the same time. Second, we study a “parallel-RC” configuration [Fig. 4(b)] where both independent RCs execute the same task in an uncorrelated way, and a single output layer is connected to both reservoirs. This constitutes a “non-deep” way of using the full computational capabilities of the system on a single task. Third, we consider a “deep-RC” configuration [Fig. 4(c)] where two independent RCs are connected in series as described in Section 2.A.2.

Fig. 5. Performance of the reservoir computers in shallow-RC configuration as a function of $\Omega$ on the channel equalization task (top) and the Santa Fe time series prediction task for prediction 1 timestep ahead (bottom). The complex dependence on $\Omega$ is due to the dispersion in the optical fiber. The dispersion is also the reason why the dependence on $\Omega$ is different for RC-1 and RC-2, as they use frequency combs centered on different wavelengths. (As these plots are time consuming to obtain, a reduced number of comb lines $N = 14$ was used).

Download Full Size | PDF

Fig. 6. Experimental results for the three operation modes (shallow-RC, parallel-RC, and deep-RC) on the two selected benchmark tasks: (a) nonlinear channel equalization and (b) chaotic time series prediction. Deep-RC results are shown for both optimization methods presented in the text (uniform optimized attenuation $\alpha$ and CMA-ES). Error bars represent the score standard deviation measured in cross-validation phase. Results in (a) are expressed as symbol error rate (SER) versus signal-to-noise ratio (SNR). Results in (b) are expressed as normalized mean square error (NMSE) versus shift of the target time series with respect to the input one. When the shift is positive, the task consists of predicting the future; when the shift is zero, the task consists of reproducing the present input; when the shift is negative, the task consists of reproducing the past.

Download Full Size | PDF

In addition, in the deep-RC configuration, we used two different methods to tune the weights ${\textbf W}_{{\rm out}}^{{\rm (1)}}$, i.e., the attenuations applied by the PSF that determine the connection from the first RC layer to the second one. In the first, simplest approach, we apply the same attenuation to all comb lines, corresponding to ${\textbf W}_{{\rm out}}^{{\rm (1)}} = {\rm diag}(\alpha)$, and we optimize the overall attenuation ${\alpha ^2}$ by sweeping it in the range of $[- 20\;{\rm dB},0\;{\rm dB}]$. In the second approach, we optimize all the coefficients of ${\textbf W}_{{\rm out}}^{{\rm (1)}}$ by using the covariance matrix adaptation evolution strategy (CMA-ES) optimization algorithm [38]. CMA-ES is a standard tool for continuous black-box optimization, already used in the context of reservoir computing in [41]. The algorithm consists of sampling possible solutions from a multivariate Gaussian distribution whose parameters (mean and covariance) are tuned based on the performance of the solutions sampled in the previous epochs. In our case, the optimization runs over six epochs, using a population of 13 sampled solutions per epoch. For each choice of weights ${\textbf W}_{{\rm out}}^{{\rm (1)}}$, we optimize the output weights and then use the performance on the corresponding RC as a measure of fitness. Independent of the strategy (sweeping of ${\alpha ^2}$ or CMA-ES), the purpose of the optimization is to find the configuration that maximizes the network performance.

Finally, to improve the reservoir computing performance, we tune the comb line spacing $\Omega$ to the best-performing value for each task. The fiber loop constitutes a spectral interferometer and exhibits, due to dispersion in the fiber, a complex behavior strongly dependent on $\Omega$. This is illustrated in Fig. 5, where the performance of the shallow-RC configuration for two different tasks is plotted as a function of $\Omega$ for both reservoirs.

3. RESULTS

The results of the benchmark tasks are shown in Fig. 6 for the three operation modes: shallow-RC, parallel-RC, and deep-RC. In this figure, the deep-RC results are shown for both optimization techniques described in Section 2.D. The nonlinear-channel equalization results [Fig. 6(a)] show the expected decrease of the symbol error rate (SER) with increasing signal-to-noise ratio (SNR). This is because additional noise raises the complexity of the task, which eventually makes correcting the signal distortion impossible. For high-SNR values, both shallow-RC and parallel-RC SER scores saturate, while the deep-RC SER score maintains an exponential decay for increasing SNR values. For every SNR value, the deep-RC always performs better, followed by the parallel-RC and finally shallow-RC. A similar behavior is found in the results of the chaotic time series prediction task [Fig. 6(b)].

Two trends are clearly visible in Fig. 6. First, the parallel-RC systematically outperforms the shallow-RC. Indeed, since two parallel-RCs perform different computations (as is evident from Fig. 5), using both reservoirs in parallel should perform at least as well as using a single reservoir. Second, the deep-RC outperforms the parallel-RC in every test we conducted. Both configurations exploit the same number of neurons and differ only by their topology. Thus, we conclude that the serial configuration in a deep-RC really boosts RC’s overall performance.

We observe that both optimization techniques for the inter-layer connection perform comparably, with the simpler algorithm sometimes outperforming the CMA-ES algorithm. We identified two reasons for this behavior. First, the CMA-ES algorithm could get stuck in a local minimum. Second, the search for the optimal set of weights could be affected by slow drifts in the operating conditions of the deep-RC. For example, as reported in [31], the transfer function of the fiber loop constituting the reservoir is strongly sensitive to temperature changes. This is because the thermally induced fiber-length variation affects the relative phases of the comb lines, thus, changing the dynamics of the system. The complexity of the system is well captured in Fig. 5, which shows how a small change in a parameter significantly affects the performance. Improved stabilization should resolve this issue in the future.

4. CONCLUSION

We presented a fully analog photonic implementation of a two-layer deep reservoir computer. The connection between the layers is performed in the analog domain, with no processing or storing on a digital computer. The presented implementation also allows for two independent RC computations to be executed at the same time. We found that the deep-RC configuration, obtained by connecting two RCs in series, performs better than a parallel-RC configuration, where two RCs process the same input data without interacting.

The reported experiment has only two layers, but deeper schemes are in principle possible. New layers can be added to the deep-RC by using more than two lasers, provided that the generated combs do not overlap each other. The C band could host 10 parallel computations (considering combs with widths of 3 nm; see Fig. 3). These 10 parallel computations could be employed to constitute a single 10-layer deep-RC, or even multiple deep-RCs running in parallel, each one composed of fewer layers. On the other hand, broader combs would be able to encode more neurons in each reservoir. Thus, a balance between the number of layers and the number of neurons per layer has to be searched for. In any case, integrating (partially or entirely) the experiment, as proposed in [43], could be a route to scaling up the system while simplifying its stabilization.

We note that, for simplicity, in the present experiment, the final output ${y_n}$ was obtained digitally by carrying out the linear combination as described in Eq. (3). However, as described in Eq. (5) and reported in our previous works [31,40], the output can also be obtained in the analog domain without loss of performance.

Although we explored two strategies for optimizing the interconnection between the two deep-RC layers, many ideas are still to be tested (see, e.g., [33,35,36]) and could be the object of further investigation.

In summary, developing deep architectures for neuromorphic photonic computing is a highly promising avenue for increasing both the complexity of the tasks that can be solved and the system performance. However, the presence of analog-to-digital or digital-to-analog converters strongly affects the power consumption and footprint, and hence it is to be avoided. We have demonstrated that this is possible for photonic deep reservoir computing.

Funding

European Commission [Marie Sklodowska-Curie 860830 (POST DIGITAL)]; Fonds De La Recherche Scientifique - FNRS [Excellence of Science (EOS) 40007536].

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. Y. Bengio, “Learning deep architectures for AI,” Foundations Trends Mach. Learn. 2, 1–127 (2009). [CrossRef]

2. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015). [CrossRef]

3. S. Ravidran, “Five ways deep learning has transformed image analysis,” Nature 609, 864–866 (2022). [CrossRef]

4. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science 362, 1140–1144 (2018). [CrossRef]

5. M. Baek, F. DiMaio, I. Anishchenko, et al., “Accurate prediction of protein structures and interactions using a three-track neural network,” Science 373, 871–876 (2021). [CrossRef]

6. J. Jumper, R. Evans, A. Pritzel, et al., “Highly accurate protein structure prediction with alphafold,” Nature 596, 583–589 (2021). [CrossRef]

7. OpenAI, https://openai.com/about. Accessed: March 2023.

8. Bing Chat, https://www.bing.com. Accessed: March 2023.

9. X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 tops photonic convolutional accelerator for optical neural networks,” Nature 589, 44–51 (2021). [CrossRef]

10. J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. L. Gallo, X. Fu, A. Lukashchuk, A. S. Raja, J. Liu, C. D. Wright, A. Sebastian, T. J. Kippenberg, W. H. P. Pernice, and H. Bhaskaran, “Parallel convolutional processing using an integrated photonic tensor core,” Nature 589, 52–58 (2021). [CrossRef]

11. A. Liutkus, D. Martina, S. Popoff, G. Chardon, O. Katz, G. Lerosey, S. Gigan, L. Daudet, and I. Carron, “Imaging with nature: Compressive imaging using a multiply scattering medium,” Sci. Rep. 4, 5552 (2014). [CrossRef]

12. A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Drémeau, S. Gigan, and F. Krzakala, “Random projections through multiple optical scattering: Approximating kernels at the speed of light,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 6215–6219.

13. J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569, 208–214 (2019). [CrossRef]

14. A. Jha, C. Huang, H.-T. Peng, B. Shastri, and P. R. Prucnal, “Photonic spiking neural networks and graphene-on-silicon spiking neurons,” J. Lightwave Technol. 40, 2901–2914 (2022). [CrossRef]

15. R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, “Large-scale optical neural networks based on photoelectric multiplication,” Phys. Rev. X 9, 021032 (2019). [CrossRef]

16. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441–446 (2017). [CrossRef]

17. F. Ashtiani, A. J. Geers, and F. Aflatouni, “An on-chip photonic deep neural network for image classification,” Nature 606, 501–506 (2022). [CrossRef]

18. T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, and Q. Dai, “Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit,” Nat. Photonics 15, 367–373 (2021). [CrossRef]

19. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361, 1004–1008 (2018). [CrossRef]

20. G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing 70, 489–501 (2006). [CrossRef]

21. H. Jaeger and H. Haas, “Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication,” Science 304,78–80 (2004). [CrossRef]

22. G. Tanaka, T. Yamane, J. B. Héroux, R. Nakane, N. Kanazawa, S. Takeda, H. Numata, D. Nakano, and A. Hirose, “Recent advances in physical reservoir computing: A review,” Neural Netw. 115, 100–123 (2019). [CrossRef]

23. G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature 588, 39–47 (2020). [CrossRef]

24. D. Marković, A. Mizrahi, D. Querlioz, and J. Grollier, “Physics for neuromorphic computing,” Nat. Rev. Phys. 2, 499–510 (2020). [CrossRef]

25. L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011). [CrossRef]

26. D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer, “Parallel photonic information processing at gigabyte per second data rates using transient states,” Nat. Commun. 4, 1364 (2013). [CrossRef]

27. L. Larger, A. Baylón-Fuentes, R. Martinenghi, V. S. Udaltsov, Y. K. Chembo, and M. Jacquot, “High-speed photonic reservoir computing using a time-delay-based architecture: Million words per second classification,” Phys. Rev. X 7, 011015 (2017). [CrossRef]

28. M. Rafayelyan, J. Dong, Y. Tan, F. Krzakala, and S. Gigan, “Large-scale optical reservoir computing for spatiotemporal chaotic systems prediction,” Phys. Rev. X 10, 041037 (2020). [CrossRef]

29. S. Sunada, K. Kanno, and A. Uchida, “Using multidimensional speckle dynamics for high-speed, large-scale, parallel photonic computing,” Opt. Express 28, 30349–30361 (2020). [CrossRef]

30. M. Nakajima, K. Tanaka, and T. Hashimoto, “Scalable reservoir computing on coherent linear photonic processor,” Commun. Phys. 4, 20 (2021). [CrossRef]

31. L. Butschek, A. Akrout, E. Dimitriadou, A. Lupo, M. Haelterman, and S. Massar, “Photonic reservoir computer based on frequency multiplexing,” Opt. Lett. 47, 782–785 (2022). [CrossRef]

32. F. Triefenbach, A. Jalalvand, B. Schrauwen, and J.-P. Martens, “Phoneme recognition with large hierarchical reservoirs,” in Advances in Neural Information Processing Systems (2010), Vol. 23.

33. C. Gallicchio, A. Micheli, and L. Pedrelli, “Deep reservoir computing: A critical experimental analysis,” Neurocomputing 268, 87–99 (2017). [CrossRef]

34. M. Freiberger, S. Sackesyn, C. Ma, A. Katumba, P. Bienstman, and J. Dambre, “Improving time series recognition and prediction with networks and ensembles of passive photonic reservoirs,” IEEE J. Sel. Top. Quantum Electron. 26, 7700611 (2019). [CrossRef]

35. M. Nakajima, K. Inoue, K. Tanaka, Y. Kuniyoshi, T. Hashimoto, and K. Nakajima, “Physical deep learning with biologically inspired training method: gradient-free approach for physical hardware,” Nat. Commun. 13, 7847 (2022). [CrossRef]

36. L. G. Wright, T. Onodera, M. M. Stein, T. Wang, D. T. Schachter, Z. Hu, and P. L. McMahon, “Deep physical neural networks trained with backpropagation,” Nature 601, 549–555 (2022). [CrossRef]

37. B.-D. Lin, Y.-W. Shen, J.-Y. Tang, J. Yu, X. He, and C. Wang, “Deep time-delay reservoir computing with cascading injection-locked lasers,” IEEE J. Sel. Top. Quantum Electron. 29, 7600408 (2022). [CrossRef]

38. N. Hansen, “The cma evolution strategy: a comparing review,” in Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms (2006), pp. 75–102.

39. Q. Vinckier, F. Duport, A. Smerieri, K. Vandoorne, P. Bienstman, M. Haelterman, and S. Massar, “High-performance photonic reservoir computer based on a coherently driven passive cavity,” Optica 2, 438–446 (2015). [CrossRef]

40. A. Lupo, L. Butschek, and S. Massar, “Photonic extreme learning machine based on frequency multiplexing,” Opt. Express 29, 28257–28276 (2021). [CrossRef]

41. M. Freiberger, A. Katumba, P. Bienstman, and J. Dambre, “Training passive photonic reservoirs with integrated optical readout,” IEEE Trans. Neural Netw. Learn. Syst. 30, 1943–1953 (2019). [CrossRef]

42. C.-O. Weiss, U. Hübner, N. B. Abraham, and D. Tang, “Lorenz-like chaos in NH3-FIR lasers,” Infrared Phys. Technol. 36, 489–512 (1995). [CrossRef]

43. W. Kassa, E. Dimitriadou, M. Haelterman, S. Massar, and E. Bente, “Towards integrated parallel photonic reservoir computing based on frequency multiplexing,” Proc. SPIE 10689, 1068903 (2018). [CrossRef]

Deep photonic reservoir computer based on frequency multiplexing with fully analog connection between layers

Abstract

1. INTRODUCTION

2. METHODS

A. Algorithms

1. Reservoir Computing

2. Deep Reservoir Computing

B. Experimental Setup

C. Benchmark Tasks

D. Tested Configurations

3. RESULTS

4. CONCLUSION

Funding

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (6)

Equations (9)

Optica