PMONN: an optical neural network for photonic integrated circuits based on micro-resonator

Jingya Ding; Jingya Ding; Jingya Ding; Lianqing Zhu; Lianqing Zhu; Lianqing Zhu; Mingxin Yu; Mingxin Yu; Mingxin Yu; Lidan Lu; Lidan Lu; Penghao Hu; Penghao Hu

doi:10.1364/OE.511245

1. Introduction

The Artificial Intelligence network (ANN) has found widespread applications in various fields such as object detection, autonomous driving, robot vision, and more [1–6]. The demand for computing power in ANN has been continuously rising with its increasing scale and volume [7,8]. Due to the slowdown of Moore's Law, the current computing power of hardware struggles to meet the growing needs of ANN [9–11]. In this context, exploring alternative methods to enhance computing power, improve speed, achieve real-time performance, and reduce power consumption becomes crucial.

Compared to electricity, light possesses advantages like massive parallelism, low power consumption, low latency, and high bandwidth [12–17]. This makes it suitable for large-scale matrix operations in ANN. Optical Neural Networks (ONNs) [18–22] mainly comprise D²NN [23–26] Neural Networks, MZI-based [27–29] Optical Neural Networks, and micro-resonator-based Optical Neural Networks (MONNs) [30–32]. Among them, MONNs utilizing micro-resonators (MRRs) as basic calculation units offer advantages such as compact structure, small size, and easy integration [33–35].

Recent research efforts, such as [36], proposed innovative architecture like Digital Electronic and Analog Photonic (DEAP) CNN hardware, achieving a high accuracy of 98.6% for MNIST classification. Similarly, [32] implemented a micro-resonator-based optical convolution accelerator with an impressive speed of 10 TOPS and an accuracy of 88% for MNIST classification. [37] introduced a micro comb-based integrated photonic processing unit with an accuracy of 96.6% which was tested by the MNIST dataset, and [38] developed a 4-bit, 5-bit, and 6-bit quantized photonic neural networks based on MRRs, which trained by MNIST dataset, with accuracy rates of 94.23%, 94.73%, and 96.11%, respectively.

While these advancements have improved the speed of convolution computation in MONNs, challenges remain: (1) Existing MONNs derive inference models from trained electronic neural networks, lacking a dedicated photonic convolution algorithm for Photonic Integrated Circuits (PICs) based on MRR. (2) Previous studies consistently used fixed amplifications to amplify the MRR convolution weights, thereby limiting the flexibility and reusability of PICs. (3) Some optimization algorithms of electronic neural networks, such as Batch Normalization (BN) [39] are still not utilized in MONNs.

To address these challenges, we propose the Phase-based Micro-resonator Optical Neural Network (PMONN). Key contributions include: (1) Proposing a PMONN based on MRR with a core Convolutions and Batch Normalization (CB) unit comprising a PB convolutional layer, DPW convolutional layer, and RBN layer. (2) The PB convolution kernel uses modulable phase shifts of Add-drop MRRs as learnable parameters and optical transfer function values as convolutional weights. (3) Proposing a DPW convolutional layer to learn the PB convolution amplifications, and reconstructing the BN algorithm as the RBN algorithm. (4) After training, the DPW layer and RBN layer of the inference model are merged, and the merged layer is implemented by a tunable differential amplifier (DA) array in the PICs.

2. Theory

2.1 MRR

In recent years, MRR as an optical waveguide device has been progressively used in optical neural network architectures. The MRR works by transmitting light through a directional coupler to the ring and then recombining the light. Changing the effective refractive index between the directional coupler and the ring will cause a phase shift of the light, which will interfere with the intensity of the original light. When the radius of the MRR is determined, the effective refractive index of the device can be changed by the plasma dispersion effect or thermo-optic effect, to make the light phase shift. The relation between the phase shift and the effective refractive index is illustrated in Eq. (1).

(1)$$\Delta \phi = \frac{{4{\pi ^2}d{n_{eff}}}}{\lambda }$$

where d is the radius of the MRR, ${n_{eff}}$ is the variation of the effective refraction index, and $\lambda$ is the resonant wavelength of light.

The MRR has two basic structures: the All-pass MRR and the Add-drop MRR. The optical transfer function between the output light intensity and the input light intensity of MRR depends on the phase shift. As shown in Fig. 1, Fig. 1(a) is an All-pass MRR consisting of a directional coupler and a ring, containing an Input port and a Through port. Fig. 1(b) shows the output optical intensity curve of the Through port for the phase shift. The optical transfer function of the Through port is shown in Eq. (2).

(2)$${T_n}(\Delta \phi ) = \frac{{{a^2} - 2ra\cos (\Delta \phi ) + {r^2}}}{{1 - 2ra\cos (\Delta \phi ) + {{(ra)}^2}}}$$

where $\Delta \phi$ is the phase shift, r is the self-coupling coefficient, and a is the propagation loss from the ring and the directional coupler.

Fig. 1. The structures and curves of All-pass MRR and Add-drop MRR. (a) All-pass MRR structure. (b) The output optical intensity curve of the All-pass MRR. (c) Add-drop MRR structure. (d) The output optical intensity curve of the (Drop-Through) of the Add-drop MRR.

Download Full Size | PDF

Fig. 1(c) shows the structure of the Add-drop MRR, which consists of two directional couplers and a ring. The Add-drop MRR includes an Input port and two output ports: the Through port and the Drop port. The optical transfer function of the Through port is shown in Eq. (3). The optical transfer function of the Drop port is demonstrated in Eq. (4). Fig. 1(d) depicts the output optical intensity of the Drop port minus the Through port. The equation is shown in Eq. (5).

(3)$$Tp(\Delta \phi ) = \frac{{{{({r_2} \times a)}^2} - 2 \times {r_1} \times {r_2} \times a \times \cos (\Delta \phi ) + r_1^2}}{{1 - 2 \times {r_1} \times {r_2} \times a \times \cos (\Delta \phi ) + {{({r_1} \times {r_2} \times a)}^2}}}$$

(4)$$Td(\Delta \phi ) = \frac{{(1 - r_1^2) \times (1 - r_2^2) \times a}}{{1 - 2 \times {r_1} \times {r_2} \times a \times \cos (\Delta \phi ) + {{({r_1} \times {r_2} \times a)}^2}}}$$

(5)$$out = Td(\Delta \phi ) - Tp(\Delta \phi )$$

The theory of All-pass MRR is applied within the Input Data Mapping Module of the PICs. Additionally, the concept of Add-drop MRR is harnessed in the PB convolution for both PMONN and PICs.

2.2 Convolution

The convolutional neural network is mainly composed of convolutional layers. Compared with the fully connected network, it has the characteristics of local connections and shared weights. The convolutional layer scans and multiply-accumulates the input feature map owning multiple channels through a specific number of convolution kernels, so as to obtain the output feature map containing higher-level semantic information. The convolution kernels are used to convolute the input feature map, and its calculation formula is shown as follows:

(6)$${{\bf O}_{l,i,j}} = \sum\limits_{c = 1}^C {\sum\limits_{k = 1}^R {\sum\limits_{m = 1}^R {{{\bf \omega }_{l,c,k,m}}} } {{\bf I}_{c,S \times (i - 1) + k,S \times (j - 1) + m}}}$$

where ${\bf O}$ represents the output feature map. l stands for the $lth$ channel of the output feature map, and represents the $lth$ convolution kernel in the normal convolutional layer. i and j represent the coordinates in the output feature map. ${\bf \omega }$ is the normal convolution kernels of the convolutional layer. C is the channel number of a convolution kernel, and it is also the channel number of the input feature map. R stands for the width and height of the convolution kernel. S is the sliding stride of the convolution kernel on the input feature map. ${\bf I}$ represents the input feature map.

When the convolution operation is completed, the height and width of the output feature map obtained are shown in Eq. (7).

(7)$$height,width = \frac{{H - R + 2P}}{S} + 1,\frac{{W - R + 2P}}{S} + 1$$

where H and W stand for the height and width of the input feature map. R represents the height and width of the convolution kernel, and S is the sliding stride of the convolution kernel. The $height$ and $width$ represent the height and width of the output feature map.

3. Method

We propose a PMONN based on MRR. The core architecture of the network is CB unit. The fundamental modules and the PICs of PMONN are shown in Fig. 2.

Fig. 2. The fundamental modules and the PICs of the PMONN proposed. (a) The structure of the CB unit, which consists of a PB convolutional layer, a DPW convolutional layer and an RBN layer. (b) The internal logic realization of the PB convolution kernel. (c) The PICs of the PMONN.

Download Full Size | PDF

Fig. 2(a) is the structure of the CB unit, which consists of a PB convolutional layer, a DPW convolutional layer and an RBN layer. The details of the PB convolution algorithm are shown in Fig. 2(b). The PB convolution kernel utilizes the phase shifts of the Add-drop MRRs as trainable parameters. These phase shifts are learned and optimized during the PMONN training process. The weights of the PB convolution kernel are determined based on the values of the optical transfer function, as specified in (5). Furthermore, the optical transfer function is related to the learned phase shifts. This methodology enables the integration of dynamic phase information, thereby enhancing the adaptability and effectiveness of the convolutional process. As shown in Fig. 1(d), the value range of the optical transfer function is between [-1,1].

Fig. 2(c) is the proposed fundamental diagram of the PICs about the PMONN, which consists of the following three parts: Optical Pulse Module, Input Data Mapping Module, and CB unit. According to the wavelength division multiplexing (WDM) principle, the PICs is designed with m channels, which means that m convolution kernels can process input data in parallel to improve the operation speed of PMONN. Each channel has n micro-resonators, which are used to process n different resonant wavelengths and load the weights of a convolution kernel.

3.1 Optical Pulse Module

As shown in Fig. 2(c), the Optical Pulse Module consists of an Optical Frequency Comb (OFC) and a splitter. The OFC is used to generate n different wavelengths of optical pulses. The splitter divides the obtained optical pulses into m sections, each contains n different wavelengths of optical pulses. Set the relative intensity of each optical pulse to 1. The primary role of the optical pulse module is to serve as the optical source for the entire system.

3.2 Input Data Mapping Module

In Fig. 2(c), the Input Data Mapping Module is constructed using an All-pass MRR bank with All-pass MRR vectors, designed to transform input image data into optical signals. Specific Mapping details can be found in Fig. 3. Assuming the input data is an image with dimensions H (height), H (width), and $C$ (channels), the following processing steps are applied:

Fig. 3. The input data mapping process.

Download Full Size | PDF

Normalization: Initially, the input image undergoes normalization using the Min-Max Scaling method.

Slicing: The normalized input image is sliced into segments, with each slice matching the size of the convolution kernel, which dimensions are R (height), $R$ (width), and C (channels). The number of these slices is determined by the formula $l = (\frac{{H - R}}{S} + 1) \times (\frac{{H - R}}{S} + 1)$, where S represents the interval length for each slice. When $S \ge 2$, the convolution operation can replace the pooling layer, serving as a down-sampling function.

Flattening: Each slice is then flattened based on the channel dimension, resulting in a vector. The vector's size becomes $n = R \times R \times C$.

Mapping to All-pass MRR Vectors: The All-pass MRR vector consists of n different All-pass MRRs, each responsible for modulating n optical pulses at distinct wavelengths. The relationship between the radius of the All-pass MRR and its modulated wavelength is expressed in Eq. (8).

(8)$$\textrm{R} = \frac{{k\lambda }}{{2\pi {n_c}}}$$

where k is the resonance series, $\lambda$ is the wavelength of the optical pulse, and ${n_c}$ is the effective refractive index of silicon.

We set the intensity of each optical pulse to 1. Then, we utilize the optical transfer function of the All-pass MRRs to load the values from the flattened vector in the corresponding MRRs, one by one. The optical transfer function of All-pass MRRs is associated with the modulation phase shifts $\varDelta \boldsymbol{\phi }$, which can be found in Eq. (2). Assuming ${\boldsymbol{X}_i}$ represents the $ith$ value of the flattened vector, we can derive ${\boldsymbol{X}_i}$ using Eq. (2), where ${\boldsymbol{X}_i} = {\boldsymbol{T}_{\boldsymbol{n}}}(\varDelta {\boldsymbol{\phi }_i})$. We obtain the value of the modulated phase shift through the inverse function of Eq. (2), as indicated in Eq. (9).

(9)$$\varDelta {\boldsymbol{\phi }_i} = arc{T_n}(\varDelta {\boldsymbol{\phi }_i})$$

where the $\varDelta {\boldsymbol{\phi }_i}$ of the $ith$ All-pass MRR can be modulated by the plasma dispersion effect or thermo - optic effect.

Utilizing Eq. (3), we modulate the MRRs within the All-pass MRR bank individually, resulting in the intensities of the output light pulses of the bank being directly associated with the values in the flattened vector.

Multiple Copies for Parallel Processing: There exist m copies of the input data in the All-pass MRR bank, facilitating subsequent parallel convolution operations by convolution kernels. This parallelization leverages the principle of WDM, thereby significantly enhancing the operational speed of the network. This comprehensive approach enables the efficient transformation of input image data into modulated optical signals, fostering enhanced network performance.

3.3 CB unit

The CB unit serves as the core submodule of the PMONN, tasked with executing essential operations (convolution operation, amplification operation, and BN operation) on optical signals that represent either the input image or the output feature map of the preceding submodule. In Fig. 2(a), the CB unit's structure comprises three key elements: the PB convolutional layer, the DPW convolutional layer, and the RBN layer. Referencing Fig. 2(c), the hardware architecture of the CB unit encompasses the following components: the Add-drop MRR bank, balanced photodiodes (BPDs), and tunable differential amplifiers (DAs). Among these, the Add-drop MRR bank is employed to translate the PB convolution kernel into the optical domain. The DAs facilitate the mapping of the merged layer (the DPW layer merged with the RBN layer in the inference stage) to the PICs.

3.3.1 PB convolution

In Fig. 2(c), the PB convolution kernels are mapped to the Add-drop MRR bank. A detailed mapping process is depicted in Fig. 4(a). Assuming the PB convolution kernel has C channels, and the size of the PB convolution kernel is $R \times R$. The first step involves flattening the PB convolution kernel by channels, resulting in a vector with a size of $C \times {R^2}$. These vector values are then mapped individually into the Add-drop MRR vector. Within the Add-drop MRR vector, there are n Add-drop MRRs, where $n = C \times {R^2}$, each Add-drop MRR is responsible for modulating a single wavelength of an optical pulse.

Fig. 4. The detailed mapping process of the PB convolution to the Add-drop MRR vector. (a) The mapping process of the PB convolution kernel. (b) The algorithm of the PB convolution kernel.

Download Full Size | PDF

The specifics of the PB convolution algorithm are illustrated in Fig. 4(b). The ${\boldsymbol{x}_i}$ to ${\boldsymbol{x}_{C{R^2}}}$ represent input optical pulses for PB convolution, sourced from the Input Data Mapping module. Correspondingly, ${\boldsymbol{w}_i}$ to ${\boldsymbol{w}_{C{R^2}}}$ are the weights of the PB convolution, also representing the values of the optical transfer function of the Add-drop MRRs. This optical transfer function is associated with the modulable phase shift ($\Delta \phi$) of the Add-drop MRR, as depicted in Eq. (5). Furthermore, $\Delta \phi$ is a learnable parameter of the PB convolution kernel, which value is obtained through the training process of the PMONN. As a result, the $\Delta \phi$ of Add-drop MRR is modulated through the plasma dispersion effect or thermo-optic effect. It's worth noting that in PB convolution, the weight values fall within the range of [-1, 1], as depicted in Fig. 1(d). The accumulation operation of PB convolution can be achieved using Balanced Photodiodes (BPDs). Simultaneously, the BPDs also serve to convert optical signals from the Add-drop MRR bank into electric signals.

As shown in Fig. 2(c), the Add-drop MRR bank comprises m Add-drop MRR vectors, which can be used to map m PB convolution kernels. The utilization of m Add-drop MRR vectors can realize parallel convolution of the input data, which significantly enhances the operational speed of PMONN. After the Add-drop MRR bank operation, the output feature map with the channel number of m can be obtained.

3.3.2 DPW convolution

To amplify the weights of the PB convolution kernel and achieve good performance, we propose the DPW convolution method instead of the fix amplifications in the existing researches. The learnable parameters of the DPW convolution kernel correspond to the amplification values of the PB convolution weights, obtained through the training process of the PMONN. In Fig. 2(a), the network layer comprising DPW convolution kernels immediately follows the PB convolutional layer within the CB unit.

The convolution operation of the DPW layer differs from traditional convolution; it's a channel-separable convolution. It employs a distinct convolution kernel for each channel of the input feature map, resulting in the number of output channels matching the number of input feature map channels. This approach draws inspiration from Depth-wise Convolution [40] and Pointwise Convolution [41]. The computational process of the DPW convolutional layer is depicted in Fig. 5. In this illustration, ${\boldsymbol{X}_{i,m}}$ represents the input feature map of the DPW convolutional layer, where i denotes the position within the input feature map, and m signifies the channel number of the input feature map. $\boldsymbol{k}$ represents the kernels of the DPW convolutional layer, and m denotes the $mth$ DPW convolution kernel within the DPW layer. Notably, the number of kernels in the DPW convolutional layer matches the channel count of the input feature map. In essence, each DPW convolution kernel is responsible for convolving one channel of the input feature map. Additionally, each DPW convolution kernel has a size of $1 \times 1$, with a channel count of 1. In the output feature map of the DPW convolutional layer, the calculation for the output result of the channel at position i is represented as ${\boldsymbol{y}_{i,m}} = {\boldsymbol{x}_{i,m}} \times {\boldsymbol{k}^m}$.

Fig. 5. The operations of the DPW convolution kernels.

Download Full Size | PDF

In the training stage, we train the DPW layer to learn the optimal amplifications. In the test phase, we will merge the DPW layer with the RBN layer, and implement the merged layer with the tunable DA array in PICs.

3.3.3 RBN algorithm

To prevent gradient vanishing or gradient exploding in PMONN during training, we propose an RBN layer and add it to the CB unit to maintain the consistency of data distribution and accelerate the network convergence speed. The RBN algorithm is derived from the BN algorithm on electronic neural networks. The mapping process of the RBN layer is shown below.

Eq. (10) is the BN layer algorithm in the electronic neural network, which is as follows:

(10)$$\boldsymbol{y = \gamma }\frac{{\boldsymbol{x} - \boldsymbol{u}}}{{\sqrt {{\boldsymbol{\sigma }^{\boldsymbol{2}}} + \varepsilon } }} + \boldsymbol{\beta }$$

where $\boldsymbol{\gamma }$ and $\boldsymbol{\beta }$ are the learnable parameters of the BN layer. The $\boldsymbol{u}$, ${\boldsymbol{\sigma }^2}$ are the mean and variance of the $\boldsymbol{x}$.

In the training stage, the BN layer uses a data quantity of one mini-batch as the input $\boldsymbol{x}$ at each iteration. In this layer, both $\boldsymbol{\gamma }$ and $\boldsymbol{\beta }$ are parameters learned by the neural network itself. It is crucial to emphasize that the input data for each mini-batch varies, leading to different mean and variance values. In the test stage, the $\boldsymbol{\gamma }$, $\boldsymbol{\beta }$, $\boldsymbol{u}$, and ${\boldsymbol{\sigma }^2}$ are globally determined during the training process. Specifically, the $\boldsymbol{u}$ and ${\boldsymbol{\sigma }^2}$ can be represented as follows:

(11)$${\boldsymbol{u}_{test}} = E({\boldsymbol{u}_{batch}})$$

where ${\boldsymbol{u}_{batch}}$ is the set of the mean of all batches in the training stage.

(12)$$\boldsymbol{\sigma }_{test}^2 = \frac{{m - 1}}{m}E(\boldsymbol{\sigma }_{batch}^2)$$

where $\boldsymbol{\sigma }_{batch}^2$ is the set of the variance of all batches in the training stage. m is the number of the batches in the training process.

If the BN layer is applied to the PMONN, we need to reconstruct the BN layer of the electronic neural network, which is called RBN. The algorithm of the RBN is shown below.

(13)$$\boldsymbol{y = }\frac{\boldsymbol{\gamma }}{{\sqrt {{\boldsymbol{\sigma }^{\boldsymbol{2}}} + \varepsilon } }} \times (\boldsymbol{x}\textrm{ - (}\boldsymbol{u}\textrm{ - }\frac{{\boldsymbol{\beta } \times \sqrt {{\boldsymbol{\sigma }^{\boldsymbol{2}}} + \varepsilon } }}{\boldsymbol{\gamma }}\textrm{)})$$

Defining $\boldsymbol{a} = \frac{\boldsymbol{\gamma }}{{\sqrt {{\boldsymbol{\sigma }^{\boldsymbol{2}}} + \varepsilon } }}$, $\boldsymbol{b} = \textrm{(}\boldsymbol{u}\textrm{ - }\frac{{\boldsymbol{\beta } \times \sqrt {{\boldsymbol{\sigma }^{\boldsymbol{2}}} + \varepsilon } }}{\boldsymbol{\gamma }}\textrm{)}$, then Eq. (13) can be written as follows.

(14)$$\boldsymbol{y}\textrm{ = }\boldsymbol{a}\textrm{(}\boldsymbol{x}\textrm{ - }\boldsymbol{b}\textrm{)}$$

In the training stage, we train the RBN layer. In the test stage, we obtain the trained model which $\boldsymbol{\gamma }$, $\boldsymbol{\beta }$, ${\boldsymbol{\sigma }^2}$, and $\boldsymbol{u}$ are fixed in the trained RBN layer. The details of BN and BN to RBN can be seen in Supplement 1.

3.3.4 Merged layer

In the training stage, both the DPW layer and RBN layer are retained to obtained their learnable parameters, respectively. In the test stage, once the model is trained, we merge the DPW layer and the RBN layer together, and implement the merged layer using the tunable DA array in the PICs. This scheme can eliminate the use of TIA amplifier array in previous studies, improve the computing speed of neuromorphic photonic integrated circuits, and reduce the time and money consumption. If the parameters learned by DPW layer are $\boldsymbol{k}$ and the output of the PB convolution layer is $\boldsymbol{x}$, Eq. (14) can be expressed as follows:

(15)$$\boldsymbol{y}\textrm{ = }\boldsymbol{a}\textrm{(}\boldsymbol{k}\cdot \boldsymbol{x}\textrm{ - }\boldsymbol{b}\textrm{)}$$

It can also be written as follows:

(16)$$\boldsymbol{y}\textrm{ = }\frac{\boldsymbol{a}}{\boldsymbol{k}}\textrm{(}\boldsymbol{x}\textrm{ - }\frac{\boldsymbol{b}}{\boldsymbol{k}}\textrm{)}$$

The merged layer can be expressed by Eq. (16). In the PICs, the merged layer can be achieved by DA array. The $\frac{\boldsymbol{a}}{\boldsymbol{k}}$ in Eq. (16) can be described as the amplification factors of the DA array. The $\boldsymbol{x}$ in Eq. (16) is defined as the in-phase input voltages ${{\bf V}_\textrm{ + }}$ of the DA array. The $\frac{\boldsymbol{b}}{\boldsymbol{k}}$ in Eq. (16) is expressed as the reverse voltages ${{\bf V}_ - }$ of the DA array. And the $\boldsymbol{y}$ is the outputs ${{\bf V}_{out}}$ of the DA array.

4. Performance evaluation

In this section, we initially introduce the structure of the PMONN proposed in the paper. Then, we describe the network evaluation methods. Finally, we exhibit the performance results.

4.1 PMONN structure

As shown in Fig. 6(a), the PMONN consists of six blocks, the first five of which contain a CB unit and a ReLU activation function for automatic feature extraction. Specifically, the CB unit in each block is utilized to extract the features of the input feature map. The ReLU activation function in each block is used for nonlinear operation, specifically filtering negative values. The final block of the PMONN includes a linear layer and SoftMax function for classification purposes. In addition, each block converts electrical signals to optical signals through an electro-optical converter (E/O).

Fig. 6. The PMONN structure and three different types of CB units. (a) The structure of PMONN in the experiment. (b) The CB unit of the S-PMONN. (c) The CB unit of the D-PMONN. (d) The CB unit of the T-PMONN.

Download Full Size | PDF

In order to validate the successful training of the PB convolution proposed in the paper, a CB unit with only one PB convolutional layer was constructed, as shown in Fig. 6(b). This single unit PMONN was named the S-PMONN. To enhance the classification performance of the S-PMONN, a DPW convolutional layer was added to the CB unit, resulting in the D-PMONN shown in Fig. 6(c). The redesigned CB unit in the D-PMONN aimed to demonstrate that the proposed DPW convolution can amplify the weights of the PB convolution kernel and improve classification accuracy. Lastly, to explore the effectiveness of combining the RBN layer algorithm proposed in the paper with the PICs to further improve the classification performance of the PMONN, the CB unit design was modified again. The RBN layer was added adjacent to the CB unit in the D-PMONN, resulting in the T-PMONN configuration shown in Fig. 6(d). In the test stage, the DPW layer and the RBN layer are merged together, which generate a merged layer and can be implement by the tunable DA array in the PICs. The details of the structure, layers, and parameters can be seen in Supplement 1 of the paper.

4.2 Evaluation methods

The confusion matrix is utilized to tally the accurately and inaccurately classified test set images across the three networks: S-PMONN, D-PMONN, and T-PMONN. Additionally, we employ the accuracy, macro-Precision (macro-P), and macro-Recall (macro-R) evaluation techniques to assess the classification performance of these networks on MNIST and Fashion-MNIST datasets.

Both the MNIST dataset and the Fashion-MNIST dataset are utilized in the experiments to train and evaluate the three PMONNs. They all have 10 categories and the following Equations (17-21) are used as evaluate metrics, ${\boldsymbol{d}_{ij}},i \ne j$ represents the number of images in the $ith$ category that were misclassified as the $jth$ category by the network. ${\boldsymbol{d}_{ij,i = j}}$ indicates the number of the images belonging to the $ith$ category correctly classified.

Accuracy represents the proportion of the total samples in the dataset that are correctly classified by the network, as follows:

(17)$$A\textrm{ = }\frac{{\sum\limits_{i = 0}^9 {{d_{ii}}} }}{{\sum\limits_{i = 0}^9 {\sum\limits_{j = 0}^9 {{d_{ij}}} } }}$$

Precision indicates the proportion of samples that are correctly predicted across all samples of that predicted category, as follows:

(18)$${P_j} = \frac{{{d_{ij,i = j}}}}{{\sum\limits_{i = 0}^9 {{d_{ij}}} }}$$

where j stands for the $jth$ category of images.

Recall represents the proportion of the samples belonging to a certain category that are correctly identified, as follows:

(19)$${R_i} = \frac{{{d_{ij,i = j}}}}{{\sum\limits_{j = 0}^9 {{d_{ij}}} }}$$

where i represents the $ith$ category of images.

For the purpose of obtaining the precision and recall of the network for all image categories, the precision and recall of each category are exploited. Further, the precision and recall of the network are named macro-P and macro-R, respectively. The calculation formulas are shown below:

(20)$$macro\textrm{ - }P = \frac{1}{{10}}\sum\limits_{j = 0}^9 {{P_j}}$$

(21)$$macro\textrm{ - }R = \frac{1}{{10}}\sum\limits_{i = 0}^9 {{R_i}}$$

4.3 Preparation for training stage

With the purpose of validating the performance of the proposed three networks, namely S-PMONN, D-PMONN, and T-PMONN, the MNIST dataset and the Fashion-MNIST dataset were used to train and test the networks. The samples of the training set were stratified sampled according to the ratio of 4:1 to regain the training set and the validation set. Consequently, the training set containing 48,000 images was utilized to train the network. The validation set owned 12,000 images, which were used to select model hyperparameters and control the training process of the networks. The test set consisted of 10,000 images to test and evaluate the inference models obtained.

The hyperparameters of the network need to be set before the network training. To facilitate the comparison of the reasoning performance of the three networks in the follow-up study, the same hyperparameters were set for the three networks. The network hyperparameters were selected using the validation set and the grid search algorithm. The wavelengths of light ($\lambda $) were selected in the range of 1500∼1600 $nm$. All MRRs have a radius (R) of 5 $um$. The three networks were trained in 100 epochs, each epoch containing 1250 batches and each batch owning 32 images.

The loss function used in the paper is the categorical cross-entropy loss function, and its equation is as follows:

(22)$$Loss ={-} \frac{\textrm{1}}{n}\sum\limits_i {{\boldsymbol{y}^{\textrm{(}i\textrm{)}}} ln{{\hat{\boldsymbol{y}}}^{\textrm{(}i\textrm{)}}} } $$

where n is the number of the samples, ${\boldsymbol{y}^{\textrm{(}i\textrm{)}}}$ is the true label of the $ith$ sample, ${\hat{\boldsymbol{y}}^{\textrm{(}i\textrm{)}}}$ is the probability that the $ith$ sample is correctly predicted by the model.

During training, the initial learning rate (LR) of the three networks is set to 0.001. If after 5 consecutive epochs of training, the loss value of the validation set has not decreased, the LR is reconfigured to $lr = 0.5 \times lr$. If after 10 consecutive epochs of training, the model's loss value on the validation set still has not decreased, the training of the model is stopped training early.

The Adam optimizer is employed for optimizing the parameters of S-PMONN, D-PMONN, and T-PONN. To address the issue of the unsuitability of the initialization method for electrical neural networks in the PB photonic convolution algorithms and get better performance of the networks, we proposed an initialization method for PB convolution kernel. First, we constrain the selectable parameter’s initialization range to [-0.63, -0.13]. Then, we use a uniform distribution within this range to randomly sample and obtain the initialization values. The details of the result and analysis can be seen in Supplement 1.

4.4 results

Firstly, we evaluate the performance of S-PMONN, D-PMONN, and T-PMONN, respectively. Then, we compare the performance of the networks with DPW layer and those with fixed photonic convolution weights amplifications.

4.4.1 Evaluation results of S-PMONN, D-PMONN, and T-PMONN

The evaluation results of S-PMONN, D-PMONN, and T-PMONN is shown in Table 1. To verify the effectiveness of the proposed PB convolution, which is based on the microring resonance principle, the S-PMONN were trained and tested using the MNIST dataset, achieving an accuracy of 95.35%, macro-P of 95.37%, and macro-R of 95.43%. S-PMONN is also trained by the Fashion-MNIST dataset, showing an accuracy, macro-P, and macro-R of 84.55%, 84.55%, and 85.06%, respectively.

Table 1. The evaluation results of the three PMONNs

View Table | View all tables in this article

To verify if the proposed DPW convolution is effective in amplifying the weights of PB convolution and improving PMONN network performance, the MNIST dataset and Fashion-MNIST dataset were used to train and test the D-PMONN network. The assessment outcomes for D-PMONN, trained on the MNIST dataset, achieve an accuracy of 97.73%, macro-P of 97.71%, and macro-R of 97.73%. Similarly, when the D-PMONN model were trained on the Fashion-MNIST dataset, it exhibits an accuracy of 88.52%, with corresponding macro-P and macro-R values of 88.52% and 88.42%, respectively.

In order to demonstrate the effectiveness of the proposed scheme of adding an RBN layer to the CB unit, we trained and tested the T-PMONN model using the MNIST dataset and the Fashion-MNIST dataset, respectively. In the test stage, the DPW layer is merged with the RBN layer. The evaluation results of the T-PMONN on the MNIST dataset, achieve an accuracy of 99.15%, a macro-P of 99.13%, and a macro-R of 99.14%, respectively. On the other hand, the evaluation results of the T-PMONN model trained on the Fashion-MNIST dataset, obtain an accuracy of 91.83%, a macro-P of 91.83%, and a macro-R of 91.80%, respectively.

4.4.2 Comparison results of D-PMONN and other networks with fix amplifications

In previous studies, the amplification array of PICs applied fixed amplifications to amplify the weights of MRR convolutions. However, for our designed photonic convolution algorithm based on the microring resonator principle, namely the phase-based (PB) convolution algorithm, does this approach still hold? If not, can we design a trainable neural network algorithm to autonomously learn the amplification factors in place of the conventional method?

To compare the performance of the fixed amplification factors network and the network with DPW layer, we designed the simulation experimental scheme and analyzed the experimental results. we replace the DPW convolution layer designed in PMONN with the fixed amplifications ${\bf T}$ in $\textrm{(}{\bf T}\cdot {{\bf W}_{MRR}}\textrm{)}$. Referring to the literature [36], we set the fixed amplification factors for ${\bf T}$ to 2, 4, and 6, utilized to replace the DPW convolutional layer. Subsequently, we compared the experimental results of these three networks with the network containing the DPW layer, and the performance evaluation results for these four networks are presented in Table 2.

Table 2. The evaluation results of the four networks

View Table | View all tables in this article

We conducted a comparation between networks incorporating DPW layer and those with different fixed amplifications. Remarkably, the PMONN with DPW exhibited significantly superior performance in terms of accuracy, macro-P, and macro-R compared to the three networks with different fixed amplifications. These experimental findings underscore the imperative for leveraging the neural network's autonomous learning mechanism to acquire amplifications for the weights of photonic convolution algorithms.

In addition, we also analyze the reasons for the poor performance of fixed amplifications in PMONN networks. The details can be seen in the Discussion and conclusion section.

5. Discussion and conclusion

In this paper, we propose three distinct CB units and corresponding networks, namely S-PMONN, D-PMONN, and T-PMONN. The CB unit of the S-PMONN solely consists of a PB convolutional layer, whose modulable phase shifts of Add-drop MRRs serve as its learnable parameters, while the convolution weights are the values of the optical transfer function relative to the phase shifts of Add-drop MRRs. On the other hand, the CB unit of D-PMONN comprises a PB convolutional layer and a DPW convolutional layer, where the DPW convolution kernel amplifies the convolution weights of the PB convolution kernel. Finally, the T-PMONN's CB unit includes a PB convolution kernel, a DPW convolution kernel, and an RBN layer. The RBN layer is adapted from the original BN layer of the electric counterpart and merged with the DPW layer in the test stage. The merged layer is implemented by the tunable DAs in the PICs.

Prior studies mainly focused on hardware architecture design or improvement for neuromorphic PICs based on MRR. The neural network algorithms used are essentially traditional electronic neural networks. Before deploying the neural network onto a PICs based on MRR, a series of post-processing operations are required on the trained network. The electrical neural network exists in two primary forms: One involves truncating the tensor kernel directly during the training process, thereby constraining the values of weight data within the range of [-1, 1]. The other form imposes no constraints on the network, directly training weights, resulting in weight values devoid of any physical significance, and its range spans the entire real number domain. Consequently, after obtaining the inference model, it is necessary to map the weight data for each layer separately, and the mapped parameters fall within the range of [-1, 1]. Both approaches require mapping the final weight values to the output intensity of the MRR banks. Then, using the MRR's transmission function, the inverse function is calculated to obtain the corresponding phase shifts. Finally, refractive index changes are determined through phase shifts, leading to the identification of the corresponding external voltages to be applied. This series of processes does not involve the participation of electrical neural network algorithm. Instead, after obtaining a well-trained electrical neural network, a sequence of post-processing steps is also carried out.

These results of S-PMONN indicate that: We has successfully proposed a photonic convolution algorithm based on the microring principle, which take modulable phase shifts of Add-drop MRRs as the learnable parameters. In addition, the reason for using phase shifts as learnable parameters can be seen in Supplement 1. This algorithm is an end-to-end optical neural network that, leveraging its learning mechanism, directly acquires the modulation phase shift needed for each MRR. Unlike traditional electronic neural networks used in previous research, our algorithm eliminates the need for a series of post-processing steps after training.

In previous studies, the amplification arrays in neuromorphic PICs based on MRR always used fixed amplification arrays to scale the weights of microring convolution. Compared to fixed amplifications, tunable amplifications have advantages in multiple ways. Firstly, the output of each convolutional layer consists of multi-channel complex signals, tunable amplifications offer convenience in amplifying these outputs. This adaptability allows for signal processing optimization by adjusting the gain to meet varying signal intensity or frequency requirements. Secondly, different tasks often necessitate training different ONNs and acquiring distinct training parameters. Leveraging tunable amplifications grants the flexibility to adjust gains to accommodate varying signal intensities, thus enhancing system reusability. This, in turn, facilitates the PICs to load different ONNs tailored for different tasks. Finally, we have experimentally demonstrated that utilizing tunable amplifications can improve the performance of PMONN.

To compare the performance of the fix amplifications and DPW layer, we utilize amplifications of 2, 4, and 6 instead of the DPW layer in D-PMONN. In section 4.4.2, the results show that the performance of DPW layer self-learning amplifications is much better than that of fixed amplifications. We analyze the reasons for the poor performance of fixed amplifications in PMONN networks. The PB convolution algorithm is a photonic convolution algorithm based on the microring resonance principle designed. The convolution kernel contains the optical transfer function of the microrings. Therefore, in the process of backpropagation of training, PMONN also needs to continue to calculate the gradient of the transmission function of the weights $\textrm{W}_\textrm{MRR}$. The process of weights updating can be expressed as follows:

(23)$$\frac{{\partial \textrm{(}{\bf T}\cdot {{\bf W}_{MRR}}\textrm{)}}}{{\partial \Delta \boldsymbol{\phi }}} = \textrm{T}\cdot \frac{{\partial {{\bf W}_{MRR}}}}{{\partial \Delta \boldsymbol{\phi }}}$$

(24)$${{\bf W}_{MRR}}\textrm{ = }{{\bf W}_{MRR}}\textrm{ - lr} \ast {\bf T}\cdot \frac{{\partial {{\bf W}_{MRR}}}}{{\partial \Delta \boldsymbol{\phi }}}$$

where $\textrm{lr}$ represents the learning rate, which denotes the step size during the backpropagation computation in the model training process.

From Fig. 1(d), it can be observed that the gradient of the ${{\bf W}_{MRR}}$ function itself is relatively steep. A small step size may result in significant changes. In Eq. (24), the step size can be viewed as $\textrm{lr} \ast {\bf T}$, leading to more pronounced variations in weight data. The pronounced fluctuations in weights will lead to the instability during neural network training, resulting in poor performance of the trained network.

Another way to use the fixed amplification array is second form, which described in the second paragraph of this chapter. The algorithm of photonic integrated circuit still follows the traditional electronic neural network. After training the electronic neural network, the weights are mapped to the range [-1, 1] using the Eq. (25). Subsequently, Eq. (26) is used as the fixed amplifications. However, the approach involves a series of complex post-processing steps that require individuals to master the relevant optical principles of microring when deploying neural networks onto photonic integrated circuits.

(25)$$\boldsymbol{x} = \frac{{\boldsymbol{x} - {\boldsymbol{x}_{\textrm{mid}}}}}{{{\boldsymbol{x}_{\max }} - {\boldsymbol{x}_{\min }}}}$$

(26)$$\boldsymbol{k} = {\boldsymbol{x}_{\max }} - {\boldsymbol{x}_{\min }}$$

The T-PMONN is evaluated by the MNIST test set. Compared with D-PMONN, the accuracy, macro-P, and macro-R of the T-PMONN are improved by 1.40%, 1.42% and 1.41%, respectively. Compared with S-PMONN, the accuracy, macro-P, and macro-R of T-PMONN are increased by 3.80%, 3.76% and 3.71%, respectively. The T-PMONN, D-PMONN, S-PONN are also evaluated by the Fashion-MNIST test set. Compared with D-PMONN, the accuracy, macro-P, and macro-R of T-PMONN are improved by 3.31%, 3.31%, and 3.38%, respectively. Compared with S-PMONN, the accuracy, macro-P and macro-R of T-PMONN are improved by 7.28%, 7.28% and 6.74%, respectively. The above results prove that: (1) The merged layer proposed in the paper can be successfully added to the PMONN network and achieved by the tunable DAs which added in the PICs. (2) It can effectively improve the classification performance of PMONN network. (3) Compared with the MONN papers [32], [36–38], the successful addition of RBN layer to the CB unit of the T-PMONN network can prevent gradient vanishing or gradient exploding during training, and reduce the occurrence of internal covariate shift problem.

Despite the notable successes achieved in our work, it is important to acknowledge a few limitations that should be considered. Firstly, our implementation of the RBN layer involved the addition of DAs to the existing PICs. This approach has the potential to impact the inference time of the system. Although we have made significant progress in integrating these components, further optimizations may be necessary to ensure optimal performance. Secondly, it is essential to highlight that our paper did not extensively explore the optical implementation method of the activation function in neural networks. While we focused on other aspects of the research, such as the implementation of RBN layers and their effects on the PICs, the optical implementation method of activation functions remains an area that warrants investigation, and some research on this area has already emerged [42–44]. Understanding and optimizing this aspect could contribute to enhancing the overall performance and efficiency of neural networks in optical systems. In conclusion, while our work has achieved notable milestones, including the successful integration of RBN layers into the PICs, we acknowledge the potential impact on inference time and the need for further exploration in the optical implementation of activation functions. These limitations provide opportunities for future research and improvements in the field of neural network optimization for PICs.

Funding

National Natural Science Foundation of China (52175505, 62205029); the Program of Promoting the Development of University-Diligence Talents (5112111145).

Acknowledgments

Jingya Ding thanks the National Natural Science Foundation of China and the Program of Promoting the Development of University-Diligence Talents for help identifying collaborators for this work.

Disclosures

The authors declare no conflicts of interest.

Data availability

Code is available in [45]. Other data is available upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. C. Janiesch, P. Zschech, and K. Heinrich, “Machine learning and deep learning,” Electron. Mark. 31(3), 685–695 (2021). [CrossRef]

2. J. D. Kelleher, Deep learning, MIT press:USA, 2019.

3. S. Dong, P. Wang, and K. Abbas, “A survey on deep learning and its applications,” Comput. Sci. Rev. 40, 100379 (2021). [CrossRef]

4. V. Sharma and R. N. Mir, “A comprehensive and systematic look up into deep learning based object detection techniques: A review,” Comput. Sci. Rev. 38, 100301 (2020). [CrossRef]

5. E. Yurtsever, J. Lambert, A. Carballo, et al., “A survey of autonomous driving: Common practices and emerging technologies,” IEEE Access 8, 58443–58469 (2020). [CrossRef]

6. Y. Chen, M. Lin, Z. He, et al., “Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images,” Expert Syst. Appl. 229, 120519 (2023). [CrossRef]

7. N. C. Thompson, K. Greenewald, K. Lee, et al., “The computational limits of deep learning,” arXiv, arXiv:2007.05558 (2020). [CrossRef]

8. F. Mireshghallah, M. Taram, P. Vepakomma, et al., “Privacy in deep learning: A survey,” arXiv, arXiv:2004.12254 (2020). [CrossRef]

9. C. E. Leiserson, N. C. Thompson, J. S. Emer, et al., “There’s plenty of room at the Top: What will drive computer performance after Moore’s law?” Science 368(6495), eaam9744 (2020). [CrossRef]

10. J. B. Aimone, “Neural algorithms and computing beyond Moore's law,” Commun. ACM 62(4), 110 (2019). [CrossRef]

11. C. Li, X. Zhang, J. Li, et al., “The challenges of modern computing and new opportunities for optics,” PhotoniX 2(1), 20–31 (2021). [CrossRef]

12. Q. Zhang, H. Yu, M. Barbiero, et al., “Artificial neural networks enabled by nanophotonics,” Light: Sci. Appl. 8(1), 42 (2019). [CrossRef]

13. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

14. K. Yao, R. Unni, and Y. Zheng, “Intelligent nanophotonics: merging photonics and artificial intelligence at the nanoscale,” Nanophotonics 8(3), 339–366 (2019). [CrossRef]

15. J. Feldmann, N. Youngblood, C. D. Wright, et al., “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]

16. M. Gu, X. Fang, H. Ren, et al., “Optically digitalized holography: a perspective for all-optical machine learning,” Engineering 5(3), 363–365 (2019). [CrossRef]

17. J. Liu, Q. Wu, X. Sui, et al., “Research progress in optical neural networks: theory, applications and developments,” PhotoniX 2(1), 5 (2021). [CrossRef]

18. R. Xu, P. Lv, F. Xu, et al., “A survey of approaches for implementing optical neural networks,” Opt. Laser Technol. 136, 106787 (2021). [CrossRef]

19. X. Sui, Q. Wu, J. Liu, et al., “A review of optical neural networks,” IEEE Access 8, 70773–70783 (2020). [CrossRef]

20. X. Guo, J. Xiang, Y. Zhang, et al., “Integrated neuromorphic photonics: synapses, neurons, and neural networks,” Adv. Photonics Res. 2(6), 2000212 (2021). [CrossRef]

21. S. Xiang, Y. Han, Z. Song, et al., “A review: Photonics devices, architectures, and algorithms for optical neural computing,” J. Semicond. 42(2), 023105 (2021). [CrossRef]

22. J. Cheng, H. Zhou, and J. Dong, “Photonic matrix computing: from fundamentals to applications,” Nanomaterials 11(7), 1683 (2021). [CrossRef]

23. X. Lin, Y. Rivenson, N. T. Yardimci, et al., “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

24. H. Dou, Y. Deng, T. Yan, et al., “Residual D2NN: training diffractive deep neural networks via learnable light shortcuts,” Opt. Lett. 45(10), 2688–2691 (2020). [CrossRef]

25. M. S. S. Rahman, J. Li, D. Mengu, et al., “Ensemble learning of diffractive optical networks,” Light: Sci. Appl. 10(1), 14 (2021). [CrossRef]

26. Z. Duan, H. Chen, and X. Lin, “Optical multi-task learning using multi-wavelength diffractive deep neural networks,” Nanophotonics 12(5), 893–903 (2023). [CrossRef]

27. Y. Shen, N. C. Harris, S. Skirlo, et al., “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

28. H. Zhou, Y. Zhao, G. Xu, et al., “Chip-scale optical matrix computation for PageRank algorithm,” IEEE J. Select. Topics Quantum Electron. 26(2), 1–10 (2020). [CrossRef]

29. D. Pérez, I. Gasulla, and J. Capmany, “Programmable multifunctional integrated nanophotonics,” Nanophotonics 7(8), 1351–1371 (2018). [CrossRef]

30. A. N. Tait, T. Ferreira de Lima, M. A. Nahmias, et al., “Continuous calibration of microring weights for analog optical networks,” IEEE Photon. Technol. Lett. 28(8), 887–890 (2016). [CrossRef]

31. J. Feldmann, N. Youngblood, M. Karpov, et al., “Parallel convolutional processing using an integrated photonic tensor core,” Nature 589(7840), 52–58 (2021). [CrossRef]

32. X. Xu, M. Tan, B. Corcoran, et al., “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature 589(7840), 44–51 (2021). [CrossRef]

33. A. N. Tait, A. X. Wu, T. F. de Lima, et al., “Microring weight banks,” IEEE J. Select. Topics Quantum Electron. 22(6), 312–325 (2016). [CrossRef]

34. A. N. Tait, M. A. Nahmias, B. J. Shastri, et al., “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Lightwave Technol. 32(21), 4029–4041 (2014). [CrossRef]

35. A. N. Tait, T. Ferreira de Lima, E. Zhou, et al., “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7(1), 7430 (2017). [CrossRef]

36. V. Bangari, B. A. Marquez, H. Miller, et al., “Digital electronics and analog photonics for convolutional neural networks (DEAP-CNNs),” IEEE J. Select. Topics Quantum Electron. 26(1), 1–13 (2020). [CrossRef]

37. B. Bai, Q. Yang, H. Shu, et al., “Microcomb-based integrated photonic processing unit,” Nat. Commun. 14(1), 66 (2023). [CrossRef]

38. Y. Bai, M. Yu, L. Lu, et al., “Quantized photonic neural network modeling method based on microring modulators,” Opt. Eng. 61(06), 061409 (2022). [CrossRef]

39. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” International conference on machine learning pmlr., pp. 448–456, (2015).

40. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Proceedings of the IEEE conference on computer vision and pattern recognition., pp. 1251–1258, (2017).

41. B. S. Hua, M. K. Tran, and S. K. Yeung, “Pointwise convolutional neural networks,” Proceedings of the IEEE conference on computer vision and pattern recognition., pp. 984–993, (2018).

42. Y. Zuo, B. Li, Y. Zhao, et al., “All-optical neural network with nonlinear activation functions,” Optica 6(9), 1132–1137 (2019). [CrossRef]

43. G. Mourgias-Alexandris, A. Tsakyridis, N. Passalis, et al., “An all-optical neuron with sigmoid activation function,” Opt. Express 27(7), 9620–9630 (2019). [CrossRef]

44. A. Jha, C. Huang, and P. R. Prucnal, “Reconfigurable all-optical nonlinear activation functions for neuromorphic photonics,” Opt. Lett. 45(17), 4819–4822 (2020). [CrossRef]

45. J. Ding, L. Zhu, M. Yu, et al., “Code for Phase of Microring-based Optoelectronic Neural Network (PMONN),” GitHub (2024), https://github.com/ISCLab-Bistu/PMONN.

		MNIST			Fashion-MNIST
		Accuracy	Macro-P	Macro-R	Accuracy	Macro-P	Macro-R
Fix	2	91.55%	91.49%	91.55%	71.05%	71.05%	70.99%
	4	91.59%	91.53%	91.58%	73.37%	73.37%	73.23%
	6	90.43%	90.31%	90.41%	73.77%	73.77%	73.48%
DPW	learnable	97.73%	97.71%	97.73%	88.52%	88.52%	88.42%

		MNIST			Fashion-MNIST
		Accuracy	Macro-P	Macro-R	Accuracy	Macro-P	Macro-R
Fix	2	91.55%	91.49%	91.55%	71.05%	71.05%	70.99%
	4	91.59%	91.53%	91.58%	73.37%	73.37%	73.23%
	6	90.43%	90.31%	90.41%	73.77%	73.77%	73.48%
DPW	learnable	97.73%	97.71%	97.73%	88.52%	88.52%	88.42%

PMONN: an optical neural network for photonic integrated circuits based on micro-resonator

Abstract

1. Introduction

2. Theory

2.1 MRR

2.2 Convolution

3. Method

3.1 Optical Pulse Module

3.2 Input Data Mapping Module

3.3 CB unit

3.3.1 PB convolution

3.3.2 DPW convolution

3.3.3 RBN algorithm

3.3.4 Merged layer

4. Performance evaluation

4.1 PMONN structure

4.2 Evaluation methods

4.3 Preparation for training stage

4.4 results

4.4.1 Evaluation results of S-PMONN, D-PMONN, and T-PMONN

4.4.2 Comparison results of D-PMONN and other networks with fix amplifications

5. Discussion and conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (6)

Tables (2)

Equations (26)

Optics Express

PMONN	MNIST			Fashion-MNSIT
PMONN	Accuracy	Macro-P	Macro-R	Accuracy	Macro-P	Macro-R
S-	95.35%	95.37%	95.43%	84.55%	84.55%	85.06%
D-	97.73%	97.71%	97.73%	88.52%	88.52%	88.42%
T-	99.15%	99.13%	99.14%	91.83%	91.83%	91.80%