Addressing the programming challenges of practical interferometric mesh based optical processors

Kaveh (Hassan) Rahbardar Mojaver; Bokun Zhao; Edward Leung; S. Mohammad Reza Safaee; Odile Liboiron-Ladouceur

doi:10.1364/OE.489493

1. Introduction

Interferometric-based programmable optical processors are promising structures for fast and energy efficient computation in classic and quantum photonics [1,2,3]. These processors can efficiently perform analog vector-matrix multiplication from the inherent parallelism of optics [4]. Programmable optical processors can also do multiply-accumulate (MAC) operation in computing [5] and be used as quantum gates in quantum photonics [6]. With fast-growing computational demand in deep learning limiting its progress [7,8], energy efficient computational accelerators fabricated in silicon photonic (SiPh) technology is an excellent candidate to meet the computational demands of future machine learning and deep learning applications [9,10].

Ideally, the programmable optical processors should be fully reconfigurable after fabrication similar to what is offered by the electronics field-programmable gate arrays (FPGAs) [11,12]. However, there is a major difference between programmable optical processors and electronic FPGAs: the former is analog whereas the latter is digital. Performing analog computation includes several advantages mainly reducing the time and complexity of computation. The downside is that these analog building blocks are more sensitive to the device parameters and bias condition. Fabrication variations and dynamic errors therefore translate into considerable computation error and inaccuracy in programmable optical processors [13,14]. Moreover, the processor performance drastically deteriorates with the dynamic errors including thermal and electrical crosstalk. One remedy to overcome the dynamic errors is to use in-situ programming [15,16]. The in-situ programming uses optimization techniques to find the optimum bias point of phase shifters for implementing a specific weight matrix. In in-situ programming, the processor is programmed in the presence of fabrication imperfections and dynamic errors, therefore, the phase shifters’ bias can be tuned in a way to compensate for those errors. The downside is obvious; the time and energy required for programming. Each individual processor should go through a time/energy consuming optimization technique every time the weight matrix changes, and the optimization becomes drastically harder when scaling with matrix size.

The second option is ex-situ programming, i.e., the phase shifters’ bias corresponding to a specific weight matrix is externally calculated at first and then implemented on different similar chips. There are two main challenges with this approach. Firstly, hardware error correction schemes should be employed to compensate for the fabrication imperfections such as unbalanced splitting ratio [17]. Secondly, the processor should maintain an option to monitor the phase setting of phase shifters such that the phase setting can be fine-tuned to compensate for the dynamic errors. Waveguide taps or in-line transparent photodetectors may be considered to monitor the optical power and the state of numerous phase shifters on the processor [18]. However, these solutions often increase the insertion loss of the structure limiting the scalability. It would be favorable if monitoring the phase shift applied by a specific phase shifter within the optical paths is viable without modifying the bias of the other phase shifters.

In this work, for the first time, we propose an architecture, referred to as Bokun mesh, that provides direct phase monitoring while maintaining the minimum optical path depth. We also delve into a detailed exploration of the strengths and weaknesses exhibited by different meshes that were previously proposed. Additionally, we carry out a comprehensive evaluation of our recently introduced mesh in comparison to the pre-existing architectures to highlight its distinctive benefits. Bokun mesh is a topology arrangement that merges the attributes of two mesh topologies: Clements and Diamond [19,20]. Bokun, like Diamond, offers diagonal paths that pass through each individual MZI, allowing for direct phase monitoring without interference from other MZIs in the line. However, in contrast to Diamond and similar to Clements, Bokun maintains a minimum optical depth, resulting in improved scalability. Providing the monitoring option, the programming would be faster, improving the total energy efficiency of the processor. Through simulations, we find Bokun improves the total energy efficiency by 83% compared to the rectangular mesh for a 10 × 10 mesh with weight matrix changing at 2 kHz. The performance of Bokun mesh enabled by an optimal optical depth is also three times more resilient to the loss and fabrication imperfections compared to the architectures with longer depth such as Reck and Diamond for a 10 × 10 mesh used in a two-layered optical neural network for MNIST classification task [20,21].

2. MZI-based optical processor architecture

Figure 1(a) shows the building block of a Mach-Zehnder interferometer (MZI)-based programmable optical processors. This MZI is composed of two couplers (also referred as beam splitter/combiner) and two phase shifters. The linear transformation matrix of the 2 × 2 building block for a fixed state of polarization, 50:50 splitting ratio of couplers and assuming lossless optical propagation is:

(1)$$\left[ {\begin{array}{{c}} {{O_{top}}}\\ {{O_{bottom}}} \end{array}} \right] = {e^{j({\theta /2} )}}\left[ {\begin{array}{cc} {{e^{j\varphi }}\textrm{sin}\left( {\frac{\theta }{2}} \right)}&{{e^{j\varphi }}\textrm{cos}\left( {\frac{\theta }{2}} \right)}\\ {\textrm{cos}\left( {\frac{\theta }{2}} \right)}&{ - \textrm{sin}\left( {\frac{\theta }{2}} \right)} \end{array}} \right]\left[ {\begin{array}{{c}} {{I_{top}}}\\ {{I_{bottom}}} \end{array}} \right],$$

where, $\theta $ is the internal phase shift changing the output optical intensity, and $\varphi $ is the external phase shift defining the output optical phase. I_top, I_bottom, O_top, and O_bottom, are the optical electric field distribution of a plane wave at the input and output ports [22].

Fig. 1. (a) 2 × 2 building block of the processor. The 8 × 8 MZI-based optical processors as a (b) Reck, (c) Diamond, (d) Clements, and (e) Bokun (presented in this work) mesh. Reck is the primary interferometric architecture with a triangular architecture supporting sequential calibration scheme. The Diamond is the symmetrical version of Reck with 21 extra MZIs. Clements reduces the mesh depth and provide more symmetry in terms of min/max number of MZIs in each path. Bokun mesh is an extended version of Clements with 12 extra MZIs and 12 auxiliary optical I/Os to mitigate the calibration/programming challenges. As discussed in section 3, an MZI is independently accessible if there is a way to light up the mesh so that one input of this MZI and all the subsequent MZIs toward an output remain dark (null). The illuminated section of the Bokun mesh highlights the monitoring of phase setting of MZI-16 by applying light only into $I_3^{\prime}$ and detecting the light at ${O_1}$ while keeping all the rest of the inputs dark. In this condition, the top input of MZI-16 and secondary inputs of MZI-17 and MZI-18 remain null supporting independently monitoring of the MZI-16 phase setting. The figure also shows why this cannot be the case for the Clements mesh. In Clements, the light going through I₇ approaches both inputs of MZI-9 and the MZIs thereafter, so MZI-9 in Clements is not independently accessible. This is also the case if any other input is used. Among the four meshes, Bokun provides the shortest depth (equal with Clements), and the most balanced number of MZIs in all paths (min = 7 and max = 8), while all its MZIs are independently accessible.

Download Full Size | PDF

An ideal N × N multiport reconfigurable MZI-based optical processor is a unitary optical component which consists of n MZIs connected to each other based on a given mesh topology. The structure can be represented by an N × N unitary matrix implemented by the successive products of the unitary transformation matrices of its constituent MZIs. This process is done based on the location of the MZIs in the mesh. Unitary matrices preserve the inner product and norm of the transformed vectors; hence, they preserve the vectors length and angle. Unitary matrices have wide applications in machine learning, AI, and quantum computing. In deep learning, matrix multiplication is used to compute the activations of neurons in the neural network (NN). This involves multiplying the input data by weight matrices and adding biases to produce the output. When we use the optical processor to perform vector matrix multiplication of a NN, the N input ports are employed to feed in the d features of a sample (where N = d). The N output ports serve as the c-dimensional output vector to determine the single layer NN’s predicted class of the sample. In this work, we use two sample datasets: the multivariate Gaussian dataset already introduced in [20] and the MNIST dataset of handwritten digits [23]. We compare the results obtained from the two datasets to study the effect of dataset on hardware performance. The predicted class is determined by the index of an output representing the highest optical power. This is consistent with conventional NNs, where its predicted class is designated by the output neuron with the highest value [24].

The Reck mesh topology shown in Fig. 1(b), theorized by Reck, et al. [21], consists of a triangular mesh of MZIs. It can be employed to implement an arbitrary N × N unitary matrix with n MZIs connected to each other in a mesh with N additional phase shifters at the input ports. In fact, an N × N Reck mesh is a universal N × N unitary transformation and it can be used to implement any N × N unitary transformation. The number of MZIs scale quadratically with respect to the size of the matrix as follows:

(2)$$n = \frac{{N({N - 1} )}}{2},$$

where, N is the number of optical channels from the i^th input to the j^th output (i, j ${\in} $ N) for a structure with the same number of inputs and outputs. The triangular architecture of Reck supports the sequential calibration of the mesh. In other words, the Reck architecture allows us to ensure every MZI is calibrated through a path of pre-calibrated MZIs.

The Diamond mesh shown in Fig. 1(c), proposed by Shokraneh, et al. [20], employs $({N - 1} )\; \times \; ({N - 2} )/2$ additional MZIs compared to the Reck and Clements meshes, for a total of MZIs given by:

(3)$$n = {({N - 1} )^2}.$$

While the additional number of MZI causes higher loss and greater susceptibility to phase uncertainty, it copes with loss imbalance due to increased symmetry of the mesh compared to the Reck mesh. Furthermore, it counteracts the effects of phase uncertainty by adding extra degrees of freedom in the created mesh through the additional MZIs. The unconnected output waveguides of the additional MZIs within the Diamond mesh allows for excluding the destructive portion of the interference from its outputs and optimally adjust the optical power levels at its outputs.

The Clements mesh depicted in Fig. 1(d), proposed by Clements, et al., uses the same number of MZIs as the Reck mesh [19]. Although the total number of MZIs in Reck and Clements meshes are equal, Clements demonstrates shorter optical depth, therefore, less insertion loss. In Clements architecture, each input signal crosses its nearest neighbor at the first possible occasion leading to a shorter optical depth. This is in contrast to the Reck mesh, where the bottom input signals must propagate for some distance before interacting with other signals. The mesh depth (number of consecutive MZIs in the longest path) for an 8 × 8 Clements topology is eight MZIs in contrast to 13 MZIs for the Reck. The downside of the Clements topology is its complex calibration and programming. Unlike Reck and Diamond, the Clements structure is not triangular which makes its calibration challenging. In the next section, we show how triangular structures and presence of diagonal paths contribute to more precise calibration and programming.

In this work, we propose the novel Bokun mesh, shown in Fig. 1(e), offering a short optical depth while maintaining the triangular structure essential for accurate calibration and programming. Indeed, the Bokun mesh is a truncated Diamond mesh with the middle optical I/Os used as the main optical path. It can also be considered as an extended version of Clements mesh with extra MZIs on the top and bottom of the structure to provide a triangular shape and offer a diagonal I/O path for each individual MZI. The number of MZIs in Bokun mesh is:

(4)$$n = \; \frac{{N({N + N/2 - 2} )}}{2}.$$

A unitary matrix can be implemented by the optical processor using its N MZIs by the inverse transformation matrices of the constituent MZIs, each being defined on an N-dimensional Hilbert space [21]. This allows for the matrix multiplication of an input vector representing a sample by injecting light at the input ports toward field interactions between the optical components in the mesh. The output vector is the result of the multiplication of the input vector by the unitary matrix.

3. Calibration and programming the optical processors

The first step in using programmable optical processors is the calibration. Through the calibration step, we find the relation between the bias voltage of phase shifters and the applied phase shift. In theory, similar phase shifters demonstrate identical phase/voltage relation. However, in practice, the phase/voltage relation varies for similar phase shifters fabricated on the same chip due to the errors mainly coming from fabrication process variations [25]. Programmable optical processors are analog devices very sensitive to phase settings; therefore, one needs to calibrate each individual phase shifter prior to using the processor. In section 4, we will see how the phase error of a programmable optical processor degrades the classification accuracy of an optical neural network (ONN). In this section, we start discussing calibration of a single MZI as the 2 × 2 building block. Next, we extend the discussion to the system-level calibration of a larger mesh.

3.1 Calibration of a single MZI

Figure 2(a) shows the calibration process of a single MZI. We start by calibrating the phase shifter θ and apply a continuous wave (CW) light into one input; the top input, i.e., In_top is chosen in this case. We then sweep the θ bias voltage and measure the optical power at a chosen output. We choose the bar state, therefore, top output (Out_top) in this measurement. The discussion would be similar for the cross state. According to Eq. (1), one can find the optical electric field distributions from an input emitted to an output port. Focusing on the In_top to Out_top path, we write:

(5)$${O_{top}} = {e^{j({\theta /2} )}}\left[ {{e^{j\varphi }}\sin \left( {\frac{\theta }{2}} \right){I_{top}} + {e^{j\varphi }}\cos \left( {\frac{\theta }{2}} \right){I_{bottom}}} \right],$$

Fig. 2. (a) MZI calibration in the presence of an interfering signal at the bottom input. (b) Optical transmission versus $\theta $ for different cases of interfering signal at In_bottom. The calibration is ideal when In_bottom is null (c) In the presence of a signal at In_bottom, changing ${\Delta _{in}}$ and averaging the optical transmission reduces the $\theta $ calibration error while increases the calibration complexity.

Download Full Size | PDF

Therefore, the transmission from the top input to the top output is:

(6)$$Transmission = {\left( {\frac{{{O_{top}}}}{{{I_{top}}}}} \right)^2} = {\left( {\sin \left( {\frac{\theta }{2}} \right) + \cos \left( {\frac{\theta }{2}} \right)\frac{{{I_{bottom}}}}{{{I_{top}}}}} \right)^2},$$

For a precise calibration, we favor blocking the bottom input port (In_bottom) so no light passes through it. Under this condition, the ${I_{bottom}}$ in Eq. (6) becomes zero and the transmission is proportional to ${\sin ^2}\left( {\frac{\theta }{2}} \right)$ with a minimum and maximum at θ=0 and θ=π, respectively. We used Ansys Lumerical Interconnect to simulate the MZI transmission under various conditions. The black solid line in Fig. 2(b) presents the ideal case when no light goes to In_bottom. Knowing the corresponding bias voltage for θ=0 and θ=π, we calibrate the θ phase shifter.

In practice, if direct access to the MZI input is not viable, a small optical interference may be present at the In_bottom. Considering the optical powers of P_top and P_bottom at the top and bottom input ports with a relative phase shift of ${\Delta _{in}}$, we rewrite Eq. (6) by plugging in ${I_{top}} = \sqrt {{P_{top}}} $ and ${I_{bottom}} = \sqrt {{P_{bottom}}} {e^{j{\Delta _{in}}}}$:

(7)$$Transmission = {\left( {\sin \left( {\frac{\theta }{2}} \right) + \cos \left( {\frac{\theta }{2}} \right)\sqrt {\frac{{{P_{bottom}}}}{{{P_{top}}}}} {e^{j\Delta {_{in}}}}} \right)^2},$$

Eq. (7) formulates the effect of P_bottom and Δ_in on In_top to Out_top transmission, hence, θ calibration error. We define the calibration error of a phase shifter as the difference between the actual phase applied by the phase shifter and the desired phase, caused by non-ideal calibration. Based on this equation, the calibration error increases with P_bottom and maximized for Δ_in= 0, π. Figure 2(b) compares the In_top to Out_top transmission versus θ, for P_top = 0 dBm, and different values of P_bottom and Δ_in. For P_bottom= -20 dBm and Δ_in= 0, the error in realizing the transmission minimum is 0.06 π. For P_bottom= -10 dBm, this error increases to 0.18π and -0.18π, for Δ_in= 0 and π, respectively. In section 4, we will show that 0.18π error on phase shifters drops the classification accuracy of a processor doing MNIST task by approximately 25%. The simulation results of Fig. 2(b) are in agreement with theory presented in Eq. (7). We should note that the presence of a -10 dBm interfering signal at the bottom port is a realistic assumption, especially considering the fact that the previous stage of MZIs may not be calibrated [26].

To remove the effect of interfering light at I_bottom during the θ calibration, Bandyopadhyay, et al. discussed a useful method to average the transmission over 2π change of Δ_in [17]. Sweeping Δ_in can be done by using the external phase shifter of the previous MZI block. The average transmission is then equal to:

(8)$$\left\langle {Transmission} \right\rangle = \frac{1}{{2\pi }}\mathop \int \nolimits_0^{2\pi } {\left( {\sin \left( {\frac{\theta }{2}} \right) + \cos \left( {\frac{\theta }{2}} \right)\sqrt {\frac{{{P_{bottom}}}}{{{P_{top}}}}} {e^{j{\Delta _{in}}}}} \right)^2}d{\Delta _{in}} = {\sin ^2}\left( {\frac{\theta }{2}} \right),$$

As shown in Fig. 2(c), taking the average over a complete 2π period of Δ_in, the transmission is minimized and maximized at θ=0 and θ=π, leading to error-free θ calibration. To ensure averaging over 2π, the previous block controlling the Δ_in must be calibrated. If this is not the case, taking the average over a period slightly different than 2π would lead to a calibration error: 0.053 π and 0.036 π errors for averaging over 2π - 0.4π and 2π + 0.4π, respectively. The averaging technique contributes considerably to mitigating the calibration error even though it is not done exactly over a 2π shift of Δ_in. The downside of the averaging technique is the increase in time and complexity of the calibration process. A typical thermo-optic phase shifter with ${V_\pi } \approx 2\; V$ [26] requires 400 measurement points for a 2π sweep with 0.01 V resolution. Employing the averaging technique, assuming the same resolution, the number of measurement points goes up to 1600 for a two-dimensional sweep of θ and Δ_in.

We now analyze the calibration error generated by the subsequent MZIs in an optical path toward its output. Figure 3(a) shows MZI-1 (under calibration) connected to a photodetector through MZI-2. Based on Eq. (7) and considering no light at In_bottom1 (ports labels are shown in Fig. 3(a)), the In_top1 to Out_top1 transmission is:

(9)$$Transmission = {\sin ^2}\left( {\frac{{{\theta_1}}}{2}} \right){\left( {\sin \left( {\frac{{{\theta_2}}}{2}} \right) + \cos \left( {\frac{{{\theta_2}}}{2}} \right)\sqrt {\frac{{{P_{bottom}}_2}}{{{P_{top}}_2}}} {e^{j{\Delta_{in - 2}}}}} \right)^2},$$

in which, ${\theta _1}$ and ${\theta _2}$ are the internal phase shift of MZI-1 and MZI-2, P_top2 and P_bottom2 are the optical power at the top and bottom input ports of second MZI with relative phase shift of ${\Delta _{in - 2}}$. Ideally, we prefer the transmission to be proportional to ${\sin ^2}\left( {\frac{{{\theta_1}}}{2}} \right)$, resulting in the minimum and maximum transmission at θ₁ = 0 and θ₁=π, respectively. However, ${P_{top}}_2$ and ${\Delta _{in - 2}}$ from the second part of Eq. (9) are functions of ${\theta _1}$ leading to the calibration error.

Fig. 3. (a) MZI calibration in the presence of a secondary MZI at its output. (b) Optical transmission versus ${\theta _1}$ for different conditions of interfering signal at In_bottom2. The calibration is ideal when In_bottom2 is null or ${\theta _2} = \pi $

Download Full Size | PDF

To mitigate this error, the first solution is to set ${\theta _2} = \pi $ (MZI-2 is in the bar state) making the cosine term in Eq. (9) equal to zero. Thus, light at In_top2 directly goes to Out_top2 without interfering with In_bottom2. This scenario is shown in Fig. 3(b) by the red dashed curve. The challenge is that any variation on ${\theta _2}$ translates into an error in the ${\theta _1}$ calibration. The second solution is to block the bottom input port of MZI (${P_{bottom}}_2 = 0$). According to Eq. (9), the transmission would be ${\sin ^2}\left( {\frac{{{\theta_1}}}{2}} \right){\sin ^2}\left( {\frac{{{\theta_2}}}{2}} \right)$, proportional to ${\sin ^2}\left( {\frac{{{\theta_1}}}{2}} \right)$ for a constant value of ${\theta _2}$. Note that in this case, there is no need to precisely set ${\theta _2}$ to $\pi $ or 0. As long as ${\theta _2}$ is constant, MZI-2 acts as a constant optical attenuation between In_top1 to Out_top2 by tapping out a portion of signal to the Out_bottom2. The blue dotted curve in Fig. 3(b) presents this scenario.

If ${P_{bottom}}_2 \ne 0$ and ${\theta _2} \ne \pi $, MZI-2 adds error to the calibration of MZI-1. As shown by the dash-dotted green curve in Fig. 3(b), for ${P_{bottom}}_2 ={-} 10\textrm{}dBm$, a 0.1 π error in ${\theta _2}$ translates to a 0.02 π error in θ ₁ calibration. Based on this discussion, to perform a precise calibration of an MZI, the second input port of all the MZIs in the following stages should be kept null.

The external phase shifter φ sets the output phase of the light coming out of the MZI; therefore, its calibration requires measuring the phase of signal. This can be done through employing multiple transverse electric (TE) modes or coherent detection of light [27]. For the phase shifter φ calibration, the effect of interfering signal at the input of the MZI under calibration and all the subsequent MZIs is similar to what we discussed for the phase shifter θ. In general, having one input of the MZI and all the following MZIs null contributes to precise calibration of both θ and φ phase shifters. This is the principal advantage of diagonal optical paths in interferometric meshes as will be discussed in the next session.

3.2 Calibrating a mesh of interferometers

To calibrate a mesh of MZIs as shown in Fig. 1(b-e), we need to calibrate each individual MZI. As discussed in the previous section, while calibrating an MZI in a mesh, we ideally need one input of the MZI null as well as one input of all MZIs null on the consecutive stages. This is viable for all MZIs on the Diamond and Bokun architectures thanks to the diagonal paths going through every MZIs in these meshes; however, Reck and Clements cannot provide this option.

To calibrate the Bokun mesh, we start with MZI-1 applying light to I’₅ and connecting O’₅ to a detector. When no light goes through I₇, I’₃, and I’₄, we ensure the top input of MZI-1 is null. Next, we calibrate MZI-2 by applying light to I’₅, setting MZI-1 in the cross state and connecting a detector to O’₄. Keeping I₂, I₃, I₄, I₅, I₆, I₇, and I’₄ dark, we ensure the top input of MZI-2 is null. We continue calibrating all MZIs in a similar manner with the sequence identified by MZI numbers in Fig. 1(e). The diamond shape architecture of the Bokun mesh provides a diagonal path (southwest to northeast or northwest to southeast) from an input to an output going through each individual MZIs. Taking this diagonal path while keeping all the other inputs dark ensures one input of the MZI and all the consecutive MZIs toward the output are null. The calibration process for MZI-16 as one of the middle MZIs is illustrated in Fig. 1(e). We illuminate I’₃, setting MZIs 12, 13, 14, 15, 17, and 18 all in their cross state. Note that due to the diagonal path, MZI-16 and the two consecutive MZIs (MZI-17 and MZI-18) have one null input. Therefore, the diamond shape architecture of the Bokun mesh provides the option for calibrating every MZI independently without phase error accumulation. The calibration of the Diamond mesh follows an almost similar procedure.

The MZIs in the Reck mesh shown in Fig. 1(b) can be calibrated in the sequence as enumerated. Due to the triangular topology of the Reck mesh, during the calibration procedure of an MZI one input can always be set as null. For example, in calibrating MZI-22, by applying an input to I₄, setting MZI-19, MZI-20 and MZI-21 in their cross state, and keeping I₀, I₁, I₂, and I₃ dark, we ensure the top port of MZI-22 is null. This helps eliminate the input error in MZI calibration. However, in Reck mesh, we cannot necessarily ensure one input of consecutive MZIs toward the output is null. In the example of calibrating MZI-22, whether we choose the path toward output through MZI-18, MZI-13, and MZI-7 or any other path toward output I/Os, all MZIs may have interfering signals at their secondary output generated from I₄. Although the triangular structure of Reck guarantees MZI-18, MZI-13, and MZI-7 are already calibrated during the calibration of MZI-22, any error on their phase setting translates to the calibration error of MZI-22. As discussed in the previous section, the error generated by the interfering optical light at the input of MZI under calibration is more severe than the error caused by the consecutive MZIs. Therefore, Reck calibration is fairly robust against phase errors. The downside of the Reck is the long and unbalanced optical depth as noted in section 2.

The calibration process of Clements is more elaborate. Clements architecture is designed in a rectangular shape so that each input signal crosses its nearest neighbor at the first possible occasion leading to minimum optical depth of the mesh. However, the short optical depth is at the price of a more involved and in some cases inaccurate calibration. Calibration of Clements is discussed in detail in [17]. The calibration starts from the last stage providing direct access to the outputs and continues toward the inputs with the sequence as enumerated in Fig. 1(d). We start from MZI-1 and we choose its top input to top output path for calibration. While calibrating MZI-1, the structure does not provide any option to shine light at the top input of this MZI while keeping the bottom input dark. The light from the mesh input reaches MZI-1 through several non-calibrated MZIs. Therefore, the technique of averaging the input phase over 2π discussed in the previous section is used in calibrating MZI-1 to mitigate the effect of interfering light at the bottom input. Once the last stage MZIs are calibrated, the calibration process continues to the preceding stages of these mesh topologies.

3.2 Programming and monitoring the state of phase shifters

The programming is the processes of setting all the MZIs’ bias toward achieving a desired weight matrix. Unlike calibration which is done only one time after the fabrication of the programmable optical processor, the programming is done every time the weight matrix changes. Therefore, the time and energy consumption in the programming should not be ignored. In in-situ programming, the MZIs’ bias is set through an optimization technique such as gradient descent [15,16]. In ex-situ programming, the bias points required for a specific weight matrix are externally calculated and implemented on different similar chips. In both techniques, every time the weight matrix changes, the processor should be reprogrammed. Therefore, the programming time should be less than one over the maximum frequency of the weight matrix change. This limits the application of the optical processor to stationary or low frequency variation weight matrix tasks. Once the weight matrix is applied to the phase shifters, dynamic errors caused by thermal crosstalk between the phase shifters, degrade the accuracy of the processor. The MZIs’ bias must be readjusted to compensate for the thermal crosstalk generated by the adjacent phase shifters. The programming would be fast and more accurate if the processor included waveguide taps or in-line transparent photodetectors to monitor the phase setting of each MZI and provide a closed loop system for setting the phase shifters bias. However, waveguide taps and in-line photodetectors often increase the insertion loss of the structure and adds to the complexity of the system [11,18]. Therefore, it would be a great advantage if the architecture can inherently provide an option for monitoring the phase shift applied by a specific phase shifter, through the main optical I/Os and without changing the other phase shifters’ bias. The diagonal architecture of Bokun mesh provides this option.

Let us get back to the example of MZI-16 in the Bokun mesh presented in Fig. 1(e). We showed earlier that by applying light to I’₃, keeping all the other inputs dark, MZI-16 and all the consecutive MZIs toward O₁ have one dark optical input. We schematically demonstrate this situation in Fig. 1(e), by lighting up the illustrated structure through applying light into I’₃. As discussed in section 3.1, keeping the bias of all MZIs except MZI-16 unchanged, the I’₃ to O₁ transmission is proportional to ${\sin ^2}\left( {\frac{{{\theta_{MZI-16}}}} {2}} \right)$. When a specific weight matrix is implemented on the Diamond mesh, we can monitor ${\theta _{MZI - 16}}$ without changing the bias of other MZIs. Because of this feature, we call MZI-16 independently accessible. This feature is very useful when we need to monitor the phase shift applied by a specific phase shifter after implementing the weight matrix. Since we do not need to change the state of other MZIs, we are able to monitor the state of MZI-16 in the presence of thermal crosstalk generated by the other MZIs. This is indeed not viable if we do not have a diagonal path between an input to an output going through MZI-16. Due to the diamond shape of Bokun, all MZIs in this structure are independently accessible.

In Clements, only the MZIs on the two diagonals are independently accessible. As an example, Fig. 1(d) demonstrates why MZI-9 as an off-diagonal MZI is not independently accessible. If we shine light into I₇ and monitor O₃, both inputs of MZI-9 and the following MZIs toward the output are illuminated. After implementing a weight matrix on Clements, if we need to monitor the exact phase shift applied by the MZI-9, we need to change the configuration of MZI-12 and set them to cross state to null the upper input of MZI-9. We also must set MZI-5 and MZI-2 in the bar state so that the I₇ to O₃ transmission path becomes equal to a sinusoidal squared function of ${\theta _{MZI - 9}}$. Similarly, in Reck, only the MZIs on the outer diagonal are independently accessible. Diamond, however, similar to Bokun has all its MZIs independently accessible.

Table 1 compares the characteristics of the four meshes shown in Fig. 1. Diamond and Bokun support easier and more accurate programming having all their MZIs independently accessible. The main improvement in the Bokun compared to Diamond is the shorter depth. Indeed, Bokun is a truncated Diamond, in which we use the center optical I/Os as the main optical path. We also remove the MZIs on the two sides to minimize the optical depth. Bokun mesh can also be seen as a complementary Clements with extra MZIs on top and bottom. These MZIs are essential for providing diagonal path and increasing the number of independently accessible MZIs. Another advantage provided by Bokun is the balanced number of MZI among different optical paths (minimum of seven and maximum of eight in an 8 × 8 structure). The balanced number of MZIs, hence, balanced insertion loss is an important feature of a mesh especially in quantum applications [19]. The downside of Bokun, similar to Diamond, is the larger number of MZIs and optical I/Os. It should be noted that although larger number of MZIs increase the footprint of the mesh, the extra MZIs in Bokun (unlike Diamond) are not in the main optical path, keeping the mesh depth minimized. As highlighted in this section, the extra MZIs in Bokun provide diagonal paths for every MZI in the mesh. Calibration and programming through these diagonal paths are faster and more accurate, decreasing the time and energy required for programming. This will be discussed in section 5.

Table 1. Architecture Characteristics of Different 8 × 8 Meshes

View Table

4. Performance of meshes in optical neural networks

To compare the performance of optical processors based on Reck, Clements, Diamond, and Bokun mesh, we simulate their performance when used as optical neural networks. A conventional single layer digital neural network is depicted in Fig. 4. The inputs (IN) are the features composing of a single sample of the dataset fed into the neural network through nodes X. The vector X is then multiplied by the weight matrix W, resulting in the vector Z, before being sent through the nonlinear activation function $f()$, yielding a final vector $\hat{Y}$ . As such, the equation for a single layer NN is:

(10)$$\hat{Y} = f(Z )= f\; ({W \cdot X} )$$

Fig. 4. Example of an N × N single layer neural network, taking in N features and returning N possible classes.

Download Full Size | PDF

Vector $\hat{Y}$ is processed for the predicted class of the sample by finding its maximum argument. If the network is undergoing backpropagation [28], this $\hat{Y}$ can then be compared to the ground truth vector Y, which for classification purposes is generally a one-hot encoded vector [29]. The number of features is assumed to be equal to the number of classes. The comparison between the two vectors $\hat{Y}$ and Y results in a loss function, L, such as a mean square error [30] or a categorical cross entropy [31]. Once this loss function value is calculated, the gradient is calculated with respect to the weight matrix. Subsequently, gradient descent is done on the network to optimize the weight matrix. The resulting weight matrix W can then be built using the optical processors described in the previous section.

We use two datasets in this work to ensure our comparison is general enough and to ensure that one mesh is not favored due to specific characteristics of the used dataset. The first dataset used is the linearly separable Gaussian dataset presented in [20]. This dataset allows a conventional single layer NN to achieve a classification accuracy of 100% [32]. The second dataset is MNIST [23].

Neuroptica, written by the authors of [33], is a simulation platform for MZI-based ONNs written in Python. It provides a wide range of abstraction levels for training and simulating the ONNs. The lowest-level functionality is implemented allowing for the manipulation of the arrangement and properties of the phase shifters and the couplers of the MZIs, while the highest-level features provide a Keras-like application programming interface (API). This library allows for the training of MZI-based ONNs through backpropagation [15,21]. It should be noted that the backpropagation algorithm used for ONNs must backpropagate all the way back to the phases of the MZIs rather than simply to the matrix weights. As such, the backpropagation algorithm uses the adjoint electric field method [22] to allow in situ optimization of the unitary transformation matrix of the mesh. The library includes both the Reck and the Clements meshes. However, it does not include the Diamond and Bokun mesh which were added allowing the training and simulation of the type of meshes studied in this work. The Diamond and Bokun meshes have been added in a cloned repository, Neuroptica: Towards a Practical Implementation of Photonic Neural Networks repository [34].

The MNIST dataset with pixel value already normalized to [0,1] first undergoes dimensionality reduction from 784 (28 × 28) down to 10 using the principal component analysis (PCA) reduction function from the Scikit-learn module [29]. This matches the dimension with the number of data input ports of our ONN. Then, an 8:2 split is performed on a subset of 4,500 from the 60,000 MNIST samples. Then, 4000 samples are drawn forming the 80% portion, forming the training set, and 500 samples are drawn from the 20% portion, forming the test set. We test the performance of the meshes presented in Fig. 1(b-e), in the presence of two main sources of error, i.e., phase uncertainty and optical loss. Phase uncertainty in the phase shifters of each reconfigurable MZI is of great importance in the experimental programming of the MZI-based optical processors. The phase uncertainty of a phase shifter impacts the optical power splitting ratio at the outputs of the corresponding MZI. Phase error in the phase shifter corrupts the relative phase at its output ports. As a result, phase uncertainties degrade the classification accuracy of the implemented ONN. In this work, the phase uncertainty is represented by a normally distributed random variable ($N(0,\sigma $)). The phase uncertainty is affected by multiple factors, including thermal crosstalk between phase shifters, signal noise of the bias voltages applied to the phase shifters, and waveguide dimension variations. Thus, the phase shift in a phase shifter is mathematically defined as:

(11)$${\mathrm{\Theta }_{actual}} = {\mathrm{\Theta }_{optim}} + N(0,\sigma _\mathrm{\Theta }^2)$$

where, ${\mathrm{\Theta }_{optim}}$ is the optimal phase of the internal phase shifter, i.e., θ, or external phase shifter, i.e., φ, and ${\sigma _\mathrm{\Theta }}\; $ is the standard deviation of either phase. This phase uncertainty is recalculated after each matrix multiplication to mimic the dynamic variation in noise.

Two figures of merits (FoMs) determine the quality of each mesh topology with respect to their tolerance to phase uncertainty and loss. To obtain these FoMs, an ONN is first trained, then made to classify the validation dataset under varying experimental conditions. The FoMs define a surface area of the ONN simulation results that achieves 75% classification accuracy achieved. If phase uncertainty is taken into account (i.e., all that is considered as ${\sigma _\mathrm{\theta }}$ and ${\sigma _\phi }$), the FoM is in units of radian squared (rad²). On the other hand, if the parameters varied include both loss of the constituent MZIs and the phase uncertainty ${\sigma _\mathrm{\theta }}$ and ${\sigma _\phi }$, the FoM is in units of decibel multiplied by radian (dB.rad). In this situation, we assume ${\sigma _\mathrm{\theta }}$ is equal to ${\sigma _\phi }$. The two presented FoMs enables the study of ONNs’ ability to handle practical uncertainties. This analysis has stochastic components (the loss varies depending on fabrication quality and position of the MZI on the chip, for example). Thus, every sample in the validation dataset is retested multiple times, and the average classification accuracy is taken. Figure 5(a-d) demonstrates the ability of the 10 × 10 meshes to handle the θ and φ phase uncertainties. This figure presents the simulation results for classification of the Gaussian dataset when the MZI loss is 0 dB. Details of simulation parameters are presented in the Supplement 1. In all meshes, the accuracy degrades with the phase uncertainty, however, Clements and Bokun are more robust to phase error. The shorter optical depth in Clements and Bokun mainly contributes to the improved robustness to the phase uncertainties.

Fig. 5. Classification accuracy of a 10 × 10 ONNs for the Gaussian dataset. (a)—(d) are based on the Reck, Diamond, Clements, and Bokun mesh topologies with varying θ and φ phase uncertainty for 0 dB loss per MZI. (d)—(f) shows the classification accuracy for ${\sigma _\mathrm{\theta }}$ = ${\sigma _\phi }$ and resistance to MZIs loss. The contour (black line) shows the FoM representing the area of above 75% classification accuracy.

Download Full Size | PDF

Figure 5(e-h) shows the meshes phase uncertainty tolerance with ${\sigma _\mathrm{\theta }}$ = ${\sigma _\phi }$ and resistance to loss of the constituent MZIs. Clements and Bokun with shorter optical depth demonstrate better loss tolerance compared to the two other counterparts. Diamond with symmetrical structure shows relatively better performance in terms of loss tolerance compared to the Reck.

Figure 6(a-d) presents the classification accuracy of a two-layered 10 × 10 MNIST classifier based on the four presented architectures in the presence of θ and φ phase errors. In the case of the MNIST classification, since the classification is more complex, we used a two-layered network to increase the classification accuracy. Simulation results of a single layer 10 × 10 classifier is also provided in the Supplement 1. Similar to the case of the Gaussian dataset, Clements and Bokun are more robust to phase error. Figure 6(e-h) shows the classification accuracy in the presence of phase error and loss. The Clements and Bokun with minimum optical depth provide more robustness against optical loss, i.e., 0.081 dB.rad and 0.049 dB.rad, respectively. However, the classification accuracy of the Reck and Diamond decreases with insertion loss leading to a reduction in the FOM, i.e., 0.018 dB.rad and 0.019 dB.rad, respectively.

Fig. 6. Classification accuracy of a two-layered 10 × 10 ONNs for the MNIST dataset. (a)—(d) are based on the Reck, Diamond, Clements, and Bokun mesh topologies with varying θ and φ phase uncertainty for 0 dB loss per MZI. (d)—(f) shows the classification accuracy for ${\sigma _\mathrm{\theta }}$ = ${\sigma _\phi }$ and resistance to MZIs loss. The contour shows the FoM representing the area of above 75% classification accuracy.

Download Full Size | PDF

A phase error of 0.2 radians may not be tangible such that we translate it into a temperature accuracy for a better physical understanding [35]:

(12)$$\Delta \mathrm{\Theta } = \frac{{2\pi L}}{{{\lambda _0}}}\frac{{dn}}{{dT}}\mathrm{\Delta }T$$

in which $\Delta \mathrm{\Theta }$ is the phase error (for θ or φ phase shifter), L is the phase shifter length, ${\lambda _0}$ is the wavelength, dn/dT is the thermo-optic coefficient of the phase shifter, and $\mathrm{\Delta }T$ is the temperature variation error. Through Eq. (12), there is a linear relation between the phase shift and the change in temperature of the phase shifter. A phase error of 0.2 radians translates into 3.2% change in the temperature of a phase shifter applying a 2π phase shift (0.2/2π). Maintaining such level of temperature accuracy in a die with several phase shifters is a challenging task, primarily because of the presence of thermal crosstalk between these phase shifters. Moreover, fabrication variations such as waveguide sidewall roughness also adds to the phase error.

5. Discussion on energy efficiency

The energy consumption (in the unit of Joule per operation) for an N × N mesh of interferometers can be calculated as [36]:

(13)$${E_{static}}({J/Op} )\textrm{ } = \textrm{}\frac{{n\; \times \; {P_{PS}}}}{{{N^2}\; VR}}\textrm{ }$$

In which n is the total number of phase shifters in the mesh, P_PS is the power dissipation in a phase shifter, and VR is the rate of the input vector being multiplied. We consider VR of 10 GOp/s. Although the interferometer-based optical processor performs analog vector matrix multiplication, the input vector is mainly changing with a specific rate; hence, the computation speed is limited by the speed of incoming vectors. Figure 7(a) compares the energy consumption of four the 10 × 10 different optical processor structures: the Reck, Diamond, Clements, and Bokun. We considered a power dissipation of 20 mW/π for the thermo-optic phase shifters (TOPS) on silicon on insulator (SOI) platform [25,36]. Assuming uniform distribution of phases, the average power consumption of a TOPS is P_π/2 where P_π is the power required for the π phase shift. As shown in Fig. 7(a), Reck and Clements structures with a smaller number of MZIs show better efficiency for static weight matrix while Diamond and Bokun dissipate more energy in larger number of MZIs.

Fig. 7. Energy consumption in units of energy per operation (a) with and (b) without programming.

Download Full Size | PDF

The energy consumption presented by Eq. (13) does not include the energy dissipated during the programming phase. This equation only represents the case of using static weight matrix, when the time/energy required for programming compared to that for computation is negligible. If the weight matrix changes more often, we must, however, include the energy dissipation of the programming phase. Let us assume the weight matrix changes with the frequency of f_w, and we spend the time of t_Prog for programming. The modified energy consumption considering the energy spent for the programming becomes:

(14)$${E_{total}} = \frac{{\frac{1}{{{f_w}}}\; - \; {t_{Prog}}}}{{\frac{1}{{{f_w}}}}}\; {E_{static}} = ({1 - {f_w} \cdot {t_{Prog}}} ){E_{static}}$$

Employing faster programming methods as well as increasing phase shifter speed reduce t_Prog and reduce the total energy consumption. The in-situ programming relying on optimization techniques performed on the chip increases the t_Prog hence E_total. Figure 7(b) compares E_total for the four meshes. For Reck and Clements, we assumed a backpropagation programming method with 200 iterations [16]. The conventional ex-situ programming (i.e., predefined weight matrix programming) leads to considerable accuracy degradation on these meshes due to the lack of monitoring options and presence of phase errors. A two-step ex-situ training scheme employing a genetic algorithm has been proposed to tackle this issue, which also affects the computation time and energy required for training [37]. For Diamond and Bokun, we considered ex-situ programming with 10 iterations/MZI monitoring the MZI state and readjusting their bias. The programming time is estimated based on a 2.2 µs transit time in TOPS [36]. The theoretical optical training time of each iteration needs to account for the maximum of the TOPS transit time and the electronic delay [38]. Considering the slow response of TOPS, we can neglect the electronic delays. The power consumption required by the electronics is not accounted in Eq. (14). We assumed the electronic power consumption is negligible compared to the power-hungry TOPS. More accurate estimation of E_total may include the electronic energy consumption by simply adding this value to the $\; {E_{static}}.$ Adding the electronic power consumption will change the scale of Fig. 7(b) while the trend will remain almost the same. From Fig. 7, for Clements the energy consumption increases from 450 fJ/Op for stationary weight matrix to 3750 fJ/Op for weight matrix changing at 2 kHz. While for Bokun, taking the advantage of monitoring feature provided by the architecture, the energy consumption slightly increases from 610 fJ/Op to 638 fJ/Op.

Based on Eq. (14), E_total decreases as ${f_w}$ increases. This is because ${t_{Prog}}$ is required every time the weight matrix changes. Therefore, within a period of $1/{f_w}$, the processor actively computes for $(1/{f_w} - {t_{prog}}$). Reck and Clement lack diagonal paths, which require time-consuming optimization techniques to program the processor. This significantly increases programming time and reduces the active computation time, resulting in a lower E_total. Diamond and Bokun have faster programming options by utilizing phase monitoring through diagonal paths, resulting in higher E_total compared to Reck and Clement. Diamond mesh also saves energy in the training, however, as shown in Figs. 5–6, its longer optical depth deteriorates its performance in presence of optical insertion loss and phase error. Finally, it’s important to note that the energy consumption evaluation presented in Fig. 7 provides only an approximate evaluation rather than a precise estimation. The main purpose of this evaluation is to highlight the fact that phase monitoring can significantly reduce programming time, thus improving the overall energy efficiency of the system.

6. Summary

Bokun mesh, proposed in this work, is a topology arrangement that merges the attributes of the prior processor topologies Diamond and Clements for optical processors. Like Diamond, Bokun provides diagonal path going through every individual MZI enabling phase monitoring without interference from other MZIs. Providing the monitoring option, Bokun’s programming is faster improving the total energy efficiency of the processor. Using simulations, we demonstrate that compared to rectangular mesh, Bokun mesh improves total energy efficiency by 83% for a 10 × 10 mesh with weight matrix changing at 2 kHz. The Bokun mesh with optimal optical depth also exhibits three times more resilience to loss and fabrication imperfections compared to Reck and Diamond architectures for a 10 × 10 mesh used in a two-layered optical neural network for MNIST classification task.

Acknowledgment

The authors thank Simon Geoffroy-Gagnon for configuring Neuroptica in 2021 in the first phase of this project.

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [34].

Supplemental document

See Supplement 1 for supporting content.

References

1. F. Shokraneh, M. S. Nezami, and O. Liboiron-Ladouceur, “Theoretical and experimental analysis of a 4 × 4 reconfigurable MZI-based linear optical processor,” J. Lightwave Technol. 38(6), 1258–1267 (2020). [CrossRef]

2. L. De Marinis, M. Cococcioni, O. Liboiron-Ladouceur, G. Contestabile, P. Castoldi, and N. Andriolli, “Photonic integrated reconfigurable linear processors as neural network accelerators,” Appl. Sci. 11(13), 6232 (2021). [CrossRef]

3. R. Hamerly, A. Sludds, L. Bernstein, M. Prabhu, C. Roques-Carmes, J. Carolan, Y. Yamamoto, M. Soljacic, and D. Englund, “Towards large-scale photonic neural-network accelerators,” in 2019 IEEE International Electron Devices Meeting (IEDM), pp. 22.8.1–22.8.4, 2019.

4. D. A. B. Miller, “Self-configuring universal linear optical component,” Photonics Res. 1(1), 1–15 (2013). [CrossRef]

5. M. A. Nahmias, T. F. de Lima, A. N. Tait, H. Peng, B. J. Shastri, and P. R. Prucnal, “Photonic multiply-accumulate operations for neural networks,” IEEE J. Select. Topics Quantum Electron. 26(1), 1–18 (2020), Art no. 7701518. [CrossRef]

6. C. Taballione, T. A. W. Wolterink, J. Lugani, A. Eckstein, B. A. Bell, R. Grootjans, I. Visscher, J. J. Renema, D. Geskus, C. G. H. Roeloffzen, I. A. Walmsley, P. W. H. Pinkse, and K. Boller, “8 × 8 programmable quantum photonic processor based on silicon nitride waveguides,” in Frontiers in Optics/Laser Science, paper JTu3A.58, Sept. 2018.

7. N. C. Thompson, K. Greenewald, K. Lee, and G. F. Manso, “Deep learning’s diminishing returns: the cost of improvement is becoming unsustainable,” IEEE Spectr. 58(10), 50–55 (2021). [CrossRef]

8. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

9. Q. Cheng, J. Kwon, M. Glick, M. Bahadori, L. P. Carloni, and K. Bergman, “Silicon photonics codesign for deep learning,” Proc. IEEE 108(8), 1261–1282 (2020). [CrossRef]

10. D. Pérez, I. Gasulla, J. Capmany, and R. A. Soref, “Reconfigurable lattice mesh designs for programmable photonic processors,” Opt. Express 24(11), 12093–12106 (2016). [CrossRef]

11. W. Bogaerts, D. Pérez, J. Capmany, D. A. B. Miller, J. Poon, D. Englund, F. Morichetti, and A. Melloni, “Programmable photonic circuits,” Nature 586(7828), 207–216 (2020). [CrossRef]

12. N. C. Harris, J. Carolan, D. Bunandar, M. Prabhu, M. Hochberg, T. Baehr-Jones, M. L. Fanto, A. M. Smith, C. C. Tison, P. M. Alsing, and D. Englund, “Linear programmable nanophotonic processors,” Optica 5(12), 1623 (2018). [CrossRef]

13. M. Nikdast, G. Nicolescu, J. Trajkovic, and O. Liboiron-Ladouceur, “Chip-scale silicon photonic interconnects: a formal study on fabrication non-uniformity,” J. Lightwave Technol. 34(16), 3682–3695 (2016). [CrossRef]

14. S. Banerjee, M. Nikdast, and K. Chakrabarty, “Characterizing coherent integrated photonic neural networks under imperfections,” Journal of Lightwave Technology 41, 1464–14692022. [CrossRef]

15. S. Pai, B. Bartlett, O. Solgaard, and D. A. B. Miller, “Matrix optimization on universal unitary photonic devices,” Phys. Rev. Appl. 11(6), 064044 (2019). [CrossRef]

16. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864 (2018). [CrossRef]

17. S. Bandyopadhyay, R. Hamerly, and D. Englund, “Hardware error correction for programmable photonics,” Optica 8(10), 1247–1255 (2021). [CrossRef]

18. F. Morichetti, S. Grillanda, M. Carminati, G. Ferrari, M. Sampietro, M. J. Strain, M. Sorel, and A. Melloni, “Non-invasive on-chip light observation by contactless waveguide conductivity monitoring,” IEEE J. Select. Topics Quantum Electron. 20(4), 292–301 (2014). [CrossRef]

19. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal design for universal multiport interferometers,” Optica 3(12), 1460 (2016). [CrossRef]

20. F. Shokraneh, S. Geoffroy-gagnon, and O. Liboiron-Ladouceur, “The diamond mesh, a phase-error- and loss-tolerant field-programmable MZI-based optical processor for optical neural networks,” Opt. Express 28(16), 23495–23508 (2020). [CrossRef]

21. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]

22. F. Shokraneh, S. Geoffroy-Gagnon, and O. Liboiron-Laouceur, “High-performance programmable MZI-based optical processors,” in Silicon Photonics for High-Performance Computing and Beyond (CRC Press, 2021), pp. 335–365.

23. Y. LeCun, C. Cortes, and N. S. C. J. C. Borges, “The MNIST database of handwritten digits,” NIST, 1998, http://yann.lecun.com/exdb/mnist

24. B. Yegnanarayana, Artificial Neural Networks (PHI Learning Pvt. Ltd., 2009).

25. A. Mirza, A. Shafiee, S. Banerjee, K. Chakrabarty, S. Pasricha, and M. Nikdast, “Characterization and optimization of coherent MZI-based nanophotonic neural networks under fabrication non-uniformity,” IEEE Trans. Nanotechnology 21, 763–771 (2022). [CrossRef]

26. A. Das, H. R. Mojaver, G. Zhang, and O. Liboiron-Ladouceur, “Scalable SiPh-InP hybrid switch based on low-loss building blocks for lossless operation,” IEEE Photon. Technol. Lett. 32(21), 1401–1404 (2020). [CrossRef]

27. K. R. Mojaver and O. Liboiron-Ladouceur, “On-chip optical phase monitoring in multi-transverse-mode integrated silicon-based optical processors,” IEEE J. Select. Topics Quantum Electron. 28(6), 1–8 (2022). [CrossRef]

28. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]

29. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research 12, 2825 (2011).

30. C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance,” Clim. Res. 30(1), 79–82 (2005). [CrossRef]

31. K. Hu, Z. Zhang, X. Niu, Y. Zhang, C. Cao, F. Xiao, and X. Gao, “Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function,” Neurocomputing 309, 179–191 (2018). [CrossRef]

32. J. G. Carbonell, R. S. Michalski, and T. M. Mitchell, “An overview of machine learning,” Machine learning 1, 3–23 (1983). [CrossRef]

33. I. A. D. Williamson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan, “Reprogrammable electro-optic nonlinear activation functions for optical neural networks,” IEEE J. Select. Topics Quantum Electron. 26(1), 1–12 (2020). [CrossRef]

34. S. Geoffroy-Gagnon, “Neuroptica: towards a practical implementation of photonic neural networks,” Github, 2020, https://github.com/Xoreus/neuroptica.

35. M. Jacques, A. Samani, E. El-Fiky, D. Patel, Z. Xing, and D. V. Plant, “Optimization of thermo-optic phase-shifter design and mitigation of thermal crosstalk on the SOI platform,” Opt. Express 27(8), 10456–10471 (2019). [CrossRef]

36. M. A. Al-Qadasi, L. Chrostowski, B. J. Shastri, and S. Shekhar, “Scaling up silicon photonic-based accelerators: challenges and opportunities,” APL Photonics 7(2), 020902 (2022). [CrossRef]

37. R. Shao, G. Zhang, and X. Gong, “Generalized robust training scheme using genetic algorithm for optical neural networks with imprecise components,” Photon. Res. 10(8), 1868–1876 (2022). [CrossRef]

38. T. Zhou, L. Fang, T. Yan, J. Wu, Y. Li, J. Fan, H. Wu, X. Lin, and Q. Dai, “In situ optical backpropagation training of diffractive optical neural networks,” Photon. Res. 10(8), 1868 (2022). [CrossRef]

Mesh	Total number of MZIs	Independently accessible MZIs	Mesh depth	Min↔Max MZIs per path
Reck [21]	28	13 (46%)	13	1↔13
Diamond [20]	49	49 (100%)	13	1↔13
Clements [19]	28	14 (50%)	8	4↔8
Bokun (this work)	40	40 (100%)	8	7↔8

Addressing the programming challenges of practical interferometric mesh based optical processors

Abstract

1. Introduction

2. MZI-based optical processor architecture

3. Calibration and programming the optical processors

3.1 Calibration of a single MZI

3.2 Calibrating a mesh of interferometers

3.2 Programming and monitoring the state of phase shifters

4. Performance of meshes in optical neural networks

5. Discussion on energy efficiency

6. Summary

Acknowledgment

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (1)

Equations (14)

Optics Express

Kaveh (Hassan) Rahbardar Mojaver	https://orcid.org/0000-0003-2654-4789
Edward Leung	https://orcid.org/0009-0007-2090-3126
S. Mohammad Reza Safaee	https://orcid.org/0000-0001-8593-2592
Odile Liboiron-Ladouceur	https://orcid.org/0000-0001-6238-5346