Abstract
The recent advent of diffractive deep neural networks or D2NNs has opened new avenues for the design and optimization of multi-functional optical materials; despite the effectiveness of the D2NN approach, there is a need for making these networks as well as the design algorithms more general and computationally efficient. The work demonstrated in this paper brings significant improvements to both these areas by introducing an algorithm that performs inverse design on fully nonlinear diffractive deep neural network - assisted by an adjoint sensitivity analysis which we term (DNA)2. As implied by the name, the procedure optimizes the parameters associated with the diffractive elements including both linear and nonlinear amplitude and phase contributions as well as the spacing between planes via adjoint sensitivity analysis. The computation of all gradients can be obtained in a single GPU compatible step. We demonstrate the capability of this approach by designing several types of three layered D2NN to classify 8800 handwritten digits taken from the MNIST database. In all cases, the D2NN was able to achieve a minimum 94.64% classification accuracy with 192 minutes or less of training.
© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
There are striking similarities between the mathematics and concepts governing optical wave propagation and those of artificial neural networks (ANNs) [1–4]. Perhaps the most straightforward example is illustrated in the recently reported diffractive deep neural networks (D$^{2}$NN) – a specialized dense ANN prudently constrained to mimic diffractive-scale wave propagation [5]. In the D$^{2}$NN perspective, the nodes, layers, and edges of the network directly correspond to discretized electric field values, propagation distances, and point-source wavelets, respectively. Known as the Huygens-Fresnel principle, electric field values at further distances are found by fully superimposing secondary wavelet contributions emanating from the current plane; this concept is mirrored in D$^{2}$NNs by the repeated application of weighted sums to update node values layer-by-layer.
The benefit of the D$^{2}$NN approach is realized with the introduction of training data as there is no simple analog in the wave propagation perspective. By training the D$^{2}$NN on pairs of object/image plane electromagnetic field profiles to serve as input/output training data, one can effectively design a cascade of passive optical elements that can negotiate multiple propagation requirements [5,6]. Equipped with this powerful capability, the design of all-optical elements have been harnessed, generalized, and improved mostly in the context of classification schemes [7–10]. Beyond the confines of classification, D$^{2}$NNs have also found utility in optics-focused applications including beam steering [11], frequency control [12], and mode differentiation [13–15]. In such a short period of time, the novel material design possibilities and application spaces inspired by D$^{2}$NNs have expanded quickly – yet there still exists gaps in the original methodology. Currently, the main improvement zones fit into two categories: generalizing D$^{2}$NNs and increasing the computational accuracy of D$^{2}$NNs. Examples of the former include addressing the need to handle true complex numbers rather than independently tracking amplitudes and phases [16], as well as the proper encoding of nonlinear activation functions [17]. Optical analogues to nonlinear activation functions and biases as seen in traditional ANN had been discussed in the supplemental materials of [5], with several promising avenues of study put forward. Despite the successful experimental demonstration of D$^{2}$NN to the MNIST classifier problem, the absence of nonlinearity in the constructed network was a source of contention [18,19]. Many authors would later go on to explore and apply nonlinearity using photorefractive based approaches similar to those initially proposed in [5], resulting in Fourier space networks, residual networks, and in-situ trained networks [20–22]. In each case an increased computational accuracy compared to purely linear networks was observed.
In this work, we demonstrate significant improvements to both the generalization and the computational efficiency of D$^{2}$NNs by introducing an optimization algorithm for optimizing fully nonlinear complex-valued diffractive deep neural network computationally assisted by an adjoint method referred to here as (DNA)$^{2}$. In the original procedure, the most computationally expensive step in each training iteration occurs during the gradient descent calculation. The gradient must be obtained by individually calculating the derivative of the loss function with respect to each complex-valued parameter at each node. These calculations involved repeated applications of a modified version of the the forward propagation algorithm the number of which scaled linearly with the number of parameters being optimized. Instead, (DNA)$^{2}$ is derived via an adjoint approach that returns the entire gradient using a single application of a GPU-compatible modified version of the forward propagation algorithm. Simply put, (DNA)$^{2}$ removes the linear dependence on the number of parameters entirely. Our adjoint procedure is an advance to those seen previously in D$^{2}$NN [22], being a generalized complex-valued format with an arbitrary parameterized nonlinear activation function. Furthermore, it is compatible with both transmission amplitude and phase light modulators (TAPLM) and optically addressable spatial light modulators (OASLM). Optimization is both simple and efficient for any parameter associated with the linear and nonlinear response of the materials that compose each layer in a D$^{2}$NN as well as the spacing between layers.
2. (DNA)$^{2}$ for transmissive amplitude and phase light modulators (TAPLM)
Our method applies the perturbative approach outlined in [23] to the problem of scalar diffraction theory. We begin our description of (DNA)$^{2}$ with forward propagation. In standard systems, the incident field impinges upon a TAPLM (see Fig. 1). In this regime, the behavior can be described using a discretized form of Rayleigh-Sommerfeld diffraction which models the evolution of scalar electric fields:
Here, $u(x_{i},y_{j};z_{p})$ is the 2D array of the transverse complex electric field values impinging upon the $p^{th}$ layer of the network at a position $z_{p}$. This array has dimensions $NxN$ where $N$ is the number of discretized nodal points along the $\{x,y\}$ directions. Each entry is separated by a constant spatial resolution $\{\Delta x,\Delta y\}$ and tracked by indices $\{i,j\}$. Eq. 1 provides the connection between this current network layer and the next hidden layer at a further position $z_{p+1}$ separated by a propagation distance, $\Delta z_{p}=z_{p+1}-z_{p}$. The field impinging the next hidden layer, $u(x_{m},y_{n};z_{p+1})$, has the same dimensions and spatial resolution as before, but is tracked with a different set of indices $\{m,n\}$. Each impinging field encounters an adjustable complex aperture function, $A(x_{i},y_{j};z_{p})$, which after interacting generates a field immediately after labeled $U(x_{i},y_{j};z_{p})$. Connecting any two consecutive layers is the fixed Rayleigh-Sommerfeld diffraction kernel, $H_{mnij}(\Delta z_{p})$. The latter is dependent on the radial spacing between subsequent layer nodes, $r_{mnij}=[(x_{i}-x_{m})^{2}+(y_{j}-y_{n})^{2}+\Delta z_{p}^{2}]^{1/2}$.
Imposing a convention, we define $p=1$ as the input layer at a distance $z_{1}=0$ and $p=\mathscr{P}$ as the output layer at a distance $z_{\mathscr{P}}$. Eq. 1 is then used to progress through each hidden layer resulting in a network depth equal to $\mathscr{P}-1$ layers with adjustable aperture functions at distances $\{z_{1},z_{2},z_{3},\ldots,z_{\mathscr{P}-1}\}$. Having defined the network architecture, we can introduce training data. The training set size is given by $Q$ where each $q\in Q$ contains an input field condition, $u_{q}(x_{i},y_{j};z_{1})$, and a desired target output field condition, $\bar {u}_{q}(x_{i},y_{j};z_{\mathscr{P}})$. In this notation, the overhead bar conveys that this is the desired or supervised target value. Alternatively, we may also wish to optimize the output to a desired phase. This can be done holographically by interfering $u_{q}(x_{i},y_{j};z_{\mathscr{P}})$ with a reference field $u_{ref}(x_{m},y_{n};z_{\mathscr{P}})$. The interferogram resulting from the superposition of the two fields labeled:
Regardless of the FOM in the D$^{2}$NN context, the computational bottleneck is in retrieving the gradient of the FOM with respect to the parameters we wish to optimize such as phase delay and absorption. We denote this gradient function, keeping in mind that it was derived perturbatively, as ${\Delta FOM}\left/{\Delta \varrho }\right.$. The variable $\varrho$ in this context simply denotes the parameter we are trying to optimize and changes to it via our gradient descent algorithm must be sufficiently small for perturbation analysis to be valid [23]. In previous diffractive neural network approaches, the gradient function is cast with respect to the phase or amplitude modulation of each pixel in the previous $z_{p-1}$ plane via the chain rule. The adjustments are then made by utilizing gradient descent. The standard approach for the backpropagation of errors in the context of D$^{2}$NN is discussed in [5,11] which we will summarize.
The standard gradient descent algorithm is physically equivalent to a point-by-point aperture scan of a pixel sized pinhole coupled with a ${\pi }\left/{2}\right.$ phase shift at the opening over the pixels elements of a D$^{2}$NN (see Eq.7 in [11]). The phase shifted pinhole is applied to the pixel that we wish to obtain gradients for. To this system, we apply the algorithm used in the forward propagation (i.e. solve Eq. 1) to calculate the corresponding field at the output plane $\mathscr{P}$ for a given input. This new field is equivalent to the gradient of the output field with respect to the the pixel’s phase delay. Similarly, one can obtain field gradients with respect to amplitude modulation of a given pixel by omitting the ${\pi}\left/{2}\right.$ phase shift on the pinhole. The output field gradient along with the output field itself is then used to calculate the gradient of the FOM with respect to that pixel (see Eq. 6 in [11]). In order to obtain the FOM gradients for every pixel in the D$^{2}$NN system, we must slide the pinhole over each pixel one by one and repeatedly use the forward propagation algorithm to calculate their respective field gradients. Additionally, this process must be repeated for every input image used to calculate the FOM. Thus in order to obtain all FOM gradients for every pixel ($N^{2}$) on every layer $(\mathscr{P}-1)$ for every input image used ($Q$), the forward algorithm must be invoked $N^{2}\times Q\times (\mathscr{P}-1)$ times. For optimizing large D$^{2}$NN this may require a large amount of computational resources and be very time consuming even for the most efficient of forward propagation routines.
Our procedure moving forward seeks to avoid this computational bottleneck while also being sufficiently general to incorporate nonlinearity. Adjoint methods excel in the electromagnetic inverse design space because for every input they only need a single invokation of the forward algorithm to obtain all the field gradients with respect to all design parameters regardless of the number being optimized [23–25]. This is done by backpropagating an adjoint field through the system, which like the standard approach, is combined with the fields calculated during the forward propagation process to obtain FOM gradients. By using this adjoint approach the number of times the forward routine needs to be invoked decreases from $N^{2}\times Q\times (\mathscr{P}-1)$ to $Q$.
In the following, we drop the q-subscripts used in the fields and intensity terms with the implicit understanding that the network is only completely trained after it has been repeated for each $q\in Q$. The (DNA)$^{2}$ formulation allows for both intensity dependent and independent amplitudes and phases at each hidden layer. This is encoded into the complex aperture function by:
The formulation relating the FOM to the parameters each layer, can now be expressed as an adjoint aperture function $\epsilon (x_{i},y_{j};z_{p})$ defined as:
This has implications on real world attempts to perform all optical in-situ training. In principle the adjoint field can be “simulated” using the scalar electric field so long as it has the same phase and amplitude described in Eq. 1 is used. This synthetic adjoint field however, cannot propagate through the nonlinear components of the network in the manner required for the “real” adjoint field. In order to enforce the proper behavior additional measures are required to manage the discrepancy. For example in [22,29] the synthetic adjoint field is allowed to propagate through the linear components of the network only. When a nonlinear component is encountered, the synthetic adjoint field is calculated electronically on a computer using data collected from both the forward and backward propagation. The calculated synthetic adjoint field is then injected from a external optical source backward into the next linear component. The need for these awkward electronic calculations can be attributed to the direct transmission of the electromagnetic field between layers. Ideally, we would like to avoid such cumbersome electronic operations or replace them with passive optical analogues. In the next section we discuss (DNA)$^{2}$ optically addressable spatial light modulators, where the optical fields is not transmitted directly between layers. As we will show, the formulation of the adjoint field is greatly simplified serving as a possible alternative to published in situ methods.
3. (DNA)$^{2}$ for optically addressable spatial light modulators (OASLM)
In OASLM systems [30–32], the incident field is not transmitted through the mask (see Fig. 1). Instead, the incident field (“write beam”) is used to modulate the phase and/or amplitude of a separate reference field (“read beam”) that is propagated forward to a desired position. In this regime we use a modified form of the discretized Rayleigh-Sommerfeld diffraction equation:
4. (DNA)$^{2}$ optimization of spacing between planes
The adjoint fields in either the TAPLM or OASLM geometries can also be used to optimize the spacing between planes $\Delta z_{p}$ as well. This adjoint field, like those mentioned previously, backpropagates information about the error with respect to said spacing. This adjoint field when propagating to a prior plane is defined as $\hat {u}_{\Delta z}(x_{m},y_{n};z_{p-1})$ and is obtained using the following formula:
5. Adjoint aperture functions and various derivatives
As observed in the previous sections, the adjoint aperture functions $\epsilon (x_{i},y_{j};z_{p})$ exist for both the TAPLM and OASLM regimes and are used to propagate information about the error backwards through the system. At the output plane $\epsilon (x_{i},y_{j};z_{\mathscr{P}})$ is given by the derivative of the figure of merit $\text {FOM}(x_{i},y_{j};z_{\mathscr{P}})$ with respect to the field intensity $I(x_{i},y_{j};z_{\mathscr{P}})$. A list of aperture functions associated with the various FOMs is given in Table 2. Note that in the case of PD we must substitute ${u(x_{i},y_{j};z_{\mathscr{P}})}^{*}\rightarrow u_{int,q}(x_{i},y_{j};z_{\mathscr{P}})^{*}$ in Eq. 9 and Eq. 14. We note here that as the FOM becomes optimized with each iteration, the aperture function becomes more opaque resulting in ever weaker adjoint fields. This is to be expected, as the adjoint fields are directly related to the gradients (see Eqs. 11,14, and 18). In a perfectly optimized system ${\Delta FOM}\left/{\Delta \varrho }\right.=0$ and thus for all adjoint fields $\hat {U}(x_{i},y_{j};z_{\mathscr{P}})=0$.
At the intermediate planes where the elements of the TAPLM/OASLM lay, $\epsilon (x_{i},y_{j};z_{p})$ is related to the derivative of the complex aperture function $A(x_{i},y_{j};z_{p})$ with respect to $I(x_{i},y_{j};z_{p})$:
The gradients with respect to the weight and bias parameters that we seek to calculate (Eq. 11 and Eq. 15) contain derivatives defined as:
6. Example simulations and comparisons
In this section we demonstrate the (DNA)$^{2}$ technique for designing TAPLM and OASLM based D$^{2}$NN, by applying them to the classifier problem for handwritten digits. Image data was drawn from the MNIST database [35] and the the physical parameters of the D$^{2}$NN were taken from the supplemental material of [5]. Simulations for TAPLM and OASLM were done optimizing both linear and nonlinear phase and amplitude parameters as well as the spacing between planes (see Fig. 2). All simulation parameters described in this section are given in Table 3.
Additionally, simulations for a purely linear phase mask TAPLM system were done for comparison. In the purely linear TAPLM case, referred to as TAPLM1, only the linear phase parameters $b_{1}(x_{i},y_{j};z_{p})$ are optimized. The linear amplitude parameters of each layer are set to unity ($b_{2}(x_{i},y_{j};z_{p})=1$) to allow for full transmission of the field, while the nonlinear phase and amplitude parameters are set to zero ($w_{1}(x_{i},y_{j};z_{p})=w_{2}(x_{i},y_{j};z_{p})=0$). In the transmission mask simulations referred to as TAPLM2, all parameters on each mask are optimized. The simulation parameters of the OASLM are likewise optimized. In all three cases, all parameters including those related to spacing were randomly initialized. The spacing between planes was constrained to a minimum value $\Delta z_{min}$ set to 1 cm and a maximum value $\Delta z_{max}$ set to 30 cm. Pixel resolutions for $\Delta x$ and $\Delta y$ were set to 400$\mu m$ with the number of pixels ($NxN$) in the input images up-sampled to 400 x 400 pixels. The resulting images were normalized so that the maximum pixel value is always unity and the minimum zero. The amplitude of the input fields $\left (\left |u_{q}(x_{i},y_{j};z_{1})\right |\right )$ are set equal to the input image, while the phase of the field $\left (\angle u_{q}(x_{i},y_{j};z_{1})\right )$ is set to $0$ everywhere. The total number of images in the training set, validation set, and testing set were 50000, 4000, and 8800 respectively. These numbers were chosen to maximize the MNIST data used while also ensuring that each of the 10 digit categories (0-9) contribute an equal number of images to the total in each set. The target regions are $21.3~{\mu}{\rm m}$ x $21.3~{\mu}{\rm m}$ squares assigned to each digit as shown in Fig. 2.
The method used to perform forward and backward propagation is a variant of the FFT-AS method described in [34]. This method zero-pads the simulation window to $M\times M$ and apodizes the plane wave spectrum in the frequency domain to suppress aliasing. The simulation window used was 800 x 800. The region around the 400 x 400 containing TAPLM/OASLM in the simulations were assumed to be opaque apertures where any incident light was fully absorbed. All parameters were optimized using a combination of mini-batch gradient decent and the Adaptive Moment Estimation (ADAM [36]) algorithms. The training set was subdivided into five batches of 1000 randomly selected images (100 from each category). A smaller mini-batch ($Q=20$) is randomly selected from the larger batch and a forward and backward simulation is performed and used to calculate the gradients. For both the TAPLM and OASLM cases, categorical cross entropy was used to calculate the loss. The gradients are averaged over the mini-batch and fed into the ADAM algorithm where it is used to update the simulation parameters. The learning rates and decay rates used in the ADAM algorithm for the momentum and velocity were $\text {10}^{\text {-3}}$ and $\text {.9}$ respectively for all parameters. This process is repeated for 100 iterations for each batch with the the loss and accuracy of the D$^{2}$NN system calculated against the 4000 images in the validation set ($Q=4000$). Accuracy is determined by taking the ratio of correct predictions over total predictions for a given dataset ($Q$). In our simulations, the predicted digit value ($\nu$) for a given input field ($q$) is determined by which target region has the highest optical power (i.e. $\underset {v}{max}\ P_{q,v}\left (z_{\mathscr{P}}\right )$).
The validation set calculation defines the end of 1 epoch. A new batch of training data is generated and the aforementioned optimization process begins again. For our simulations the optimization process consisted of 75 epochs with the final D$^{2}$NN system selected from the epoch with the highest accuracy value when measured against the validation set. At the end of optimization the loss and accuracy of each D$^{2}$NN system is calculated against the 8800 images in the testing set ($Q=8800$). The code used in our simulation was written in Python and implemented on a AMD Ryzen 9 3900x 12 core CPU system with 64 GB of RAM and a NVidia RTX-2070 Super GPU with 8 GB of RAM. Simulation code was written in house using CuPy [37] to perform forward and backward simulations on the GPU.
The evolution of the loss and accuracy against the validation set verses the simulation epochs of the TAPLM1, TAPLM2, OASLM are shown in Fig. 3. Included as well as are the total training and validation time of each scenario. All three of the simulations converged quickly to a solution with the loss and accuracy being roughly inversely proportional. The parameters that yielded the best accuracy results were selected for the final testing phase. The best accuracy values were 94.2% for TAPLM1, 95.7% for TAPLM2, and 95.1% for OASLM. The TAPLM1 case required the least amount of time (129 minutes) as it only required optimizing the 3 spacing parameters and linear phase components (3 layer’s bias components). The TAPLM2 (192 minutes) and OASLM (187 minutes) simulations, on the other hand, needed to optimize 3 spacing parameters and 12 weight and bias components of all the layers. Even with a roughly 4-fold increase in the number of parameters being optimized, the increase in simulation times for both TAPLM2 and OASLM were no more than 1.5 times that of the TAPLM1 case. This modest increase in simulation time despite the significant increase in design parameters highlights the main benefit of the adjoint methods. Another benefit that contributes to efficiency are the natural recursions present in the (DNA)$^{2}$ algorithm, a feature common to most adjoint methods. Many terms such as the complex aperture functions and intensities are used in both the forward and backward propagation routines and thus only need to be calculated once. Additionally, terms that are unique to the backward calculations such as ${\partial A}\left/{\partial \varrho }\right.$ and ${\partial A}\left/{\partial I}\right.$ are relatively fast to calculate due to the simple form of the activation functions $g_{1}$ and $g_{2}$. As a final note, the OASLM were observed to slightly out perform the TAPLM2 in terms of simulation time which we attribute to the simpler nature of the OASLM calculations. As mentioned previously, when comparing the equations for the adjoint fields (Eq. 9 vs Eq. 14) and the gradients (Eq. 11 and Eq. 15), the OASLMs require fewer operations and thus should take less time.
After optimization, the D$^{2}$NN systems were evaluated against the testing set resulting in accuracy values of 94.64% for TAPLM1, 95.82% for TAPLM2, and 94.85% for OASLM. Fig. 2(c) shows a sample result from an TAPLM2 optimized system that correctly predicted the value of the input and is typical of what was observed in all the D$^{2}$NN simulations. Although all target regions are illuminated, we can clearly see that the brightest region corresponds to the intended target region. The confusion matrices for each simulation, shown in Fig. 4, indicates that the accuracy of each D$^{2}$NN system are generally high across all numbers. The TAPLM1 and OASLM cases do struggle to classify certain digits such as 8 though both maintained accuracies above 88.98% and 88.64% respectively. Overall, the performance of each case were found to be very close with the nonlinear systems having a slight edge in terms of accuracy when classifying each number. This was especially clear in the TAPLM2 case with accuracy never falling below 92.73% for any digit. This of course comes at the expense of an increased optimization time. It is possible that these differences may decline with either an increase in the number of epochs, layers, and/or pixels.
7. Conclusion
We have demonstrated significant improvements to both the generalization and computational efficiency of D$^{2}$NN inverse design algorithms by simulating a fully optimizable nonlinear complex-valued diffractive deep neural network utilizing the (DNA)$^{2}$ optimization routine. The technique was used to develop a D$^{2}$NN-based digit classifier using linear transmission phase masks (TAPLM1), fully nonlinear amplitude/phase modulating transmission masks (TAPLM2), and fully nonlinear amplitude/phase modulating optically addressable spatial light modulators (OASLM) architectures. Using data taken from the MNIST database, weight and bias parameters associated the linear/nonlinear amplitude/phase response of each element as well as the spacing between planes were optimized for each of the aforementioned D$^{2}$NN systems. After 192 minutes or less of training and validation, each system was able to categorize a test set of 8800 never before seen test digits, with an accuracy of 94.64% (TAPLM1), 95.82% (TAPLM2), and 94.85% (OASLM). While all systems were able to achieve a minimum 94% level of accuracy, the linear TAPLM1 D$^{2}$NN architecture which was fastest to train (129 minutes) was the least able to classify certain digits with accuracy falling to as low as 88.98%. This pattern was followed by the OASLM (187 mins) which also struggled with certain digits with the lowest accuracy being 88.64%. The nonlinear cases TAPLM2 (192 min) taking the longest to train, performed the best with the accuracy never falling below 92.73% for any digit. We note however, that the average accuracy values are still very close (roughly 95%) and may be improved with increases in the number of pixels/layers and training epochs.
We anticipate that the adoption of (DNA)$^{2}$ in the space of diffractive optics and holography will facilitate design of more advanced systems. Such systems may require balancing the needs of multiple wavelengths using dispersion engineered materials (meta-atoms) or multiple objectives imposed by an imaging system. In these instances, the (DNA)$^{2}$ algorithm stands as an efficient alternative to the standard approach that could significantly accelerate inverse design process for gradient based approaches. Beyond the diffractive optics, (DNA)$^{2}$ may also find use as a flexible phase/amplitude retrieval algorithm in coherent optical imaging systems. Being based on the Rayleigh-Sommerfeld algorithm, our approach is able to operate seamlessly in nearfield or farfield conditions.
Funding
Air Force Research Laboratory under the MaRSS contract task order (13 FA8650-16-D-5404-0013.).
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon request.
References
1. X. Sui, Q. Wu, J. Liu, Q. Chen, and G. Gu, “A review of optical neural networks,” IEEE Access 8, 70773–70783 (2020). [CrossRef]
2. D. Psaltis and N. Farhat, “Optical information processing based on an associative-memory model of neural nets with thresholding and feedback,” Opt. Lett. 10(2), 98 (1985). [CrossRef]
3. D. Psaltis, D. Brady, X. G. Gu, and S. Lin, “Holography in artificial neural networks,” Nature 343(6256), 325–330 (1990). [CrossRef]
4. R. Xu, P. Lv, F. Xu, and Y. Shi, “A survey of approaches for implementing optical neural networks,” Opt. Laser Technol. 136, 106787 (2021). [CrossRef]
5. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]
6. D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Analysis of diffractive optical neural networks and their integration with electronic neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–14 (2020). [CrossRef]
7. Y. Luo, D. Mengu, N. T. Yardimci, Y. Rivenson, M. Veli, M. Jarrahi, and A. Ozcan, “Broadband diffractive neural networks,” in Optics InfoBase Conference Papers, (2020).
8. Y. Chen and J. Zhu, “An optical diffractive deep neural network with multiple frequency-channels,” (2019).
9. J. Shi, M. Chen, D. Wei, C. Hu, J. Luo, H. Wang, X. Zhang, and C. Xie, “Anti-noise diffractive neural network for constructing an intelligent imaging detector array,” Opt. Express 28(25), 37686–37699 (2020). [CrossRef]
10. J. Li, D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Class-specific differential detection in diffractive optical neural networks improves inference accuracy,” Adv. Photon. 1(04), 1 (2019). [CrossRef]
11. I. U. Idehenre and M. S. Mills, “Multi-directional beam steering using diffractive neural networks,” Opt. Express 28(18), 25915–25934 (2020). [CrossRef]
12. Y. Luo, D. Mengu, N. T. Yardimci, Y. Rivenson, M. Veli, M. Jarrahi, and A. Ozcan, “Design of task-specific optical systems using broadband diffractive neural networks,” Light: Sci. Appl. 8(1), 112 (2019). [CrossRef]
13. S. Zheng, X. Zeng, L. Zha, H. Shangguan, S. Xu, and D. Fan, “Orthogonality of diffractive deep neural networks,” (2018).
14. Q. Zhao, S. Hao, Y. Wang, L. Wang, and C. Xu, “Orbital angular momentum detection based on diffractive deep neural network,” Opt. Commun. 29, 36936-36952 (2019). [CrossRef]
15. Z. Huang, P. Wang, J. Liu, W. Xiong, Y. He, J. Xiao, H. Ye, Y. Li, S. Chen, and D. Fan, “All-optical signal processing of vortex beams with diffractive deep neural networks,” Phys. Rev. Appl. 15(1), 014037 (2021). [CrossRef]
16. Y. L. Xiao, S. Li, G. Situ, and Z. You, “Unitary learning for diffractive deep neural network,” Opt Lasers Eng. 139, 106499 (2021). [CrossRef]
17. Y. Sun, M. Dong, M. Yu, J. Xia, X. Zhang, Y. Bai, L. Lu, and L. Zhu, “Nonlinear all-optical diffractive deep neural network with 10.6 μ m wavelength for image classification,” Int. J. Opt. 2021, 1–16 (2021). [CrossRef]
18. H. Wei, G. Huang, X. Wei, Y. Sun, and H. Wang, “Comment on all-optical machine learning using diffractive deep neural networks,” (2018).
19. D. Mengu, Y. Luo, Y. Rivenson, X. Lin, M. Veli, and A. Ozcan, “Response to comment on;all-optical machine learning using diffractive deep neural networks,” (2018).
20. T. Yan, J. Wu, T. Zhou, H. Xie, F. Xu, J. Fan, L. Fang, X. Lin, and Q. Dai, “Fourier-space diffractive deep neural network,” Phys. Rev. Lett. 123(2), 023901 (2019). [CrossRef]
21. H. Dou, Y. Deng, T. Yan, H. Wu, X. Lin, and Q. Dai, “Residual D 2 NN: training diffractive deep neural networks via learnable light shortcuts,” Opt. Lett. 45(10), 2688–2691 (2020). [CrossRef]
22. T. Zhou, L. Fang, T. Yan, J. Wu, Y. Li, J. Fan, H. Wu, X. Lin, and Q. Dai, “In situ optical backpropagation training of diffractive optical neural networks,” Photonics Res. 8(6), 940–953 (2020). [CrossRef]
23. C. M. Lalau-Keraly, S. Bhargava, O. D. Miller, and E. Yablonovitch, “Adjoint shape optimization applied to electromagnetic design,” Opt. Express 21(18), 21693–21701 (2013). [CrossRef]
24. G. Veronis, R. W. Dutton, and S. Fan, “A new method for sensitivity analysis of photonic crystal devices,” in Photonic Crystal Materials and Devices III, vol. 5733 (International Society for Optics and Photonics, 2005), pp. 348–355.
25. A. Michaels and E. Yablonovitch, “Leveraging continuous material averaging for inverse electromagnetic design,” Opt. Express 26(24), 31717–31737 (2018). [CrossRef]
26. N. Kukhtarev, V. Markov, S. Odulov, M. Soskin, and V. Vinetskii, “Holographic storage in electrooptic crystals.: I. steady state,” in Landmark Papers On Photorefractive Nonlinear Optics, (World Scientific, 1995), pp. 37–48.
27. P. A. Blanche, A. Bablumian, R. Voorakaranam, C. Christenson, W. Lin, T. Gu, D. Flores, P. Wang, W. Y. Hsieh, M. Kathaperumal, B. Rachwal, O. Siddiqui, J. Thomas, R. A. Norwood, M. Yamamoto, and N. Peyghambarian, “Holographic three-dimensional telepresence using large-area photorefractive polymer,” Nature 468(7320), 80–83 (2010). [CrossRef]
28. H. Bouas-Laurent and H. Dürr, “Organic photochromism (IUPAC technical report),” Pure Appl. Chem. 73(4), 639–665 (2001). [CrossRef]
29. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864–871 (2018). [CrossRef]
30. J. Grinberg, A. Jacobson, W. Bleha, L. Miller, L. Fraas, D. Boswell, and G. Myer, “A new real-time non-coherent to coherent light image converter the hybrid field effect liquid crystal light valve,” Opt. Eng. 14(3), 143217 (1975). [CrossRef]
31. P. Aubourg, J.-P. Huignard, M. Hareng, and R. Mullen, “Liquid crystal light valve using bulk monocrystalline Bi12SiO20 as the photoconductive material,” Appl. Opt. 21(20), 3706–3712 (1982). [CrossRef]
32. U. Efron, J. Grinberg, P. Braatz, M. Little, P. Reif, and R. Schwartz, “The silicon liquid-crystal light valve,” J. Appl. Phys. 57(4), 1356–1368 (1985). [CrossRef]
33. M. Cronin-Golomb, B. Fischer, J. White, and A. Yariv, “Theory and applications of four-wave mixing in photorefractive media,” IEEE J. Quantum Electron. 20(1), 12–30 (1984). [CrossRef]
34. K. Matsushima and T. Shimobaba, “Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields,” Opt. Express 17(22), 19662 (2009). [CrossRef]
35. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]
36. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization”, arXiv preprint arXiv:1412.6980 (2014).
37. R. Nishino and S. H. C. Loomis, “Cupy: A numpy-compatible library for nvidia gpu calculations,” 31st confernce on neural information processing systems151 (2017).