High accuracy single-layer free-space diffractive neuromorphic classifiers for spatially incoherent light

François Léonard; Elliot J. Fuller; Corinne M. Teeter; Craig M. Vineyard

doi:10.1364/OE.455007

1. Introduction

Identifying features in a scene from their emitted or reflected light is at the core of many applications such as autonomous driving, feature recognition, and imaging. Usually this is accomplished by analyzing digital images captured by photodetectors with high-end algorithms running on electronic computers. An alternative approach has recently emerged whereby passive optical materials that diffract free-space optical fields act as neural networks and perform neuromorphic inference [1–3] without relying on computation in the electronic domain. This all-optical approach has potential for significant improvements in speed and energy consumption by taking advantage of the physics of light propagation. Hybrid approaches have also been explored where free-space optical front ends are integrated with electronic neural networks [4,5] such that the optical material replaces some part of the neural network, most commonly the convolution layer.

Previous research on all-optical approaches has focused on coherent light fields [1–3,6–10]. While coherent light fields are relevant to some situations (e.g. illumination of scenes with lasers), expanding to incoherent light would significantly broaden the applicability. Hybrid approaches [5,11,12] have been considered for incoherent light, but these still rely on 4f systems and significant additional processing by electronic neural networks. An all-optical neural network system for incoherent light could open up a new avenue for compact, fast neuromorphic inference on more common light fields.

When developing such a system, it is important to not only determine the optimal material structure and overall system architecture, but also to develop a fundamental understanding of the factors that lead to high performance. For example, in the case of coherent light, recent work has established the fundamental properties of the system and demonstrated how the light coherence leads to nonlinear light-matter interaction effects that lead to high accuracy, even for a single layer of diffractive material [7]. However, the same fundamental understanding does not currently exist for incoherent light. This is due in part to the significantly larger computational resources needed to study the all-optical incoherent case.

Here we present extensive large-scale simulations for a single layer linear diffractive metamaterial with an incoherent light field as the input representing the MNIST and Fashion-MNIST datasets. We show that once optimized, such a system can achieve performance on par with conventional electronic linear classifiers, including neural networks, but it requires co-design of the structure of the material, the architecture of the system, and the training algorithm. We demonstrate a fundamental scaling law for the performance of the system with number of apertures and show that only a few thousand apertures in a metamaterial are needed to achieve high performance. Furthermore, we show that the limitations of the system due to the inherently linear photodetection process for incoherent light can be overcome with a differential detection scheme that boosts the performance beyond that of linear classifiers.

2. Results and discussion

The optical classification system is illustrated in Fig. 1. An incoming spatially incoherent and temporally coherent light field of wavelength λ impinges on a single metamaterial of size $L \times L$ consisting of N apertures arranged in a square grid (aperture spacing d so that $L = \sqrt N d$). Light transmission occurs only through the apertures, labelled $k = 1,\ldots ,N$ and located at ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _k} = ({{x_k},{y_k},z = 0} ).$ Each aperture k is filled with a material of refractive index n and thickness ${t_k}$ so that the transmitted light acquires a phase ${\phi _k} = 2\pi n{t_k}/\lambda $ as it transmits through the aperture. The source plane is located at a distance ${H_{in}}$ from the material plane, while the detector plane is separated from the material plane by ${H_{out}}.$ In this work we consider aperture diameters less than the wavelength such that the input light intensity is uniform across the aperture.

Fig. 1. Geometry of the free-space passive classification system for incoherent light. (Left) The source light field propagates to a material consisting of subwavelength apertures. Each aperture is partially filled with a dielectric material, as shown in cross-section. Secondary waves are emitted from each aperture, with a phase ${\phi _k}$ determined by the thickness of the dielectric material in aperture k. (Right) The source light is constructed from the 28 × 28 pixels of the dataset, as illustrated with the handwritten number 5. The material is of side length L and aperture spacing d on a square array. The different dielectric thicknesses for each aperture are illustrated with colorized circles to facilitate visualization. The detector plane has ten detectors, with one detector in the middle and the other nine arranged in a circle of radius R.

Download Full Size | PDF

The input light field is constructed from the individual digital images of the MNIST or Fashion-MNIST dataset (see Figs. 2 and 4 for examples). Each digital image consists of ${N_{in}} \times {N_{in}} = 28 \times 28$ greyscale pixels which we label with the indices m,n and positions ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _{mn}} = ({ma,na, - {H_{in}}} )$ where a is the width of the pixel (see Fig. 1). Throughout this manuscript we set ${N_{in}}a = L,$ i.e. the input light fills the material fully. We construct a continuous monochromatic light field of wavelength λ from the digital images, by considering each pixel of the digital image as a point source of intensity ${I_{mn}}$ and propagating the light from each pixel as a spherical wave to the material plane.

The amplitude of the input electric field from any given pixel m,n at aperture k is $E_{in}^{mn}\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_k}} \right) = \sqrt {{I_{mn}}} {[{{{({{x_k} - ma} )}^2} + {{({{y_k} - na} )}^2} + H_{in}^2} ]^{\; - 1/2}}$ and the amplitude of the output field is given by

(1)$$E_{out}^{k,mn}\left( {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} } \right) = E_{_{in}}^{mn}\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_k}} \right)\frac{z}{{{{\left|{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} - {{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_k}} \right|}^2}}}\left( {\frac{1}{{2\pi \left|{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} - {{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_k}} \right|}} - \frac{i}{\lambda }} \right){e^{i\left( {\frac{{2\pi }}{\lambda }\left|{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} - {{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_k}} \right|+ {\phi_k}} \right)}}.$$

This equation for the far field is appropriate when the detector plane is several wavelengths from the material plane. (As discussed below, we also use it for closer separations to illustrate the general behavior of the system). In addition, Eq. (1) assumes that the input light field $E_{_{in}}^{mn}\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_k}} \right)$ is uniform across the aperture; this requires that the aperture diameter be less than the wavelength.

Since the light is spatially incoherent and temporally coherent, the total time-averaged output intensity at detector p is $I_{_p}^{incoh} = \sum\limits_{mn} {{{\left|{\sum\limits_k {E_{out}^{k,mn}\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_p}} \right)} } \right|}^2}} .$ This can be compared to the case for spatially and temporally coherent light $I_p^{coh} = {\left|{\sum\limits_k {\left( {\sum\limits_{mn} {E_{out}^{k,mn}\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_p}} \right)} } \right)} } \right|^2},$ revealing three key differences: first, the summation over the source pixels only needs to be performed once for coherent light, while for incoherent light it has to be performed every time the phases are updated in the training process, as described below. This makes the study of the incoherent case more computationally intensive by a factor equal to the number of non-zero pixels in the source images (several hundred for the datasets considered here); second, while coherence is maintained between apertures in both cases, only the coherent case shows coherence effects between source pixels; third, the detector intensity is linear in the input light intensity for the incoherent case while it is nonlinear for coherent light [7]. Indeed, expansion of $I_p^{coh}$ shows that cross-terms between input pixels are present, while they are absent in $I_p^{incoh}.$ In the case of spatially and temporally incoherent light, the expression is $\sum\limits_{mn} {{{\sum\limits_k {\left|{E_{out}^{k,mn}\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_p}} \right)} \right|} }^2}}$ which is also linear in the input light intensity. In that case the source bandwidth would have to be considered; this may require a more complex aperture pattern with different aperture spacings to accommodate the different wavelengths.

To identify objects from the light field we first consider the case of M output photodetectors, each corresponding to one of the M classes (a different approach will be discussed below). The detectors are distributed in a circular pattern on a plane, as illustrated in Fig. 1, with ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _p}$ denoting the position of detector p, with p = 0,…9. We chose the digit “0” (item 1 for Fashion) for the center detector and evenly distributed the remaining nine detectors around the circle. Other arrangements are also possible. For example, previous work has considered square distributions [1,10,11], and it might be possible to reduce the number of detectors using a combinatorial detection scheme. Further work is needed to determine the optimal detector configuration. The detector pattern radius chosen here was previously shown to be optimal for coherent light [7].

The approach is to learn the phases ${\phi _k}$ such that the intensity on detector p is maximal when the incoming light field contains an object corresponding to class p. This is done by minimizing the cross-entropy cost function

(2)$$C ={-} \frac{1}{N}\sum\limits_{images} {\log \left( {\frac{{{e^{{{\tilde{I}}_t}}}}}{{\sum\limits_{p = 1}^M {{e^{{{\tilde{I}}_p}}}} }}} \right)}$$

where the sum is over the N training images, and the subscript t indicates the target detector. This standard function is chosen because it provides good accuracy and convergence for classification tasks on MNIST and Fashion-MNIST when used with electronic neural networks [13]. We normalized the intensities [2] as

(3)$${\tilde{I}_p} = f\frac{{{I_p}}}{{\max \{I \}}}$$

where $\max \{I \}$ means the maximum value of the intensity on the M detectors. As discussed below, we find that optimizing f is essential to obtain high performance.

Datasets and training. We use a gradient descent approach with the Adam algorithm [14], with the learning rate halved if the training accuracy decreased between two successive epochs (defined as one pass through the full training dataset). The phases were updated at each epoch using the analytical expressions for $\partial C/\partial {\phi _k}$ (See Supplement 1). The process was implemented in Fortran MPI and distributed on up to 100 parallel processors (2.6 GHz, 64GB of RAM). The training time and memory requirements are discussed in more detail in Supplement 1.)

The MNIST [15] and Fashion [16] datasets were used in their original format and order. Both contain M = 10 classes: for MNIST these correspond to the digits 0 to 9 while for Fashion these are ten different types of clothing. We trained on the first 50,000 images and tested on 10,000 images. The phases were optimized using mini-batches of 50 images. The inference performance was obtained by calculating the optical intensity on each of the output detectors, with a successful classification when the output light intensity was maximal on the target detector. An example training convergence is shown in Supplement 1.

Fig. 2. Example of the classification process. The light field for the digit 7 in panel (a) impinges on the metamaterial in panel (b) creating an output field on the detector plane in panels (c) and (d). In panel (b), the phases ${\phi _k}$ between 0 and 2π are plotted in a white to blue color contour plot on the material to facilitate visualization. The maximum intensity on one of ten detectors determines the class, as shown in the graph in (d). Here λ = 10 µm, N = 12544, d = 10 µm, H_in = 100 µm, and H_out = 1300 µm.

Download Full Size | PDF

Having defined the system elements, we explore its fundamental properties through extensive simulations. We present results for a wavelength $\lambda = 10\textrm{ }\mu \textrm{m}$ but the results apply to any wavelength by appropriately rescaling the length dimensions. Figure 2 shows an example of the classification process whereby the light field for the handwritten digit 7 impinges on the trained material leading to an output light field on the detector plane. This light field tends to be concentrated near the ten detectors but does not consist of perfect focusing on the target detector. Rather, the system uses small variations in the positions of the focused spots that are in close proximity to each of the detectors to achieve maximum intensity on the target detector. We also note that the phase distribution does not show obvious features, in contrast to the coherent case where Fresnel-like lenses were apparent [7].

3. Impact of training algorithm, material structure, and system architecture

The performance of the system depends critically on the co-optimization of the training algorithm, the material properties, and the system architecture. In this section, we describe how the training algorithm, material structure, and system architecture influences the performance. Figure 3(a) shows the testing accuracy on MNIST for two different system sizes containing 784 and 12544 apertures, as a function of the hyperparameter f in Eq. (3), which acts as a weighting factor between the intensity on the detector of interest and the intensities on the rest of the detectors. In both cases, we find that the accuracy depends significantly on f, with a clear maximum at $f \approx 40.$ As shown in Supplement 1, this can be understood from the dependence of the cost function on the intensity difference between detectors; as illustrated in Fig. 3(b) when a digit impinges on the material with trained phases, the system trained with smaller values of f gives larger intensity differences between detectors. We verified this concept by calculating the distribution of differences between the highest and lowest intensities for all 10000 MNIST inference digits as shown in Fig. 3(c). We find that the system trained with f = 20 has an average minimum intensity about 40% lower than the maximum intensity, while the system trained with f = 100 has an average minimum intensity that is less than 20% lower than the maximum intensity. This trade-off between performance and intensity difference will need to be considered when selecting photodetectors to ensure that the signal to noise ratio is large enough to resolve these differences. The ability of the system trained with larger f to find quality solutions for small intensity differences also comes with reduced training gradients (see Supplement 1), which might make it more difficult for the system to find other optimal solutions, thus leading to intermediate values of f being optimal.

Fig. 3. Impact of training algorithm parameter on MNIST inference performance. (a) Dependence of the testing accuracy on the hyperparameter f for two material sizes. (b) Normalized intensity on the ten detectors when the digit 7 impinges on the material with N = 784 apertures. The curly brackets indicate the range of normalized intensities for each value of f. (c) Distribution of differences in intensities for the 10000 MNIST inference digits for different values of the hyperparameter f.

Download Full Size | PDF

Because the optical classifier is a physical system, optimizing the training algorithm is not sufficient to achieve high performance. It is also critical to co-design the material structure and the system architecture. We first studied the performance of the system on MNIST as a function of the distance between the source plane and the material plane. As shown in Supplement 1, we find a generally mild dependence over a broad range of distances, and therefore set ${H_1} = 100\mu m$ for the rest of the calculations. A more interesting behavior is observed for the distance to the detector plane, as indicated in Fig. 4. There, the system achieves high performance for MNIST and Fashion in a range of values of ${H_{out}},$ with the performance declining if the detector plane is too close or too far from the material. This behavior is similar to that observed for coherent light [7], except that the range of ${H_{out}}$ is much larger here by an order of magnitude. While Eq. (1) is appropriate for the far field, we use it at smaller ${H_{out}}$ to illustrate the general behavior of the system. The reduction in performance there arises from the poor mixing of intensities from the different input pixels due to the large angles between input pixels and apertures, which is expected to persist even if near field expressions for the transmitted electric field were used.

Fig. 4. Dependence of the testing accuracy on the distance between the material and the output plane for (a) MNIST dataset, and (b) Fashion dataset. The greyscale images illustrate the ten classes of objects for each dataset. Filled (open) symbols are for the regular (differential) detection scheme. Here ${H_1} = 100\mu m\textrm{ and }d = 10\mu m$ except for N = 196 where d = 20 µm.

Download Full Size | PDF

The results of Fig. 4 also show that the structure of the material plays an important role in determining the performance. Indeed, for the same aperture spacing of 10 µm, increasing the lateral size of the material to accommodate more apertures has a significant impact on testing accuracy. For example, a system of size L = 280 µm containing 196 apertures achieves ∼82% accuracy on MNIST, while a system with L = 1120 µm and 12544 apertures reaches ∼92% on MNIST. This trend, plotted in Fig. 5, is one of the main results of this work: it shows that the system performance plateaus at ∼92% for MNIST and ∼84% for Fashion, values that can be obtained with a few thousand apertures. These values are on par with the best performance achieved by linear electronic classifiers (∼92% for MNIST and ∼84% for Fashion) [7], indicating that the optimized incoherent optical classifier can achieve such performance, but is limited by its inherent linearity. The results can also be compared with Ref. [11] where a hybrid incoherent 4f system to implement convolution was considered and achieved ∼92% inference accuracy on a reduced MNIST dataset (9 classes instead of 10).

Fig. 5. Dependence of the testing accuracy and testing error on the number of apertures for (a) MNIST and (b) Fashion. Filled (open) symbols correspond to regular (differential) detection.

Download Full Size | PDF

The system also displays a scaling trend where the testing error scales inversely with the number of apertures (Fig. 5, insets). While the large N asymptotic behavior is difficult to access due to the computational requirements, it seems that the error approaches a finite value in that limit, and therefore the system has reached close to its optimal performance. Identifying this scaling trend is important because it shows three important points: first, the trend is only obtained after careful co-design of the system to obtain each of the data points; second, it shows that increasing the size of the layer only leads to marginal additional improvements in the performance; third, since the system has already reached the optimal performance for a linear system, adding more layers would not improve the performance.

4. Differential detection

The results above indicate that the all-optical incoherent system can achieve good performance when considering the speed and energy benefits of the approach [7], but that it is limited by its inherent linearity. One question is how to overcome this limitation without introducing optical nonlinearities in the material itself, which tend to be weak for everyday light intensities. For coherent light, this is achieved through the detection process, which is inherently non-linear [7]. Therefore, introducing a nonlinearity at the detector level is one option, but for the incoherent system this cannot be achieved with just any nonlinearity. For example, one could consider using a saturable detector such that the measured intensity is ${I_p}\sim \sigma \left( {\sum\limits_{nm} {{g_{nm}}{I_{nm}}} } \right) = \sigma ({{g_{11}}{I_{11}} + {g_{12}}{I_{12}} + {g_{13}}{I_{13}} + \ldots } )$ where $\sigma (x )$ is the sigmoid function, ${I_{nm}}$ are the intensities from each pixel, and ${g_{nm}}$ are transfer coefficients determined by the light propagation. Unfortunately, for a single layer, monotonic functions of the form $\sigma \left( {\sum\limits_i {{f_i}({{x_i}} )} } \right)$ where the ${f_i}({x_i})$ can be any nonlinear functions of the input ${x_i}$, are unable to reproduce even basic nonlinear functions such as XOR. The solution then is to create a non-monotonic nonlinear function at the detector plane. One such approach is differential detection, where two detectors are used for each class, defined as the positive detectors $I_p^ + $ and negative detectors $I_p^ - $. The output intensity for each class is then defined as

(4)$$I_p^{diff} = \frac{{I_p^ +{-} I_p^ - }}{{I_p^ +{+} I_p^ - }}$$

which is nonlinear and non-monotonic. This approach has recently been explored in the case of coherent light [6] where it was shown to somewhat improve performance. However, the coherent system already achieves high performance, and does not suffer from fundamental limitations as the incoherent case does. Here we show that the differential approach overcomes the fundamental limitation of the incoherent system leading to performance better than electronic linear classifiers. (Supplement 1 shows how the XOR function can be reproduced using this approach).

We implement the same training approach as before, again optimizing the values of f to normalize the output intensities from Eq. (4) (see Supplement 1). We consider the case where the input light is separated by a beam splitter to two separate optical systems with the same architecture and material structure. The phases on each material are different but trained simultaneously using Eq. (4). An example of a trained system and the output for a specific digit is shown in Fig. 6.

Fig. 6. Illustration of the differential detection scheme. The light field from the digit 0 is split equally to two jointly trained materials (each with 12544 apertures), whose phases are shown on a color scale from 0 to 2π. The transmitted light from each material is directed to the same output plane. Applying the differential scheme to the whole output plane gives the two-dimensional distribution shown on the right. The differential detection class score from Eq. (4) is shown at the detector locations on the far right.

Download Full Size | PDF

Figure 4 shows MNIST and Fashion-MNIST results with differential detection for materials with varying number of apertures, and for a few distances between the material plane and the detector plane. For the case of 784 apertures, the performance on MNIST is improved from ∼90% to ∼92%. More importantly, for 12544 apertures the differential detection performance attains ∼94%, clearly surpassing linear classifiers. The dependence on the number of apertures with differential detection is similar to that of the basic system (Fig. 5) but asymptotically reaching a larger value, while the scaling for the testing error is also roughly inversely proportional to the number of apertures. The differential detection scheme is also effective for the Fashion dataset. Indeed, for that dataset the best linear classifiers and regular detection achieve ∼84% testing accuracy, while the differential optical system with 3136 apertures already achieves 85.24% testing accuracy. We anticipate that slightly better values could be obtained with additional apertures based on the scaling of the testing error.

5. Discussion

Diffractive neural networks such as the one presented here are susceptible to fabrication errors. For example, misalignment between layers in multi-layer systems has been studied in the coherent case, where it was recently shown that approaches can be employed to make the optical system more resilient to such errors [17]. Similarly, the impact of thickness variations during fabrication of the apertures was considered for the coherent case [7], where it was demonstrated that systems with more apertures are more resilient to such imperfections. We anticipate that the incoherent system considered here would also be sensitive to such imperfections, and that the system could be trained to make it more resilient. Another factor that limits the application speed and energy consumption is photodetector noise. As previously discussed for the coherent case [7], detector noise determines the minimum light intensity that can be detected for a given integration time, leading to a tradeoff between operation speed and detectable intensity. Since the basic photodetector principles are the same for coherent or incoherent light, the same tradeoff is expected for incoherent light.

One aspect that has not been discussed is the impact of wavelength variations on the performance of an already trained system. To address this point, we calculated the inference accuracy as a function of the wavelength of the input light field for some of the higher performing phase configurations for direct detection. Figure 7(a) shows the case for MNIST where we compare systems with 784 and 12,544 apertures. For the smaller system, we find that a width at half max of 6% of the wavelength, while it is 1.6% for the larger system. This reduction in efficiency with wavelength shifts arises from the change in the total phase $2\pi \left| {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} - {{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} }_k}} \right|/\lambda + {\phi _k}$ in Eq. (1) upon a change in λ at fixed ${\phi _k}.$ We also performed similar calculations for the Fashion dataset, this time investigating the impact of the material-detector separation for a fixed number of apertures N = 3136 (Fig. 7(b)). In that case we find a width at half max of 3.5% similar for the two cases even though the material-detector separations vary by more than an order of magnitude (100µm and 1300µm). Therefore, we find that in these optimally co-designed systems, sensitivity to variations in input wavelength comes mostly from the lateral size of the material. Improving the bandwidth could be done in two ways: first, during the training process random wavelength variations can be included. Such an approach has been shown to reduce sensitivity to misalignment in coherent systems [17]. A second approach would be to train the system over multiple wavelengths while allowing the aperture positions to vary.

Fig. 7. Impact of wavelength variations on the performance of already trained systems. (a) Testing accuracy for MNIST as a function of the deviation from the wavelength at which the system was trained. The curves correspond to two different system sizes consisting of 784 and 12544 apertures. (b) Testing accuracy for Fashion for two systems that were optimized at different ${H_{out}}$ during training.

Download Full Size | PDF

The system performance could also be improved by considering multilayer systems. While in principle a single layer neural network is a universal approximator, such a system suffers from the curse of dimensionality due to the slow scaling of the error with the number of nodes in the layer. The optical system studied here behaves in a similar way, as indicated by the scaling in Fig. 5. In electronic neural networks this is overcome by using multiple layers. Therefore, expanding the optical system to multiple layers could also serve as an approach to improve the performance. Future work will be needed to investigate the efficacy of this approach. Indeed, the system considered here has a nonlinearity at the detection plane, and not at each layer, as is the case in electronic neural networks. Still, the system might benefit from depth in that it could at least achieve the same performance with fewer features. The light at the pixels in the detector plane can be expressed as a matrix multiplication of the input light field (matrix of size ${N_{in}} \times {N_{in}}$) and a light propagation matrix $\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftrightarrow$}} \over A}$ of size ${N_{in}} \times M.$ Any linear matrix $\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftrightarrow$}} \over A} $ can be expressed as the product of smaller matrices, with the total number of elements in these smaller matrices less than that in $\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftrightarrow$}} \over A} .$ This might be one reason why a multilayer system could be beneficial. The simulation approach presented here can be generalized to multilayer systems consisting of several material planes, each with trainable phases. In that case the output field from Eq. (1) would serve as the input for the apertures in the next layer, and so on. We anticipate that co-design will also be critical for the multilayer system but will be more complex due to the higher number of degrees of freedom for the architecture.

Several approaches are available to fabricate the proposed material structures depending on the wavelength of interest. For longer wavelengths (e.g. THz) similar materials have already been realized using 3D printing of polymeric materials [1]. In the infrared and visible range, advanced lithography and multiple lithography masks can be used to create a digital version of the phase pattern. Another approach would be to use Focused Ion Beam fabrication which has been employed to create subwavelength features in optical materials [18].

The system performance in terms of detectable power, inferences/s, and energy/inference depend on the aperture size and density, the properties of the photodetectors, and the properties of the small electronic network needed to monitor the photodetectors. In the case of coherent light [7], basic assumptions for the aperture transmission and the photodetector response time and noise-equivalent-power suggest that optical inputs on the order of nanowatts can be classified at 100 MHz using 100 nJ/inference. Most of the energy comes from the comparators needed to identify which detector has the largest intensity. For the basic incoherent system, similar performance is expected since the monitoring scheme is the same. For the differential detection scheme, there are additional computational steps (subtraction, division) required to calculate the expression in Eq. (4), but the energy and time cost of these is low compared to the comparators, and we therefore also expect similar performance.

6. Conclusions

We provide a path toward achieving low-power and fast classification on spatially incoherent sources of light. Through extensive numerical simulations of free-space all-optical diffractive classifiers with spatially incoherent light, we demonstrate that optimized systems can achieve performance on par with the best electronic linear classifiers on the MNIST and Fashion-MNIST datasets. This requires co-design of the training algorithm, the system architecture, and the material. Furthermore, we show that the fundamental limitations of linear systems can be overcome with nonlinear and non-monotonic detection, and we show that the specific case of differential detection leads to performance superior to linear classifiers. More generally, our work opens up a number of additional fundamental scientific lines of research, such as the role of multiple layer systems, broadband incoherent fields, applications to multicolor classification, and other material structures such as phase arrays.

Funding

Sandia National Laboratories; Defense Advanced Research Projects Agency.

Acknowledgments

Work supported by the DARPA VIP program and the Laboratory Directed Research and Development Program at Sandia National Laboratories, a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under Contract No. DE-NA-0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the U.S. Government.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

2. D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Analysis of Diffractive Optical Neural Networks and Their Integration With Electronic Neural Networks,” IEEE J. Select. Topics Quantum Electron. 26(1), 1–14 (2020). [CrossRef]

3. Z. Wu, M. Zhou, E. Khoram, B. Liu, and Z. Yu, “Neuromorphic metasurface,” Photonics Res. 8(1), 46–50 (2020). [CrossRef]

4. S. Colburn, Y. Chu, E. Shilzerman, and A. Majumdar, “Optical frontend for a convolutional neural network,” Appl. Opt. 58(12), 3179–3186 (2019). [CrossRef]

5. C. M. V. Burgos, T. Yang, Y. Zhu, and A. N. Vamivakas, “Design framework for metasurface optics-based convolutional neural networks,” Appl. Opt. 60(15), 4356–4365 (2021). [CrossRef]

6. J. Li, D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Class-specific differential detection in diffractive optical neural networks improves inference accuracy,” Adv. Photonics 1(04), 1 (2019). [CrossRef]

7. F. Léonard, A. S. Backer, E. J. Fuller, C. Teeter, and C. M. Vineyard, “Co-Design of Free-Space Metasurface Optical Neuromorphic Classifiers for High Performance,” ACS Photonics 8(7), 2103–2111 (2021). [CrossRef]

8. T. Yan, J. Wu, T. Zhou, H. Xie, F. Xu, J. Fan, L. Fang, X. Lin, and Q. Dai, “Fourier-space Diffractive Deep Neural Network,” Phys. Rev. Lett. 123(2), 023901 (2019). [CrossRef]

9. A. Ryou, J. Whitehead, M. Zhelyeznyakov, P. Anderson, C. Keskin, M. Bajcsy, and A. Majumdar, “Free-space optical neural network based on thermal atomic nonlinearity,” Photonics Res. 9(4), B128–B134 (2021). [CrossRef]

10. T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, and Q. Dai, “Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit,” Nat. Photonics 15(5), 367–373 (2021). [CrossRef]

11. J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Rep. 8(1), 12324 (2018). [CrossRef]

12. S. Jiao, J. Feng, Y. Gao, T. Lei, Z. Xie, and X. Yuan, “Optical machine learning with incoherent light and a single-pixel detector,” Opt. Lett. 44(21), 5186–5189 (2019). [CrossRef]

13. K. Janocha and W. M. Czarnecki, “On Loss Functions for Deep Neural Networks in Classification,” Schedae Inform. 25, 49–59 (2016). [CrossRef]

14. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980 [cs.LG] (2014).

15. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]

16. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv:1708.07747 [cs.LG] (2017).

17. D. Mengu, Y. Zhao, N. T. Yardimci, Y. Rivenson, M. Jarrahi, and A. Ozcan, “Misalignment resilient diffractive optical networks,” Nanophotonics 9(13), 4207–4219 (2020). [CrossRef]

18. V. Garg, R. G. Mote, and J. Fu, “Focused Ion Beam Direct Fabrication of Subwavelength Nanostructures on Silicon for Multicolor Generation,” Adv. Mater. Technol. 3(8), 1800100 (2018). [CrossRef]

High accuracy single-layer free-space diffractive neuromorphic classifiers for spatially incoherent light

Abstract

1. Introduction

2. Results and discussion

3. Impact of training algorithm, material structure, and system architecture

4. Differential detection

5. Discussion

6. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Equations (4)

Optics Express