Abstract
Modern design of complex optical systems relies heavily on computational tools. These frequently use geometrical optics as well as Fourier optics. Fourier optics is typically used for designing thin diffractive elements, placed in the system’s aperture, generating a shift-invariant Point Spread Function (PSF). A major bottleneck in applying Fourier Optics in many cases of interest, e.g. when dealing with multiple, or out-of-aperture elements, comes from numerical complexity. In this work, we propose and implement an efficient and differentiable propagation model based on the Collins integral, which enables the optimization of diffractive optical systems with unprecedented design freedom using backpropagation. We demonstrate the applicability of our method, numerically and experimentally, by engineering shift-variant PSFs via thin plate elements placed in arbitrary planes inside complex imaging systems, performing cascaded optimization of multiple planes, and designing optimal machine-vision systems by deep learning.
© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Designing optical imaging systems can be a challenging task, addressable by various approaches. Geometrical optics (ray tracing), while commonly used, is a crude approximation of wave optics. Still, it is useful for designing thick elements and complex stacks of lenses. Today, a prevalent approach [1] is to use a hybrid model, where first ray-tracing is employed to efficiently compute the Optical-Path Difference (OPD) from the object space to the exit pupil. Next, a diffraction model, based on Fourier optics, is used to compute the 2D/3D Point-Spread Function (PSF) on the detector plane. Existing optical engineering tools (e.g. Zemax [2], CodeV [3], etc.) can then be used to optimize the optical components by numerically minimizing a hand-crafted merit function, usually with a limited number of degrees of freedom, as ray tracing software typically model smooth surface profiles (e.g. Zernike polynomials). This approach can take into account the thickness of optical elements, spatially variant aberrations, dispersion effects and more. Furthermore, introduction of freeform optics [4] can increase the design space. Geometrical optics is not suitable, however, for applications such as phase retrieval [5–8], adaptive optics [9,10], PSF engineering [11–14], i.e. any application that requires either correctly modelling the diffraction-limited PSF, designing systems without light focusing such as diffractive networks [15] or employing optical elements that manipulate the phase, amplitude, or polarization of light in intermediate planes. These require wave optics [16], usually limited to the thin-element and paraxial approximations.
Fourier optics is exceptionally useful in the field of computational imaging, where optical systems are co-designed alongside with analysis algorithms to yield powerful imaging systems with non-conventional capabilities [17]; applications range from High Dynamic Range imaging [18,19], Color sensing [20,21], to end-to-end wavefront shaping [22–24] and more. Due to their reliance on Fourier optics, these applications have traditionally been limited to the optimization of a single optical element, typically placed in the system’s aperture. Recently, alternative end-to-end computational photography methods based on ray tracing have been proposed [25–27], showing promise in computer vision applications. These methods optimize lens stacks and can design polynomial surfaces. Some limitations of these methods can be directly addressed using wave optics, e.g. modelling non-ideal point sources (such as electrical dipoles), modelling sources that are not placed in infinity, incorporating edge diffraction effects, multiple apertures, Fresnel transfer coefficients and more. Other techniques which use ray tracing-based Monte Carlo methods [28] exist and can be suitable for specific challenging cases (such as very short propagation distances) but are not suitable for direct optimization of optical systems.
One applicable approach for analyzing and designing optical systems with simple components is using the Collins integral [29,30], a parabolic phase case of the Fractional Fourier transform [31,32], which facilitates an angular-spectrum-based method to compute the diffraction integral using the ray transfer matrix. Recently, a cascaded diffraction model based on the Collins integral was implemented to model the diffraction of a series of diaphragms in complex optical systems [33,34] to simulate spatially variant edge-diffraction effects. The effects of spatially-variant PSFs are usually digitally corrected (post-acquisition), as they can degrade the performance of algorithms that assume a single PSF throughout the field of view [35,36]; however, spatially-variant PSFs (in Fourier optics) can also be useful, for example: micro-lens arrays with varying focal lengths [37], local color sensing [21], metasurface-optics design [38] and conjugate adaptive optics [39].
In this work, we propose and implement a differentiable propagation model based on backpropagation, enabling inverse design of optical systems, relying on Fourier optics, with unprecedented design freedom. The main contribution of our method is in providing a computationally efficient platform to directly optimize and perform phase retrieval for challenging systems. This allows us to engineer PSFs using thin plate elements placed in any plane in an optical system, design spatially-variant PSFs, perform cascaded optimization of multiple planes and optimize element-placement jointly with their phase functions. These capabilities are all made possible by implementing a differentiable version of the Collins integral using Pytorch [40]. This approach, while still within the boundaries of the thin-element and paraxial approximations, offers several advantages: (1) it is computationally efficient by alleviating strict sampling requirements of Fourier optics and requiring a few FFT operations, (2) it allows the optimization of the positions and functions (i.e phase/amplitude profiles) of many elements in a cascaded optical system, (3) it enables the optimization of diffraction limited systems and (5) a hybrid approach can be readily superimposed on our method, where thick-elements and spatial aberrations (e.g. from a ray tracing software) can be accounted for. We apply our method numerically and experimentally to several design challenges focusing on shift-variant PSF engineering in microscopy and machine vision.
2. Cascaded diffraction model
2.1 Fresnel propagation in free space
Computing free space propagation under the Fresnel approximation requires evaluation of the Fresnel integral [16] in each transition between elements:
Numerically evaluating Eq. (1) is challenging, as the Nyquist frequency of the chirp functions ($\propto \exp {\left (i\cdot x^{2}\right )}$) increases radially, i.e. larger apertures require finer sampling. Efficient methods to evaluate Eq. (1) have been extensively studied [41].
2.2 Fresnel propagation in a complex optical system
In this work we focus mainly on an angular-spectrum method termed the Collins integral [29]. This formulation relates geometrical optics ray tracing to the Huygens-Fresnel wave propagation theory. Specifically, we use a scaled version, similar to the approach by G. A. Tyler and D. L. Fried [42], which allows for some flexibility in choosing the sampling frequency and grid size. At the heart of our method is the calculation of diffraction between planes of interest (i.e. apertures, phase or amplitude elements) in an optical system. We now derive an expression for the propagation in segment j between two planes in the system. In geometrical optics, the well-known ABCD matrix [43] defines the transformation of a light ray (defined by lateral position and angle) between two planes. We write the ABCD matrix as a 2X2 matrix, assuming rotational symmetry:
Using the ABCD matrix, the scaled Collins integral defines the wave propagation as follows (see Appendix 1 for the full derivation):In our implementation, we treat the real and imaginary parts separately by splitting the field $E=A \exp {\left (i\Phi \right )}$ into $\left (u,v\right ) = \left (\Re {\left (E\right )},\Im {\left (E\right )}\right )$ channels. Then, we utilize automatic differentiation [45] (i.e. back-propagation) of a real-valued cost function $L$ w.r.t a complex intermediate variable $z=u+iv$ by taking the gradient as $\nabla _L(z) = \frac {\partial L}{\partial u}+i\frac {\partial L}{\partial v}$. This approach also allows using the generalized chain rule [46] to compute the composition of functions in the cascaded propagation.
To produce a differentiable diffraction model from an input plane to an off-axis position on an output plane, we use the matrix $\mathcal {M}_{j}$ (as defined in Eq. (2)) to shift the coordinate system $r_{2}$ [33] by tracing the geometrical chief ray. Then we evaluate Eq. (4) to get the diffraction pattern. The shifting operator can be implemented either by an FFT operation (for sub-pixel resolution) or via a shifted crop operator (for computational speed and efficiency). This approach removes linear phases from the propagation computation, allowing for smaller samplings grids and avoiding aliasing in large FOV simulations. A minor projection correction (for small angles) can be considered here, however we neglect this effect to conserve memory. Finally, if the output plane is not the final image plane, an additional step of multiplying with the shifted element at the output plane and re-centering is performed to enable the cascaded propagation to the following sections in the system.
In Appendix 3 we demonstrate the numerical optimization of two phase elements to facilitate a multi-focus RGB PSF, i.e. mapping a point in 3D object space to a point in 2D image space, where the lateral position in image space depends on the axial object-space position; and, importantly, different colors are to be mapped identically. This approach can be used to readily optimize multiple elements in a complex optical system for on-axis propagation, e.g. in holography [47] or laser mode manipulations [48]. This is not a suitable approach for imaging systems (as it considers only on-axis PSF in a shift-variant system), which are the focus of the rest of this paper. The intermediate conclusion is that addressing and optimizing the shift-variant PSF (by sampling PSFs off-axis) is crucial for making this method viable, thus, this is the approach we take in the remainder of this work.
3. Experimental demonstrations
In this section we detail three experiments performed to validate the capabilities of the method to design thin elements, focusing on phase masks. We used an epi-fluorescence microscope (40X magnification and 0.75 NA) with an extended 4f (two achromatic lenses with focal lengths of 10 and 20 cm) system as shown in Fig. 1(a).
A Liquid-Crystal Spatial Light Modulator (LC-SLM, Pluto VIS-130 Holoeye) is placed 19.6 cm after the first lens, namely, downstream from the Fourier plane, behind a linear polarize (LP) and an aperture (to filter stray light) matching the back focal plane size of the objective lens (3.75 mm). The LC-SLM plane is sampled on a 16 $\mu m$ grid (the real pixel size is 8 $\mu m$). As our approach optimizes the phase pixel-wise, the degrees of freedom (number of pixels used) in each section depends on the specific FOV and is on the scale of 150,000-200,000 pixels.
Our calibration process has two steps, the first is calibrating the microscope aberration. Our system does not exhibit noticeable FOV dependent aberrations, and is mainly dominated by shift-invariant spherical and astigmatism aberrations. We calibrate our system using a method designed for high NA objectives [49]. This approach produces a pupil phase correction based on an accurate pixel-wise vectorial diffraction model. The second step is registration of the theoretical and experimental ray tracing of the chief ray from the object plane to the LC-SLM plane, for further details see Appendix 5.
In all the experimental sections, the optimization is performed using the Adam optimizer [50] until the cost function stagnates. This optimization takes $\sim$ two minutes on a laptop with Intel i7-8750H and NVIDIA 8Gb 2070RTX GPU.
3.1 Out-of-aperture phase retrieval
The first experiment validates our out-of-aperture phase retrieval method by designing spatially variant PSFs at a specific defocus position. We use Fluorescent beads (Introvigen, 625/645, 200 nm) adhered to a glass coverslip as our sparse point source sample (see Appendix 5 for details about the sample preparation). The cost function for phase mask $\Phi$ is defined as:
3.2 Analog refocusing
The second experiment validates the estimation of positions and defocus correction by spatially refocusing single emitters in a sparse 3D object space. The challenge here is how to produce a single ’in-focus’ image of several objects in the field-of-view which are at different axial positions. Specifically, in our case, the depth of field of the system is $\approx 1.5$ $\mu m$ and we were able to image objects spanning a 9 $\mu m$ axial range. We used a sparse bead sample embedded in Gel (see 5. for sample preparation details) to acreate a 3D object. First, a focal stack was acquired by shifting the objective lens over the axial range. The focal stack was used to estimate the 3D positions of the emitters (see Appendix 5). Then, we chose some of the emitters spanning different z position ( e.g. 5 in the case presented in Fig. 2) and optimized a phase mask where the cost function was aimed at correcting the defocus phase per emitter ,i.e. compare the theoretical defocused PSF to the in-focus PSF (clear aperture) at $z= \text {AFP}$ where AFP is the actual focus plane for an objective focused at $3.7$ $\mu m$ above the coverslip (accounting for the refractive index mismatch between the Gel and air):
In this case we chose to bin the camera pixels (to an effective size of $6.9 \mu m$) to increase the signal in the ground truth estimation and to reduce computation time. After the acquisition of the ground-truth positions, the optimization ran for two minutes and the phase mask was then immediately applied, so that drift can be minimized. As the emitters beams overlap in the phase mask plane, some contrast is lost due to induced aberrations. In both images in Fig. 2(c), the objective lens is focused close to the axial plane of emitter $\#3$ (at z = 3.7 $\mu m$). Note especially beads $\#1,2,5$ that significantly improve visibility after refocusing. We note that emitter $\#4$ in the refocused image (2(c-right)) is not the emitter visible on the left image(2(c-left)), but a defocused one which cannot be seen without the phase mask - both emitters have a very similar lateral position, which can be seen the full z-stack shown in Section 5 and Fig. 10. Comparing to the in-focus emitter $\#3$, it lost 30% of the max intensity. The intensity loss ranges 10-65%, depending on the acquired aberration. We chose to use our approach even though this specific case could have been solved semi-analytically by calculating the appropriate defocus phases from Eq. (3) and super-imposing them (in the correct lateral position) on the LC-SLM. The results from this section show the applicability of our method for designing Micro Lens Arrays (MLA) with different focal planes and spatial positions [37,52] and dynamic analog refocus capabilities for sparse samples which surpass a single Extended Depth Of Field (EDOF) phase mask [53] for sparse samples in terms of photon efficiency, axial range and aberration corrections (which can be applied to each emitter separately).
3.3 Local aberration correction
Correcting the defocus of a sparse sample is a simple case of aberration correction. To show that our approach can apply to other forms of aberrations, we used our system to correct aberration induced by a thin dielectric Tetrapod phase mask [13], placed in before the first lens as shown in Fig. 3(a). This setup induced a phase-only spatially variant aberration at a plane which is not conjugate to our LC-SLM; thus, a full FOV correction is not possible [39] and we chose to correct the optical path of a select group of emitters, similar to the approach in Fig. 2. This challenge can be approached using our cascaded propagation method directly by estimating the wavefront error per chosen FOV point at the correction plane. Other methods such as local Zernike estimation [35,54] are limited to smooth aberrations, require either stitching or interpolating the aberrations over the FOV, require an extra step of propagation to estimate the required correction at the LC-SLM plane, and are slower by two order of magnitude than a direct approach, as we have previously shown in [49]. For this experiment, we switched the camera to a higher sensitivity sCMOS (Kinetix, Photometrics). For simplicity, we captured a z-stack (by shifting the objective lens with a step size of 0.15 $\mu m$ over a range of $\left (-3,3\right )\mu m$) of sparse beads on a coverslip sample and then applied our iterative approach to estimate the phase mask $\Phi$ which can create such an aberration (placed in the LC-SLM plane) using the cost function:
4. End-to-end design
In this section we use our approach to perform end-to-end learning in a computational imaging task. As a benchmark, we apply our model to the task of semantic segmentation on the Berkeley deep drive database (BDD100K) [55]. The database contains 100K RGB images, 720x1280 pixels, recorded from the dashboard of a driving car. Each image in the database is segmented to 19 different classes, road, building, car, bicycle, etc. Previous work [26] showed that optimizing a lens-stack with end-to-end learned geometrical optics design which considers the spatial PSF can improve the results in object detection tasks in computer vision. Our rationale for choosing this application is that a shift-variant PSF could potentially exploit the typical structure in such scenes (the sky is at the top of the image, the road is on the bottom, etc.).
We model image acquisition as an incoherent system with a Fused Quartz diffractive optical element (DOE) instead of an LC-SLM. The considered DOE is simulated with maximal height of $h_{max} = 1.5$ $\mu m$, having the phase delay function (when placed in air)
where $n\left (\lambda \right )$ is the refractive index of the DOE material. The optimization is performed with a continuous height map, and a softmax function with decreasing temperature is applied to the height map such that at the final iterations the mask will get sampled on a feasible fabrication grid with an axial step size of $100$ $nm$. Calculating the spatially variant PSF per-pixel and performing convolution with a full frame is computationally infeasible, thus we adopt a linear PSF interpolation approach [56]. Under this model, the PSFs are calculated on a grid, convolved with a weighted image, and summed to generate the modulated image $I_{mod}\left (n,m;C_{l}\right )$, i.e.:Our end-to-end model is composed of two main blocks (see Fig. 4(a)): the optical block and 24-layer ResNet (see Fig. 6). The optical block depicts the formation of the final image in the camera, then, the image is passed through the ResNet network that generates pixel-wise class predictions. Our loss function consists of three terms:
To test our network we have compared the performance of 3 different architectures: ResNet with standard imaging conditions, namely, blurring the original image; ResNet with learnable phase mask in the Fourier plane (yielding a shift-invariant PSF); and ResNet with learnable phase mask position and function. Additionally, the output of the optical block is the observed image with either an ideal RGB (i.e. three color channels with a single wavelength and unity response) or a monochromatic camera, which is then passed to the neural network that predicts the pixel-wise segmentation of the image.
For simplicity, our model assumes augmentation of a standard imaging system with a 4f system, with our phase mask placed at some plane (with axial position optimized by the net) in between (4(a)). We report the average accuracy that we achieved on 1000 test images by using each architecture (4(b)). The standard PSF architecture predicted 55.8% of the pixels correctly with the RGB camera. When the phase mask was learned but the position of the mask was constant (in the Fourier plane, i.e. a spatially-invariant PSF) the percentage of correctly classified pixels was 57.0%, namely, achieving some improvement over not learning a phase mask at all. Finally, in the spatially-variant design, the network correctly classified 60.4% of the pixels - an improvement of 4.7% (8.2% relative improvement) compared to standard imaging. Similar behavior can be observed with the monochromatic camera (see Fig. 4(b)). The phase mask position was optimized by the net to be 5.1 cm after the first lens in the 4f system, where the focal distances of the first and second lenses are 15 cm and 5 cm, respectively. Thus, we conclude that a shift-variant PSF that can exploit the structure of the data outperforms both standard imaging and shift-invariant PSF engineering. In practice, such a system will require the fabrication of a larger DOE (approximately 8.5 mm in the long axis compared to a circle with a diameter of 2.5 mm in the aperture), which might increase the costs of fabrication when considering lithography, although recently developed techniques show promising potential for simplified fabrication of large masks [59,60]. We note that this specific scenario (effectively 2D imaging and 3 color channels) could have been addressed without an optical block, by increasing the number of parameters in the neural network, and having the network perform digitally the physical operation of the optical setup. Nevertheless, with the optical block adding 250,000 learnable parameters (pixels in the phase mask) to a ResNet with approximately 20 million parameters, we can see an efficient improvement in the results. Interestingly, the output of the optical block can serve as an interpretable intermediate result - or alternatively, it can guide the algorithmic design process, by suggesting an initial image-processing operation to improve the reconstruction net of images obtained by a standard imaging system.
We note that our discussion in this subsection is purely simulated, and future work with real world implementation will face various challenges like calibration, manufacturing errors, modelling inaccuracies and more. The challenges dealing with calibration and modelling can be addressed in a similar approach to the experimental sections of this paper or other wave optics methods [21], while dealing with specific computational imaging implementation challenges was addressed in previous work, in the context of geometrical optics design for vehicle cameras [26].
5. Discussion
In this work we demonstrated a flexible and differentiable method to design complex optical systems containing diffractive elements. Our approach allows for simple and direct wave-optics propagation and optimization of multiple elements in a cascaded system, using backpropagation. We validated the applicability of our method to perform out-of-aperture phase retrieval for incoherent sources, enabling the design of spatially-variant PSFs with substantial freedom, e.g. optimization of the element placement, and demonstrated a potential application for autonomous vehicles, performing end-to-end optimization of a computational imaging system for object segmentation, where the position of a phase mask is learned in conjunction with the spatially variant PSF (Section 4).
Three experimental demonstrations were shown in Section 3.; first, we designed alphabetical-letter PSFs to demonstrate the pixel-wise optimization capabilities and design freedom. Second, we performed spatially-variant refocusing, suggesting a possibly useful application involving multi-focus imaging. Third, we performed FOV dependent aberration correction. Our method is based on using paraxial ray-tracing to define the wave propagation via the Huygens-Fresnel diffraction. This framework allows computation of cascaded diffraction with spatial information, while supporting fast computations and flexibility in sampling rate, enabling the optimization of complex optical systems. Possible future applications can include online correction of spatially-variant PSFs (e.g. spatial adaptive optics), end-to-end learning for tasks with structured data, source separation [61], coherent system design, and more. In the future, our approach can be combined with geometrical ray tracing, which would allow the consideration of thick-element aberrations, fabrication imperfections, tolerances and other constraints which are straightforward to include in ray tracing. Ultimately, these two approaches can be combined into a hybrid model, exhibiting the best of both worlds.
Appendix 1: Optical simulation model
Propagation model
A common way to perform ray tracing is to define an ABCD matrix which transforms a light ray $\left (x,\theta \right )$ between two planes. We write the ABCD matrix as a 2X2 matrix, which assumes azimuthal symmetry in ray propagation in the lateral directions. More generally, the ABCD matrix can be written as a 4X4. For a given section of a system (in air) with a geometrical optics matrix:
In this work, we neglect this effect to conserve memory by using a uniformly sampled grid in the output plane for all incidence angles. Finally, an additional step of multiplying with the shifted element at the output plane and re-centering is performed to enable the cascaded propagation to the following sections in the system. The shifting operation can be performed with sub-pixel precision via FFT or by simply using a shifted cropping operation (faster but with a pixel resolution), both are differentiable operations and can be used to back-propagate the gradients.
Modelling the input point source
Modelling the input source correctly is essential in creating an accurate propagation simulator. One simple choice would be using a 2D Gaussian at the input plane; however, this is an unsatisfactory choice as the Fourier transform will lack the high frequency content of a point source. Under the paraxial approximation, the emission of a point-source is given by a parabloc phase:
If there is no requirement to propagate the gradient before the Fourier plane (or entrance pupil), then the optimization can directly start from this position, reducing the need the compute the diffraction to that point like in the example provided in Fig. 1. In other cases, where there exists elements before the Fourier plane or when the Fourier plane is entirely skipped in the calculation, we back-propagated the scalar field to the input plane by noting that Eq. (14) can be computed in the reverse direction by using the inverse matrix $\mathcal {M}_{j}^{-1}$.
Incoherent image formation
In this section we show a simple optimization of the full image model with spatially variant PSFs (to show the capabilities of the optical approach without the learning stage) that was used to simulate the results in Section 4. We chose a similar optical system and optimized two thin phase elements $\left (\Phi _{1},\Phi _{2}\right )$, placed 1 $cm$ after the first lens and 1.5 $cm$ before the second lens. In Fig. 5 we show a simple optimization case where the input is a 3X3 point grid. The PSFs are sampled on the grid and a phase retrieval optimization routine (minimizing $\mathcal {L}_{\text {A}}$ is ran such that the final image ($I_{mod}$ ) will resemble $I_{A}$ which is an image of the letter "A") , i.e.:
Comparison with other propagation methods
The key advantage of using the Collins integral is that it enables skipping entire lens stacks to calculate the diffraction between thin-planes of interest - all it requires is the relatively simple-to-obtain ABCD matrix, while most other methods require expensive computations between each element. To further justify using Collins-integral over other common methods, we examine several numerical issues. First, in principle, Fresnel diffraction can be evaluated as a convolution integral in a single step. However, in cascaded systems, this imposes two main challenges: (1) the sampling of the output is fixed to $\delta _{2} = \frac {\lambda \cdot z}{N\cdot \delta _{1}}$, requiring different zero padding for a set of wavelengths and interpolations to match given restrictions (e.g. camera pixel size or desired sampling in a Diffractive Optical Element (DOE)), (2) correct sampling of the chirp function inside the integral imposes a limit on the maximal propagation distance $z\geqslant \hat {z} = \frac {N \cdot \delta _{1}^{2}}{\lambda }$; at the same time, in a cascaded scenario (where a correct complex field is important, and not just the amplitude), the outer chirp can also induce aliasing. This creates a severely limiting situation where the calculation is correct only for $z = \hat {z}$.
For a simple example we can look at a system in Fig. 5. In the scaled Collins formulation, we have chosen to sample the grid with 512 points and using only three propagation operators. In the context of free space propagation and multiplying with the phase of a lens, this simple system would require at least five propagation operators and considerably inflate the memory in the back-propagation (requiring larger GPUs for optimizing such systems). Using scaled FFT for Fresnel free-space propagation operators increases the computation time from 0.5 s for an iteration to 5 s, and increasing the GPU memory from six FFT operations of a grid of 512 points to ten FFT operation with a 2048 grid size. Moreover, using backpropagation with an unscaled Fresnel propagation (i.e. eq. Eq. (1)) required a grid size of 4096, increasing the computation time to 45 s and further inflating the memory. This is a very simple example which requires only three steps and still worthwhile - saving 1 order of magnitude in time, and as the intermidiate calculations are saved for backpropagation, a factor of 25 in memory. In larger systems, such as the lithographic lens in [34], the improvement is much more substantial.
Other existing approaches for calculating Fresnel propagation involve a two-step calculation. Solving Fresnel propagation as a convolution integral can be done by sampling the transfer function in Fourier space (usually for short propagation distances) or in real space (usually for long propagation distances). One way to mitigate the sampling constraints is by doing two single-step propagations [62,63]. All those methods and their variations have different sampling requirements and often requiring drastic zero-padding to reduce aliasing [41]. When simulating free space propagation these methods can be viable; However, the Collins approach allows for fewer propagation calculations and choosing planes with minimal field supports to reduce the sampling constraints.
Appendix 2: Neural net and training details
Net architecture
The basic architecture, presented in Fig. 6, of our neural network is similar to the ResNet [64] network architecture. First, we apply a conv. layer with kernel size of 7x7 pixels to capture features in different sizes and to rapidly increase the receptive field of the next layers. Then we concatenate 7 residual blocks of increasing sizes. Each residual block consists of a skip connection between the input and an additional path through 3 conv. layers (kernel size=3 and padding=1 for all conv. layers) separated by leaky ReLU and batch normalization layers. The final residual block output is passed through two more conv. layers and a log softmax layer that outputs the log of the predicted distributions of the classes in each pixel.
Dataset details
The training set consists of 7000 images and the validation set consists of 2000 images, both chosen randomly from a subset of the BDD100K database. For the focal loss term we have chosen $\gamma$ to be equal to 2 and $\alpha _t$ was chosen according to the class frequencies in the training set. We used Adam optimizer with reduce on plateau mechanism (patience=3 and decrease factor=0.1). We have trained the neural network on two Titan RTX GPUs for 48 hours. The network was tested on 1000 images (inference of a single image takes $\approx$ 10 ms) and the reported accuracy was measured by the number of correctly classified pixels out of the total number of pixels per image.
Results with ResNet-9
To evaluate the contribution of the optical block, we reduced the size of the deep learning architecture from ResNet-20 ( 8 million parameters) to ResNet-9 ( 3.7 million parameters) with the monochromatic camera model. The results and reconstruction quality were degraded, although still of good quality: without PSF engineering (i.e. standard PSF) the score reduced from 53.5% to 47%, with a mask in Fourier plane the score reduced from 54.7% to 49.3%, and with the spatially variant PSF the score reduced from 57.9% to 47.4%. From the results, we conclude that network size is important, and that the optical block was more substantial in the smaller architecture than in the larger architecture. Also, interestingly, the smaller sized net was unable to successfully utilize shift-variant PSF engineering.
Loss function
Our loss function consists of three terms:
$\mathcal {L}_{\text {TV}}$ is a Total Variation loss on the network output and $\mathcal {L}_{\text {PSF}}$ is a regularization on the PSF defined as:
Appendix 3: Additional results
3D phase retrieval
Additionally to performing phase retrieval for a single axial plane in Fig. 2, we implemented a similar cost function for two axial planes (with eight letters). The result can be seen in Fig. 7. The optimization converged on a less visually satisfying results due to having more constraints to optimize. Nevertheless, this shows the validity of our method to simulate and design spatially-variant PSFs in 3D.
Numerical example: multi-focus RGB PSF design
In this section we consider the case of optimizing a multi-focus RGB PSF via phase retrieval, simulation presented in Fig. 8. The challenge here is to design an optical system that maps a point in 3D object space to a point in 2D image space, where the lateral position in image space depends on the axial object-space position (the same for every color).
In [65], a chromatic correction element was placed in conjunction with a binary phase mask to address the issue of large spectral width exhibited by fluorescent molecules. In this section, we show that our model can jointly optimize non-binary phase modulation and chromatic corrections on-axis. A simple 4-f system is presented in Fig. 8(a). Three on-axis RGB (discrete wavelengths of 460,550,640 $nm$) point sources are considered (in the figure, the two plotted points indicate change in axial position of a multicolored source). The first lens has a focal length of 5 cm, the second lens has a focal length of 10 cm and the aperture diameter is 2.5 mm, which is a typical for a Diffractive Optical Element (DOE). The unmodulated RGB PSF of this system is plotted in Fig. 8(b). The considered DOEs are simulated with maximal height of $h_{max} = 1.5$ $\mu m$, having the phase delay function (when placed in air)
where $n\left (\lambda \right )$ is the refractive index of the DOE material (Fused Quartz in this case).A first DOE $h_{1}$ is placed at the aperture (Fourier plane) while an optional second DOE $h_{2}$ is placed a distance $d = 7.5$ $cm$ after the aperture. We used Eq. (14) to calculate the diffraction in all sections in Fig. 8(a). To achieve a multi-focus PSF, we minimize the cost function:
Appendix 4: Sampling requirements
A main consideration in this work is the computational burden which comes from FFT computations of large matrices. As we consider spatially variant PSFs, our matrices have five dimensions $\left (N_{xy},N_{z},N_{c},H,W \right )$ being the lateral position, axial position, wavelength and image height and width, respectively. The size of such matrices can be limited by memory constraints, and thus we chose our sampling grids to minimize $\left (H,W\right )$.
As chirp functions are not band limited, the main constraint comes from avoiding aliasing inside the limiting apertures at the input and output planes. We note that this consideration needs to be very carefully examined for each sub-system separately, as aliasing can occur if the input field adds high enough frequency at the edges of the apertures, i.e. reducing the computational complexity comes at the cost that each diffraction calculation will be correct only inside the apertures and w.r.t input fields with a pre-defined bandwidth ( limits the axial range and possible aberrations). For example, in the simulation provided in Fig. 8 we used $(H,W) = (512,512)$ sampling points, but a larger defocus in the input plane or using high frequency (e.g. gratings) components could necessitate a drastic increase in $\left (H,W\right )$.
The derivation of the numerical sampling requirements is based on previous research [41,62]. We denote $P_{in}$ and $P_{out}$ as the aperture sizes of the input and output planes sampled with frequencies $\delta _{in}$ and $\delta _{out}$, respectively. We also denote $\mathcal {M}_{j}$ as the ray optics ABCD matrix of the given section. The first two conditions are general for Fourier optics and they ensure that (from geometrical considerations in a paraxial regime) the input plane sampling is fine enough to sample the spatial bandwidth and that rays will not wrap around the grid:
Appendix 5: Experimental implementation
This section provides with details about our sample preparation, optical system (2) used and the acquisition of ground truth for the 3D experiment.
Optical system
The imaging system consists of a Nikon Eclipse-Ti2 inverted fluorescence microscope with a 40X/0.75 NA Nikon objective. An achromatic doublet lens (f=10 cm) images the back focal plane onto an iris. A linear polarizer followed by a LC-SLM (Pluto-VIS130, Holoeye) is placed 10 cm (9.6 cm after calibration- see section 5) after the iris. Another achromatic doublet lens (f=20 cm) is placed to form a 4f system. Finally, a sCMOS cameras (PixeLINK PL-D7512MU-T) with pixel size $3.45$ $\mu m$ is placed in the image plane.
Sample preparations
For the beads on a coverslip samples we used a sparse sample consisting of a glass coverslip (170 $\mu m$, Fisher Scientific) with 200 nm fluorescent beads (Invitrogen, 0.2 $\mu l$ Crimson 625/645 diluted 1:500) adhered to the surface with 1% Polyvinyl alcohol (PVA). For bead immersed in Gel ( Fig. 3) we used Mini-PROTEAN gel casting stand (BioRad) with 0.75 mm short glass to prepare acrylamide gel with beads by vertical solidification to allow beads to sparsely spread out in 3 dimensions. Gel solution with beads was prepared by gently mixing 1.2 $ml$ double distilled water, 750 $\mu l$ 40% acrylamide/bis-acrylamide solution (Sigma, A7802), 80 $\mu l$ 50% glycerol, 3 $\mu l$ Fluorospheres carboxylate bead solution (Crimson 625/645, Invitrogen), 1.5 $\mu l$ TEMED, and 10 $\mu l$ 10% ammonium per sulphate. The gel solution was immediately pipetted into the gel casting stand and gently topped with about 1 $ml$ of double distilled water to level the gel. Once solidified the gel was kept in water in the dark at 4oC.
Beads in gel ground truth estimation
In the refocusing experiment (3), the experimental ground truth 3D positions were approximated via a focal scan (10). The scan was performed by moving the objective lens 60 steps of 150 nm each. First, we used ThunderSTORM [66] to detect and localize (in 2D) the emitters from the max projection Fig. 10(a). Next, the focus position per emitter was estimated by fitting a second order polynomial to the mean intensity across focal slices Fig. 10(c). The mean intensity was estimated by averaging a small ROI of $7\times 7$ pixels around each detected emitter. Finally, the axial position was obtained by correcting the refractive index mismatch between the Gel and air.
Camera to phase mask registration
The main spatial calibration that we performed was to calibrate the LC-SLM axial position and the affine transform between the theoretical and experimental ray tracing of the chief ray from the object plane to the LC-SLM plane. The LC-SLM we used is reflective, thus an angle is introduced which can induce rotations and shearing between the planes. We chose to apply (on the LC-SLM) a phase mask termed the Tetrapod [13]. Many optional masks can be used here, but we chose the Tetrapod because a lateral misalignment between the phase mask and input beam can be easily spotted in the z-stack wobble and aberration of the PSF. A sparse bead sample with a single bead in the FOV was used to create a single point source in the image plane. After finding the optical axis on the camera plane, we shifted the mask by 45 LC-SLM pixels on a square grid with 9 points and found the corresponding shifts in the image plane. Finally, an affine transformation was computed between the shifted phase mask grid to the theoretically ray traced positions from the camera plane to the SLM plane. This transformation allowed us to engineer the desired PSF in different FOV positions but also estimate a corrected axial position of the LC-SLM. This was achieved by using a simple ray-tracing for a 4-f system (as plotted in Fig. 2) with an LC-SLM placed a position d behind the aperture, resulting in the chief ray position $\left (x_{in},y_{in}\right )$ in the intermediate image plane:
Funding
Horizon 2020 Framework Programme (802567); Israel Science Foundation (450/18).
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
References
1. M. Born and E. Wolf, Principles of optics: electromagnetic theory of propagation, interference and diffraction of light (Elsevier, 2013).
2. Zemax, “Zemax opticstudio,” https://www.zemax.com.
3. I. Synopsys, “Code v,” https://www.synopsys.com/.
4. J. P. Rolland, M. A. Davies, T. J. Suleski, C. Evans, A. Bauer, J. C. Lambropoulos, and K. Falaggis, “Freeform optics for imaging,” Optica 8(2), 161–176 (2021). [CrossRef]
5. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]
6. R. P. Millane, “Phase retrieval in crystallography and optics,” J. Opt. Soc. Am. A 7(3), 394–411 (1990). [CrossRef]
7. Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: a contemporary overview,” IEEE Signal Process. Mag. 32(3), 87–109 (2015). [CrossRef]
8. L.-H. Yeh, J. Dong, J. Zhong, L. Tian, M. Chen, G. Tang, M. Soltanolkotabi, and L. Waller, “Experimental robustness of fourier ptychography phase retrieval algorithms,” Opt. Express 23(26), 33214–33240 (2015). [CrossRef]
9. N. Ji, “Adaptive optical fluorescence microscopy,” Nat. Methods 14(4), 374–380 (2017). [CrossRef]
10. K. M. Hampson, R. Turcotte, D. T. Miller, K. Kurokawa, J. R. Males, N. Ji, and M. J. Booth, “Adaptive optics for high-resolution imaging,” Nat. Rev. Methods Primers 1(1), 68 (2021). [CrossRef]
11. B. Huang, W. Wang, M. Bates, and X. Zhuang, “Three-dimensional super-resolution imaging by stochastic optical reconstruction microscopy,” Science 319(5864), 810–813 (2008). [CrossRef]
12. A. Jesacher, S. Bernet, and M. Ritsch-Marte, “Colored point spread function engineering for parallel confocal microscopy,” Opt. Express 24(24), 27395–27402 (2016). [CrossRef]
13. Y. Shechtman, S. J. Sahl, A. S. Backer, and W. Moerner, “Optimal point spread function design for 3d imaging,” Phys. Rev. Lett. 113(13), 133902 (2014). [CrossRef]
14. S. Elmalem, R. Giryes, and E. Marom, “Motion deblurring using spatiotemporal phase aperture coding,” Optica 7(10), 1332–1340 (2020). [CrossRef]
15. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]
16. J. W. Goodman, Introduction to Fourier optics (Roberts and Company Publishers, 2005).
17. T. Klinghoffer, S. Somasundaram, K. Tiwary, and R. Raskar, “Physics vs. learned priors: Rethinking camera and algorithm design for task-specific imaging,” arXiv preprint arXiv:2204.09871 (2022).
18. M. Alghamdi, Q. Fu, A. Thabet, and W. Heidrich, “Transfer deep learning for reconfigurable snapshot hdr imaging using coded masks,” in Computer Graphics Forum, vol. 40 (Wiley Online Library, 2021), pp. 90–103.
19. Q. Sun, E. Tseng, Q. Fu, W. Heidrich, and F. Heide, “Learning rank-1 diffractive optics for single-shot high dynamic range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 1386–1396.
20. S.-H. Baek, H. Ikoma, D. S. Jeon, Y. Li, W. Heidrich, G. Wetzstein, and M. H. Kim, “Single-shot hyperspectral-depth imaging with learned diffractive optics,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), pp. 2651–2660.
21. H. Arguello, S. Pinilla, Y. Peng, H. Ikoma, J. Bacca, and G. Wetzstein, “Shift-variant color-coded diffractive spectral imaging system,” Optica 8(11), 1424–1434 (2021). [CrossRef]
22. S. Tan, Y. Wu, S.-I. Yu, and A. Veeraraghavan, “Codedstereo: Learned phase masks for large depth-of-field stereo,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 7170–7179.
23. E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, and Y. Shechtman, “Deepstorm3d: dense 3d localization microscopy and psf design by deep learning,” Nat. Methods 17(7), 734–740 (2020). [CrossRef]
24. Y. Wu, V. Boominathan, X. Zhao, J. T. Robinson, H. Kawasaki, A. Sankaranarayanan, and A. Veeraraghavan, “Freecam3d: Snapshot structured light 3d with freely-moving cameras,” in European Conference on Computer Vision, (Springer, 2020), pp. 309–325.
25. Z. Li, Q. Hou, Z. Wang, F. Tan, J. Liu, and W. Zhang, “End-to-end learned single lens design using fast differentiable ray tracing,” Opt. Lett. 46(21), 5453–5456 (2021). [CrossRef]
26. E. Tseng, A. Mosleh, F. Mannan, K. St-Arnaud, A. Sharma, Y. Peng, A. Braun, D. Nowrouzezahrai, J.-F. Lalonde, and F. Heide, “Differentiable compound optics and processing pipeline optimization for end-to-end camera design,” ACM Trans. Graph. 40(2), 1–19 (2021). [CrossRef]
27. Q. Sun, C. Wang, F. Qiang, D. Xiong, and H. Wolfgang, “End-to-end complex lens design with differentiable ray tracing,” ACM Trans. Graph 40(4), 1–13 (2021). [CrossRef]
28. J. Arasa, S. Bosch, J. Ferre-Borrull, and C. Pizarro, “Comparison of flux-tracing-based and diffraction-based strategies for optical system evaluation, in Optical Design and Engineering, vol. 5249 (International Society for Optics and Photonics, 2004), pp. 34–41.
29. S. A. Collins, “Lens-system diffraction integral written in terms of matrix optics,” J. Opt. Soc. Am. 60(9), 1168–1177 (1970). [CrossRef]
30. J. Li and C. Li, “Algorithm study of collins formula and inverse collins formula,” Appl. Opt. 47(4), A97–A102 (2008). [CrossRef]
31. D. Mendlovic and H. M. Ozaktas, “Fractional fourier transforms and their optical implementation: I,” J. Opt. Soc. Am. A 10(9), 1875–1881 (1993). [CrossRef]
32. X. Su, R. Tao, and X. Kang, “Analysis and comparison of discrete fractional fourier transforms,” Signal Processing 160, 284–298 (2019). [CrossRef]
33. H. Gross, “Cascaded diffraction in optical systems. part i: simulation model,” J. Opt. Soc. Am. A 37(2), 240–249 (2020). [CrossRef]
34. H. Gross, “Cascaded diffraction in optical systems. part ii: example calculations,” J. Opt. Soc. Am. A 37(2), 250–256 (2020). [CrossRef]
35. A. Shajkofci and M. Liebling, “Spatially-variant cnn-based point spread function estimation for blind deconvolution and depth estimation in optical microscopy,” IEEE Trans. on Image Process. 29, 5848–5861 (2020). [CrossRef]
36. V. Debarnot and P. Weiss, “Deepblur: Blind identification of space variant psf,” in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), (IEEE, 2021), pp. 1544–1547.
37. K. Yanny, N. Antipa, W. Liberti, S. Dehaeck, K. Monakhova, F. L. Liu, K. Shen, R. Ng, and L. Waller, “Miniscope3d: optimized single-shot miniature 3d fluorescence microscopy,” Light: Sci. Appl. 9(1), 171 (2020). [CrossRef]
38. E. Tseng, S. Colburn, J. Whitehead, L. Huang, S.-H. Baek, A. Majumdar, and F. Heide, “Neural nano-optics for high-quality thin lens imaging,” Nat. Commun. 12(1), 6493 (2021). [CrossRef]
39. J. Mertz, H. Paudel, and T. G. Bifano, “Field of view advantage of conjugate adaptive optics in microscopy applications,” Appl. Opt. 54(11), 3498–3506 (2015). [CrossRef]
40. A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance deep learning library,” Adv. neural information processing systems 32, 8026–8037 (2019).
41. W. Zhang, H. Zhang, C. J. Sheppard, and G. Jin, “Analysis of numerical diffraction calculation methods: from the perspective of phase space optics and the sampling theorem,” J. Opt. Soc. Am. A 37(11), 1748–1766 (2020). [CrossRef]
42. G. Tyler and D. Fried, “A wave optics propagation algorithm. optical sciences company,” Tech. rep., TR-451 (1982).
43. W. Brouwer, “Matrix methods in optical instrument design,” Matrix Methods in Optical Instrument Design (1964).
44. J. D. Schmidt, Numerical simulation of optical wave propagation: With examples in matlab (SPIE, 2010).
45. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS Autodiff Workshop (2017).
46. R. Remmert, Theory of complex functions, vol. 122 (Springer Science & Business Media, 1991).
47. B. Javidi, A. Carnicer, A. Anand, et al., “Roadmap on digital holography,” Opt. Express 29(22), 35078–35118 (2021). [CrossRef]
48. D. Maluenda, I. Juvells, R. Martínez-Herrero, and A. Carnicer, “Reconfigurable beams with arbitrary polarization and shape distributions at a given plane,” Opt. Express 21(5), 5432–5439 (2013). [CrossRef]
49. B. Ferdman, E. Nehme, L. E. Weiss, R. Orange, O. Alalouf, and Y. Shechtman, “Vipr: Vectorial implementation of phase retrieval for fast and accurate microscopic pixel-wise pupil estimation,” Opt. Express 28(7), 10179–10198 (2020). [CrossRef]
50. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).
51. S. Moser, M. Ritsch-Marte, and G. Thalhammer, “Model-based compensation of pixel crosstalk in liquid crystal spatial light modulators,” Opt. Express 27(18), 25046–25063 (2019). [CrossRef]
52. Y. Zhang, B. Xiong, Y. Zhang, Z. Lu, J. Wu, and Q. Dai, “Dilfm: an artifact-suppressed and noise-robust light-field microscopy through dictionary learning,” Light: Sci. Appl. 10(1), 152 (2021). [CrossRef]
53. E. Nehme, B. Ferdman, L. E. Weiss, T. Naor, D. Freedman, T. Michaeli, and Y. Shechtman, “Learning optimal wavefront shaping for multi-channel imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2179–2192 (2021). [CrossRef]
54. R. V. Shack and K. Thompson, “Influence of alignment errors of a telescope system on its aberration field,” in Optical Alignment I, vol. 251 (SPIE, 1980), pp. 146–153.
55. F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2020), pp. 2636–2645.
56. M. Hirsch, S. Sra, B. Schölkopf, and S. Harmeling, “Efficient filter flow for space-variant multiframe blind deconvolution,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 2010), pp. 607–614.
57. L. Denis, E. Thiébaut, F. Soulez, J.-M. Becker, and R. Mourya, “Fast approximations of shift-variant blur,” Int. J Comput. Vis. 115(3), 253–278 (2015). [CrossRef]
58. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, (2017), pp. 2980–2988.
59. R. Orange-Kedem, E. Nehme, L. E. Weiss, B. Ferdman, O. Alalouf, N. Opatovski, and Y. Shechtman, “3d printable diffractive optical elements by liquid immersion,” Nat. Commun. 12(1), 3067 (2021). [CrossRef]
60. Q. Fu, H. Amata, and W. Heidrich, “Etch-free additive lithographic fabrication methods for reflective and transmissive micro-optics,” Opt. Express 29(22), 36886–36899 (2021). [CrossRef]
61. C. A. Metzler and G. Wetzstein, “Deep s 3 pr: Simultaneous source separation and phase retrieval using deep generative models,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2021), pp. 1370–1374.
62. S. Coy, “Choosing mesh spacings and mesh dimensions for wave optics simulation, in Advanced Wavefront Control: Methods, Devices, and Applications III, vol. 5894 (International Society for Optics and Photonics, 2005), p. 589405.
63. F. Zhang, I. Yamaguchi, and L. Yaroslavsky, “Algorithm for reconstruction of digital holograms with adjustable magnification,” Opt. Lett. 29(14), 1668–1670 (2004). [CrossRef]
64. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.
65. S. Abrahamsson, J. Chen, B. Hajj, S. Stallinga, A. Y. Katsov, J. Wisniewski, G. Mizuguchi, P. Soule, F. Mueller, C. D. Darzacq, X. Darzacq, C. Wu, C. I. Bargmann, D. A. Agard, M. Dahan, and M. G. L. Gustafsson, “Fast multicolor 3d imaging using aberration-corrected multifocus microscopy,” Nat. Methods 10(1), 60–63 (2013). [CrossRef]
66. M. Ovesnỳ, P. Křížek, J. Borkovec, Z. Švindrych, and G. M. Hagen, “Thunderstorm: a comprehensive imagej plug-in for palm and storm data analysis and super-resolution imaging,” Bioinformatics 30(16), 2389–2390 (2014). [CrossRef]