Diffractive optical system design by cascaded propagation

Boris Ferdman; Boris Ferdman; Alon Saguy; Dafei Xiao; Dafei Xiao; Yoav Shechtman; Yoav Shechtman

doi:10.1364/OE.465230

1. Introduction

Designing optical imaging systems can be a challenging task, addressable by various approaches. Geometrical optics (ray tracing), while commonly used, is a crude approximation of wave optics. Still, it is useful for designing thick elements and complex stacks of lenses. Today, a prevalent approach [1] is to use a hybrid model, where first ray-tracing is employed to efficiently compute the Optical-Path Difference (OPD) from the object space to the exit pupil. Next, a diffraction model, based on Fourier optics, is used to compute the 2D/3D Point-Spread Function (PSF) on the detector plane. Existing optical engineering tools (e.g. Zemax [2], CodeV [3], etc.) can then be used to optimize the optical components by numerically minimizing a hand-crafted merit function, usually with a limited number of degrees of freedom, as ray tracing software typically model smooth surface profiles (e.g. Zernike polynomials). This approach can take into account the thickness of optical elements, spatially variant aberrations, dispersion effects and more. Furthermore, introduction of freeform optics [4] can increase the design space. Geometrical optics is not suitable, however, for applications such as phase retrieval [5–8], adaptive optics [9,10], PSF engineering [11–14], i.e. any application that requires either correctly modelling the diffraction-limited PSF, designing systems without light focusing such as diffractive networks [15] or employing optical elements that manipulate the phase, amplitude, or polarization of light in intermediate planes. These require wave optics [16], usually limited to the thin-element and paraxial approximations.

Fourier optics is exceptionally useful in the field of computational imaging, where optical systems are co-designed alongside with analysis algorithms to yield powerful imaging systems with non-conventional capabilities [17]; applications range from High Dynamic Range imaging [18,19], Color sensing [20,21], to end-to-end wavefront shaping [22–24] and more. Due to their reliance on Fourier optics, these applications have traditionally been limited to the optimization of a single optical element, typically placed in the system’s aperture. Recently, alternative end-to-end computational photography methods based on ray tracing have been proposed [25–27], showing promise in computer vision applications. These methods optimize lens stacks and can design polynomial surfaces. Some limitations of these methods can be directly addressed using wave optics, e.g. modelling non-ideal point sources (such as electrical dipoles), modelling sources that are not placed in infinity, incorporating edge diffraction effects, multiple apertures, Fresnel transfer coefficients and more. Other techniques which use ray tracing-based Monte Carlo methods [28] exist and can be suitable for specific challenging cases (such as very short propagation distances) but are not suitable for direct optimization of optical systems.

One applicable approach for analyzing and designing optical systems with simple components is using the Collins integral [29,30], a parabolic phase case of the Fractional Fourier transform [31,32], which facilitates an angular-spectrum-based method to compute the diffraction integral using the ray transfer matrix. Recently, a cascaded diffraction model based on the Collins integral was implemented to model the diffraction of a series of diaphragms in complex optical systems [33,34] to simulate spatially variant edge-diffraction effects. The effects of spatially-variant PSFs are usually digitally corrected (post-acquisition), as they can degrade the performance of algorithms that assume a single PSF throughout the field of view [35,36]; however, spatially-variant PSFs (in Fourier optics) can also be useful, for example: micro-lens arrays with varying focal lengths [37], local color sensing [21], metasurface-optics design [38] and conjugate adaptive optics [39].

In this work, we propose and implement a differentiable propagation model based on backpropagation, enabling inverse design of optical systems, relying on Fourier optics, with unprecedented design freedom. The main contribution of our method is in providing a computationally efficient platform to directly optimize and perform phase retrieval for challenging systems. This allows us to engineer PSFs using thin plate elements placed in any plane in an optical system, design spatially-variant PSFs, perform cascaded optimization of multiple planes and optimize element-placement jointly with their phase functions. These capabilities are all made possible by implementing a differentiable version of the Collins integral using Pytorch [40]. This approach, while still within the boundaries of the thin-element and paraxial approximations, offers several advantages: (1) it is computationally efficient by alleviating strict sampling requirements of Fourier optics and requiring a few FFT operations, (2) it allows the optimization of the positions and functions (i.e phase/amplitude profiles) of many elements in a cascaded optical system, (3) it enables the optimization of diffraction limited systems and (5) a hybrid approach can be readily superimposed on our method, where thick-elements and spatial aberrations (e.g. from a ray tracing software) can be accounted for. We apply our method numerically and experimentally to several design challenges focusing on shift-variant PSF engineering in microscopy and machine vision.

2. Cascaded diffraction model

2.1 Fresnel propagation in free space

Computing free space propagation under the Fresnel approximation requires evaluation of the Fresnel integral [16] in each transition between elements:

(1)$$\begin{aligned}U_{out}\left(r_{2}\right) = &\frac{\exp\left(i k\cdot z\right)}{i\lambda\cdot z}\cdot \\ &{\kern 1cm}\exp\left(\frac{ik}{2z}\cdot\left|r_{2}\right|^{2}\right) \cdot \int_{P_{1}} U_{in}\left(r_{1}\right) \cdot \exp\left(\frac{ik}{2z}\cdot\left|r_{1}\right|^{2}\right) \cdot \exp\left(\frac{-ik}{z}\left(r_{2}\cdot r_{1}\right)\right)dr_{1},\end{aligned}$$

where $U_{out}\left (r_{2}\right )$ is the scalar electrical field in the output plane $r_{2}=\left (x_{2},y_{2}\right )$, $U_{in}\left (r_{1}\right )$ is the scalar electrical field in the input plane $r_{1}=\left (x_{1},y_{1}\right )$ with spatial extent of $P_{1}$. $k=\frac {2\pi }{\lambda }$ is the wavenumber and $z$ is the paraxial propagation distance.

Numerically evaluating Eq. (1) is challenging, as the Nyquist frequency of the chirp functions ($\propto \exp {\left (i\cdot x^{2}\right )}$) increases radially, i.e. larger apertures require finer sampling. Efficient methods to evaluate Eq. (1) have been extensively studied [41].

2.2 Fresnel propagation in a complex optical system

In this work we focus mainly on an angular-spectrum method termed the Collins integral [29]. This formulation relates geometrical optics ray tracing to the Huygens-Fresnel wave propagation theory. Specifically, we use a scaled version, similar to the approach by G. A. Tyler and D. L. Fried [42], which allows for some flexibility in choosing the sampling frequency and grid size. At the heart of our method is the calculation of diffraction between planes of interest (i.e. apertures, phase or amplitude elements) in an optical system. We now derive an expression for the propagation in segment j between two planes in the system. In geometrical optics, the well-known ABCD matrix [43] defines the transformation of a light ray (defined by lateral position and angle) between two planes. We write the ABCD matrix as a 2X2 matrix, assuming rotational symmetry:

(2)$$\mathcal{M}_{j} = \begin{pmatrix} A_{j} & B_{j} \\ C_{j} & D_{j} \end{pmatrix} .$$

Using the ABCD matrix, the scaled Collins integral defines the wave propagation as follows (see Appendix 1 for the full derivation):

(3)$$\begin{aligned}U_{out}\left(r_{2}\right) = &\frac{\exp\left(\frac{ik}{2\cdot B_{j}}\cdot(D_{j}-\frac{1}{m}) \cdot\left|r_{2}\right|^{2}\right)} {i\lambda\cdot B_{j}} \cdot \\ &{\kern 1cm}\int_{P_{1}} U_{in}\left(r_{1}\right) \cdot \exp\left(\frac{ik}{2\cdot B_{j}}\cdot(A_{j}-m) \cdot\left|r_{1}\right|^{2}\right) \cdot \exp\left(\frac{i \cdot k\cdot m }{2 B_{j}}\left|\frac{r_{2}}{m} - r_{1}\right|^{2}\right)dr_{1}.\end{aligned}$$

Where $m$ is a scaling parameter between the planes. The integral is defined for sections where $B_j \neq 0$. The numerical evaluation of Eq. (3) can be simply computed via a Fourier transform [44], where the input and output planes are sampled by intervals $\delta _{1}$ and $\delta _{2}$, respectively, and $m=\frac {\delta _{1}}{\delta _{2}}$:

U_{out}\left(r_{2}'\right) \propto Ch_{out} \cdot \mathcal{F}^{{-}1} \left[\mathcal{F}\left( U_{in}\left(r_{1}\right) \cdot Ch_{in} \right) \cdot Ch_{tf}\right];

(4)$$\begin{aligned}&Ch_{in} = \exp\left(\frac{ik}{2\cdot B_{j}}\cdot(A_{j}-m) \cdot\left|r_{1}\right|^{2}\right),\\&Ch_{tf} = \exp\left(\frac{-i\pi\lambda B_{j}}{m}\cdot\left|f\right|^{2}\right),\\&Ch_{out} = \exp\left(\frac{ik}{2\cdot B_{j}}\cdot(D_{j}-\frac{1}{m}) \cdot\left|r_{2}'\right|^{2}\right).\end{aligned}$$

where $f=\left (f_x,f_y\right )$ is the sampling grid in the Fourier domain.

In our implementation, we treat the real and imaginary parts separately by splitting the field $E=A \exp {\left (i\Phi \right )}$ into $\left (u,v\right ) = \left (\Re {\left (E\right )},\Im {\left (E\right )}\right )$ channels. Then, we utilize automatic differentiation [45] (i.e. back-propagation) of a real-valued cost function $L$ w.r.t a complex intermediate variable $z=u+iv$ by taking the gradient as $\nabla _L(z) = \frac {\partial L}{\partial u}+i\frac {\partial L}{\partial v}$. This approach also allows using the generalized chain rule [46] to compute the composition of functions in the cascaded propagation.

To produce a differentiable diffraction model from an input plane to an off-axis position on an output plane, we use the matrix $\mathcal {M}_{j}$ (as defined in Eq. (2)) to shift the coordinate system $r_{2}$ [33] by tracing the geometrical chief ray. Then we evaluate Eq. (4) to get the diffraction pattern. The shifting operator can be implemented either by an FFT operation (for sub-pixel resolution) or via a shifted crop operator (for computational speed and efficiency). This approach removes linear phases from the propagation computation, allowing for smaller samplings grids and avoiding aliasing in large FOV simulations. A minor projection correction (for small angles) can be considered here, however we neglect this effect to conserve memory. Finally, if the output plane is not the final image plane, an additional step of multiplying with the shifted element at the output plane and re-centering is performed to enable the cascaded propagation to the following sections in the system.

In Appendix 3 we demonstrate the numerical optimization of two phase elements to facilitate a multi-focus RGB PSF, i.e. mapping a point in 3D object space to a point in 2D image space, where the lateral position in image space depends on the axial object-space position; and, importantly, different colors are to be mapped identically. This approach can be used to readily optimize multiple elements in a complex optical system for on-axis propagation, e.g. in holography [47] or laser mode manipulations [48]. This is not a suitable approach for imaging systems (as it considers only on-axis PSF in a shift-variant system), which are the focus of the rest of this paper. The intermediate conclusion is that addressing and optimizing the shift-variant PSF (by sampling PSFs off-axis) is crucial for making this method viable, thus, this is the approach we take in the remainder of this work.

3. Experimental demonstrations

In this section we detail three experiments performed to validate the capabilities of the method to design thin elements, focusing on phase masks. We used an epi-fluorescence microscope (40X magnification and 0.75 NA) with an extended 4f (two achromatic lenses with focal lengths of 10 and 20 cm) system as shown in Fig. 1(a).

Fig. 1. Experimental out-of-aperture phase retrieval. (a) A standard inverted microscope with 40X 0.75NA objective. (b) Emission from fluorescent beads passes through a 4f system with a LC-SLM placed 9.6 cm behind the back focal plane. We chose four lateral positions (37.5 $\mu m$ in each direction from the optical axis) in a single defocus plane (-3 $\mu m$) to optimize the PSF of a fluorescent molecule placed there. Each position was assigned a letter in the phase retrieval algorithm, resulting in the phase mask plotted. The experimental image was acquired by finding a FOV with beads close to the designed positions. Intermediate lateral positions show a combination of the PSFs, as expected for positions that were not optimized. The contrast of the full FOV image was adjusted for better visibility.

Download Full Size | PDF

A Liquid-Crystal Spatial Light Modulator (LC-SLM, Pluto VIS-130 Holoeye) is placed 19.6 cm after the first lens, namely, downstream from the Fourier plane, behind a linear polarize (LP) and an aperture (to filter stray light) matching the back focal plane size of the objective lens (3.75 mm). The LC-SLM plane is sampled on a 16 $\mu m$ grid (the real pixel size is 8 $\mu m$). As our approach optimizes the phase pixel-wise, the degrees of freedom (number of pixels used) in each section depends on the specific FOV and is on the scale of 150,000-200,000 pixels.

Our calibration process has two steps, the first is calibrating the microscope aberration. Our system does not exhibit noticeable FOV dependent aberrations, and is mainly dominated by shift-invariant spherical and astigmatism aberrations. We calibrate our system using a method designed for high NA objectives [49]. This approach produces a pupil phase correction based on an accurate pixel-wise vectorial diffraction model. The second step is registration of the theoretical and experimental ray tracing of the chief ray from the object plane to the LC-SLM plane, for further details see Appendix 5.

In all the experimental sections, the optimization is performed using the Adam optimizer [50] until the cost function stagnates. This optimization takes $\sim$ two minutes on a laptop with Intel i7-8750H and NVIDIA 8Gb 2070RTX GPU.

3.1 Out-of-aperture phase retrieval

The first experiment validates our out-of-aperture phase retrieval method by designing spatially variant PSFs at a specific defocus position. We use Fluorescent beads (Introvigen, 625/645, 200 nm) adhered to a glass coverslip as our sparse point source sample (see Appendix 5 for details about the sample preparation). The cost function for phase mask $\Phi$ is defined as:

(5)$$\mathcal{L}_{\text{PR}}\left(\Phi\right) = \frac{1}{N_p\cdot N_{xy}} \cdot \sum_{n=1}^{N_p}\sum_{q=1}^{N_{xy}}{\|\text{PSF}\left(n;\Phi,xy_{q},z\right) - L_{q}\left( n\right)\|_2^{2}} + \alpha \cdot TV\left(\Phi\right).$$

where $n= 1,2\cdots N_p$ are the pixels in the cropped FOV, $\text {PSF}\left (n;xy_{q},z\right ) = \| U_{out}\left (n;xy_{q},z\right )\|^{2}$ is the PSF at the image plane for a point source with defocus $z$ and lateral position $xy_{q}$. The second term in the optimization is an anisotropic Total Variation (TV) loss on the phase mask, to enforce smoothness which improves the performance in the case of using LC-SLMs [51]. The cost function term $L_{q}\left ( n\right )$ was designed by choosing four lateral positions $xy_{q}$ at a defocus $z$ in the FOV (limited by the LC-SLM) and assigning a picture of a letter as the desired PSF. The results can be seen in Fig. 1 where emitters close to the designed positions exhibit the desired PSF, while intermediate positions exhibit PSFs that are a shifted combination of the letters (as they sample intermediate positions in the phase mask plane). Furthermore, we repeated the experiment with eight letters spanned over four lateral and two axial positions. The results are presented in Appendix 3, showing that this method can design shift-variant 3D PSFs; however, aberrations were more prevalent when simultaneously optimizing eight letters resulting in lower visual quality.

3.2 Analog refocusing

The second experiment validates the estimation of positions and defocus correction by spatially refocusing single emitters in a sparse 3D object space. The challenge here is how to produce a single ’in-focus’ image of several objects in the field-of-view which are at different axial positions. Specifically, in our case, the depth of field of the system is $\approx 1.5$ $\mu m$ and we were able to image objects spanning a 9 $\mu m$ axial range. We used a sparse bead sample embedded in Gel (see 5. for sample preparation details) to acreate a 3D object. First, a focal stack was acquired by shifting the objective lens over the axial range. The focal stack was used to estimate the 3D positions of the emitters (see Appendix 5). Then, we chose some of the emitters spanning different z position ( e.g. 5 in the case presented in Fig. 2) and optimized a phase mask where the cost function was aimed at correcting the defocus phase per emitter ,i.e. compare the theoretical defocused PSF to the in-focus PSF (clear aperture) at $z= \text {AFP}$ where AFP is the actual focus plane for an objective focused at $3.7$ $\mu m$ above the coverslip (accounting for the refractive index mismatch between the Gel and air):

(6)$$\mathcal{L}_{\text{PR}}\left(\Phi\right) = \frac{1}{N_p\cdot N_{xy}} \sum_{n=1}^{N_p}\sum_{q=1}^{N_{xy}}\|\text{PSF}\left(n;\Phi,xy_{q},z_{q}\right) - \text{PSF}\left( n;0,0,z=AFP\right)\|_2^{2} + \alpha \cdot TV\left(\Phi\right).$$

Fig. 2. Refocusing experiment. (a) Beads in Gel creating a sparse 3D sample. The optical setup is similar to the one in Fig. 1. (b) The phase retrieved mask which was applied to the image in (c-left) and refocused to (c-right, contrast adjusted for visibility). (d-top) Line plots of the emitters in (c-left) without emitter $\#4$ ) and (d-bottom) line plots of the refocused PSFs. (e) Cropped PSFs from (c), normalized relative to the z-stack focused image (per emitter).

Download Full Size | PDF

In this case we chose to bin the camera pixels (to an effective size of $6.9 \mu m$) to increase the signal in the ground truth estimation and to reduce computation time. After the acquisition of the ground-truth positions, the optimization ran for two minutes and the phase mask was then immediately applied, so that drift can be minimized. As the emitters beams overlap in the phase mask plane, some contrast is lost due to induced aberrations. In both images in Fig. 2(c), the objective lens is focused close to the axial plane of emitter $\#3$ (at z = 3.7 $\mu m$). Note especially beads $\#1,2,5$ that significantly improve visibility after refocusing. We note that emitter $\#4$ in the refocused image (2(c-right)) is not the emitter visible on the left image(2(c-left)), but a defocused one which cannot be seen without the phase mask - both emitters have a very similar lateral position, which can be seen the full z-stack shown in Section 5 and Fig. 10. Comparing to the in-focus emitter $\#3$, it lost 30% of the max intensity. The intensity loss ranges 10-65%, depending on the acquired aberration. We chose to use our approach even though this specific case could have been solved semi-analytically by calculating the appropriate defocus phases from Eq. (3) and super-imposing them (in the correct lateral position) on the LC-SLM. The results from this section show the applicability of our method for designing Micro Lens Arrays (MLA) with different focal planes and spatial positions [37,52] and dynamic analog refocus capabilities for sparse samples which surpass a single Extended Depth Of Field (EDOF) phase mask [53] for sparse samples in terms of photon efficiency, axial range and aberration corrections (which can be applied to each emitter separately).

3.3 Local aberration correction

Correcting the defocus of a sparse sample is a simple case of aberration correction. To show that our approach can apply to other forms of aberrations, we used our system to correct aberration induced by a thin dielectric Tetrapod phase mask [13], placed in before the first lens as shown in Fig. 3(a). This setup induced a phase-only spatially variant aberration at a plane which is not conjugate to our LC-SLM; thus, a full FOV correction is not possible [39] and we chose to correct the optical path of a select group of emitters, similar to the approach in Fig. 2. This challenge can be approached using our cascaded propagation method directly by estimating the wavefront error per chosen FOV point at the correction plane. Other methods such as local Zernike estimation [35,54] are limited to smooth aberrations, require either stitching or interpolating the aberrations over the FOV, require an extra step of propagation to estimate the required correction at the LC-SLM plane, and are slower by two order of magnitude than a direct approach, as we have previously shown in [49]. For this experiment, we switched the camera to a higher sensitivity sCMOS (Kinetix, Photometrics). For simplicity, we captured a z-stack (by shifting the objective lens with a step size of 0.15 $\mu m$ over a range of $\left (-3,3\right )\mu m$) of sparse beads on a coverslip sample and then applied our iterative approach to estimate the phase mask $\Phi$ which can create such an aberration (placed in the LC-SLM plane) using the cost function:

(7)$$\mathcal{L}_{\text{PR}}\left(\Phi\right) = \frac{1}{N_p\cdot N_{xy}\cdot N_{W}} \sum_{n=1}^{N_p}\sum_{q=1}^{N_{xy}}\sum_{w=1}^{N_{W}}\|\text{PSF}\left(n;\Phi,xy_{q},f_{w}\right) - \text{PSF}\left( n;0,0,f_{w}\right)\|_2^{2} + \alpha \cdot TV\left(\Phi\right).$$

where $f_{w}; w=1,2,3\cdots N_{w}$ are the focus positions of the objective relative to the coverslip. After convergence, we applied the conjugate of the result to the LC-SLM in order to correct this aberration. For the chosen FOV of 1024X1024 pixels, we found that a correction of four points can be achieved as shown in Fig. 3. Correction of more points or the Full FOV would require placing the correction at a conjugate plane to the aberration plane, applying more than a single thin-element correction or reducing the aperture. Furthermore, aberrations such as linear shifts, field distortion or vignetting cannot be detected in this approach, because we crop around the aberrated PSF and assume a phase-only local correction. These results can be further improved by estimating the optimal position of the phase mask during the optimization process, similar to the procedure presented in the next section (in simulation).

Fig. 3. Local aberration correction experiment. (a) A phase-only aberration DOE is placed between the first lens and the Intermediate Image Plane (IIP), such that it is not conjugate to the LC-SLM plane. (b) Optimized correction map for the highlighted emitters in (c-left). (c-right) The image after applying the local correction for the four emitters. (d) Crops of the emitters without (left) correction, (middle) 4-point correction and (right) 1-point correction. All images are shown using the same color scale.

Download Full Size | PDF

4. End-to-end design

In this section we use our approach to perform end-to-end learning in a computational imaging task. As a benchmark, we apply our model to the task of semantic segmentation on the Berkeley deep drive database (BDD100K) [55]. The database contains 100K RGB images, 720x1280 pixels, recorded from the dashboard of a driving car. Each image in the database is segmented to 19 different classes, road, building, car, bicycle, etc. Previous work [26] showed that optimizing a lens-stack with end-to-end learned geometrical optics design which considers the spatial PSF can improve the results in object detection tasks in computer vision. Our rationale for choosing this application is that a shift-variant PSF could potentially exploit the typical structure in such scenes (the sky is at the top of the image, the road is on the bottom, etc.).

We model image acquisition as an incoherent system with a Fused Quartz diffractive optical element (DOE) instead of an LC-SLM. The considered DOE is simulated with maximal height of $h_{max} = 1.5$ $\mu m$, having the phase delay function (when placed in air)

(8)$$\Phi\left(x,y\right) = k \cdot \left(n\left(\lambda\right)-1\right)\cdot h\left(x,y\right),$$

where $n\left (\lambda \right )$ is the refractive index of the DOE material. The optimization is performed with a continuous height map, and a softmax function with decreasing temperature is applied to the height map such that at the final iterations the mask will get sampled on a feasible fabrication grid with an axial step size of $100$ $nm$. Calculating the spatially variant PSF per-pixel and performing convolution with a full frame is computationally infeasible, thus we adopt a linear PSF interpolation approach [56]. Under this model, the PSFs are calculated on a grid, convolved with a weighted image, and summed to generate the modulated image $I_{mod}\left (n,m;C_{l}\right )$, i.e.:

(9)$$I_{mod}\left(n,m;C_{l}\right) = \sum_{k=1}^{K} \text{PSF}_{k}\left(n,m;h,z=0,C_{l}\right)* \left( \omega_{k}\left(n,m\right) \cdot I_{0}\left(n,m;C_{l}\right)\right),$$

where $*$ is a 2D convolution, $C_{l}$ is the color channel, $\omega _{k}\left (n,m\right )$ is the window weighting function associated with $\text {PSF}_{k}$ which is the spatially variant PSF calculated at a chosen grid point and $I_{0}\left (n,m;C_{l}\right )$ is the input image at pixel $\left (n,m\right )$ and wavelength $C_{l}$. The number of chosen kernels ($K$) creates a trade-off between computational efficiency and accuracy. The faster the PSF changes (spatially), the denser the kernels need to be sampled to accurately represent it. In other methods, a SVD approach can be taken to estimate the number of kernels [37] needed, or an optimized weighting [57] grid based on the PSF can be computed. We chose a rather sparse kernel grid (K=24 for each color) and linear weighting (An example is presented in Appendix 1). Importantly, we allow the network to learn not only the DOE heights, but also its position, which is a degree of freedom rarely explored in the realm of Fourier optics.

Our end-to-end model is composed of two main blocks (see Fig. 4(a)): the optical block and 24-layer ResNet (see Fig. 6). The optical block depicts the formation of the final image in the camera, then, the image is passed through the ResNet network that generates pixel-wise class predictions. Our loss function consists of three terms:

(10)$$\mathcal{L}_{\text{net}} = \mathcal{L}_{\text{foc}} + \alpha_{TV}\cdot\mathcal{L}_{\text{TV}} + \alpha_{PSF}\mathcal{L}_{\text{PSF}},$$

where $\mathcal {L}_{\text {foc}}$ is a focal loss [58], $\mathcal {L}_{\text {TV}}$ is a Total Variation loss on the network output and $\mathcal {L}_{\text {PSF}}$ is a regularization on the PSF (see Appendix 2 for more details).

Fig. 4. End-to-end learning for semantic segmentation. (a) In the forward pass, the input image is convolved with the PSF (24 kernels per wavelength, we show only 6, gamma corrected for visibility) that was generated by the learnable phase mask in the optical system, where the original image plane (OIP) is the location of the image from the dataset. We learn both the phase mask pattern, and its position ($d$) in the optical path. The observed image is inserted to the ResNet network, that generates a pixel-wise class prediction map. Finally, we calculate the loss function between the prediction and the ground truth. In the backpropagation step we update both the ResNet weights and the phase mask parameters. Black/red arrows mark forward/backward paths. (b) The final scores for three different training modes: standard PSF, learned PSF in the Fourier plane, and learned PSF and position. Scale bar is 150 $\mu m$ in the camera plane

Download Full Size | PDF

To test our network we have compared the performance of 3 different architectures: ResNet with standard imaging conditions, namely, blurring the original image; ResNet with learnable phase mask in the Fourier plane (yielding a shift-invariant PSF); and ResNet with learnable phase mask position and function. Additionally, the output of the optical block is the observed image with either an ideal RGB (i.e. three color channels with a single wavelength and unity response) or a monochromatic camera, which is then passed to the neural network that predicts the pixel-wise segmentation of the image.

For simplicity, our model assumes augmentation of a standard imaging system with a 4f system, with our phase mask placed at some plane (with axial position optimized by the net) in between (4(a)). We report the average accuracy that we achieved on 1000 test images by using each architecture (4(b)). The standard PSF architecture predicted 55.8% of the pixels correctly with the RGB camera. When the phase mask was learned but the position of the mask was constant (in the Fourier plane, i.e. a spatially-invariant PSF) the percentage of correctly classified pixels was 57.0%, namely, achieving some improvement over not learning a phase mask at all. Finally, in the spatially-variant design, the network correctly classified 60.4% of the pixels - an improvement of 4.7% (8.2% relative improvement) compared to standard imaging. Similar behavior can be observed with the monochromatic camera (see Fig. 4(b)). The phase mask position was optimized by the net to be 5.1 cm after the first lens in the 4f system, where the focal distances of the first and second lenses are 15 cm and 5 cm, respectively. Thus, we conclude that a shift-variant PSF that can exploit the structure of the data outperforms both standard imaging and shift-invariant PSF engineering. In practice, such a system will require the fabrication of a larger DOE (approximately 8.5 mm in the long axis compared to a circle with a diameter of 2.5 mm in the aperture), which might increase the costs of fabrication when considering lithography, although recently developed techniques show promising potential for simplified fabrication of large masks [59,60]. We note that this specific scenario (effectively 2D imaging and 3 color channels) could have been addressed without an optical block, by increasing the number of parameters in the neural network, and having the network perform digitally the physical operation of the optical setup. Nevertheless, with the optical block adding 250,000 learnable parameters (pixels in the phase mask) to a ResNet with approximately 20 million parameters, we can see an efficient improvement in the results. Interestingly, the output of the optical block can serve as an interpretable intermediate result - or alternatively, it can guide the algorithmic design process, by suggesting an initial image-processing operation to improve the reconstruction net of images obtained by a standard imaging system.

We note that our discussion in this subsection is purely simulated, and future work with real world implementation will face various challenges like calibration, manufacturing errors, modelling inaccuracies and more. The challenges dealing with calibration and modelling can be addressed in a similar approach to the experimental sections of this paper or other wave optics methods [21], while dealing with specific computational imaging implementation challenges was addressed in previous work, in the context of geometrical optics design for vehicle cameras [26].

5. Discussion

In this work we demonstrated a flexible and differentiable method to design complex optical systems containing diffractive elements. Our approach allows for simple and direct wave-optics propagation and optimization of multiple elements in a cascaded system, using backpropagation. We validated the applicability of our method to perform out-of-aperture phase retrieval for incoherent sources, enabling the design of spatially-variant PSFs with substantial freedom, e.g. optimization of the element placement, and demonstrated a potential application for autonomous vehicles, performing end-to-end optimization of a computational imaging system for object segmentation, where the position of a phase mask is learned in conjunction with the spatially variant PSF (Section 4).

Three experimental demonstrations were shown in Section 3.; first, we designed alphabetical-letter PSFs to demonstrate the pixel-wise optimization capabilities and design freedom. Second, we performed spatially-variant refocusing, suggesting a possibly useful application involving multi-focus imaging. Third, we performed FOV dependent aberration correction. Our method is based on using paraxial ray-tracing to define the wave propagation via the Huygens-Fresnel diffraction. This framework allows computation of cascaded diffraction with spatial information, while supporting fast computations and flexibility in sampling rate, enabling the optimization of complex optical systems. Possible future applications can include online correction of spatially-variant PSFs (e.g. spatial adaptive optics), end-to-end learning for tasks with structured data, source separation [61], coherent system design, and more. In the future, our approach can be combined with geometrical ray tracing, which would allow the consideration of thick-element aberrations, fabrication imperfections, tolerances and other constraints which are straightforward to include in ray tracing. Ultimately, these two approaches can be combined into a hybrid model, exhibiting the best of both worlds.

Appendix 1: Optical simulation model

Propagation model

A common way to perform ray tracing is to define an ABCD matrix which transforms a light ray $\left (x,\theta \right )$ between two planes. We write the ABCD matrix as a 2X2 matrix, which assumes azimuthal symmetry in ray propagation in the lateral directions. More generally, the ABCD matrix can be written as a 4X4. For a given section of a system (in air) with a geometrical optics matrix:

(11)$$\begin{bmatrix} x_{2} \\ \theta_{2} \\ \end{bmatrix} = \begin{pmatrix} A_{j} & B_{j} \\ C_{j} & D_{j} \end{pmatrix} \begin{bmatrix} x_{1} \\ \theta_{1} \\ \end{bmatrix} .$$

Collins [29] presented a method to compute the Huygens-Fresnel integral in such a system:

(12)$$U_{out}\left(r_{2}\right) = \frac{\exp\left(ik\cdot z\right)}{i\lambda\cdot B_{j}} \cdot \int_{P_{1}} U_{in}\left(r_{1}\right) \cdot \exp\left(\frac{ik}{2B_{j}}\cdot\left(D_{j}\cdot \left|r_{2}\right|^{2}-2r_{1}\cdot r_{2}+A_{j}\left|r_{1}\right|^{2}\right)\right)dr_{1}.$$

Following the work of Tyler and Fried [42], a scaling parameter $m$ can be introduced by defining a scaled grid $r_{2}'=r_{2}/m$ and doing the following computation:

(13)$$\begin{aligned}&D_{j}\cdot r_{2}^{2}-2r_{1}\cdot r_{2}+A_{j}r_{1}^{2} = D_{j}\cdot \left(r_{2}^{2}+\frac{r_{2}^{2}}{m\cdot D_{j}} -\frac{r_{2}^{2}}{m\cdot D_{j}}\right)-\\&2r_{1}\cdot r_{2}+A_{j}\cdot \left(r_{1}^{2}+\frac{m\cdot r_{1}^{2}}{A_{j}} -\frac{m\cdot r_{1}^{2}}{A_{j}}\right) = m\left(\frac{r_{2}}{m}-r_{1}\right)^{2}+\left(D_{j}-\frac{1}{m}\right)\cdot r_{2}^{2}+ \left(A_{j}-m\right)\cdot r_{1}^{2}.\end{aligned}$$

Plugging Eq. (13) into Eq. (12) gives the scaled Collins integral which is the main tool in this work:

(14)$$\begin{aligned}U_{out}\left(r_{2}\right) =& \frac{\exp\left(\frac{ik}{2\cdot B_{j}}\cdot(D_{j}-\frac{1}{m}) \cdot\left|r_{2}\right|^{2}\right)} {i\lambda\cdot B_{j}} \cdot \\&{\kern 1cm} \int_{P_{1}} U_{in}\left(r_{1}\right) \cdot \exp\left(\frac{ik}{2\cdot B_{j}}\cdot(A_{j}-m) \cdot\left|r_{1}\right|^{2}\right) \cdot \exp\left(\frac{i \cdot k\cdot m }{2 B_{j}}\left|\frac{r_{2}}{m} - r_{1}\right|^{2}\right)dr_{1}.\end{aligned}$$

We use the matrix $\mathcal {M}_{j}$ to shift the coordinate system $r_{2}$ [33] by ray-tracing the geometrical chief ray. Finally, we evaluate Eq. (14) to get the diffraction pattern. A minor projection correction can be performed here, i.e. transforming the coordinates based on the incidence angles $\left (\theta _{x} , \theta _{y} \right )$ by projecting the grid at the output plane (scaling the coordinates by $\left (\cos {\theta _x},\cos {\theta _y}\right )$).

In this work, we neglect this effect to conserve memory by using a uniformly sampled grid in the output plane for all incidence angles. Finally, an additional step of multiplying with the shifted element at the output plane and re-centering is performed to enable the cascaded propagation to the following sections in the system. The shifting operation can be performed with sub-pixel precision via FFT or by simply using a shifted cropping operation (faster but with a pixel resolution), both are differentiable operations and can be used to back-propagate the gradients.

Modelling the input point source

Modelling the input source correctly is essential in creating an accurate propagation simulator. One simple choice would be using a 2D Gaussian at the input plane; however, this is an unsatisfactory choice as the Fourier transform will lack the high frequency content of a point source. Under the paraxial approximation, the emission of a point-source is given by a parabloc phase:

(15)$$U\left(r\right) = \frac{\exp{\left(i k\cdot z\right)}}{i\lambda \cdot z} \exp{\left(\frac{i k}{2z}r^{2}\right)}.$$

As this equation cannot be evaluated at $z=0$, we implement two options for the input source. For simplicity, we provide the expression for on-axis sources, whereas evaluating off-axis is straightforward by adding a linear phase in the frequency domain. The first approach is by modelling the field, using Eq. (15) (or an apodized version of it), at the plane of the first element (or lens) in the system, this requires changing $\mathcal {M}_{1}$ and the sampling conditions. The second approach is starting from a model of the scalar field in the aperture; this is mainly suitable for small defocus values, i.e. $z<f_{1}$ and for telecentric systems which we experimentally focused on. Under this assumption, the Fourier plane scalar field $E_{\mathcal {F}}$ can be expressed as clear aperture with a phase curvature proportional to the z position:

(16)$$E_{\mathcal{F}}\left(r\right) = \begin{cases} \exp{\left(i k z \cos\theta\right)}, & \text{if } r\leq S\\ 0, & \text{otherwise} \end{cases},$$

where $S$ is the limiting aperture and $\sin {\theta } = r/f_{1}$. For low NA values, this can be further simplified by $\cos {\theta } = 1-\sin {\theta }^{2}/2$.

If there is no requirement to propagate the gradient before the Fourier plane (or entrance pupil), then the optimization can directly start from this position, reducing the need the compute the diffraction to that point like in the example provided in Fig. 1. In other cases, where there exists elements before the Fourier plane or when the Fourier plane is entirely skipped in the calculation, we back-propagated the scalar field to the input plane by noting that Eq. (14) can be computed in the reverse direction by using the inverse matrix $\mathcal {M}_{j}^{-1}$.

Incoherent image formation

In this section we show a simple optimization of the full image model with spatially variant PSFs (to show the capabilities of the optical approach without the learning stage) that was used to simulate the results in Section 4. We chose a similar optical system and optimized two thin phase elements $\left (\Phi _{1},\Phi _{2}\right )$, placed 1 $cm$ after the first lens and 1.5 $cm$ before the second lens. In Fig. 5 we show a simple optimization case where the input is a 3X3 point grid. The PSFs are sampled on the grid and a phase retrieval optimization routine (minimizing $\mathcal {L}_{\text {A}}$ is ran such that the final image ($I_{mod}$ ) will resemble $I_{A}$ which is an image of the letter "A") , i.e.:

(17)$$\mathcal{L}_{\text{A}}\left(\Phi_{1},\Phi_{2}\right) = \frac{1}{H\cdot W} \sum_{n=1}^{H}\sum_{m=1}^{W}{\|I_{mod}\left(n,m\right) - I_{A}\left(n,m\right)\|_2^{2}}.$$

where $I_{mod}\left (n,m\right )$ is the resulting diffraction image.

Fig. 5. Simulation of the image formation model used in the end-to-end design. A grid of 9 points is used as the object, passing through the optical system with 2 phase masks (L denotes lens, $\Phi$ denotes phase mask and A denotes the aperture). the spatially variant PSF is calculated on those points and a cost is calculated between the obtained image and a blurred letter A.

Download Full Size | PDF

Comparison with other propagation methods

The key advantage of using the Collins integral is that it enables skipping entire lens stacks to calculate the diffraction between thin-planes of interest - all it requires is the relatively simple-to-obtain ABCD matrix, while most other methods require expensive computations between each element. To further justify using Collins-integral over other common methods, we examine several numerical issues. First, in principle, Fresnel diffraction can be evaluated as a convolution integral in a single step. However, in cascaded systems, this imposes two main challenges: (1) the sampling of the output is fixed to $\delta _{2} = \frac {\lambda \cdot z}{N\cdot \delta _{1}}$, requiring different zero padding for a set of wavelengths and interpolations to match given restrictions (e.g. camera pixel size or desired sampling in a Diffractive Optical Element (DOE)), (2) correct sampling of the chirp function inside the integral imposes a limit on the maximal propagation distance $z\geqslant \hat {z} = \frac {N \cdot \delta _{1}^{2}}{\lambda }$; at the same time, in a cascaded scenario (where a correct complex field is important, and not just the amplitude), the outer chirp can also induce aliasing. This creates a severely limiting situation where the calculation is correct only for $z = \hat {z}$.

For a simple example we can look at a system in Fig. 5. In the scaled Collins formulation, we have chosen to sample the grid with 512 points and using only three propagation operators. In the context of free space propagation and multiplying with the phase of a lens, this simple system would require at least five propagation operators and considerably inflate the memory in the back-propagation (requiring larger GPUs for optimizing such systems). Using scaled FFT for Fresnel free-space propagation operators increases the computation time from 0.5 s for an iteration to 5 s, and increasing the GPU memory from six FFT operations of a grid of 512 points to ten FFT operation with a 2048 grid size. Moreover, using backpropagation with an unscaled Fresnel propagation (i.e. eq. Eq. (1)) required a grid size of 4096, increasing the computation time to 45 s and further inflating the memory. This is a very simple example which requires only three steps and still worthwhile - saving 1 order of magnitude in time, and as the intermidiate calculations are saved for backpropagation, a factor of 25 in memory. In larger systems, such as the lithographic lens in [34], the improvement is much more substantial.

Other existing approaches for calculating Fresnel propagation involve a two-step calculation. Solving Fresnel propagation as a convolution integral can be done by sampling the transfer function in Fourier space (usually for short propagation distances) or in real space (usually for long propagation distances). One way to mitigate the sampling constraints is by doing two single-step propagations [62,63]. All those methods and their variations have different sampling requirements and often requiring drastic zero-padding to reduce aliasing [41]. When simulating free space propagation these methods can be viable; However, the Collins approach allows for fewer propagation calculations and choosing planes with minimal field supports to reduce the sampling constraints.

Appendix 2: Neural net and training details

Net architecture

The basic architecture, presented in Fig. 6, of our neural network is similar to the ResNet [64] network architecture. First, we apply a conv. layer with kernel size of 7x7 pixels to capture features in different sizes and to rapidly increase the receptive field of the next layers. Then we concatenate 7 residual blocks of increasing sizes. Each residual block consists of a skip connection between the input and an additional path through 3 conv. layers (kernel size=3 and padding=1 for all conv. layers) separated by leaky ReLU and batch normalization layers. The final residual block output is passed through two more conv. layers and a log softmax layer that outputs the log of the predicted distributions of the classes in each pixel.

Fig. 6. Neural network architecture. (a) The first conv. layer has relatively large kernel size to capture features in different scales. Then the data goes through 7 residual blocks. The output of the residual blocks is passed through few more conv. layers and a log-softmax layer to generate the class probabilities per pixel. (b) the architecture of a single residual block. The in channels, mid channels and out channels determine the input and output channels of each conv. layer in the residual block. The conv. layers are separated by batch normalization and leaky ReLU layers.

Download Full Size | PDF

Dataset details

The training set consists of 7000 images and the validation set consists of 2000 images, both chosen randomly from a subset of the BDD100K database. For the focal loss term we have chosen $\gamma$ to be equal to 2 and $\alpha _t$ was chosen according to the class frequencies in the training set. We used Adam optimizer with reduce on plateau mechanism (patience=3 and decrease factor=0.1). We have trained the neural network on two Titan RTX GPUs for 48 hours. The network was tested on 1000 images (inference of a single image takes $\approx$ 10 ms) and the reported accuracy was measured by the number of correctly classified pixels out of the total number of pixels per image.

Results with ResNet-9

To evaluate the contribution of the optical block, we reduced the size of the deep learning architecture from ResNet-20 ( 8 million parameters) to ResNet-9 ( 3.7 million parameters) with the monochromatic camera model. The results and reconstruction quality were degraded, although still of good quality: without PSF engineering (i.e. standard PSF) the score reduced from 53.5% to 47%, with a mask in Fourier plane the score reduced from 54.7% to 49.3%, and with the spatially variant PSF the score reduced from 57.9% to 47.4%. From the results, we conclude that network size is important, and that the optical block was more substantial in the smaller architecture than in the larger architecture. Also, interestingly, the smaller sized net was unable to successfully utilize shift-variant PSF engineering.

Loss function

Our loss function consists of three terms:

(18)$$\mathcal{L}_{\text{net}} = \mathcal{L}_{\text{foc}} + \alpha_{TV}\cdot\mathcal{L}_{\text{TV}} + \alpha_{PSF}\mathcal{L}_{\text{PSF}},$$

where $\mathcal {L}_{\text {foc}}$ is a focal loss [58] defined as:

(19)$$\mathcal{L}_{\text{foc}} ={-}\alpha_t\cdot(1-p_t)^{\gamma} \cdot log(p_t),$$

where $p_t$ is the predicted probability of the pixel to have the class t, $\gamma$ reduces the loss for well classified classes and $\alpha _t$ controls the weight of each class. The focal loss is similar to the Cross Entropy loss, while trying to cope with the problem of class imbalance in the database.

$\mathcal {L}_{\text {TV}}$ is a Total Variation loss on the network output and $\mathcal {L}_{\text {PSF}}$ is a regularization on the PSF defined as:

(20)$$\mathcal{L}_{\text{PSF}} = \sum_{l=1}^{N_c}\sum_{n=R}^{H}\sum_{m=R}^{W} \text{PSF}\left(n,m;h,z=0,C_{l}\right),$$

where R is the cropping region for the PSF chosen to be 225 pixels in this case. We chose to reduce the size of the PSF to reduce high frequencies in the phase mask and reduce the memory usage.

Appendix 3: Additional results

3D phase retrieval

Additionally to performing phase retrieval for a single axial plane in Fig. 2, we implemented a similar cost function for two axial planes (with eight letters). The result can be seen in Fig. 7. The optimization converged on a less visually satisfying results due to having more constraints to optimize. Nevertheless, this shows the validity of our method to simulate and design spatially-variant PSFs in 3D.

Fig. 7. Experimental out-of-aperture phase retrieval in 3D. (a) Four lateral positions (37.5 $\mu m$ in each direction from the optical axis) in two defocus planes ($\pm$3 $\mu m$). Each position was assigned a letter in the phase retrieval algorithm resulting in the phase mask plotted. The experimental image was acquired by shifting the microscope stage with a sparse bead sample. (b) Experimental result of optimizing a single position (for visual comparison.)

Download Full Size | PDF

Numerical example: multi-focus RGB PSF design

In this section we consider the case of optimizing a multi-focus RGB PSF via phase retrieval, simulation presented in Fig. 8. The challenge here is to design an optical system that maps a point in 3D object space to a point in 2D image space, where the lateral position in image space depends on the axial object-space position (the same for every color).

Fig. 8. Simulation of a 4-f optical system with two thin phase plates, one placed in the back focal plane and the second a distance d away. (a) The system is divided into three parts with corresponding matrix $\mathcal {M}_{j}$. (b-top) The RGB PSF of the unmodulated system. (b-center) The solution of a phase retrieval task [49] with the aim to create a multi-focus RGB PSF, where only the mask in the back focal plane is optimized. (b-bottom) Solution to the same task but two elements are jointly optimized to correct the color drift. (c) Induced off-axis aberration which is caused by the second DOE in (b-bottom).

Download Full Size | PDF

In [65], a chromatic correction element was placed in conjunction with a binary phase mask to address the issue of large spectral width exhibited by fluorescent molecules. In this section, we show that our model can jointly optimize non-binary phase modulation and chromatic corrections on-axis. A simple 4-f system is presented in Fig. 8(a). Three on-axis RGB (discrete wavelengths of 460,550,640 $nm$) point sources are considered (in the figure, the two plotted points indicate change in axial position of a multicolored source). The first lens has a focal length of 5 cm, the second lens has a focal length of 10 cm and the aperture diameter is 2.5 mm, which is a typical for a Diffractive Optical Element (DOE). The unmodulated RGB PSF of this system is plotted in Fig. 8(b). The considered DOEs are simulated with maximal height of $h_{max} = 1.5$ $\mu m$, having the phase delay function (when placed in air)

(21)$$\Phi\left(x,y\right) = k \cdot \left(n\left(\lambda\right)-1\right)\cdot h\left(x,y\right),$$

where $n\left (\lambda \right )$ is the refractive index of the DOE material (Fused Quartz in this case).

A first DOE $h_{1}$ is placed at the aperture (Fourier plane) while an optional second DOE $h_{2}$ is placed a distance $d = 7.5$ $cm$ after the aperture. We used Eq. (14) to calculate the diffraction in all sections in Fig. 8(a). To achieve a multi-focus PSF, we minimize the cost function:

(22)$$\mathcal{L}_{\text{PR}}\left(h_{1},h_{2}\right) = \frac{1}{N_p\cdot N_z\cdot N_c}\sum_{n=1}^{N_p}\sum_{m=1}^{N_z}\sum_{l=1}^{N_{c}} \|\text{PSF}\left(n;h_{1},h_{2},z_{m},C_{l}\right) - \text{PSF}\left(n;0,0,0,C_{l}\right)\|_2^{2} ,$$

where $n= 1,2\cdots N_p$ are the pixels in the cropped FOV, $\text {PSF}\left (n;h_{1},h_{2},z_{m},C_{l}\right ) = \| U_{out}(n;h_{1},h_{2},z_{m},C_{l})\|^{2}$ is the PSF at the image plane for a point source with defocus $z_{m}$ and wavelength $C_{l}$. $\text {PSF}(n;h_{1}=0,h_{2}=0,z=0,C_{l})$ is the diffraction limited PSF (in-focus) per wavelength $C_{l}$. First, we optimized the Fourier plane alone, resulting in the phase mask and corresponding PSFs in Fig. 8(c), which manages to shift and refocus the input field, however a chromatic shift between the colors is noticeable. We note that theoretically increasing $h_{max}$ can achieve any chromatic correction in a single element; however, DOEs with large height differences are very challenging to fabricate and exhibit large errors. Next, we jointly optimized both elements, resulting in the DOEs and corresponding PSF in Fig. 8(d). The entire optimization takes two minutes (on a laptop with Intel i7-8750H and NVIDIA 8Gb 2070RTX GPU). Optimizing two planes enabled the correction of the color shift; however, we only considered on-axis PSF optimization, which results in a significant off-axis aberration, shown in Fig. 8(e).

Appendix 4: Sampling requirements

A main consideration in this work is the computational burden which comes from FFT computations of large matrices. As we consider spatially variant PSFs, our matrices have five dimensions $\left (N_{xy},N_{z},N_{c},H,W \right )$ being the lateral position, axial position, wavelength and image height and width, respectively. The size of such matrices can be limited by memory constraints, and thus we chose our sampling grids to minimize $\left (H,W\right )$.

As chirp functions are not band limited, the main constraint comes from avoiding aliasing inside the limiting apertures at the input and output planes. We note that this consideration needs to be very carefully examined for each sub-system separately, as aliasing can occur if the input field adds high enough frequency at the edges of the apertures, i.e. reducing the computational complexity comes at the cost that each diffraction calculation will be correct only inside the apertures and w.r.t input fields with a pre-defined bandwidth ( limits the axial range and possible aberrations). For example, in the simulation provided in Fig. 8 we used $(H,W) = (512,512)$ sampling points, but a larger defocus in the input plane or using high frequency (e.g. gratings) components could necessitate a drastic increase in $\left (H,W\right )$.

The derivation of the numerical sampling requirements is based on previous research [41,62]. We denote $P_{in}$ and $P_{out}$ as the aperture sizes of the input and output planes sampled with frequencies $\delta _{in}$ and $\delta _{out}$, respectively. We also denote $\mathcal {M}_{j}$ as the ray optics ABCD matrix of the given section. The first two conditions are general for Fourier optics and they ensure that (from geometrical considerations in a paraxial regime) the input plane sampling is fine enough to sample the spatial bandwidth and that rays will not wrap around the grid:

(23)$$\delta_{out} \leqslant -\frac{P_{out}}{P_{in}}\cdot \delta_{in}+\frac{\lambda \cdot B_{j}}{P_{in}},$$

(24)$$N \geqslant \frac{P_{in}}{2\delta_{in}} + \frac{P_{out}}{2\delta_{out}}+\frac{\lambda \cdot B_{j}}{2\delta_{in}\cdot\delta_{out}},$$

where $N$ is the number of grid points (for both dimensions). The next conditions minimize the aliasing of the product of the input field with $Ch_{in} = \exp \left (i\frac {k}{2\cdot B_{j}}\cdot \left (A_{j}-m\right ) \cdot \left |r_{in}\right |^{2}\right )$. For large apertures, we assume that the highest frequency will be at the edge of the aperture at $r=P_{in}/2$ due to the chirp function. However, it is important to note here that in our work we considered defocus but also PSF-engineered wavefronts, so care needs to be taken in each case to avoid aliasing if higher frequencies occur in the field (the conditions can be simply evaluated numerically for a simulated wavefront). The Nyquist condition for the local frequency at the edge of the input aperture for a input with wavefront curvature $\frac {k}{2}\cdot S_{U}$:

(25)$$\frac{1}{2\pi}\frac{\partial }{\partial r} \left[\frac{k}{2} \left|S_{U} + \frac{A_{j}-m}{B_{j}} \right| r_{in}^{2} \right]_{r_{in}=\frac{P_{in}}{2}}\leqslant \frac{1}{2\delta_{in}}.$$

This equation can be rearranged to two conditions:

(26)$$\delta_{out} \leqslant \left(A_{j}+B_{j}\cdot S_{U}\right)\cdot \delta_{in}+\frac{\lambda\cdot B_{j}}{P_{in}}$$

(27)$$\delta_{out} \geqslant \left(A_{j}+B_{j}\cdot S_{U}\right)\cdot \delta_{in}-\frac{\lambda\cdot B_{j}}{P_{in}}.$$

The final sampling condition comes from correctly sampling the transfer function $Ch_{tf} = \exp \left (-i\frac {\pi \lambda B_{j}}{m}\cdot f^{2}\right )$, where the frequency sampling is $f=\frac {1}{N\cdot \delta _{in}}$. The Nyquist sampling condition at the edge of the frequency grid (at $f=\frac {1}{2\delta _{in}}$) is given by

(28)$$\frac{1}{2\pi}\frac{\partial }{\partial f} \left[\frac{\pi\cdot \lambda\cdot B_{j}}{m} f^{2} \right]_{f=\frac{1}{2\delta_{in}}}\leqslant \frac{1}{2f} = \frac{N \cdot \delta_{in}}{2}.$$

This can be simplified to

(29)$$N \geqslant \frac{\lambda \cdot B_{j}}{\delta_{in}\cdot\delta_{out}}.$$

The condition for the final output plane chirp can be calculated in a similar manner to Eq. (25). In most of the discussed systems, $D_{j}$ is a very small number, making this condition extremely lenient. For a simple example, if we look at the experimental system used in Fig. 2 and 3, we plot the possible sampling condition in Fig. 9. The left plot give the possible sampling range in highlighted blue for the two parts of the system (from point-source to phase mask and from phase mask to camera). The actual choice of the sampling should consider also the two conditions on N and the desired sampling in the phase mask plane (depending on the desired modulation and used device). We note that the experimental system is relatively easy in terms of choosing the sampling conditions compared to the theoretical situation presented in Fig. 1, where conditions for three colors are considered.

Fig. 9. Example of the sampling conditions for the experimental system in Fig. 2 (left) Sampling conditions for the first part, (right) sampling conditions for the right part. Blue highlights the viable range. For simplicity, all conditions are calculated for an in-focus PSF, to evaluate the entire system, each focal position needs to be assessed.

Download Full Size | PDF

Appendix 5: Experimental implementation

This section provides with details about our sample preparation, optical system (2) used and the acquisition of ground truth for the 3D experiment.

Optical system

The imaging system consists of a Nikon Eclipse-Ti2 inverted fluorescence microscope with a 40X/0.75 NA Nikon objective. An achromatic doublet lens (f=10 cm) images the back focal plane onto an iris. A linear polarizer followed by a LC-SLM (Pluto-VIS130, Holoeye) is placed 10 cm (9.6 cm after calibration- see section 5) after the iris. Another achromatic doublet lens (f=20 cm) is placed to form a 4f system. Finally, a sCMOS cameras (PixeLINK PL-D7512MU-T) with pixel size $3.45$ $\mu m$ is placed in the image plane.

Sample preparations

For the beads on a coverslip samples we used a sparse sample consisting of a glass coverslip (170 $\mu m$, Fisher Scientific) with 200 nm fluorescent beads (Invitrogen, 0.2 $\mu l$ Crimson 625/645 diluted 1:500) adhered to the surface with 1% Polyvinyl alcohol (PVA). For bead immersed in Gel ( Fig. 3) we used Mini-PROTEAN gel casting stand (BioRad) with 0.75 mm short glass to prepare acrylamide gel with beads by vertical solidification to allow beads to sparsely spread out in 3 dimensions. Gel solution with beads was prepared by gently mixing 1.2 $ml$ double distilled water, 750 $\mu l$ 40% acrylamide/bis-acrylamide solution (Sigma, A7802), 80 $\mu l$ 50% glycerol, 3 $\mu l$ Fluorospheres carboxylate bead solution (Crimson 625/645, Invitrogen), 1.5 $\mu l$ TEMED, and 10 $\mu l$ 10% ammonium per sulphate. The gel solution was immediately pipetted into the gel casting stand and gently topped with about 1 $ml$ of double distilled water to level the gel. Once solidified the gel was kept in water in the dark at 4oC.

Beads in gel ground truth estimation

In the refocusing experiment (3), the experimental ground truth 3D positions were approximated via a focal scan (10). The scan was performed by moving the objective lens 60 steps of 150 nm each. First, we used ThunderSTORM [66] to detect and localize (in 2D) the emitters from the max projection Fig. 10(a). Next, the focus position per emitter was estimated by fitting a second order polynomial to the mean intensity across focal slices Fig. 10(c). The mean intensity was estimated by averaging a small ROI of $7\times 7$ pixels around each detected emitter. Finally, the axial position was obtained by correcting the refractive index mismatch between the Gel and air.

Fig. 10. Experimental ground truth approximation. (a) The microscope objective is used to create a focal sweep over the axial range of interest (b) Max projection of the captured z-stack (c) Depth fitting example of the mean intensity to determine the axial position of each emitter.(d) Emitter $\#4$ position where 2 emitters at different axial positions happened to have a very similar lateral position

Download Full Size | PDF

Camera to phase mask registration

The main spatial calibration that we performed was to calibrate the LC-SLM axial position and the affine transform between the theoretical and experimental ray tracing of the chief ray from the object plane to the LC-SLM plane. The LC-SLM we used is reflective, thus an angle is introduced which can induce rotations and shearing between the planes. We chose to apply (on the LC-SLM) a phase mask termed the Tetrapod [13]. Many optional masks can be used here, but we chose the Tetrapod because a lateral misalignment between the phase mask and input beam can be easily spotted in the z-stack wobble and aberration of the PSF. A sparse bead sample with a single bead in the FOV was used to create a single point source in the image plane. After finding the optical axis on the camera plane, we shifted the mask by 45 LC-SLM pixels on a square grid with 9 points and found the corresponding shifts in the image plane. Finally, an affine transformation was computed between the shifted phase mask grid to the theoretically ray traced positions from the camera plane to the SLM plane. This transformation allowed us to engineer the desired PSF in different FOV positions but also estimate a corrected axial position of the LC-SLM. This was achieved by using a simple ray-tracing for a 4-f system (as plotted in Fig. 2) with an LC-SLM placed a position d behind the aperture, resulting in the chief ray position $\left (x_{in},y_{in}\right )$ in the intermediate image plane:

(30)$$\begin{aligned}&x_{SLM} = x_{in}\cdot\left(1-\frac{f-d}{f}\right),\\&y_{SLM} = y_{in}\cdot\left(1-\frac{f-d}{f}\right). \end{aligned}$$

Funding

Horizon 2020 Framework Programme (802567); Israel Science Foundation (450/18).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. Born and E. Wolf, Principles of optics: electromagnetic theory of propagation, interference and diffraction of light (Elsevier, 2013).

2. Zemax, “Zemax opticstudio,” https://www.zemax.com.

3. I. Synopsys, “Code v,” https://www.synopsys.com/.

4. J. P. Rolland, M. A. Davies, T. J. Suleski, C. Evans, A. Bauer, J. C. Lambropoulos, and K. Falaggis, “Freeform optics for imaging,” Optica 8(2), 161–176 (2021). [CrossRef]

5. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]

6. R. P. Millane, “Phase retrieval in crystallography and optics,” J. Opt. Soc. Am. A 7(3), 394–411 (1990). [CrossRef]

7. Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: a contemporary overview,” IEEE Signal Process. Mag. 32(3), 87–109 (2015). [CrossRef]

8. L.-H. Yeh, J. Dong, J. Zhong, L. Tian, M. Chen, G. Tang, M. Soltanolkotabi, and L. Waller, “Experimental robustness of fourier ptychography phase retrieval algorithms,” Opt. Express 23(26), 33214–33240 (2015). [CrossRef]

9. N. Ji, “Adaptive optical fluorescence microscopy,” Nat. Methods 14(4), 374–380 (2017). [CrossRef]

10. K. M. Hampson, R. Turcotte, D. T. Miller, K. Kurokawa, J. R. Males, N. Ji, and M. J. Booth, “Adaptive optics for high-resolution imaging,” Nat. Rev. Methods Primers 1(1), 68 (2021). [CrossRef]

11. B. Huang, W. Wang, M. Bates, and X. Zhuang, “Three-dimensional super-resolution imaging by stochastic optical reconstruction microscopy,” Science 319(5864), 810–813 (2008). [CrossRef]

12. A. Jesacher, S. Bernet, and M. Ritsch-Marte, “Colored point spread function engineering for parallel confocal microscopy,” Opt. Express 24(24), 27395–27402 (2016). [CrossRef]

13. Y. Shechtman, S. J. Sahl, A. S. Backer, and W. Moerner, “Optimal point spread function design for 3d imaging,” Phys. Rev. Lett. 113(13), 133902 (2014). [CrossRef]

14. S. Elmalem, R. Giryes, and E. Marom, “Motion deblurring using spatiotemporal phase aperture coding,” Optica 7(10), 1332–1340 (2020). [CrossRef]

15. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

16. J. W. Goodman, Introduction to Fourier optics (Roberts and Company Publishers, 2005).

17. T. Klinghoffer, S. Somasundaram, K. Tiwary, and R. Raskar, “Physics vs. learned priors: Rethinking camera and algorithm design for task-specific imaging,” arXiv preprint arXiv:2204.09871 (2022).

18. M. Alghamdi, Q. Fu, A. Thabet, and W. Heidrich, “Transfer deep learning for reconfigurable snapshot hdr imaging using coded masks,” in Computer Graphics Forum, vol. 40 (Wiley Online Library, 2021), pp. 90–103.

19. Q. Sun, E. Tseng, Q. Fu, W. Heidrich, and F. Heide, “Learning rank-1 diffractive optics for single-shot high dynamic range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 1386–1396.

20. S.-H. Baek, H. Ikoma, D. S. Jeon, Y. Li, W. Heidrich, G. Wetzstein, and M. H. Kim, “Single-shot hyperspectral-depth imaging with learned diffractive optics,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), pp. 2651–2660.

21. H. Arguello, S. Pinilla, Y. Peng, H. Ikoma, J. Bacca, and G. Wetzstein, “Shift-variant color-coded diffractive spectral imaging system,” Optica 8(11), 1424–1434 (2021). [CrossRef]

22. S. Tan, Y. Wu, S.-I. Yu, and A. Veeraraghavan, “Codedstereo: Learned phase masks for large depth-of-field stereo,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 7170–7179.

23. E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, and Y. Shechtman, “Deepstorm3d: dense 3d localization microscopy and psf design by deep learning,” Nat. Methods 17(7), 734–740 (2020). [CrossRef]

24. Y. Wu, V. Boominathan, X. Zhao, J. T. Robinson, H. Kawasaki, A. Sankaranarayanan, and A. Veeraraghavan, “Freecam3d: Snapshot structured light 3d with freely-moving cameras,” in European Conference on Computer Vision, (Springer, 2020), pp. 309–325.

25. Z. Li, Q. Hou, Z. Wang, F. Tan, J. Liu, and W. Zhang, “End-to-end learned single lens design using fast differentiable ray tracing,” Opt. Lett. 46(21), 5453–5456 (2021). [CrossRef]

26. E. Tseng, A. Mosleh, F. Mannan, K. St-Arnaud, A. Sharma, Y. Peng, A. Braun, D. Nowrouzezahrai, J.-F. Lalonde, and F. Heide, “Differentiable compound optics and processing pipeline optimization for end-to-end camera design,” ACM Trans. Graph. 40(2), 1–19 (2021). [CrossRef]

27. Q. Sun, C. Wang, F. Qiang, D. Xiong, and H. Wolfgang, “End-to-end complex lens design with differentiable ray tracing,” ACM Trans. Graph 40(4), 1–13 (2021). [CrossRef]

28. J. Arasa, S. Bosch, J. Ferre-Borrull, and C. Pizarro, “Comparison of flux-tracing-based and diffraction-based strategies for optical system evaluation, in Optical Design and Engineering, vol. 5249 (International Society for Optics and Photonics, 2004), pp. 34–41.

29. S. A. Collins, “Lens-system diffraction integral written in terms of matrix optics,” J. Opt. Soc. Am. 60(9), 1168–1177 (1970). [CrossRef]

30. J. Li and C. Li, “Algorithm study of collins formula and inverse collins formula,” Appl. Opt. 47(4), A97–A102 (2008). [CrossRef]

31. D. Mendlovic and H. M. Ozaktas, “Fractional fourier transforms and their optical implementation: I,” J. Opt. Soc. Am. A 10(9), 1875–1881 (1993). [CrossRef]

32. X. Su, R. Tao, and X. Kang, “Analysis and comparison of discrete fractional fourier transforms,” Signal Processing 160, 284–298 (2019). [CrossRef]

33. H. Gross, “Cascaded diffraction in optical systems. part i: simulation model,” J. Opt. Soc. Am. A 37(2), 240–249 (2020). [CrossRef]

34. H. Gross, “Cascaded diffraction in optical systems. part ii: example calculations,” J. Opt. Soc. Am. A 37(2), 250–256 (2020). [CrossRef]

35. A. Shajkofci and M. Liebling, “Spatially-variant cnn-based point spread function estimation for blind deconvolution and depth estimation in optical microscopy,” IEEE Trans. on Image Process. 29, 5848–5861 (2020). [CrossRef]

36. V. Debarnot and P. Weiss, “Deepblur: Blind identification of space variant psf,” in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), (IEEE, 2021), pp. 1544–1547.

37. K. Yanny, N. Antipa, W. Liberti, S. Dehaeck, K. Monakhova, F. L. Liu, K. Shen, R. Ng, and L. Waller, “Miniscope3d: optimized single-shot miniature 3d fluorescence microscopy,” Light: Sci. Appl. 9(1), 171 (2020). [CrossRef]

38. E. Tseng, S. Colburn, J. Whitehead, L. Huang, S.-H. Baek, A. Majumdar, and F. Heide, “Neural nano-optics for high-quality thin lens imaging,” Nat. Commun. 12(1), 6493 (2021). [CrossRef]

39. J. Mertz, H. Paudel, and T. G. Bifano, “Field of view advantage of conjugate adaptive optics in microscopy applications,” Appl. Opt. 54(11), 3498–3506 (2015). [CrossRef]

40. A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance deep learning library,” Adv. neural information processing systems 32, 8026–8037 (2019).

41. W. Zhang, H. Zhang, C. J. Sheppard, and G. Jin, “Analysis of numerical diffraction calculation methods: from the perspective of phase space optics and the sampling theorem,” J. Opt. Soc. Am. A 37(11), 1748–1766 (2020). [CrossRef]

42. G. Tyler and D. Fried, “A wave optics propagation algorithm. optical sciences company,” Tech. rep., TR-451 (1982).

43. W. Brouwer, “Matrix methods in optical instrument design,” Matrix Methods in Optical Instrument Design (1964).

44. J. D. Schmidt, Numerical simulation of optical wave propagation: With examples in matlab (SPIE, 2010).

45. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS Autodiff Workshop (2017).

46. R. Remmert, Theory of complex functions, vol. 122 (Springer Science & Business Media, 1991).

47. B. Javidi, A. Carnicer, A. Anand, et al., “Roadmap on digital holography,” Opt. Express 29(22), 35078–35118 (2021). [CrossRef]

48. D. Maluenda, I. Juvells, R. Martínez-Herrero, and A. Carnicer, “Reconfigurable beams with arbitrary polarization and shape distributions at a given plane,” Opt. Express 21(5), 5432–5439 (2013). [CrossRef]

49. B. Ferdman, E. Nehme, L. E. Weiss, R. Orange, O. Alalouf, and Y. Shechtman, “Vipr: Vectorial implementation of phase retrieval for fast and accurate microscopic pixel-wise pupil estimation,” Opt. Express 28(7), 10179–10198 (2020). [CrossRef]

50. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

51. S. Moser, M. Ritsch-Marte, and G. Thalhammer, “Model-based compensation of pixel crosstalk in liquid crystal spatial light modulators,” Opt. Express 27(18), 25046–25063 (2019). [CrossRef]

52. Y. Zhang, B. Xiong, Y. Zhang, Z. Lu, J. Wu, and Q. Dai, “Dilfm: an artifact-suppressed and noise-robust light-field microscopy through dictionary learning,” Light: Sci. Appl. 10(1), 152 (2021). [CrossRef]

53. E. Nehme, B. Ferdman, L. E. Weiss, T. Naor, D. Freedman, T. Michaeli, and Y. Shechtman, “Learning optimal wavefront shaping for multi-channel imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2179–2192 (2021). [CrossRef]

54. R. V. Shack and K. Thompson, “Influence of alignment errors of a telescope system on its aberration field,” in Optical Alignment I, vol. 251 (SPIE, 1980), pp. 146–153.

55. F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2020), pp. 2636–2645.

56. M. Hirsch, S. Sra, B. Schölkopf, and S. Harmeling, “Efficient filter flow for space-variant multiframe blind deconvolution,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 2010), pp. 607–614.

57. L. Denis, E. Thiébaut, F. Soulez, J.-M. Becker, and R. Mourya, “Fast approximations of shift-variant blur,” Int. J Comput. Vis. 115(3), 253–278 (2015). [CrossRef]

58. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, (2017), pp. 2980–2988.

59. R. Orange-Kedem, E. Nehme, L. E. Weiss, B. Ferdman, O. Alalouf, N. Opatovski, and Y. Shechtman, “3d printable diffractive optical elements by liquid immersion,” Nat. Commun. 12(1), 3067 (2021). [CrossRef]

60. Q. Fu, H. Amata, and W. Heidrich, “Etch-free additive lithographic fabrication methods for reflective and transmissive micro-optics,” Opt. Express 29(22), 36886–36899 (2021). [CrossRef]

61. C. A. Metzler and G. Wetzstein, “Deep s 3 pr: Simultaneous source separation and phase retrieval using deep generative models,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2021), pp. 1370–1374.

62. S. Coy, “Choosing mesh spacings and mesh dimensions for wave optics simulation, in Advanced Wavefront Control: Methods, Devices, and Applications III, vol. 5894 (International Society for Optics and Photonics, 2005), p. 589405.

63. F. Zhang, I. Yamaguchi, and L. Yaroslavsky, “Algorithm for reconstruction of digital holograms with adjustable magnification,” Opt. Lett. 29(14), 1668–1670 (2004). [CrossRef]

64. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.

65. S. Abrahamsson, J. Chen, B. Hajj, S. Stallinga, A. Y. Katsov, J. Wisniewski, G. Mizuguchi, P. Soule, F. Mueller, C. D. Darzacq, X. Darzacq, C. Wu, C. I. Bargmann, D. A. Agard, M. Dahan, and M. G. L. Gustafsson, “Fast multicolor 3d imaging using aberration-corrected multifocus microscopy,” Nat. Methods 10(1), 60–63 (2013). [CrossRef]

66. M. Ovesnỳ, P. Křížek, J. Borkovec, Z. Švindrych, and G. M. Hagen, “Thunderstorm: a comprehensive imagej plug-in for palm and storm data analysis and super-resolution imaging,” Bioinformatics 30(16), 2389–2390 (2014). [CrossRef]

Diffractive optical system design by cascaded propagation

Abstract

1. Introduction

2. Cascaded diffraction model

2.1 Fresnel propagation in free space

2.2 Fresnel propagation in a complex optical system

3. Experimental demonstrations

3.1 Out-of-aperture phase retrieval

3.2 Analog refocusing

3.3 Local aberration correction

4. End-to-end design

5. Discussion

Appendix 1: Optical simulation model

Propagation model

Modelling the input point source

Incoherent image formation

Comparison with other propagation methods

Appendix 2: Neural net and training details

Net architecture

Dataset details

Results with ResNet-9

Loss function

Appendix 3: Additional results

3D phase retrieval

Numerical example: multi-focus RGB PSF design

Appendix 4: Sampling requirements

Appendix 5: Experimental implementation

Optical system

Sample preparations

Beads in gel ground truth estimation

Camera to phase mask registration

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Equations (31)

Optics Express