Fast wavefront sensing method based on diffraction basis vectors for tightly focused optical systems

Yuejia Li; Huiwen Liu; Yingzhe Yang; Jian Bai

doi:10.1364/OE.504059

1. Introduction

Phase retrieval (PR) [1] wavefront sensing technology can accurately deduce wavefront complex amplitude information by analyzing the energy distribution of defocused diffraction patterns. PR technology can achieve high accuracy and large dynamic range with a simple device, making it widely used in various fields such as microscopy, X-ray phase contrast imaging, and optical measurement. Traditional PR theory primarily caters to low numerical aperture (NA) optical systems that comply with the paraxial approximation condition [2], where the diffraction field can be directly described by the Fourier transform [3], inadvertently overlooking polarization effects. However, with the fixed sampling relationship, the applicable range of NA is restricted by system parameters such as pixel size and wavelength, thereby hindering the advancement of wavefront sensing in high-NA systems. Recently, due to the significant applications in the fields of semiconductor lithography [4], microscope imaging [5] and super-resolution imaging [6], PR theories suitable for tightly focused systems [7] have gained considerable attention. For high-NA optical systems, apparent diffraction sidelobes will appear as the longitudinal polarization component of the diffraction field increases [8], necessitating vector correction for diffraction field calculations. Currently, the widely accepted vector diffraction theories include vector Rayleigh-Sommerfeld (VRS) [9] and extended Nijboer-Zernike (ENZ) algorithms [10]. Nonetheless, the VRS theory underestimates the diffraction effect and vectoriality of light waves, resulting in smaller calculated spot sizes in high NA optical systems. While the ENZ theory is valid for systems with NA greater than 0.6, its use of the Bessel function for calculating the diffraction field induces an extremely high time complexity, presenting challenges for applications with complex structures and large-scale data.

Establishing the diffraction model serves as the cornerstone of the PR theory, and the optimization algorithm crucially influences the retrieval performance. Classic iterative optimization algorithms, such as Gerchberg–Saxton [11] and hybrid input-output [12] algorithms, optimize the complex amplitude wavefront by alternately implementing forward and inverse diffraction propagation between input and detection patterns and applying constraints accordingly. With the development of computer science, nonlinear numerical optimization algorithms [13] are gradually used to achieve wavefront reconstruction by finding the optimal solution of complex least squares equation. Here, the calculation of the gradient related to the search direction is a crucial issue. Fienup et al. proposed a method for calculating first-order gradients based on diffraction calculation, capable of achieving high-resolution wavefront detection [14]. However, as the diffraction operator is executed in each iteration, the optimization efficiency is seriously diminished. In addition, for first-order optimization algorithms such as gradient descent methods [15], search routes may become jagged resulting in a greater number of iterations. Second-order optimization algorithms such as Newton methods [16,17] can effectively reduce the number of iterations, but given the size of the cubic of the number of pixels, the Hessian matrix is impractical to compute and store directly.

In this paper, we present a fast wavefront sensing algorithm suitable for high-NA optical systems. The proposed method involves an efficient propagation model based on Debye diffraction theory [18], employing transmission matric to trace polarized beam imaging. In particular, we perform the chirp Z transform (CZT) algorithm to reduce the computational complexity and eliminate the lateral error [19,20]. The numerically orthogonal polynomials are input to the established vector model to generate diffraction basis vectors, and the complex amplitude coefficients are modal optimized in the frequency domain. To improve optimization efficiency, we employ a modified Newton-gradient (MNG) numerical optimization algorithm, calculating the Jacobian and Hessian matrices using ${\mathbb C}{\mathbb R}$ calculus theory [21] to define the search direction. The iterative search continues cyclically until an optimal solution is achieved. To validate the effectiveness of our proposed approach, we conduct numerical simulations closely aligned with the physical model. Results demonstrate that our method achieves high accuracy in reconstructing wavefront in multiple polarization directions while ensuring high computational efficiency and robustness.

2. Basis vectors based on Debye diffraction

In different polarization directions, the wavefront of the entrance pupil is decomposed into numerically orthogonal polynomials, which then undergo Debye diffraction calculation via the CZT algorithm to generate diffraction basis vectors at each defocused position. Compared to widely used semi-analytic theory for generating focal, diffraction operators can effectively improve computational speed.

2.1 Vector diffraction model

Debye diffraction integration is used to achieve vector correction in non-paraxial and near-field regions, which can accurately simulate the polarization with the significant longitudinal component in high-NA optical systems. The diffraction model schematic is as shown in Fig. 1. In Eq. (1), $\lambda$ is the wavelength, z is the propagation distance, n₁ is the refractive index of the image space, θ and φ represent the polar and azimuth angles in the spherical coordinate system, respectively. When the input field is $\overrightarrow {{E_i}} $, the output field $\overrightarrow E $ can be written as:

(1)$$\overrightarrow E ({X,Y,z} )={-} \frac{{iC}}{\lambda }\int\!\!\!\int_\Omega {\left[ {{\mathbf M} \times \overrightarrow {{E_i}} \times {{\exp ({i{k_z}z} )} / {\sqrt {\cos \theta } }}} \right] \times \exp [{ - i({{k_x}X + {k_y}Y} )} ]d} {k_x}d{k_y},$$

where C is the constant factor, M denotes the polarization transformation matrix and the $\overrightarrow {{k_r}} :{({k_x},{k_y},{k_z})^T}$ is the wave vector pointed in the direction of wave propagation:

(2)$${\mathbf M} = \left[ {\begin{array}{{ccc}} {1 + (\cos \theta - 1){{\cos }^2}\varphi }&{(\cos \theta - 1)\cos \varphi \sin \varphi }&{ - \sin \theta \cos \varphi }\\ {(\cos \theta - 1)\cos \varphi \sin \varphi }&{1 + (\cos \theta - 1){{\sin }^2}\varphi }&{ - \sin \theta \sin \varphi }\\ {\sin \theta \cos \varphi }&{\sin \theta \sin \varphi }&{\cos \theta } \end{array}} \right],$$

(3)$$\overrightarrow {{k_r}} = k{n_1}{[ - \sin \theta \cos \varphi , - \sin \theta \sin \varphi ,\cos \theta ]^T},\textrm{ }k = 2\pi /\lambda .$$

Fig. 1. Vector diffraction model schematic.

Download Full Size | PDF

Equation (3) needs to be discretization for calculation, and then it can be written in the form of Fourier integral with $X = 2\pi {f_x},Y = 2\pi {f_y}$, which can be rapidly calculated using the CZT algorithm:

(4)$$\overrightarrow E ({X,Y,z} )={-} \frac{{iC}}{\lambda }{\boldsymbol{ZT}}\left[ {{\mathbf M} \times \overrightarrow {{E_i}} \times {{\exp ({i{k_z}z} )} / {\sqrt {\cos \theta } }}} \right] = {\boldsymbol{P}}\textrm{(}\overrightarrow {{E_i}} ).$$

Here, ZT indicates the execution of the CZT operator. This operator facilitates the flexible configuration of the desired bandwidth and resolution in the frequency domain, precisely matching the pixel size and quantity of the detector. The forward Debye diffraction propagation model in Eq. (4) is represented by the symbol ${\boldsymbol{P}}\textrm{(}\overrightarrow {{E_i}} )$. Importantly, in the established vector diffraction model, the restriction of the sampling relationship in the entrance pupil is effectively overcome, and precise model matching is possible, which will be analyzed in Section 2.3.

2.2 Calculation of diffraction basis vector

As shown in Fig. 2, due to the linear shift-invariant nature of incoherent imaging systems, if the polynomially decomposed wavefront passes through a linear optical system, its focal spot can be fitted by diffraction basis vectors which is calculated by serving the polynomial as the input for the aforementioned propagation model. And the basis vectors in the frequency domain and the polynomial in the spatial domain retain the same coefficients, serving as the optimization variables.

Fig. 2. The calculation model of diffraction basis vectors. The basis vectors are complex amplitude matrix columns, six components are generated at each out-of-focus position.

Download Full Size | PDF

With numerical orthogonality in the unit circle domain, Zernike polynomials are typically selected to fit the wavefront in circular symmetry optical systems. Given that the classical phase retrieval device uses plane wave for illumination, the input field lacks a longitudinal component, hence only the wavefront information in the x and y directions requires polynomial decomposition. Polynomials are represented by the symbol Z, and the wavefront information ${\boldsymbol W}:({{W_x},{W_y}} )$ can be written as:

(5)$${W_x}({\rho ,\theta } )= \sum\limits_j {{\beta _{x,j}}{{\boldsymbol Z}_j}} , \quad {W_y}({\rho ,\theta } )= \sum\limits_{n,m} {{\beta _{y,j}}{{\boldsymbol Z}_j}} ,$$

where $\boldsymbol{\beta}:{({\beta _{x,j}},{\beta _{y,j}})^T}$ is the polynomial coefficients, j is the retrieval symbol for terms of polynomials with a total of J. With time complexity of O(J), the diffraction basis vectors ${{\mathbf C}_{x,k}};{{\mathbf C}_{y,k}};{{\mathbf C}_{z,k}}$ in the three Cartesian coordinate directions can be calculated as:

(6)$$\begin{array}{c} {({{\mathbf C}_{xx,k}},{{\mathbf C}_{yx,k}},{{\mathbf C}_{zx,k}})^T} = {\boldsymbol P}\textrm{[}{({\boldsymbol Z},0,0)^T}];\quad {({{\mathbf C}_{xy,k}},{{\mathbf C}_{yy,k}},{{\mathbf C}_{zy,k}})^T} = {\boldsymbol P}\textrm{[}{(0,{\boldsymbol Z},0)^T}];\\ {{\mathbf C}_{x,k}} = [{{{\mathbf C}_{xx,k}},{{\mathbf C}_{xy,k}}} ];\quad {{\mathbf C}_{y,k}} = [{{{\mathbf C}_{yx,k}},{{\mathbf C}_{yy,k}}} ];\quad {{\mathbf C}_{z,k}} = [{{{\mathbf C}_{zx,k}},{{\mathbf C}_{zy,k}}} ], \end{array}$$

where k is the retrieval coordinate of the defocused positions. The basis vectors can be dimensionally reduced when calculating, and its size is determined by the number of detector pixels, polynomials and defocused images. The calculated light intensity value ${\boldsymbol I}_k^{cal}$ at each defocused position can be written as:

(7)$${\mathbf I}_k^{cal} = {|{{{\mathbf C}_{x,k}}{\mathbf \beta }} |^2} + {|{{{\mathbf C}_{y,k}}{\mathbf \beta }} |^2} + {|{{{\mathbf C}_{z,k}}{\mathbf \beta }} |^2}.$$

2.3 Numerical sampling analysis

In the traditional PR model based on Fresnel diffraction, the sampling grid of the entrance pupil is uniformly divided by the spatial position [22]. And the fixed sampling relationship between the sample spacing in fast Fourier transform (FFT) is:

(8)$${d_\xi } = \lambda z/{N_F}{d_x},$$

where ${d_x}$ and ${d_\xi }$ are the sampling interval of the object wavefront and the pixel size of the detector respectively. And N_F is sample numbers in FFT. Obviously, the diameter D of the optical system needs to meet: $D \le {N_F}{d_x}.$ Substituting into Eq. (8), we get:

(9)$${d_\xi } \le \lambda z/D,$$

in geometric optics relations, $NA \le D/2z,$ therefore, NA has an upper limit:

(10)$$NA \le \lambda /2{d_\xi },$$

when NA exceeds this range, the calculation is undersampled, resulting in the failure of optimization. However, the proposed vector model can effectively break through this constraint by sampling in the two-dimensional wave vector.

Combined with the Eq. (3), when the sampling subscript of ${k_x}$ and ${k_y}$ is set to be m = -M/2, …, M/2-1, n = N/2, …, N/2-1, θ and φ can be rewritten as:

(11)$$\begin{array}{l} \theta = \arcsin \left( {\sqrt {k_x^2 + k_y^2} /kn} \right) = \arcsin \left( {{d_k}\sqrt {{m^2} + {n^2}} /kn} \right),\\ \varphi = \arctan ({k_y}/{k_x}) = \arctan ({n/m} ). \end{array}$$

Therefore, the sampling interval ${d_k}$ is:

(12)$${d_k} = \frac{{kn\sin {\theta _{\max }}}}{{M/2 - 1}}.$$

In the vector model, with ${d_\xi } = 2\pi {d_{{f_x}}},$ the highest analysis frequency is ${N_d}{d_\xi }/4\pi ,$ where ${N_d}$ is the number of pixels of the detector. According to the Nyquist theorem, we get:

(13)$$\frac{1}{{{d_k}}} \ge \frac{{{N_d}{d_\xi }}}{{2\pi }}.$$

What’s more, in the Eq. (4), we consider that the sampling is mainly affected by the factor $\exp ({i{k_z}z} )$. We can analyze in the x dimension, then $\varphi ({{k_x}} )= {k_z}z = z\sqrt {{{({kn} )}^2} - k_x^2}$, and it can be concluded that:

(14)$${\left|{\frac{1}{{2\pi }}\frac{{\partial \varphi ({{k_x}} )}}{{\partial {k_x}}}} \right|_{\max }} \le \frac{1}{{2d{k_x}}}.$$

Combining the Eq. (13) and Eq. (14), the number of samples in the entrance pupil should satisfy:

(15)$$M \ge \max ({N_d}kNA{d_\xi }/\pi + 2,\quad 4NA\tan {\theta _{\max }} \times \max (z)/\lambda + 2),$$

the max function is used to select the maximum of all factors. In addition, in the FFT algorithm:

(16)$${d_\xi } = 2\pi {d_{{f_x}}} = 2\pi \frac{1}{{M{d_k}}} = \frac{{M/2 - 1}}{{Mk \times NA}},$$

where ${d_{{f_x}}}$ is the frequency interval. Obviously, the range of pixel size is not continuous since M is an integer, making it difficult to match detector parameters. Therefore, we choose to use the CZT algorithm to achieve an accurate simulation model by arbitrarily setting the interval between the space domain and the frequency domain, effectively eliminating lateral errors.

In conclusion, the vector forward model proposed in this paper effectively represents the longitudinal components of the diffraction field. In addition, by mapping the sampling target to the wave vector in spherical coordinates, it overcomes the constraints of grid sampling, thereby significantly expanding the applicable range of numerical apertures (NA). Furthermore, to achieve precise model matching, we have introduced the CZT algorithm, allowing arbitrary pixel sizes of the detector.

3. Numerical optimization algorithms

Based on the established diffraction model, the complex coefficients of the polynomial are selected as optimization variables. Compared with directly optimizing the complex amplitude of each pixel, the number of variables is greatly reduced, which will effectively alleviate the search burden, yielding improved optimization performance. In this section, we introduce the MNG second-order algorithm applied to multi-images PR optimization model. Essentially, the optimization evaluation process is an unconstrained optimization problem, which can be formulated as a nonlinear least-squares error equation:

(17)$$\begin{array}{l} \mathop {\min }\limits_{\beta } F = \frac{1}{2}\sum\limits_{k = 1}^K {\left\|{\sqrt {{{\mathbf W}_k}} \cdot ({\hat{{\mathbf I}}_k^{sen} - \hat{{\mathbf I}}_k^{cal}} )} \right\|_2^2} \\ = \frac{1}{2}\sum\limits_{k = 1}^K {\left\|{\sqrt {{{\mathbf W}_k}} \cdot \left( {\hat{{\mathbf I}}_k^{sen} - \frac{{{{|{{{\mathbf C}_{x,k}}{\mathbf \beta }} |}^2} + {{|{{{\mathbf C}_{y,k}}{\mathbf \beta }} |}^2} + {{|{{{\mathbf C}_{z,k}}{\mathbf \beta }} |}^2}}}{{\sum\limits_{s = 1}^S {I_{ks}^{cal}} }}} \right)} \right\|_2^2} , \end{array}$$

where $\hat{{\mathbf I}}_k^{sen}$ and $\hat{{\mathbf I}}_k^{cal}$ represent the measured and calculated values of the normalized light intensity at each defocused position, respectively. s is the retrieval coordinate of the pixel, with a total of S, and $\sqrt {{{\mathbf W}_k}}$ is set to be the weight function used to remove pixels with low signal-to-noise ratio.

Due to the error function with the complex variable $\boldsymbol{\mathrm{\beta}}$ is non-analytic in the complex domain, the ${\mathbb C}{\mathbb R}$ calculus theory is employed to take derivatives. According the ${\mathbb C}{\mathbb R}$ calculus theory, the complex variable $\boldsymbol{\mathrm{\beta}}$ and its conjugated values $\overline{\boldsymbol{\mathrm{\beta}}}$ are treated as independent variables. In the following discussion, we define:

(18)$${{\mathbf f}_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )= \sqrt {{{\mathbf W}_k}} \cdot \left( {\hat{{\mathbf I}}_k^{sen} - \frac{{{{|{{{\mathbf C}_{x,k}}\boldsymbol{\mathrm{\beta}}} |}^2} + {{|{{{\mathbf C}_{y,k}}\boldsymbol{\mathrm{\beta}}} |}^2} + {{|{{{\mathbf C}_{z,k}}\boldsymbol{\mathrm{\beta}}} |}^2}}}{{\sum\limits_{s = 1}^S {I_{ks}^{cal}} }}} \right),$$

(19)$${F_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )= \frac{1}{2}{{\mathbf f}_k}{({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )^H}{{\mathbf f}_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )= \frac{1}{2}\sum\limits_{s = 1}^S {{\textrm{f}_{ks}}({{\beta },\bar{\boldsymbol{\mathrm{\beta}}}} ){{\mathrm{\bar{f}}}_{ks}}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )}.$$

Thus, the error function in Eq. (17) is rewritten as:

(20)$$\mathop {\min }\limits_{\beta } F = \sum\limits_{k = 1}^K {{F_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )}.$$

The gradient $\nabla {F_k}$ and the Hessian matrix ${\nabla ^2}{F_k}$ of the error function can be established about Jacobian matrix J_k, such that:

(21)$$\nabla {F_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )= {{\mathbf J}_k}{({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )^H}{{f}_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} ),{\nabla ^2}{F_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )\approx {{J}_k}{({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )^H}{{\mathbf J}_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} ),$$

with,

(22)$${{\mathbf J}_k}({\boldsymbol{\mathrm{\beta}},\bar{\boldsymbol{\mathrm{\beta}}}} )= \left[ {\begin{array}{{cc}} {\frac{{\partial {{\mathbf f}_k}}}{{\partial \boldsymbol{\mathrm{\beta}}}}}&{\frac{{\partial {{\mathbf f}_k}}}{{\partial \bar{\boldsymbol{\mathrm{\beta}}}}}} \end{array}} \right].$$

Further derivation according to the Eq. (18), we get:

(23)$$\begin{array}{l} \frac{{\partial {{\mathbf f}_k}}}{{\partial \boldsymbol{\mathrm{\beta}}}} ={-} \textrm{diag}\left( {\sqrt {{{\mathbf W}_k}} } \right)({{\mathbf {eI}}_k^{cal}} )({\textrm{diag}({{{\bar{{\mathbf C}}}_{x,k}}\bar{\boldsymbol{\mathrm{\beta}}}} ){{\mathbf C}_{x,k}} + \textrm{diag}({{{\bar{{\mathbf C}}}_{y,k}}\bar{\boldsymbol{\mathrm{\beta}}}} ){{\mathbf C}_{y,k}} + \textrm{diag}({{{\bar{{\mathbf C}}}_{z,k}}\bar{\boldsymbol{\mathrm{\beta}}}} ){{\mathbf C}_{z,k}}} )\\ - \textrm{diag}({{\mathbf I}_k^{cal}} ){\mathbf E}({\textrm{diag}({{{\bar{{\mathbf C}}}_{x,k}}\bar{\boldsymbol{\mathrm{\beta}}}} ){{\mathbf C}_{x,k}} + \textrm{diag}({{{\bar{{\mathbf C}}}_{y,k}}\bar{\boldsymbol{\mathrm{\beta}}}} ){{\mathbf C}_{y,k}} + \textrm{diag}({{{\bar{{\mathbf C}}}_{z,k}}\bar{\boldsymbol{\mathrm{\beta}}}} ){{\mathbf C}_{z,k}}} )/{({{\mathbf{eI}}_k^{cal}} )^2}; \end{array}$$

(24)$$\begin{array}{l} \frac{{\partial {{\mathbf f}_k}}}{{\partial \bar{\boldsymbol{\mathrm{\beta}}}}} ={-} \textrm{diag}\left( {\sqrt {{{\mathbf W}_k}} } \right)({{\mathbf{eI}}_k^{cal}} )({\textrm{diag}({{{\mathbf C}_{x,k}}\boldsymbol{\mathrm{\beta}}} ){{\bar{{\mathbf C}}}_{x,k}} + \textrm{diag}({{{\mathbf C}_{y,k}}\boldsymbol{\mathrm{\beta}}} ){{\bar{{\mathbf C}}}_{y,k}} + \textrm{diag}({{{\mathbf C}_{z,k}}\boldsymbol{\mathrm{\beta}}} ){{\bar{{\mathbf C}}}_{z,k}}} )\\ - \textrm{diag}({{\mathbf I}_k^{cal}} ){\mathbf E}({\textrm{diag}({{{\mathbf C}_{x,k}}\boldsymbol{\mathrm{\beta}}} ){{\bar{{\mathbf C}}}_{x,k}} + \textrm{diag}({{{\mathbf C}_{y,k}}\boldsymbol{\mathrm{\beta}}} ){{\bar{{\mathbf C}}}_{y,k}} + \textrm{diag}({{{\mathbf C}_{z,k}}\boldsymbol{\mathrm{\beta}}} ){{\bar{{\mathbf C}}}_{z,k}}} )/{({{\mathbf{eI}}_k^{cal}} )^2}, \end{array}$$

where e is a row vector of length S, and E is a two-dimensional matrix with the size of $S \times S$, where all elements in the vector and matrix are 1. Therefore, the time complexity with respect to the number of sampled pixels and the number of polynomial terms of the gradient solution is O(1) and O(J), respectively.

The search direction ${{\mathbf d}_t}$ is solved using the existing conjugate-gradient method:

(25)$${\nabla ^2}F({{\boldsymbol{\mathrm{\beta}}_t},{{\bar{\boldsymbol{\mathrm{\beta}}}}_t}} )[_{\overline {{{\mathbf d}_t}} }^{{{\mathbf d}_t}}] ={-} \nabla F({{\boldsymbol{\mathrm{\beta}}_t},{{\bar{\boldsymbol{\mathrm{\beta}}}}_t}} ).$$

In particular, if $\nabla F({{\boldsymbol{\mathrm{\beta}}_t},{{\bar{\boldsymbol{\mathrm{\beta}}}}_t}} ){{\mathbf d}_t} \ge 0$, then it should make ${{\mathbf d}_t} ={-} \nabla F({{\boldsymbol{\mathrm{\beta}}_t},{{\bar{\boldsymbol{\mathrm{\beta}}}}_t}} )$, which ensures that ${{\mathbf d}_t}$ is the descent direction of $F({{\boldsymbol{\mathrm{\beta}}_t},{{\bar{\boldsymbol{\mathrm{\beta}}}}_t}} )$ at $\boldsymbol{\mathrm{\beta}}$.

Next, we update the optimization value: ${\boldsymbol{\mathrm{\beta}}_{t + 1}} = {\boldsymbol{\mathrm{\beta}}_t} + {\alpha _t}{{\mathbf d}_t}$, where ${\alpha _t}$ is step size factor calculated using the clue search method based on Armijo criterion, and iteration is continued until the set termination condition is met.

The above optimization process is shown in Fig. 3, combined with the established vector model, the first-order gradient and second-order gradient of the error function with respect to the optimization variables can be analytically expressed based on the basis vectors, thereby realizing the non-diffraction iterative process, which greatly reduces the computational complexity and realizes high-precision wavefront sensing.

Fig. 3. The flowchart of the MNG optimization algorithm.

Download Full Size | PDF

4. Simulation results and analysis

4.1 Effectiveness of vector PR model

In this section, we conduct extensive numerical simulations to verify the accuracy and efficiency of the proposed method. First, we generate the diffraction focal spots of different types of polarized light in Table 1 and analyze the significant vector polarization characteristics of each field distribution. As shown in Fig. 4, when the polarized light passes through the optical system with an NA of 0.95, the significant diffraction sidelobes greatly affects the total intensity distribution. For the linearly polarized (LP) on the x-axis, circularly polarized (CP) light and angularly polarized (AP) light, the transverse fields still dominate. However, due to the influence of longitudinal components, the total intensity distribution becomes elliptical, a larger dispersed circular and ring shape, respectively. In contrast, for radial polarized (RP) light, the longitudinal component is stronger and its half-power beam width is much smaller than that of the circular total intensity. The above results are completely consistent with the diffraction rules, which cannot be explained by scalar diffraction theory, and further proves the necessity and accuracy of vector model.

Fig. 4. The normalized transverse, longitudinal, and total light intensity of the diffraction fields in the system with NA of 0.95.

Download Full Size | PDF

Table 1. Components in different directions of polarized beams

View Table | View all tables in this article

Next we input each polynomial term into the vector model to generate diffraction basis vectors, and then we reconstruct the complex amplitude wavefront in the transverse polarization direction using the MNG method. In numerical simulations, the CP illumination wavelength is 632.8 nm, the NA of the optical system is equal to 0.65. We use a CCD with a pixel size of 4.4 µm and a resolution of 128 × 128. Due to the large pixel size, the imaging spot can only occupy a few pixel units at the best focus position, so we choose to measure the imaging spot at a defocus distance Δz = [0.15, 0.2, 0.25]mm to obtain more information. We fit a wavefront using the randomly generated 66 terms of Zernike coefficients, the Fig. 5 shows the true object wavefront and the reconstruction results.

Fig. 5. The true wavefront and reconstruction results in the x and y directions.

Download Full Size | PDF

Root Mean Square Error (RMSE) is a commonly used metric to measure the average magnitude of the errors between predicted and actual values, which is calculated by taking the square root of the average of the squared differences. Therefore, we visually evaluated the reconstruction results with RMSE values, and recorded its trend with the number of iterations, as shown in Fig. 6. When iterating 70 times, the results gradually converge, and the RMSE of phase reconstruction can reach 3.7155 × 10⁻⁴$\lambda$ and 8.2606 × 10⁻⁴$\lambda$ in the x and y directions, and the RMSE of amplitude reconstruction can achieve 4.4 × 10⁻³$\lambda$ and 4.1 × 10⁻³$\lambda$, which demonstrate the effectiveness and high accuracy of the proposed method.

Fig. 6. The RMSE values with the iteration number in the x and y directions.

Download Full Size | PDF

In order to better meet application requirements, especially when higher-order aberrations are present, we need to consider a larger number of polynomials. Therefore, we used an increasing number of polynomial coefficients to generate and reconstruct wavefront, and each case included 10 sets of randomly generated datasets. For more intuitive analysis, we recorded the number of iterations required to achieve an RMSE below 0.001$\lambda$. As shown in Fig. 7, there is a gentle increase in the number of iterations corresponding to the growing count of polynomial coefficients, but the growth rate is exceedingly gradual, reflecting that the number of iterations is highly dependent on the characteristics of the samples. Overall, when the number of polynomials is less than 300, the number of iterations can generally be kept within 100.

Fig. 7. Number of iterations required to achieve an RMSE below 0.001$\lambda$.

Download Full Size | PDF

In optical measurements, achieving precise alignment of detectors is a formidable challenge. Consequently, we further explored the sensitivity of the reconstruction accuracy to the defocus and lateral displacement of the detector. We conducted simulations to capture the light intensity data when the detector's position was adjusted both in the transverse (x-y plane) and z-direction, followed by an assessment of the RMSE for the reconstruction results, as shown in Table 2. It can be concluded that the detector is more sensitive to positional variations in the z-direction, and in an ideal experimental environment, the algorithm demonstrates the capability to accommodate position errors at the micron-level.

Table 2. Effect of position error on algorithm calculation accuracy

View Table | View all tables in this article

4.2 Anti-noise performance analysis

In this section, we add the Gaussian noise and analyze the noise robustness performance of the optimization algorithm when signal-to-noise ratio (SNR) ranging from 25 to 75 dB. Since the phase contains more information, we take the phase reconstruction results as examples for analysis. The reconstructed phase results and residuals of 100 times iteration without noise are shown in Fig. 8. The RMSE values of the reconstructed wavefront in the x and y directions are 1.415 × 10⁻⁴$\lambda$ and 1.2548 × 10⁻⁴$\lambda$, respectively.

Fig. 8. The reconstruction wavefront and residual without noise.

Download Full Size | PDF

For each noise environment, we randomly generated 60 simulated datasets, and the RMSE values of the reconstructed results are shown in the Fig. 9. As expected, the RMSE is barely affected with high SNR. When the signal-to-noise ratio is 75 dB, the average value of RMSE is 3.7155 × 10⁻⁴$\lambda$ and 8.2606 × 10⁻⁴$\lambda$ in the x and y directions, respectively, and the error increases gradually as the signal-to-noise ratio decreases. When the signal-to-noise ratio is 25 dB, the value of RMSE is 0.0289$\lambda$ and 0.0327$\lambda$.

Fig. 9. RMSE values of the 60 reconstruct results in each group when SNR ranging from 25 to 75 dB. (a) and (b) show the box plots about RMSE in the x direction. (c) and (d) show the box plots about RMSE in the y direction. (e) shows the average of the RMSE.

Download Full Size | PDF

4.3 Comparison of calculation time

To verify the high computational efficiency, we contrast the proposed method with the classic first-order conjugate-gradient (CG) descent and second-order Levenberg–Marquardt (LM) algorithm. The optimization results of the three methods on the wavefront fitted with 16 Zernike coefficients are shown in Fig. 10. As expected, the MNG algorithm achieves higher optimization accuracy with fewer iterations. When the CG algorithm iterates 50000 times, the RMSE of the reconstruction result can reach 0.0055$\lambda$, which takes 302.28 seconds. The LM algorithm demonstrates excellent optimization efficiency. However, due to the penalty factor introduced to ensure the full rank of the Jacobian matrix, the convergence speed is affected. When the LM algorithm iterates 500 times, the RMSE of the reconstructed wavefront is 0.0048λ. In contrast, the MNG algorithm only requires dozens of iterations to improve the reconstruction results by several orders of magnitude, such as the computational time is reduced to 1.87 seconds when the iterations is 20, and the RMSE can be increased to 9.7142 × 10⁻⁵$\lambda$.

Fig. 10. The reconstruction results and residuals optimized using the two methods.

Download Full Size | PDF

In addition, we test the convergence speed of the RMSE and error function, as shown in Fig. 11. Obviously, The MNG algorithm exhibits excellent performance for more complex vector optimization, achieving faster square convergence than CG algorithm with linear convergence.

Fig. 11. The plots of RMSE and error functions versus number of iterations.

Download Full Size | PDF

5. Conclusion

In this paper, we propose a fast wavefront sensing method suitable for tightly focused systems. Firstly, we establish a vector Debye diffraction model based on the CZT algorithm to simulate imaging, effectively eliminating lateral error and overcoming the constraints of undersampling. Furthermore, we calculate the imaging distribution of each polynomial fitting the wavefront, generating the diffraction basis vector. Due to the linear invariance of incoherent optical systems, the polynomial coefficients of the object wavefront align with the basis vector coefficients of the imaging patterns. Consequently, we construct a modal PR method since the wavefront can be reconstructed by predicting the complex amplitude coefficients of the polynomials, instead of optimizing pixel values individually. Specifically, we propose a modified Gauss-Newton optimization algorithm, applying ${\mathbb C}{\mathbb R}$ calculus theory to calculate the gradient and utilizing the conjugate-gradient method to define the search direction, enabling swift and accurate optimization variable solving. Lastly, we validate the method's effectiveness and analyze the noise immunity performance and speed advantage. Experimental results reveal that the RMSE of the reconstruction result can achieve sub-nanometer precision at the very least, and it presents robust noise immunity. Compared to traditional optimization algorithms, our proposed method can significantly reduce the number of iterations, decrease computation time, and improve retrieval accuracy. In conclusion, our method demonstrates the feasibility of aberration retrieval based on full vector diffraction, providing valuable theoretical guidance for in-situ detection for high-NA optical systems.

Funding

National Natural Science Foundation of China (62175211).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. Fienup, “Phase Retrieval Algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]

2. W. Zhang, H. Zhang, C. Sheppard, and G. Jin, “Analysis of numerical diffraction calculation methods: from the perspective of phase space optics and the sampling theorem,” J. Opt. Soc. Am. A 37(11), 1748–1766 (2020). [CrossRef]

3. Y. Xiao, X. Tang, Y. Qin, H. Peng, W. Wang, and L. Zhong, “Nonuniform fast Fourier transform method for numerical diffraction simulation on tilted planes,” J. Opt. Soc. Am. A 33(10), 2027–2033 (2016). [CrossRef]

4. Y. Pan and X. Ma, “Informatics-based computational lithography for phase-shifting mask optimization,” Opt. Express 30(12), 21282–21294 (2022). [CrossRef]

5. J. Krüger, D. Bergmann, R. Köning, B. Bodermann, and E. Manske, “In situ, back-focal-plane-based determination of the numerical apertures in optical microscopes,” Appl. Opt. 62(3), 756–763 (2023). [CrossRef]

6. S. Park, C. Min, S. Han, E. Choi, K. Cho, H. Jang, and M. Kim, “Super-resolution Microscopy with Adaptive Optics for Volumetric Imaging,” Curr. Opt. Photon. 6, 550–564 (2022).

7. S. N. Khonina and S. A. Degtyarev, “Analysis of the formation of a longitudinally polarized optical needle by a lens and axicon under tightly focused conditions,” J. Opt. Technol. 83(4), 197–205 (2016). [CrossRef]

8. G. Gillen, C. Seck, and S. Guha, “Analytical beam propagation model for clipped focused-Gaussian beams using vector diffraction theory,” Opt. Express 18(5), 4023–4040 (2010). [CrossRef]

9. V. Kotlyar and A. Kovalev, “Nonparaxial propagation of a Gaussian optical vortex with initial radial polarization,” J. Opt. Soc. Am. A 27(3), 372–380 (2010). [CrossRef]

10. J. Braat, P. Dirksen, and A. Janssen, “Assessment of an extended Nijboer–Zernike approach for the computation of optical point-spread functions,” J. Opt. Soc. Am. A 19(5), 858–870 (2002). [CrossRef]

11. G. Yang, B. Dong, B. Gu, J. Zhuang, and O. Ersoy, “Gerchberg–Saxton and Yang–Gu algorithms for phase retrieval in a nonunitary transform system: a comparison,” Appl. Opt. 33(2), 209–218 (1994). [CrossRef]

12. H. Takajo, T. Takahashi, R. Ueda, and M. Taninaka, “Study on the convergence property of the hybrid input–output algorithm used for phase retrieval,” J. Opt. Soc. Am. A 15(11), 2849–2861 (1998). [CrossRef]

13. C. Gloster and Paul, “Optimization Algorithms in Physics,” International Journal of Nonlinear Sciences & Numerical Simulation 11(6), 387–388 (2010). [CrossRef]

14. J. Fienup, “Phase-retrieval algorithms for a complicated optical system,” Appl. Opt. 32(10), 1737–1746 (1993). [CrossRef]

15. Q. Hu, L. Zhen, Y. Mao, S. Zhu, X. Zhou, and G. Zhou, “Adaptive stochastic parallel gradient descent approach for efficient fiber coupling,” Opt. Express 28(9), 13141–13154 (2020). [CrossRef]

16. J. Yuan and Z. Hu, “High-Order Statistical Blind Deconvolution of Spectroscopic Data with a Gauss–Newton Algorithm,” Appl. Spectrosc. 60(6), 692–697 (2006). [CrossRef]

17. L. Jin, S. Kasuga, E. Kondoh, and B. Gelloz, “Correction of large retardation window effect for ellipsometry measurements using quasi-Newton method,” Appl. Opt. 54(10), 2991–2998 (2015). [CrossRef]

18. R. Pielaszek, “Nanopowder Diffraction Theory - Line Profile for Polydispersive Powders,” Diffusion and Defect Data. Solid State Data, Part B. Solid State Phenomena 114, 303–312 (2006). [CrossRef]

19. N. Ngo, “Optical Chirp Z-Transform Processor: Design and Application,” J. Lightwave Technol. 33(11), 2213–2221 (2015). [CrossRef]

20. A. Jurling, M. Bergkoetter, and J. Fienup, “Techniques for arbitrary sampling in two-dimensional Fourier transforms,” J. Opt. Soc. Am. A 35(11), 1784–1796 (2018). [CrossRef]

21. Ken Kreutz-Delgado “The Complex Gradient Operator and the CR-Calculus.” Mathematics (2009).

22. G. Brady and J. Fienup, “Measurement range of phase retrieval in optical surface and wavefront metrology,” Appl. Opt. 48(3), 442–449 (2009). [CrossRef]

Lateral error (x-y plane)/pixels	RMSE/λ	Defocusing error (z-axis)/µm	RMSE/ $λ$
0.25	0.0066	1	0.0197
0.5	0.0146	3	0.0436
1	0.0337	5	0.0447
3	0.0449	10	0.0488

Lateral error (x-y plane)/pixels	RMSE/λ	Defocusing error (z-axis)/µm	RMSE/ $λ$
0.25	0.0066	1	0.0197
0.5	0.0146	3	0.0436
1	0.0337	5	0.0447
3	0.0449	10	0.0488

Fast wavefront sensing method based on diffraction basis vectors for tightly focused optical systems

Abstract

1. Introduction

2. Basis vectors based on Debye diffraction

2.1 Vector diffraction model

2.2 Calculation of diffraction basis vector

2.3 Numerical sampling analysis

3. Numerical optimization algorithms

4. Simulation results and analysis

4.1 Effectiveness of vector PR model

4.2 Anti-noise performance analysis

4.3 Comparison of calculation time

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (2)

Equations (25)

Optics Express