Source and mask optimizing with a defocus antagonism for process window enhancement

Fei Peng; Fei Peng; Yiduo Xu; Yiduo Xu; Yi Song; Yi Song; Chengqun Gui; Chengqun Gui; Yan Zhao; Yan Zhao

doi:10.1364/OE.469275

1. Introduction

As one of the manufacturing strategies of integrated circuits (IC), optical lithography systems are constantly evolving to follow the steps of Moore's Law [1]. Figure 1(a) illustrates the schematic diagram of the imaging system for deep ultraviolet (DUV) lithography with 193 nm illumination [2,3]. First, the mask carrying the integrated circuit layout is uniformly illuminated and an optical signal is generated. Second, the optical signal is then transferred to the photoresist-covered wafer through an optical projection system, and the photochemical reaction of the photoresist is induced. Finally, the wafer pattern is printed after a series of developing processes. However, the fidelity of printed wafer pattern is distorted due to optical proximity effect (OPE) and so on [4–6]. Thus, a series of ILT schemes such as mask optimization (MO) method [7–10] and source mask optimization (SMO) [11–14] were proposed and become essential to compensate for the systematic distortions of the lithographic images. As shown in Fig. 1(b), SMO improves pattern fidelity by planning the intensity distribution of the source, pre-distorting the mask pattern, and inserting sub-resolution assist features (SRAF) around the main features (MF).

Fig. 1. (a) Schematic of a DUV lithography system, in which a 193 nm annular partially coherent light source is used, the IC layout on the mask is transferred to the wafer surface by a projection exposure and development process. (b) Source and mask optimization (SMO) method based on ILT, where the pixelated light source and mask are numerically solved through an inverse imaging process to improve the pattern fidelity.

Download Full Size | PDF

With the continuous reduction of critical dimension (CD), the optical lithography is pushed into the low-k1 regime, and the printed wafer image becomes more sensitive to process variations [15,16]. Meanwhile, in the photolithography process, due to the existence of wavefront aberrations [5], thermal aberrations [17], thermal mask effect [18], thick mask effect [19], flare effect, and unevenness of wafer surface and photoresist surface [20], the imaging of the photolithography system will inevitably be offset in the focal plane, which further presents reliability challenges and stringent requirements for ILT. Thus, various computational strategies are proposed and applied to SMO to ensure process manufacturability. Along the way, the SMO in the assigned defocus plane was studied by Peng et al [21] and further explored by Ma et al. [22], but the pattern fidelity at different defocus planes is still difficult to ensure. Recently, Jia et al. proposed the statistical multivariable-based OPC methods [23–25], which combine random defocus and a variety of evaluation indicators while searching the light source and mask, and greatly improve the pattern fidelity at different focal planes. Subsequently, based on the advent of the ASML wavefront manipulator equipment, Sears et al. proposed the pupil wavefront optimization (PWO) approaches [26–28] to compensate for thick mask induced aberration (TMIA), and a series of optimization methods based on the Zernike polynomials [29,30] are proposed and used to improve the manufacturability of the process [31,32]. Meanwhile, the differentiable expression of edge placement error (EPE) and process variation band (PVB) combined with dose and defocus was developed by Gao et al., and the MO based on process window (PW) awareness was proposed [33]. Shen et al. summarized previous approaches and combined with a reinforcement learning algorithm to optimize EPE and PVB in the vector lithography model [34], which further improved the robustness of SMO. Later, Li et al. proposed the defocus robust SMO (DRSMO) method [35], which combines defocus disturbance and mini-batch gradient descent (MBGD) algorithms, and further reduced the sensitivity of lithographic imaging to defocus. Besides, the compressive sensing focus on the manufacturability of optimized sources and masks [36,37], while also improving the PW at the same time.

Unfortunately, the improvement of the DOF by the above methods is always indirect, namely, the control of DOF is passive, and the corresponding optimization scheme and quantifiable optimization indicators are not clearly given. To eliminate the change and uncertainty of the pattern error (PAE) caused by the inherent defocus effectively, it is necessary to develop a method which can directly plan and control the DOF in the SMO, thereby reducing the sensitivity of the optimized source and mask to the defocus variation. Thus, reducing the sensitivity of the optimal source and optimal mask to defocusing under the condition of expanding defocusing is an effective strategy, based on which we propose a defocus generative and adversarial SMO (DGASMO) method. In the generative adversarial model, the optimized objects not only contain source and mask, but also focus on defocus interference. Specifically, the cost function in the DGASMO method includes the pattern error (PAE) at the focal plane, the PAE under defocus plane, and the defocus generation term. Here, the defocus generating term, also known as the penalty term in convex optimization, is constantly generating defocus conditions and disturbing the imaging system. On another hand, the PAE convergence at the focal plane and the defocus plane drives the optimization towards the reduction of defocus. As a result, the defocus reduction and defocus generation are constantly confronted and search for a robust solution of the source and mask. Meanwhile, we analyze the relationship between normalized image log slope (NILS) and EL and equate the PW as a Defocus-NILS window, which shows the superiority of the proposed DGASMO method in improving process manufacturability. It is worth noting that we borrow the Adam reinforcement learning algorithm to improve the optimization efficiency.

2. Forward imaging model

As depicted in Fig. 1(a), the transition from mask patterns to wafer patterns can be decomposed into two numerical models: the optical projection (aerial image formation) based on Fourier optics and Abbe method, and the resist effect (wafer image formation) described by the sigmoid function. Considering a polarized electric field emanated by a point source at $({{\alpha_s},{\beta_s}} )$, the analytic expression of aerial image ${{\bf I}_a}$ at the focal plane can be described as [5,38,39]

(1)$${{\bf I}_a} = \frac{1}{{{J_{sum}}}}{\sum\limits_{({{\alpha_s},{\beta_s}} )} {\sum\limits_{p = x,y,z} {{{\bf J}^{{\alpha _s}{\beta _s}}} \cdot |{{\bf E}_p^{wafer}} |} } ^2}$$

where ${J_{sum}}\textrm{ = }\sum\nolimits_{({{\alpha_s},{\beta_s}} )} {{{\bf J}^{{\alpha _s}{\beta _s}}}}$ is the sum of source intensities, ${\bf J} \in {{\cal R}^{{N_S} \times {N_S}}}$ represents the intensity distribution of the pixelated source, ${\bf E}_p^{wafer}$ is the electric fields (x-, y-, and z-directions) on the wafer contributed by ${{\bf J}^{{\alpha _s}{\beta _s}}}$, which can be expressed as [40,41]

(2)$${\bf E}_p^{wafer} = {\bf H}_p^{{\alpha _s}{\beta _s}} \otimes ({{\bf M} \odot {{\bf B}^{{\alpha_s}{\beta_s}}}} )$$

in which, ${\otimes} $ is convolution operation, ${\odot} $ is entry-by-entry multiplication operation, ${\bf M} \in {{\cal R}^{{N_M} \times {N_M}}}$ represents the intensity distribution of the pixelated mask, ${\bf M} \odot {{\bf B}^{{\alpha _s}{\beta _s}}}$ represent the mask near field, which using the constant scattering coefficient assumption (CSCA) approximate the oblique incidence effect of the light rays [42], and the entry in ${{\bf B}^{{\alpha _s}{\beta _s}}}$ is defined as

(3)$${{\bf B}^{{\alpha _s}{\beta _s}}}({m,n} )= {\textrm{e}^{\frac{{\textrm{j} \times 2\pi \times {\alpha _s} \times m}}{{{N_M}}}}} \times {\textrm{e}^{\frac{{\textrm{j} \times 2\pi \times {\beta _s} \times n}}{{{N_M}}}}},m,n = 0,1, \cdots {N_M} - 1.$$

${\bf H}_p^{{\alpha _s}{\beta _s}}$ with p = x, y, z refers to the equivalent filters of the x, y and z components. It is computed as

(4)$${\bf H}_p^{{\alpha _s}{\beta _s}} = {\mathrm{{\cal F}}^{\textrm{ - }1}}\left\{ {\frac{{{n_w}}}{R}\sqrt {\frac{{{n_w}\gamma }}{{\gamma^{\prime}}}} {{\bf E}_p} \odot {\bf H}\textrm{(}\alpha^{\prime}\textrm{,}\beta^{\prime}\textrm{)} \odot {\bf V}\textrm{(}\alpha^{\prime}\textrm{,}\beta^{\prime}\textrm{,}\gamma^{\prime}\textrm{)}} \right\},\;\;p\;=\;x,y,z,$$

where ${\mathrm{{\cal F}}^{\textrm{ - }1}}\{{\; \cdot \;} \}$ represent inverse Fourier transform operation, ${n_w}$ is the refractive index at the wafer side, R denotes the transverse magnification, $({\alpha^{\prime},\beta^{\prime},\gamma^{\prime}} )$ and $({\alpha ,\beta ,\gamma } )$ are the light propagation direction cosines in the wafer side and mask side by a point source $({{\alpha_s},{\beta_s}} )$, ${{\bf E}_p}$ is the electric field emitted by ${{\bf J}^{{\alpha _s}{\beta _s}}}$[12], ${\bf H} \in {{\cal R}^{{N_M} \times {N_M}}}$ is the pupil function of the projector with the lights $\sqrt {{{\alpha ^{\prime}}^2} + {{\beta ^{\prime}}^2}} > NA$ are filtered, and ${\bf V}({\alpha^{\prime},\beta^{\prime},\gamma^{\prime}} )$ characterizes the rotating factor in vector imaging system [5].

It should be noted that the aerial images ${{\bf I}_a}$ at defocus plane can be described as the effect of aberration on imaging [35]. And the defocus, ${\bf Def}(\delta )$, which can be thought of as the phase change on equivalent filters ${\bf H}_p^{{\alpha _s}{\beta _s}}$, it can be computed by

(5)$${\bf Def}(\delta )\textrm{ = }{\mathrm{{\cal F}}^{\textrm{ - }1}}\left\{ {\textrm{exp} \left[ {j\frac{{2\pi {n_w}\delta (1 - {\gamma^{\prime}})}}{\lambda }} \right]} \right\}$$

where $\delta > 0$ is defocus value. Thus, the aerial image ${{\bf I}_a}$ with defocus at $\delta $ can be expressed as

(6)$${\bf I}_a^\delta = \frac{1}{{{J_{sum}}}}{\sum\limits_{({{\alpha_s},{\beta_s}} )} {{\bf J}({{\alpha_s},{\beta_s}} )\sum\limits_{p = x,y,z} {|{{\bf Def}(\delta )\otimes {\bf E}_p^{wafer}} |} } ^2}$$

The photochemical reaction of photoresist is described by the differentiable sigmoid model $sig(x) = {1 / {({1 + {e^{ - a(x - {t_r})}}} )}}$, with a and ${t_r}$ are the steepness and threshold of sigmoid function respectively. Together with the optical projection, the wafer image can be uniformly calculated as

(7)$${\bf I}_w^\delta = \mathrm{{\cal T}}\{ \{{{\bf J},{\bf M},{\boldsymbol \delta }} \}= sig({{\bf I}_a^\delta } )$$

3. Defocus generative and adversarial SMO

As illustrated in Fig. 2, the forward calculation of the DGASMO framework computes the wafer images at focal plane ${\bf I}_w^{\delta \textrm{ = }0}$ and defocus plane ${\bf I}_w^\delta$. Combined with the desired target pattern and defocus generating term $\mathrm{{\cal S}}_\delta ^{gen}$, the inverse optimization updates the source, mask, and defocus parameters by minimizing the “score $({{\mathrm{{\cal S}}_L}} )$ “. In this work, the Lagrange equation based formula ${\mathrm{{\cal S}}_L}$ is defined as ${\mathrm{{\cal S}}_L} = {\lambda _{pe}}\mathrm{{\cal S}}_{pe}^{\delta = 0} + \;{\lambda _{ad}}\mathrm{{\cal S}}_\delta ^{ad} - {\lambda _{gen}}\mathrm{{\cal S}}_\delta ^{gen}$, in which, $\mathrm{{\cal S}}_{pe}^{\delta = 0}$ and $\mathrm{{\cal S}}_\delta ^{gen}$ are the PAEs at focus and defocus plane, the weights ${\lambda _{pe}} = {1 / {|{\mathrm{{\cal S}}_{pe}^{\delta = 0}} |}}$, ${\lambda _{ad}} = {1 / {|{\mathrm{{\cal S}}_\delta^{ad}} |}}$, ${\lambda _{gen}} = {1 / {|{ - \mathrm{{\cal S}}_\delta^{gen}} |}}$ are set and used to normalize the various cost functions, $| \cdot |$ means that the value has no differentiable properties. $\mathrm{{\cal S}}_\delta ^{gen}$ are derived by Karush-Kuhn-Tucker (KKT) condition, which is used to ensure a solution of the optimization problem. For brevity in the subsequent derivation, we define ${\bf Def}(\delta )\textrm{ = }{\mathrm{{\cal F}}^{\textrm{ - }1}}\{{\textrm{exp} [{j\delta \cdot {{\bf \Gamma }_{def}}} ]} \}$, details of ${{\bf \Gamma }_{def}}$ will be shown in the Appendix A. For the binary mask, the pixel values are between 0 and 1, we apply the parametric transformation such that

(8)$${\bf J}\;\textrm{ = }\;({{1 + \cos \theta )} / 2}{\bf ;M}\;\textrm{ = }\;({{1 + \cos \omega )} / 2}$$

Fig. 2. Forward calculation and inverse optimization process for defocus generative adversarial SMO (DGASMO).

Download Full Size | PDF

3.1 DGASMO inverse optimization framework

To detail, given a desired target pattern ${{\bf I}_0} \in {{\cal R}^{N \times N}}$, the goal of DGASMO is to search for the optimal source $\hat{{\bf J}} \in {{\cal R}^{{N_s} \times {N_s}}}$ and optimal mask $\hat{{\bf M}} \in {{\cal R}^{N \times N}}$ which minimize the mismatch between ${{\bf I}_0}$ and $\mathrm{{\cal T}}\{\{{\; \cdot \;} \}$ through inverse imaging and numerical optimization algorithms while maximizing defocus $\delta $. In this work, the measured dissimilarity is defined as “score $(\mathrm{{\cal S}} )$ “, and $\mathrm{{\cal S}}_{pe}^{\delta \ne 0}$ represents the pattern fidelity (PAE) at a defocus plane. Thus, the uniform expression of PAE at focal plane and defocus plane can be calculated as

(9)$$\mathrm{{\cal S}}_{pe}^\delta \;\textrm{ = }\;\frac{1}{2}||{{\bf I}_w^\delta \;\textrm{ - }\;{{\bf I}_0}} ||_2^2$$

which measures the sum of the absolutely mismatches between ${\bf I}_w^\delta$ and the desired one ${{\bf I}_0}$ over all locations. Notably, lithographic imaging systems are characterized by the optimal imaging fidelity at the focal plane, which leads to the convergence of $\delta $ towards 0 and ${{\partial \mathrm{{\cal S}}_{pe}^\delta } / {\partial \delta }} \ge 0$ while $\delta \ge 0$. To this end, PAE at defocus plane is defined as $\mathrm{{\cal S}}_\delta ^{ad}\textrm{ = }\mathrm{{\cal S}}_{pe}^{\delta \ne \textrm{0}}$ in the adversarial problem. To improve the imaging robustness of $\hat{{\bf J}}$ and $\hat{{\bf M}}$ under the focal plane and defocus plane, it is necessary to reasonably penalize the defocus during SMO. Thus, a simple defocus generation term is defined as $\mathrm{{\cal S}}_\delta ^{gen} \ge 0$. Combined $\mathrm{{\cal S}}_\delta ^{ad}$ and $\mathrm{{\cal S}}_{pe}^{\delta \textrm{ = 0}}$ with the weight parameter ${\lambda _{pe}} > 0$ and ${\lambda _{ad}} > 0$, the generative and adversarial problem can be formulated as

(10)$$\begin{array}{{c}} {\textrm{minimize }\;\;\;\;{\lambda _{pe}}\mathrm{{\cal S}}_{pe}^{\delta = 0} + \;{\lambda _{ad}}\mathrm{{\cal S}}_\delta ^{ad}}\\ {\textrm{subject}\;\textrm{to}\;\;\;\;\;\;\;\; - \mathrm{{\cal S}}_\delta ^{gen} \le 0\;\;\;\;\;\;} \end{array}$$

where $F = {\lambda _{pe}}\mathrm{{\cal S}}_{pe}^{\delta = 0} + \;{\lambda _{ad}}\mathrm{{\cal S}}_\delta ^{ad}$ is defined as cost function in convex optimization and $- \mathrm{{\cal S}}_\delta ^{gen} \le 0$ is the boundary conditions of defocus value $\delta $, and the Lagrange equation is given by

(11)$${\mathrm{{\cal S}}_L}(\theta ,\omega ,\delta ) = {\lambda _{pe}}\mathrm{{\cal S}}_{pe}^{\delta = 0} + \;{\lambda _{ad}}\mathrm{{\cal S}}_\delta ^{ad} + {\lambda _{gen}} \cdot ( - \mathrm{{\cal S}}_\delta ^{gen})$$

where ${\lambda _{gen}}$ being the Lagrange multiplier, and a time-dependent scheme of the Euler-Lagrange equation computed as

(12)$${{{\partial _t}{\mathrm{{\cal S}}_L}} / {{\partial _t}\phi }} = {\lambda _{pe}} \cdot {{\partial \mathrm{{\cal S}}_{pe}^{\delta = 0}} / {\partial \phi }}\textrm{ + }{\lambda _{ad}} \cdot {{\partial \mathrm{{\cal S}}_\delta ^{ad}} / {\partial \phi }} - {\lambda _{gen}} \cdot {{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \phi }}$$

in which, t is the time step, $\phi = \theta ,\omega \;or\;\delta $ can be solved with the finite-difference schemes by the steepest gradient descent (SGD) method

(13)$${\phi ^{t + 1}} = {\phi ^t} - \eta \cdot {{\partial S_L^t} / {\partial {\phi ^t}}}$$

Since the $\mathrm{{\cal S}}_{pe}^{\delta = 0}$ and $\mathrm{{\cal S}}_\delta ^{ad}\textrm{ = }\mathrm{{\cal S}}_{pe}^{\delta \ne \textrm{0}}$ are uniform expressed by Eq. (9), the ${{\partial \mathrm{{\cal S}}_{pe}^{\delta = 0}} / {\partial \phi }}$ and ${{\partial \mathrm{{\cal S}}_\delta ^{ad}} / {\partial \phi }} = {{\partial \mathrm{{\cal S}}_{pe}^{\delta \ne 0}} / {\partial \phi }}$ can be uniform expressed by ${{\partial \mathrm{{\cal S}}_{pe}^\delta } / {\partial \phi }}$. The details of the derivation of ${{\partial \mathrm{{\cal S}}_{pe}^\delta } / {\partial \phi }}$ is shown in the Appendix A.

The traditional SMO methods improve the pattern fidelity at the focal plane by minimizing $F = \mathrm{{\cal S}}_{pe}^{\delta = 0}$, but the image fidelity with an uncertain defocus plane is difficult to control, the optimized source and mask still has a relatively high sensitivity to defocus. In order to effectively eliminate the effect of defocus, it is necessary to introduce a defocus term in the penalty function. Thus, Peng et al. [21] set the defocus to 100 nm by minimize $F = {\lambda _{pe}}\mathrm{{\cal S}}_{pe}^{\delta = 0} + \;{\lambda _{ad}}\mathrm{{\cal S}}_\delta ^{ad}$, Jia et al. [23] introduced random defocus in $F = {\lambda _{pe}}\mathrm{{\cal S}}_{pe}^{\delta = 0} + \;{\lambda _{ad}}\mathrm{{\cal S}}_\delta ^{ad}$. At the same time, we pay more attention to the generative and adversarial problem of defocus, namely, the optimization problem with boundary conditions given by the Eq. (10), which leads the update of defocus $\delta $ to be considered based on Eq. (12). In Section 3.2, a defocus generation term $\mathrm{{\cal S}}_\delta ^{gen} ={-} ({{\delta_0}/\delta } )$ is designed depending on the defocus value $\delta $. The update direction vector ${{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \phi }}$ is derived in the Appendix A.

3.2 Regularization of defocus generation term

As mentioned previously, $\mathrm{{\cal S}}_\delta ^{gen}$ generates and plans defocus $\delta $ during SMO to ensure the robustness of optimized source $\hat{{\bf J}}$ and mask $\hat{{\bf M}}$. Focusing on the defocus generative and adversarial problem, the Euler-Lagrange equation corresponding to $\delta $ is computed as:

(14)$${{{\lambda _{ad}}\partial \mathrm{{\cal S}}_\delta ^{ad}} / {\partial \delta }} - {\lambda _{gen}} \cdot {{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \delta }} = 0$$

combined with KKT conditions

(15)$${\hat{\lambda }_{gen}}\textrm{ = }{\lambda _{ad}} \cdot [{{{\partial \mathrm{{\cal S}}_\delta^{ad}({\hat{\delta }} )} / {\partial \hat{\delta }}}} ]\cdot {[{{{\partial \mathrm{{\cal S}}_\delta^{gen}({\hat{\delta }} )} / {\partial \hat{\delta }}}} ]^{\textrm{ - }1}} \ge 0$$

in which, optimal ${\hat{\lambda }_{gen}}$ is designed to satisfy the dual feasibility and it’s practical to set ${\lambda _{gen}}\textrm{ = 1} \cdot {|{\mathrm{{\cal S}}_\delta^{gen}(\delta )} |^{ - 1}}$ during SMO. To ensure a solution, a conservative setting ${{\partial \mathrm{{\cal S}}_\delta ^{gen}(\delta )} / {\partial \delta }} \ge 0$ with all defocus $\delta$ is adopted in this paper. On other hand, the primary feasibility and complementarity of KKT condition requires

(16)$$\left\{ \begin{array}{l} \mathrm{{\cal S}}_\delta^{gen}(\hat{\delta }) \ge 0\\ {{\hat{\lambda }}_{gen}} \cdot \mathrm{{\cal S}}_\delta^{gen}({\hat{\delta }} )= 0 \end{array} \right.$$

Thus, ${{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \delta }}$ is designed to be a conservative function with a lower bound. In cooperation with defocus generative problem, which is employed to generate defocus near the focal plane and converge on the infimum when the defocus is too large, a monotonic lower bound equation ${{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \delta }}\textrm{ = }{1 / {{\delta ^2}}}$ is devised to guarantee that the formula has a solution (other functions can be readily applied), with the defocus generation term can be expressed as:

(17)$$\mathrm{{\cal S}}_\delta ^{gen}\textrm{ = }\int {{{\;1} / {{\delta ^2}}}\;\textrm{d}\delta } \; = \;\;\textrm{ - }{1 / \delta }$$

where $\mathrm{{\cal S}}_\delta ^{gen}$ is designed to generate defocus, and the convergence of $\mathrm{{\cal S}}_\delta ^{ad}$ will push the defocus value to $\delta \textrm{ = }0$, forming a generative adversarial process during SMO. Combined with the optimization of the dimensionless score function ${\mathrm{{\cal S}}_L}$, the defocus generation term is modified as $\mathrm{{\cal S}}_\delta ^{gen} ={-} ({{\delta_0}/\delta } )$, with ${\delta _0}$ is defocus scale. Notably, the convergence ability of defocus generation term $\mathrm{{\cal S}}_\delta ^{gen}$ determines the limit of pushing the defocus value to $\delta \gg 0$, and the more aggressive defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{({{\delta_0}/\delta } )^4}$ and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{({{\delta_0}/\delta } )^9}$ are adopted to compare the adversarial effect.

3.3 Adam optimizer

Although SGD provides a feasible scheme for minimizing ${\mathrm{{\cal S}}_L}$, however, the optimization suffers from the challenge of the sensitivity on the step-size $\eta $, which is extremely serious in multi-task optimization (optimization with constraints) as we studied before. Thus the Adam optimizer [43] is employed in this paper, with the updated rules at time-step t+1 is

(18)$${\phi ^{t + 1}} = {\phi ^t} - \eta \cdot {{{{\hat{m}}^t}} / {\left( {\sqrt {{{\hat{v}}^t}} + \varepsilon } \right)}}$$

in which, $\eta \;\textrm{ = }\;0.01$ is learning rate, $\varepsilon \textrm{ = }{10^{\textrm{ - }8}}$ is an additional term to ensure denominator is not zero, ${\hat{m}_t}\textrm{ = }{{{m_t}} / {({1 - \beta_1^t} )}}$ and ${\hat{\nu }_t}\textrm{ = }{{{\nu _t}} / {({1 - \beta_2^t} )}}$ are the first and second bias-corrected moment estimate of first and second bias moment ${m_t}\textrm{ = }{\beta _1} \cdot {m_{t\textrm{ - }1}}\;\textrm{ + }\;({1\;\textrm{ - }\;{\beta_1}} )\cdot {g_t}$ and ${\nu _t}\textrm{ = }{\beta _2} \cdot {\nu _{t\textrm{ - }1}}\;\textrm{ + }\;({1\;\textrm{ - }\;{\beta_2}} )\cdot g_t^2$, with ${g_t} = {{\partial {\mathrm{{\cal S}}_L}} / {\partial \phi }}$, and the decay rates are set to ${\beta _1}\textrm{ = }0.99\;{\beta _2}\textrm{ = }0.999$.

4. Defocus-NILS window

The lithography process robustness is usually described by the focus-exposure process window (PW), which includes all pairs of DOF and EL that satisfy the CD error condition: $\varDelta \textrm{CD} \le 10{\%}$. Without loss of generality, Fig. 3(a) gives the cross-sections of the aerial image exposed by nominal dose and dose’, with the CD of wafer image corresponding to CD1 and CD2 after photoresist effect. Figure 3(b) shows the slope of the aerial image on the image contour, which is defined as NILS:

(19)$$NILS = {\left. {\frac{{CD}}{{{{\bf I}_{\textrm{con}}}}} \times \frac{{d{\bf I}}}{{dx}}} \right|_{{{\bf I}_{\textrm{con}}}}}$$

in which, ${{\bf I}_{\textrm{con}}}$ is the aerial image intensity on the contour and ${\left. {\frac{{d{\bf I}}}{{dx}}} \right|_{{{\bf I}_{\textrm{con}}}}}$ is the differential of the intensity with respect to the length perpendicular to the contour. It’s observed that the larger NILS indicate the less change of CD caused by exposure dose, namely, there is a positive correlation between NILS and EL. Mathematically, the relationship between exposure latitude (EL), CD, and NILS is computed as [43]:

(20)$$\frac{{\partial \textrm{l}\textrm{n}EL}}{{\partial \textrm{l}\textrm{n}CD}} = \frac{1}{2}NILS$$

by taking the CD error $\varDelta \textrm{CD} \le 10{\%}$ into consideration, the EL can be modified to NILS by:

(21)$$\%EL \approx {\alpha _{\textrm{EL}}}({NILS\;\textrm{ - }\;{\beta_{\textrm{EL}}}} )$$

where ${\alpha _{\textrm{EL}}}$ is the added exposure latitude (EL) for each unit increase in NILS above the lower limit ${\beta _{\textrm{EL}}}$, it’s practical to measure NILS under different defocus planes, and the Defocus-NILS window is used in this paper.

Fig. 3. Relationship between exposure latitude (EL) and the NILS. (a) The cross sections of aerial images with different dose and (b) the cross sections based on NILS.

Download Full Size | PDF

5. Numerical results

Numerical simulations are performed on a DUV lithographic imaging system with 85 nm and 55 nm technology node, and the parameters used for simulations are: $\lambda = 193\textrm{nm}$, $\textrm{NA} = 1.35$, resolution $\Delta x = \Delta y = 4\textrm{nm/pixel}$, ${N_S} = 29$, ${N_M} = 256$, transverse magnification R = 4, refractive index ${n_w} = 1.44$, the steepness and threshold of sigmoid function being $a = 80$ and $tr = 0.2$. The system is illuminated by a partially coherent annular source ${\bf J}$ with an inner radius ${\delta _{inner}} = 0.6$ and the outer radius ${\delta _{outer}} = 0.9$. ${\bf J}$ and two desired target pattern ${{\bf I}_{01}}$ (CD = 85 nm) and ${{\bf I}_{02}}$ (CD = 55 nm) are given in Fig. 4.

Fig. 4. (a) The annular source ${\bf J}$. (b) The desired target pattern ${{\bf I}_{01}}$ and (c) The desired target pattern ${{\bf I}_{02}}$. The averaged-NILS and PW are computed based on the critical locations marked by the yellow lines.

Download Full Size | PDF

Optimization results with desired target pattern ${{\bf I}_{01}}$ are given in Fig. 5. In the columns of Fig. 5, from left to right are illumination source, mask, wafer images at $\delta \;$= 0 (focal plane), 100 and 200 nm. In row (a) of Fig. 5, optimized source and mask are derived using the traditional SMO, with the improved PAEs are 290, 723, and 6268 at $\delta \;\textrm{ = }$ 0, 100 and 200 nm, and the NILS from 1.207 decreases to 0.929 and 0.264 with the defocus increases from 0 to 100 and 200 nm, row (b) shows the imaging results with SMO-Adam method, and the improved PAEs = 230, 584, 4271 and NILS = 1.407,1.136, 0.449 at $\delta \;\textrm{ = }$ 0, 100 and 200 nm, respectively. Row (c) gives the DGASMO imaging results using defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$, with the PAEs are 215464, and 3621 at $\delta \;\textrm{ = }$ 0, 100 and 200 nm, where the proposed method achieves better performance of PAEs. Similarly, compared with the traditional SMO algorithm, NILS at $\delta \;\textrm{ = }$ 0, 100 and 200 nm increased to 1.428, 1.176 and 0.511, which has the similar performance to the method proposed by Jia [23]. Row (d) compares the imaging results with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / {{\delta ^4}}}$, with the improved PAE is 227 at $\delta \;\textrm{ = }$ 0, and the better performance of PAEs are 357 and 2561 at $\delta \;\textrm{ = }$ 100 and 200 nm, with the NILS improved to 1.389, 1.186 and 0.638. Row (e) is optimized using defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / {{\delta ^9}}}$, which has the best imaging results at defocus plane, the improved PAEs are 244, 281, and 1378 at $\delta \;\textrm{ = }$ 0, 100 and 200 nm, and the NILS from 1.399 decreases to 1.2267 and 0.873 with the defocus increases from 0 to 100 and 200 nm. In particular, it can be observed that the assistant features of the mask (the area in the red box) become more and more obvious as the $\mathrm{{\cal S}}_\delta ^{gen}$ is set more aggressively, because the aggressive $\mathrm{{\cal S}}_\delta ^{gen}$ generates a higher defocus value to be included in SMO, while the defocus adversarial term $\mathrm{{\cal S}}_\delta ^{ad}$ continuously reduces the influence of defocusing on SMO, and two terms form a dynamic confrontation. Thus, the negative impact of defocus on imaging can be effectively compensated, and the optimization process does not lose robustness due to the modulation by KKT conditions. Although the generative and adversarial problem introduces interference to the optimization of source and mask, DGASMO achieves better global performance with the same iterations, which is realized based on the adaptive adjustment ability of the Adam enhanced algorithm.

Fig. 5. Simulation results with ${{\bf I}_{01}}$ as target pattern. Columns from left to right: the optimized source pattern $\hat{{\bf J}}$, the optimized mask pattern $\hat{{\bf M}}$, and the printed wafer image ${\bf I}_w^\delta $ at 0 nm, 100 nm and 200 nm defocus. Rows (a-e): the traditional SMO method, the SMO-Adam method, the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$, $\textrm{ - }{{1 / \delta }^4}$ and $\textrm{ - }{{1 / \delta }^9}$, respectively.

Download Full Size | PDF

Figure 6(a) depicts the optimized defocus–PAE of desired target pattern ${{\bf I}_{01}}$, with the defocus range from −200 nm to 200 nm. Where the black dotted, black, green, blue, and red curve corresponds to the imaging results of rows (a-e) in Fig. 5, respectively. Where PAEs greatly decreased from traditional SMO (black dotted curve) to SMO-Adam method (black curve), while the PAEs of each curve gradually decreased in the order of DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). Compared with the defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve)and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve), the most aggressive defocus generation term $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve) greatly reduces the sensitivity of imaging to defocusing, and the printed wafer image ${\bf I}_w^\delta $ within the defocusing range of ${\pm} 100$nm is almost uniform, which provides a better defocus tolerance for lithography, and the process robustness benefit can be realized by setting the degree of aggressiveness of the defocus generation term. Figure 6(b) gives the convergence performance of defocus value $\delta$ for DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). At the initial stage of the confrontation process (Iterations < 50), the properties of the source and mask under the low $\delta$ are also optimized. Compared with Peng's method [21], we do not directly set the defocusing value $\delta$, and $\delta$ is continuously adjusted by the generation terms and the adversarial terms to achieve a steady state. At the middle and late stages of the confrontation, the defocusing converges to ${\pm} 37$ (green curve), ${\pm} 67$ (blue curve), and ${\pm} 103$nm (red curve), respectively, and the robustness of the optimized source and mask is continuously improved. Thus, the PAE in Fig. 6(a) shows a uniform error magnitude within the defocusing range of 50 (green curve), 70 (blue curve) and 90 nm (red curve) respectively. Compared with Li’s method [35], the proposed method clearly gives the maximum defocusing value $\delta$ that SMO can tolerate, and can determine the improvement of DOF without the help of tedious calculation operations.

Fig. 6. (a) The optimized defocus-pattern error (PAE) of desired target pattern ${{\bf I}_{01}}$ for traditional SMO (black dotted curve), the SMO-Adam method (black curve), the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). (b) The confrontation process of defocus value using desired target pattern ${{\bf I}_{01}}$ for DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve).

Download Full Size | PDF

To compare the similarity between Defocus-NILS window and PW, the averaged NILS corresponding to different defocus value $\delta$ is given in Fig. 7(a), and Fig. 7(b) shows the rectangle- based PW. Figure 7(a) shows the optimized Defocus-NILS window of desired target pattern ${{\bf I}_{01}}$ for traditional SMO (black dotted curve), the SMO-Adam method (black curve), and the DGASMO methods with the defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve), with the defocus range from −200 nm to 200 nm. It is illustrated that the NILS of the proposed DGASMO method with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) is larger than the traditional SMO method (black curve) while defocus value $\delta \ge 70$nm, and the NILS optimized by the most aggressive defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ has a better performance within the defocusing range of ${\pm} 200$nm. Figure 7(b) shows the PWs of desired target pattern ${{\bf I}_{01}}$ for traditional SMO method (black dotted curve), the SMO-Adam method (black curve), and the DGASMO methods with the defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). Due to the optimization ability of the Adam method, the increased PW and DOF (EL = 15%) of the SMO-Adam method are 33.63% and 38.81%, respectively. Compare with the SMO-Adam method, the increased PW being 11.15%, 21.4% and 29.12%, and the increased DOFs (EL = 15%) are 19.35%, 32.26% and 44.09% by the proposed DGASMO method. The PW optimized by DGASMO with the defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve) have a best performance in DOF and EL. It should be noted that PWs and Defocus-NILS window has similar trends in DOF and EL.

Fig. 7. Comparison of Defocus-NILS window and PW for desired target pattern ${{\bf I}_{01}}$ (a) The Defocus-NILS window for traditional SMO (black dotted curve), the SMO-Adam method (black curve), the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). (b) The PWs for traditional SMO (black dotted curve), the SMO-Adam method (black curve), the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). Details of PWs Fig. S1 of the supplementary material.

Download Full Size | PDF

Optimization results with desired target pattern ${{\bf I}_{0\textrm{2}}}$ are given in Fig. 8. The columns from left to right lies the illumination source, mask, wafer images at $\delta \;$= 0 (focal plane), 100, and 200 nm. Rows (a-e) of Fig. 10 are imaging results optimized by the traditional SMO, SMO-Adam, and DGASMO with the defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$, $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$. Where the PAE = 347, 710, 5145 and NILS = 0.905, 0.806, 0.530 optimized by SMO are given in row (a), with the defocus of optical system are $\delta \;\textrm{ = }$ 0, 100 and 200 nm. The huge improvement of PAE = 277, 575, 4005 and NILS = 1.030, 0.916, 0.599 are given in row (b), which are optimized by the SMO-Adam method. Row (c) gives the DGASMO imaging results using defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$, with the improved PAE = 264, 470, 3166 and NILS = 1.042, 0.936, 0.639 at $\delta \;\textrm{ = }$ 0, 100 and 200 nm. Row (d) shows the DGASMO imaging results using defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^\textrm{4}}$, with the improved PAE = 270, 395, 2680 and NILS = 1.045, 0.948, 0.673, while $\delta \;\textrm{ = }$ 0, 100 and 200 nm. Row (e) shows the DGASMO imaging results using defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^\textrm{9}}$, with the better imaging performance under defocusing $\delta \;\textrm{ = }$ 100 and 200 nm, in which, PAE = 278, 341, 2316 and NILS = 1.064, 0.968, 0.698. And the changes of assistant features in mask (the area in the red box) become more and more important to compensate for wafer image distortion.

Fig. 8. Simulation results with ${{\bf I}_{02}}$ as target pattern. Columns from left to right: the optimized source pattern $\hat{{\bf J}}$, the optimized mask pattern $\hat{{\bf M}}$, and the printed wafer image ${\bf I}_w^\delta $ at 0 nm, 100 nm and 200 nm defocus. Rows (a-e): the traditional SMO method, the SMO-Adam method, the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$, $\textrm{ - }{{1 / \delta }^4}$ and $\textrm{ - }{{1 / \delta }^9}$, respectively.

Download Full Size | PDF

Fig. 9. (a) The optimized defocus-pattern error (PAE) of desired target pattern ${{\bf I}_{02}}$ for traditional SMO (black dotted curve), the SMO-Adam method (black curve), the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). (b) The confrontation process of defocus value using desired target pattern ${{\bf I}_{02}}$ for DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve).

Download Full Size | PDF

Fig. 10. Comparison of Defocus-NILS window and PW for desired target pattern ${{\bf I}_{02}}$. (a) The Defocus-NILS window for traditional SMO (black dotted curve), the SMO-Adam method (black curve), the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). (b) The PWs for traditional SMO (black dotted curve), the SMO-Adam (black curve), the DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). Details of PWs are given in Fig. S2 of the supplementary material.

Download Full Size | PDF

Figure 9(a) depicts the optimized defocus–PAE using ${{\bf I}_{02}}$ as desired target pattern, with the defocus range from −200 nm to 200 nm. Where the black dotted, black, green, blue and red curve corresponds to the imaging results of rows (a-e) in Fig. 8. And the global PAE of each curve decreased in the order of traditional SMO (black dotted curve), SMO-Adam, DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). Similar to the results in Fig. 6(a) (using ${{\bf I}_{01}}$ as target pattern), the sensitivity of imaging to defocusing reduced by the most aggressive defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve) greatly. Figure 9(b) shows the confrontation performance of defocus value $\delta$ for DGASMO with defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). And the defocusing value $\delta$ rises rapidly at the beginning of the confrontation, and fluctuates at ${\pm} 44$, ${\pm} 70$ and ${\pm} 91$nm at the later stage of iterations. However, the convergence speed of defocus value $\delta$ becomes slower with the aggressiveness of defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}$, so it is necessary to select an appropriate $\mathrm{{\cal S}}_\delta ^{gen}$ to ensure the robustness of optimization.

Figure 10(a) shows the Defocus-NILS window of desired target pattern ${{\bf I}_{02}}$ for traditional SMO (black dotted curve), the SMO-Adam (black curve), and the DGASMO method with the defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). Where the NILS of SMO-Adam and proposed the DGASMO methods are larger than traditional SMO method (black dotted curve) within ${\pm} 200$nm defocusing. Figure 10(b) shows the PWs of desired target pattern ${{\bf I}_{02}}$ for traditional SMO method (black dotted curve), the SMO-Adam method (black curve), and the DGASMO with the defocus generation terms $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{1 / \delta }$ (green curve), $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^4}$ (blue curve) and $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ (red curve). Compared with the traditional SMO method, the PW optimized by SMO-Adam method is greatly improved. Compared with the SMO-Adam method, the increased PW being 54.9%, 126.8% and 190.2%, and the increased DOF (EL = 2%) are 50%, 86.84% and 118.42% by the proposed DGASMO method.

The expansion trend between defocus value $\delta$ and DOF revealed by desired target patterns ${{\bf I}_{01}}$ and ${{\bf I}_{02}}$ are summarized in Table 1. Where the optimized DOF of desired target pattern ${{\bf I}_{01}}$ by proposed DGASMO increased from 111 nm to 123 nm to 134 nm, while the defocus value $\delta$ increased from 37 nm to 67 nm to 103 nm. And the optimized DOF of desired target pattern ${{\bf I}_{02}}$ by proposed DGASMO increased from 57 nm to 71 nm to 83 nm, while the defocus value $\delta$ increased from 44 nm to 70 nm to 91 nm. Thus, the improvement of defocus value $\delta$ have the similar trend of DOF. Integrated consider the convergence performance of PAEs, $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ is a reasonable term for generating defocus value.

Table 1. The expansion trend between defocus value $\delta$ and DOF revealed by desired target patterns ${{\bf I}_{01}}$ and ${{\bf I}_{02}}$

View Table

Compared with the desired target pattern ${{\bf I}_{01}}$ (CD = 85 nm), the DOF and EL of the desired target pattern ${{\bf I}_{02}}$ (CD = 55 nm) are decreased, this is because the acyclic pattern is more sensitive to the defocus of optical system, but the proposed DGASMO can effectively perceive such drastic changes, so as to compensate the tolerance of the optimized source and mask to defocus. On the other hand, the slight difference between PWs and Defocus-NILS window is due to the average operation of NILS.

6. Conclusions

This paper proposed an efficient DGASMO method to improve the imaging performance for lithographic defocus, which was formulated as a SMO problem under defocusing interference. The inverse optimization framework using defocus generation term generates a ruler for planning defocus and against with the adversary term, thus, a more robust lithographic source and mask with better defocusing tolerance were searched. Benefit from the constraint of the KKT condition and the acceleration of the Adam algorithm, the defocus value $\delta$ changes and converges continuously, the optimization of DOF is more direct and effective. Compared to the SMO-Adam method, the DGASMO with $\mathrm{{\cal S}}_\delta ^{gen}\;\textrm{ = }\;\textrm{ - }{{1 / \delta }^9}$ resulting in the improvement of PW and DOF (EL = 15%) are 29.12% and 44.09% with CD = 85 nm, and the better performance of PW and DOF (EL = 2%) at CD = 55 nm are 190.2% and 118.42%. In addition, the similarities between Defocus-NILS window and PW is compared, and the method of directly optimizing PW will be considered in our subsequent work.

Appendix A

For brevity, Eq. (5) can be simplified as:

(A1)$${\bf Def}(\delta )\textrm{ = }{\mathrm{{\cal F}}^{\textrm{ - }1}}\{{\textrm{exp} [{j\delta \cdot {{\bf \Gamma }_{def}}} ]} \}, $$

where

(A2)$${{\bf \Gamma }_{def}}\textrm{ = }{{2\pi {n_w}(1 - {\gamma ^{\prime}})} / \lambda }. $$

According to Eq. (11) and (12), we define the time-dependent scheme of the Euler-Lagrange term ${{{\partial _t}{\mathrm{{\cal S}}_L}} / {{\partial _t}\phi }}$, where $\phi $ can be $\theta ,\omega \;or\;\delta $. According to the derivation in [34], the gradients of $\mathrm{{\cal S}}_{pe}^\delta$ with respect to $\theta $ can be computed as:

(A3)$$\frac{{\partial \mathrm{{\cal S}}_{pe}^\delta }}{{\partial \theta }}\textrm{ = } - \textrm{2}a\;\textrm{sin}\theta \sum\limits_{{\alpha _s},{\beta _s}} {\frac{{\sum\nolimits_{p = x,y,z} {||{{\bf Def}(\delta )\otimes {\bf E}_p^{{\alpha_s}{\beta_s}}} ||_2^2 - {\bf I}_a^\delta } }}{{{J_{sum}}}}} \odot ({{{\bf I}_0} - {\bf I}_w^\delta } )\odot {\bf I}_w^\delta \odot ({1 - {\bf I}_w^\delta } ), $$

it’s worth noting that ${{\partial \mathrm{{\cal S}}_{pe}^{\delta = 0}} / {\partial \theta }} = { {({{{\partial \mathrm{{\cal S}}_{pe}^\delta } / {\partial \theta }}} )} |_{\delta = 0}}$, and ${{\partial \mathrm{{\cal S}}_\delta ^{ad}} / {\partial \theta }} = {{\partial \mathrm{{\cal S}}_{pe}^{\delta \ne 0}} / {\partial \theta }} = { {({{{\partial \mathrm{{\cal S}}_{pe}^\delta } / {\partial \theta }}} )} |_{\delta \ne 0}}$.

similarly, for the $\omega $

(A4)$$\begin{array}{l} \frac{{\partial \mathrm{{\cal S}}_{pe}^\delta }}{{\partial \omega }}\textrm{ = }\frac{{\textrm{ - 2}a}}{{{J_{sum}}}}\;\textrm{sin}\omega \sum\limits_{{\alpha _s},{\beta _s}} {\sum\limits_{p = x,y,z} {J \cdot \textrm{Real}} } [{{B^ \ast } \odot ({{{({{\bf Def}(\delta )\otimes H_p^{{\alpha_s}{\beta_s}}} )}^{ {\ast}{\circ} }}} } \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; { { \otimes \{{{\bf Def}(\delta )\otimes E_p^{wafer} \odot ({{{\bf I}_0} - {\bf I}_w^\delta } )\odot {\bf I}_w^\delta \odot ({1 - {\bf I}_w^\delta } )} \}} )} ]\end{array}$$

with ∗ being the conjugate operation, ◦ flipping the matrix in the argument in both up-down and right-left directions, $\textrm{1} \in {{\cal R}^{{N_M} \times {N_M}}}$ being the all-ones matrix. Where the ${{\partial \mathrm{{\cal S}}_{pe}^{\delta = 0}} / {\partial \omega }}$ and ${{\partial \mathrm{{\cal S}}_\delta ^{ad}} / {\partial \omega }}$ can be computed by setting $\delta = 0$ and $\delta \ne 0$, respectively.

Similarly, the update direction vector of defocus $\delta $ to be computed as:

(A5)$$\frac{{\partial \mathrm{{\cal S}}_{pe}^\delta }}{{\partial \delta }} = \frac{{4a}}{{{J_{sum}}}}\sum\limits_{({{\alpha_s},{\beta_s}} )} {{{\bf J}^{{\alpha _s}{\beta _s}}}} \sum\limits_{p = x,y,z} {\sum\limits_{{N_M},{N_M}} {\{{Re ({\Lambda _p^{{\alpha_s}{\beta_s}}} )\odot {D_1}} } } {\textrm{ + }{\mathop{\rm Im}\nolimits} ({\Lambda _p^{{\alpha_s}{\beta_s}}} )\odot {D_2}} \}$$

where $Re \{{\; \cdot \;} \}$ and ${\mathop{\rm Im}\nolimits} \{{\; \cdot \;} \}$ represent the operations to extract the real part and the imaginary part, with $\Lambda _p^{{\alpha _s}{\beta _s}}$, ${D_1}$, ${D_2}$ are given by:

(A6)$$\Lambda _p^{{\alpha _s}{\beta _s}}\textrm{ = }{({{\bf E}_p^{wafer}} )^{ {\ast}{\circ} }} \otimes [{({{{\bf I}_0} - {\bf I}_w^\delta } )\odot {\bf I}_w^\delta \odot ({1 - {\bf I}_w^\delta } )} ]\odot ({{\bf Def}(\delta )\otimes {\bf E}_p^{wafer}} )$$

(A7)$${D_1}\textrm{ = }Re [{{\mathrm{{\cal F}}^{\textrm{ - }1}}\{{\;{\Gamma _{def}}\; \odot \sin ({\delta \;{{\bf \Gamma }_{def}}} )\;} \}} ]+ {\mathop{\rm Im}\nolimits} [{{\mathrm{{\cal F}}^{\textrm{ - }1}}\{{\;{{\bf \Gamma }_{def}}\; \odot \cos ({\delta \;{{\bf \Gamma }_{def}}} )} \}} ]$$

(A8)$${D_2}\textrm{ = }Re [{{\mathrm{{\cal F}}^{\textrm{ - }1}}\{{\;{\Gamma _{def}} \odot \cos ({\delta \;{{\bf \Gamma }_{def}}} )\;} \}} ]\textrm{ - }{\mathop{\rm Im}\nolimits} [{{\mathrm{{\cal F}}^{\textrm{ - }1}}\{{\;{{\bf \Gamma }_{def}}\;\; \odot \sin ({\delta \;{{\bf \Gamma }_{def}}} )} \}} ]$$

with the ${{\partial \mathrm{{\cal S}}_{pe}^{\delta = 0}} / {\partial \delta }} = 0$ and ${{\partial \mathrm{{\cal S}}_\delta ^{ad}} / {\partial \delta }} = { {({{{\partial \mathrm{{\cal S}}_{pe}^\delta } / {\partial \delta }}} )} |_{\delta \ne 0}}$ can be computed by setting $\delta = 0$ and $\delta \ne 0$, respectively.

For defocus generation term $\mathrm{{\cal S}}_\delta ^{gen} ={-} ({{\delta_0}/\delta } )$ with defocus scale ${\delta _0} = 1nm$, it’s obviously that ${{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \theta }} = {{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \omega }} = 0$, and ${{\partial \mathrm{{\cal S}}_\delta ^{gen}} / {\partial \delta }} = {1 / \delta }$.

Funding

Key R&D program of Hubei (2021BAA173).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request

Supplemental document

See Supplement 1 for supporting content.

References

1. G. E. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE 86(1), 82–85 (1998). [CrossRef]

2. A. K. Wong, Resolution Enhancement Techniques in Optical Lithography (SPIE, 2001).

3. F. Schellenberg, “A little light magic,” IEEE Spectrum 40(9), 34–39 (2003). [CrossRef]

4. X. Ma and G. R. Arce, Computational Lithography, 1st ed. (John Wiley and Sons, 2010).

5. A. K. Wong, Optical Imaging in Projection Lithography (SPIE, 2005).

6. A. Poonawala and P. Milanfar, “Mask design for optical microlithography—an inverse imaging problem,” IEEE Trans. on Image Process. 16(3), 774–788 (2007). [CrossRef]

7. Y. Granik, “Fast pixel-based mask optimization for inverse lithography,” J. Micro/Nanolith. MEMS MOEMS 5(4), 043002 (2006). [CrossRef]

8. X. Ma and G. R. Arce, “Pixel-based OPC optimization based on conjugate gradients,” Opt. Express 19(3), 2165–2180 (2011). [CrossRef]

9. X. Ma, Y. Li, and L. Dong, “Mask optimization approaches in optical lithography based on a vector imaging model,” J. Opt. Soc. Am. A 29(7), 1300–1312 (2012). [CrossRef]

10. W. Lv, S. Liu, Q. Xia, X. Wu, Y. Shen, and E. Y. Lam, “Level-set-based inverse lithography for mask synthesis using the conjugate gradient and an optimal time step,” J. Vac. Sci. Technol. B 31(4), 041605 (2013). [CrossRef]

11. J. Yu and P. Yu, “Gradient-based fast source mask optimization (SMO),” Proc. SPIE 7973, 797320 (2011). [CrossRef]

12. X. Ma, C. Han, Y. Li, L. Dong, and G. R. Arce, “Pixelated source and mask optimization for immersion lithography,” J. Opt. Soc. Am. A 30(1), 112–123 (2013). [CrossRef]

13. J. Li and E. Y. Lam, “Robust source and mask optimization compensating for mask topography effects in computational lithography,” Opt. Express 22(8), 9471–9485 (2014). [CrossRef]

14. J. Li, S. Liu, and E. Y. Lam, “Efficient source and mask optimization with augmented lagrangian methods in optical lithography,” Opt. Express 21(7), 8076–8090 (2013). [CrossRef]

15. C. Han, Y. Li, X. Ma, and L. Liu, “Robust hybrid source and mask optimization to lithography source blur and flare,” Appl. Opt. 54(17), 5291–5302 (2015). [CrossRef]

16. N. Jia and E. Y. Lam, “Pixelated source mask optimization for process robustness in optical lithography,” Opt. Express 19(20), 19384–19398 (2011). [CrossRef]

17. A. Erdmann, J. Kye, Y. Mao, S. Li, G. Sun, J. Wang, L. Duan, Y. Bu, and X. Wang, “The thermal aberration analysis of a lithography projection lens,” presented at the Optical Microlithography XXX2017.

18. A. T. Macrander, A. M. Khounsary, D. Chojnowski, D. C. Mancini, B. P. Lai, R. J. Dejus, and A. M. Khounsary, “Thermal management of masks for deep x-ray lithography,” presented at the High Heat Flux and Synchrotron Radiation Beamlines1997.

19. A. Erdmann, “Mask modeling in the low k1 and ultrahigh NA regime: phase and polarization effects (invited paper),” Proc. SPIE 5835, 69–81 (2005). [CrossRef]

20. C. A. Mack, Fundamental Principles of Optical Lithography: The Science of Microfabrication (John Wiley & Sons,Ltd., 2007).

21. Y. Peng, J. Zhang, Y. Wang, and Z. Yu, “Gradient-based source and mask optimization in optical lithography,” IEEE Trans. on Image Process. 20(10), 2856–2864 (2011). [CrossRef]

22. W. Conley, X. Ma, Y. Li, X. Guo, and L. Dong, “Robust resolution enhancement optimization methods to process variations based on vector imaging model,” presented at the Optical Microlithography XXV2012.

23. N. Jia and E. Y. Lam, “Machine learning for inverse lithography: using stochastic gradient descent for robust photomask synthesis,” J. Opt. 12(4), 045601 (2010). [CrossRef]

24. A. C. Chen, N. Jia, B. Lin, A. K. Wong, E. Y. Lam, and A. Yen, “Robust mask design with defocus variation using inverse synthesis,” presented at the Lithography Asia (2008).

25. Y. Shen, N. Jia, N. Wong, and E. Y. Lam, “Robust level-set-based inverse lithography,” Opt. Express 19(6), 5511–5521 (2011). [CrossRef]

26. M. K. Sears, G. Fenger, J. Mailfert, and B. W. Smith, “Extending SMO into the lens pupil domain,” Proc. SPIE 7973, 79731B (2011). [CrossRef]

27. M. K. Sears, J. Bekaert, and B. W. Smith, “Pupil wavefront manipulation for optical nanolithography,” Proc. SPIE 8326, 832611 (2012). [CrossRef]

28. M. K. Sears, J. Bekaert, and B. W. Smith, “Lens wave front compensation for 3D photomask effects in subwavelength optical lithography,” Appl. Opt. 52(3), 314–322 (2013). [CrossRef]

29. R. J. Noll, “Zernike polynomials and atmospheric turbulence,” J. Opt. Soc. Am. A 66(3), 207–211 (1976). [CrossRef]

30. P. Dirksen, J. Braat, A. Janssen, and A. Leeuwestein, “Aberration retrieval for high-NA optical systems using the extended Nijboer-Zernike theory,” Proc. SPIE 5754, 262–273 (2005). [CrossRef]

31. X. Wu, S. Liu, J. Li, and E. Y. Lam, “Efficient source mask optimization with zernike polynomial functions for source representation,” Opt. Express 22(4), 3924–3937 (2014). [CrossRef]

32. C. Han, Y. Li, L. Dong, X. Ma, and X. Guo, “Inverse pupil wavefront optimization for immersion lithography,” Appl. Opt. 53(29), 6861–6871 (2014). [CrossRef]

33. J.-R. Gao, X. Xu, Y. Bei, and D. Z. Pan, “MOSAIC: Mask optimizing solution with process window aware inverse correction,” Proc. ACM/IEEE Design Autom. Conf. (DAC), San Francisco, CA, USA, 2014pp. 1–6.

34. Y. Shen, F. Peng, X. Huang, and Z. Zhang, “Adaptive gradient-based source and mask co-optimization with process awareness,” Chin. Opt. Lett. 17(12), 121102 (2019). [CrossRef]

35. P. Wei, Y. Li, T. Li, N. Sheng, E. Li, and Y. Sun, “Multi-Objective Defocus Robust Source and Mask Optimization Using Sensitive Penalty,” Appl. Sci. 9(10), 2151 (2019). [CrossRef]

36. X. Ma, D. Shi, Z. Wang, Y. Li, and G. R. Arce, “Lithographic source optimization based on adaptive projection compressive sensing,” Opt. Express 25(6), 7131–7149 (2017). [CrossRef]

37. X. Ma, Z. Wang, Y. Li, G. R. Arce, L. Dong, and J. Garcia-Frias, “Fast optical proximity correction method based on nonlinear compressive sensing,” Opt. Express 26(11), 14479–14498 (2018). [CrossRef]

38. J. W. Goodman, Introduction to Fourier Optics, Chap. 4 (McGraw-Hill Science, 1996).

39. M. Born and E. Wolf, Principle of Optics, Chap. 10 (Cambridge University, 1999).

40. Y. Shen, F. Peng, and Z. Zhang, “Efficient optical proximity correction based on semi-implicit additive opera,” Opt. Express 27(2), 1520–1528 (2019). [CrossRef]

41. Y. Shen, F. Peng, and Z. Zhang, “Semi-implicit level set formulation for lithographic source and mask optimization,” Opt. Express 27(21), 29659–29668 (2019). [CrossRef]

42. T. V. Pistor, A. R. Neureuther, and R. J. Socha, “Modeling oblique incidence effects in photomasks,” Proc. SPIE 4000, 228–237 (2000). [CrossRef]

43. D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” presented at the 3rd International Conference on Learning Representations, ICLR (2015).

Source and mask optimizing with a defocus antagonism for process window enhancement

Abstract

1. Introduction

2. Forward imaging model

3. Defocus generative and adversarial SMO

3.1 DGASMO inverse optimization framework

3.2 Regularization of defocus generation term

3.3 Adam optimizer

4. Defocus-NILS window

5. Numerical results

6. Conclusions

Appendix A

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (10)

Tables (1)

Equations (29)

Optics Express

Method	$I_{01}$ (CD = 85 nm)		$I_{02}$ (CD = 55 nm)
Method	$δ$ (nm)	DOF (nm) (EL = 15%)	$δ$ (nm)	DOF (nm) (EL = 2%)
Traditional SMO	-	67	-	-
SMO-Adam	-	93	-	38
DGASMO ( $S_{δ}^{g e n} = - 1 / δ$ )	37	111	44	57
DGASMO ( $S_{δ}^{g e n} = - 1 / δ^{4}$ )	67	123	70	71
DGASMO ( $S_{δ}^{g e n} = - 1 / δ^{9}$ )	103	134	91	83