Fast inverse lithography approach based on a model-driven graph convolutional network

Shengen Zhang; Xu Ma; Xu Ma; Junbi Zhang

doi:10.1364/OE.493178

1. Introduction

The advance of semiconductor industry has promoted the development of the information technology. According to the Moore’s Law, the number of transistors on an integrated circuit chip would double in about 18-24 months [1]. Optical lithography is the most important process and driving force in the semiconductor manufacturing. Nowadays, deep ultraviolet (DUV) lithography is extensively used to fabricate the very large scale integrated circuit at 45-7 nm technology nodes and beyond [2]. As shown in Fig. 1(a), the radiation from the DUV light source uniformly illuminates the mask, on which the layout pattern of integrated circuit is engraved, and a portion of light rays pass through the transparent area on mask, forming the diffraction waves. Then, the diffracted light waves off the mask will be collected by the projection optics, and projected onto the wafer coated with a thin film of photosensitive material called photoresist. The photochemical reaction occurs in the exposed areas of photoresist, which make this part of photoresist solvable or unsolvable for the resist developer. The solvable part of photoresist is removed by development process to produce the print image in photoresist. The developed photoresist contour is then used as the template for the following wafer processes including the etching and ion implantation.

Fig. 1. Sketches of (a) optical lithography system, and (b) the mask patterns and print images before and after optimization [5].

Download Full Size | PDF

However, with the shrinkage of the critical dimension (CD) of integrated circuits, the interference and diffraction effects will degrade the imaging performance of lithography system and cause image distortion on the wafer, which is considered as the optical proximity effect (OPE). In order to preserve the yield and device functions of semiconductor chips, optical proximity correction (OPC) and inverse lithography technique (ILT) were proposed to compensate the OPE and improve the lithography image fidelity [3]. OPC pre-warps the geometric patterns on the mask to compensate for the image distortion and make the lithography image satisfy the manufacturing requirement as much as possible. Traditional rule-based OPC (RBOPC) approaches correct the mask patterns according to a set of specific parameters from the rule table. In addition, traditional edge-based OPC (EBOPC) approaches move the edge segments of mask pattern to offset the OPE. As one of the state-of-the-art optimization approaches, ILT deals with the mask as a pixelated image, and inversely optimizes the mask pattern from the target layout based on the lithography imaging model. Compared with RBOPC and EBOPC, ILT has higher correction flexibility by modifying all pixel values on the mask, thus can effectively improve the image resolution of lithography system [4]. Figure 1(b) illustrates the sketch of the mask patterns and print images before and after optimization, respectively.

The ILT methods were extensively studied [5–8]. However, the computational complexity is always the most challenging issue for ILT since the fine-resolution mask includes a huge amount of pixels. The values of those mask pixels need to be manipulated and optimized iteratively until the wafer image of the mask pattern is acceptable. Therefore, the computational complexity of ILT is higher than the traditional rule-based OPC and edge-based OPC approaches. Recently, the development of machine learning and deep learning provided novel perspectives for ILT methods to greatly improve the computational efficiency while retaining the image fidelity. These methods include the multilayer perceptron (MLP) neural network [9], nonparametric kernel regression [10], generative adversarial network [11], residual network [12], and model-informed deep learning (MIDL) [5].

The methods mentioned above usually use grid-structured data or Euclidean data to represent the mask patterns. Although it is simple to use the grid-structured data to represent local environment information of the mask features, it will increase the storage and computing consumption. Especially, the deep neural networks for powerful ILT have complex structures, and are still time-consuming for large mask optimization. On the other hand, the graph representation can effectively extract the useful geometric features from the local environment at much lower storage and computation costs. For example, mask local information can be represented by sub-images which are grid-structured data in the traditional machine-learning-assisted ILT methods [9]. However, the mask local information can be represented by a small number of vertices in the graph model, which require less storage space. In addition, the sparse vertex-based sampling and representation approaches can reduce the runtime of matrix manipulations in the graph convolutional network (GCN). Over past few years, graph signal processing (GSP) and GCN have been developed rapidly in many fields, which provided powerful tools to represent and analyze such kind of non-Euclidean data effectively [13–15]. GCN is mainly divided into to two categories, the spectral-based GCN and spatial-based GCN. Spectral-based GCN uses graph Fourier transform (GFT) and Chebyshev polynomials to realize the convolution on graph [16–18]. Meanwhile, the spatial-based GCN defines different aggregation functions for the central vertex that receives information from the neighboring vertices [19–22].

Previously, Zhang et al. studied the mask optimization method based on a supervised-learning GCN [23]. The input of GCN is the features extracted from the target layout using the sparse concentric circles sampling (SCCS) template, where the sampling points are arranged on the radiative lines in eight or sixteen directions from the center. The output of GCN is the optimized mask of ILT. The supervised learning method was used in that work, and the labels of the training data set are the optimized mask patterns obtained by other ILT algorithms. However, the GCN-based ILT has its own limitations. Firstly, the supervised learning consumes a lot of time for data labelling. In addition, the performance of the network is sensitive to the labelling method. The training masks that are not well-optimized by the labelling process will degrade the prediction accuracy of the network. Secondly, the SCCS template is too simple to extract sufficient local environmental features, which also reduces prediction capability. Thirdly, this method performs a pixel-wise calculation of the ILT mask under CPU framework, which is inefficient in computation.

This paper proposes a model-driven graph convolutional network (MGCN) method combined with the dense concentric circular sampling (DCCS) to improve the computational efficiency and image fidelity of ILT. The paper proposes to use a GCN-based encoder to generate the ILT mask from the target layout. The local environmental information of the target layout is extracted by the DCCS template to obtain the feature matrices, which is then inputted in the encoder. Compared with SCCS, the DCCS template can extract more abundant geometric information from the input layout. In addition, a decoder based on the partially coherent lithography imaging model is used to train the encoder with an unsupervised manner, thus avoiding the time-consuming labelling process. The loss function is defined as the difference between the target layout and the print image calculated by the decoder. With the help of the model-driven decoder which uses a lithography imaging model to drive the training of the forward GCN, the encoder can learn the physical characteristics of the lithography imaging process better, thus improving the training performance of the network. Finally, the proposed MGCN model is built based on the parallel computation of GPU to accelerate the computation speed. To our best knowledge, this paper is the first to propose the new architecture of MGCN to solve the ILT problem. In addition, the proposed DCCS method can extract more local information of mask pattern than the traditional SCCS method, thus further improving the ILT performance. The following experimental results will show the superiority of the proposed methods in terms of lithography image fidelity and computational efficiency.

This paper compares the proposed MGCN model with some other comparative methods, including the gradient-based ILT algorithm [7], MLP model [9], traditional GCN [23], MIDL model [5], and convolutional neural network (CNN). The results show that the proposed method obtains the best performance of image fidelity, and can further improve the computational efficiency.

The reminder of this paper is organized as follows. The fundamentals of lithography imaging model and GCN are provided in Section 2. The ILT method based on MGCN is proposed in Section 3. Simulation results and analysis are illustrated in Section 4. Section 5 gives the conclusions.

2. Fundamentals of lithography imaging model and GCN

2.1 Fundamentals of lithography imaging model

The imaging process of a partially coherent lithography system is shown in Fig. 2, which consists of two parts: the aerial image model and resist model. The aerial image model describes the information transfer of the layout pattern from the mask to the top of wafer, forming a stable light intensity distribution on the photoresist. The resist model describes the photochemical reactions and development process of the photoresist under the exposure. Given the aerial image result, the resist model can calculate the contour of print image after the development process. In this paper, Fourier series expansion (FSE) model and constant threshold resist (CTR) model are used to calculate the aerial image and print image, respectively.

Fig. 2. Schematic diagram of the imaging process of partially coherent lithography system.

Download Full Size | PDF

Let ${\mathbf M} \in {{\mathbb R}^{{N_M} \times {N_M}}}$ be the binary mask, where the pixel values in the transparent and opaque regions are 1 and 0, respectively. According to the Hopkins diffraction model, the aerial image of a partially coherent lithography system can be written as [24]:

(1)$${\mathbf I}({\mathbf r}) = \int\!\!\!\int {{\mathbf M}({{\mathbf r}_1}){\mathbf M}({{\mathbf r}_2})\gamma ({{\mathbf r}_1} - {{\mathbf r}_2}){{\mathbf h}^ \ast }({\mathbf r} - {{\mathbf r}_1}){\mathbf h}({\mathbf r} - {{\mathbf r}_2})\textrm{d}{{\mathbf r}_1}\textrm{d}{{\mathbf r}_2}} ,$$

where ${{\mathbf r}_1} = ({x_1},{y_1})$ and ${{\mathbf r}_2} = ({x_2},{y_2})$ are object plane coordinates located on the mask; ${\mathbf r} = (x,y)$ represents the image plane coordinates on the wafer; ${\mathbf h}$ is the point spread function (PSF); “${\ast} $” is the conjugate operator; $\gamma ({{\mathbf r}_1} - {{\mathbf r}_2})$ is the complex coherence degree, the magnitude of which represents the optical interaction extent between the two spatial positions ${{\mathbf r}_1} = ({x_1},{y_1})$ and ${{\mathbf r}_2} = ({x_2},{y_2})$.

Suppose the range of mask is defined within the square region of $(x,y) \in [ - {D / 2},{D / 2}]$, so $\gamma ({\mathbf r})$ can be expanded by the two-dimensional Fourier series, and Eq. (1) can be discretized and reformulated as the FSE model [25,26]:

(2)$${\mathbf I} = {\sum\limits_{\mathbf m} {{{\mathbf \Gamma }_{\mathbf m}}|{{\mathbf M} \otimes {{\mathbf h}^{\mathbf m}}} |} ^2},$$

where “${\otimes} $” is the convolution operator, and the two-dimensional Fourier series are:

(3)$${{\mathbf \Gamma }_\textrm{m}} = \frac{1}{{{D^2}}}\int_{{A_\gamma }} {\gamma ({\mathbf r})\exp (j{\omega _0}{\mathbf m} \cdot {\mathbf r})\textrm{d}{\mathbf r}} ,$$

where ${\omega _0} = {\pi / D}$, ${\mathbf m} = ({m_x},{m_y})$ is the coordinates of the pixel in source plane, and “${\cdot}$” represents the inner-product operation. In Eq. (2), the convolution kernel ${{\mathbf h}^{\mathbf m}}$ can be formulated as:

(4)$${{\mathbf h}^{\mathbf m}} = {\mathbf h} \cdot \exp (j{\omega _0}{\mathbf m} \cdot {\mathbf r}),$$

where ${\mathbf h} \in {{\mathbb R}^{{N_h} \times {N_h}}}$ is defined as the Fourier transform of the circular lens aperture whose cutoff frequency is $NA/\lambda $:

(5)$${\mathbf h} = \frac{{{J_1}(2\pi rNA/\lambda )}}{{2\pi rNA/\lambda }},$$

where ${J_1}({\cdot} )$ is the first kind of Bessel function, $r = \sqrt {{x^2} + {y^2}} $, NA is the numerical aperture of the projection system, $\lambda $ is the wavelength of illumination.

The photoresist effect is characterized by the CTR model that is extensively used in lithography simulations [27]. CTR assumes that the photoresist will be removed or remained in a certain area if the intensity of aerial image exceeds a resist threshold ${t_r}$. Therefore, the print image ${\mathbf Z}$ on the wafer can be expressed as:

(6)$${\mathbf Z} = \Gamma \{ {\mathbf I},{t_r}\} ,$$

where ${\mathbf I}$ is the aerial image defined in Eq. (2); $\Gamma \{{\cdot} \} $ is the hard threshold function, if $x \ge {t_r}$, $\Gamma \{ x,{t_r}\} = 1$, otherwise $\Gamma \{ x,{t_r}\} = 0$. It is worth noting that CTR model is not differentiable. So, sigmoid function is usually used to replace $\Gamma \{{\cdot} \} $ for calculating the derivative of the loss function in mask optimization process. Hence, Eq. (6) can be adjusted as follows:

(7)$${\mathbf Z} \approx sigmoid({\mathbf I},{a_r},{t_r}) = \frac{1}{{1 + \exp [ - {a_r}({\mathbf I} - {t_r})]}},$$

where ${a_r}$ is the steepness index of the sigmoid function.

2.2 Fundamentals of GCN

In GSP field, an undirected graph can be represented as $G = (V,E)$, where V and E are the vertex set and edge set, respectively. Assume that the total number of vertices in G is $|V |= {N_V}$. Let ${v_i} \in V$ be the $i$th vertex, and ${e_{i,j}} = ({v_i},{v_j}) \in E$ indicates the edge connecting ${v_i}$ and ${v_j}$($i,j = 1, \cdots ,{N_V}$) [28]. The feature vector associated with ${v_i}$ is denoted as ${{\mathbf x}_i} \in {{\mathbb R}^{F \times 1}}$, and the feature matrix of G is defined as ${\mathbf X} = {[{{{\mathbf x}_1}, \cdots ,{{\mathbf x}_i}, \cdots ,{{\mathbf x}_{{N_V}}}} ]^\mathrm{\ \top }} \in {{\mathbb R}^{{N_V} \times F}}$. The edges can be assigned with weight coefficients representing the relationship between different vertices. The weighted adjacency matrix is denoted as ${\mathbf A} \in {{\mathbb R}^{{N_V} \times {N_V}}}$, where ${{\mathbf A}_{i,j}}$ is equal to the weight value of the edge ${e_{i,j}}$. In addition, ${{\mathbf A}_{i,j}} = 0$ means that the vertices ${v_i}$ and ${v_j}$ are disconnected. The degree matrix ${\mathbf D} \in {{\mathbb R}^{{N_V} \times {N_V}}}$ is a diagonal matrix, where the $i$th diagonal element ${{\mathbf D}_{i,i}} = \sum\nolimits_j {{{\mathbf A}_{i,j}}} $ representing the sum of all edge weights connecting to ${v_i}$ [29].

The proposed method uses the GCN model [18] to predict the optimized mask pattern according to the target layout. According to the GSP and GFT theories, the layer-wise propagation of GCN is given by:

(8)$${{\mathbf H}^{(l)}} = \sigma ({{{\tilde{{\mathbf D}}}^{{{ - 1} / 2}}}\tilde{{\mathbf A}}{{\tilde{{\mathbf D}}}^{{{ - 1} / 2}}}{{\mathbf H}^{(l - 1)}}{{\mathbf W}^{(l)}}} ),\;\;\;\;l = 1,2, \cdots $$

where ${{\mathbf H}^{(l)}}$ and ${{\mathbf H}^{(l - 1)}}$ are the output feature matrix in the $l$th layer and the $(l - 1)$th layer, respectively; $\sigma ({\cdot} )$ is the active function; $\tilde{{\mathbf A}} = {\mathbf A} + {\mathbf I}$ corresponds to the weighted adjacency matrix where all vertices in G are added into the self-loops; ${\tilde{{\mathbf D}}_{i,i}} = \sum\nolimits_j {{{\tilde{{\mathbf A}}}_{i,j}}}$; ${{\mathbf W}^{(l)}}$ is the weight matrix that needs to be trained in the $l$th layer. From the perspective of spatial domain, GCN uses ${\tilde{{\mathbf D}}^{{{ - 1} / 2}}}\tilde{{\mathbf A}}{\tilde{{\mathbf D}}^{{{ - 1} / 2}}}$ to aggregate information from different vertices to realize the convolution operation.

3. Fast ILT method based on MGCN

The proposed method can be divided into four parts: feature matrix extraction with DCCS template, GCN-based encoder, model-driven decoder, and network training process. Figure 3 presents the sketch of the proposed method. For a given target layout $\tilde{{\mathbf Z}} \in {{\mathbb R}^{{N_M} \times {N_M}}}$, the proposed method firstly samples the local features surrounding each pixel in $\tilde{{\mathbf Z}}$ by the DCCS template to obtain the feature matrices $\{ {{\mathbf X}_r}\} ,r = 1, \cdots ,N_M^2$. The local features of a pixel correspond one feature matrix. Then, these feature matrices can be stacked into a three-dimensional data cube, and inputted into the encoder in parallel using GPU. Then, the optimized mask of ILT, denoted as ${\mathbf M} \in {{\mathbb R}^{{N_M} \times {N_M}}}$, is automatically generated by the encoder. Next, the output mask from the encoder is transferred into the model-driven decoder. It is noted that the decoder is composed of the lithography imaging model as described in Section 2.1, which is used to calculate the print image ${\mathbf Z}$ corresponding to the optimized mask. If the encoder generates a desired mask of ILT, then the print image should be same or very close to the target layout $\tilde{{\mathbf Z}}$. That is the output of decoder should be very close to the target layout. Therefore, GCN-based encoder can be trained by minimizing the distance between $\tilde{{\mathbf Z}}$ and ${\mathbf Z}$. Next, we describe the proposed method in details.

Fig. 3. Fast ILT method based on MGCN.

Download Full Size | PDF

3.1 Feature matrix extraction using DCCS template

Due to the optical proximity effect in lithography system, the mask correction result on a certain position should be influenced by its surrounding layout features. This paper designs a DCCS template to sample and extract local environmental features around the mask pixels. The schematic diagrams of the DCCS templates are shown in Fig. 4.

Fig. 4. The DCCS template and SCCS template. (a) The DCCS template in continuous domain, (b) the discrete DCCS template in layout grids, (c) the edge connections in discrete DCCS template, (d) the SCCS template in continuous domain, and (e) the discrete SCCS template in layout grids.

Download Full Size | PDF

Figure 4(a) shows the DCCS template in the continuous domain, where the blue circles represent the sampling points, and the radial black lines represent the edges. The lithography system considered in this paper is a symmetric imaging system, so the supporting area of its PSF is a circle. The lithography system has circular optical pupil, so the supporting area of its PSF is a circle. If the projection optics of lithography system has aberrations, it may induce variations of the amplitude and phase of PSF, but the supporting area of PSF can be still represented as a circle. In order to depict the radially symmetric interference between different pixels, the sampling points in DCCS template are placed on a set of concentric circles. In addition, according to property of OPE, a pair of mask pixels with closer distance will have a stronger interaction in the print image. The magnitude of the interaction approximately follows the inverse square law with respect to the distance [30]. Thus, the radius of the $j$th ring can be defined as ${R_j}\textrm{ = }pixel \times {\alpha ^j}$, where pixel represents the pixel size on the mask scale, and $\alpha $ is the parameter to control the radius. It shows that the sampling density near the center is larger than that of the outer rings. Each sample point on the inner ring is connected to several closer points on the outer ring, in addition, the neighboring points within the same ring are also connected.

If we fit the DCCS template in Fig. 4 (a) into the pixelated grid of layout pattern, it will become the discrete template of Fig. 4(b), where each sampling point forms a vertex. The edge connections of the discrete DCCS template are shown in Fig. 4(c). The edge weight ${{\mathbf A}_{i,j}}$ is assigned as the reciprocal of the distance between the vertices $({x_i},{y_i})$ and $({x_j},{y_j})$:

(9)$${{\mathbf A}_{i,j}} = \frac{1}{{\sqrt {{{({x_i} - {x_j})}^2} + {{({y_i} - {y_j})}^2}} }}.$$

The feature vector associated with each vertex is composed of the sampled pixel on the underlying layout pattern and the eight pixels of the target layout adjacent to this sampling point. For example, as shown in the top of Fig. 4(b), the feature vector associated with ${v_{106}}$ contains the blue sampled pixel and the eight gray pixels around ${v_{106}}$. Hence, the feature dimension of each vertex is F = 9. The DCCS template contains ${N_V}$ vertices in total, and the feature matrix of ${\tilde{{\mathbf Z}}_{p,q}}$ is denoted as ${{\mathbf X}_r} \in {{\mathbb R}^{{N_V} \times F}}$, where $r = p{N_M} + q$, and the range of r is $\{ 1,2, \cdots ,N_M^2\} $. It is worth noting that the feature matrix of the boundary pixel is obtained by padding the layout pattern.

Here we compare the DCCS template and SCCS template. Figures 4(d) and 4(e) show the SCCS template in the continuous domain and the corresponding discrete template, respectively [23]. This SCCS template is used in the simulation experiments for comparison in this paper. Overall, the red vertices in the SCCS template are placed in different directions from the center vertex. The SCCS template consists of five concentric circles, and includes 61 vertices in total. In addition, each pair of vertices in the SCCS template are connected by the weighted edge. On the other hand, the vertices distribution of DCCS template is shown in Figs. 4(a) and 4(b), which includes 109 vertices in total. Thus, the vertex number in DCCS template is more than that of SCCS template. Therefore, the DCCS method can extract more abundant mask geometric information than the SCCS method. On the other hand, the DCCS method requires more computation time than the SCCS method. The computation times of feature extraction and graph convolution are mainly influenced by the vertex number. That is, the overall computation time increases with the vertex number. By the way, the edge number in DCCS template is fewer than that of SCCS template. However, the number of weighted edges only affects the number of nonzero values in the adjacency matrix, which has little impact on the computational complexity. In addition, with the help of GPU platform, the calculation time based on DCCS method is acceptable.

It is worth noting that the mask sampling methods mentioned above can be used in both of coherent lithography imaging system and partially coherent lithography image system. Except for the DCCS and SCCS methods, other sampling approaches such as the blue noise sampling may be also used to extract the geometric features of mask pattern [31,32]. The blue noise sampling has uniform distribution, and is able to extract the high frequency components of the mask layout in spatial domain. In the future work, we plan to study different sampling approaches and their impacts on the mask synthesis results.

3.2 GCN-based encoder

The encoder predicts the underlying pixel value of ILT mask based on the features extracted by DCCS template from the target layout. The encoder can be regarded as a binary classifier, where the input is the feature matrix and the output is the pixel value of the optimized mask. The encoder consists of three graph convolutional layers and three fully connected layers. The flowchart of the encoder is presented in Fig. 5.

Fig. 5. The flowchart of the GCN-based encoder.

Download Full Size | PDF

The input of the first graph convolution layer is the feature matrix ${{\mathbf X}_r} = {\mathbf H}_r^{(0)}$. The output of the $l$th graph convolution layer can be calculated as follows:

(10)$${\mathbf H}_r^{(l)} = \textrm{ReLU}({{{\tilde{{\mathbf D}}}^{{{ - 1} / 2}}}\tilde{{\mathbf A}}{{\tilde{{\mathbf D}}}^{{{ - 1} / 2}}}{\mathbf H}_r^{(l - 1)}{\mathbf W}_{GCN}^{(l)}} ),\;\;\;\;l = 1,2,3.$$

Most of the parameters in Eq. (10) were described in Section 2.2, and ${\mathbf W}_{GCN}^{(l)}$ is the weight matrix for dimension transformation that can be optimized in the training process. Here we use the rectified linear unit (ReLU) function as the activation function. The GCN model uses the vertex feature aggregation and dimension transformation to obtain the high-level features from the whole graph. It is noted that the output ${\mathbf H}_r^{(3)}$ from the third layer is a column vector.

The input of the first fully connected layer is $\vec{y}_r^{(0)} = {\mathbf H}_r^{(3)}$. The forward feature propagation can be calculated as follows:

(11)$$\left\{ \begin{array}{l} \vec{y}_r^{(1)} = \textrm{ReLU}({\mathbf W}_{FCL}^{(1)}\vec{y}_r^{(0)} + \vec{b}_{FCL}^{(1)}),\\ \vec{y}_r^{(2)} = \textrm{ReLU}({\mathbf W}_{FCL}^{(2)}\vec{y}_r^{(1)} + \vec{b}_{FCL}^{(2)}),\\ y_r^{(3)} = \Gamma \{{{\mathbf W}_{FCL}^{(3)}\vec{y}_r^{(2)} + \vec{b}_{FCL}^{(3)},{t_m}} \}, \end{array} \right.$$

where ${\mathbf W}_{FCL}^{(1)}$, ${\mathbf W}_{FCL}^{(2)}$, and ${\mathbf W}_{FCL}^{(3)}$ are the weight matrices; $\vec{b}_{FCL}^{(1)}$, $\vec{b}_{FCL}^{(2)}$ and $\vec{b}_{FCL}^{(3)}$ are the bias vectors; $\Gamma \{{x,{t_m}} \}$ is the hard threshold function and ${t_m}$ is the threshold. The finally output $y_r^{(3)}$ is a scalar with the value of 0 or 1. Its relationship to the pixel value of the predicted ILT mask is $y_r^{(3)}$=${{\mathbf M}_{p,q}}$, where $r = p{N_M} + q$ $(p,q \le {N_M})$. For the training purpose, the sigmoid function is used to replace the hard threshold function to make it differentiable. Hence, $\vec{y}_r^{(3)}$ in Eq. (11) can be calculated as follows:

(12)$$\vec{y}_r^{(3)} = sigmoid(\vec{y}_r^{(2)},{a_m},{t_m}),$$

where ${a_m}$ is the steepness index.

As mentioned above, the GCN-based encoder is initially designed to predict the ILT mask in a pixel-wise manner. In order to improve the computational efficiency, we can input the feature matrices of all pixels into the encoder and predict the entire ILT mask simultaneously by using the parallel computing under GPU framework. After that, the predicted ILT mask, denoted as ${\mathbf M}$, is sent to the model-driven decoder to calculate the print image.

3.3 Training process

This paper uses an unsupervised method to train the network. As shown in Fig. 3, the objective of ILT is to minimize the pattern error (PE) between $\tilde{{\mathbf Z}}$ and ${\mathbf Z}$, which is defined as:

(13)$$\textrm{PE} = ||{\tilde{{\mathbf Z}} - {\mathbf Z}} ||_F ^2,$$

where $\textrm{||}\; \cdot \;\textrm{|}{\textrm{|}_\textrm{F}}$ is the Frobenius norm. In addition, we adopt the quadratic penalty to constrain the optimized mask to be close to a binary pattern [7]. The quadratic penalty term ${R_Q}$ is defined as:

(14)$${R_Q} = {\mathbf 1}_{{N_M} \times 1}^T \cdot [{4{\mathbf M} \odot ({\mathbf 1} - {\mathbf M})} ]\cdot {{\mathbf 1}_{{N_M} \times 1}},$$

where ${\mathbf 1}_{{N_M} \times 1}^T$ is the one-valued vector with dimension of ${N_M} \times 1$.

Therefore, the loss function L for training the network can be written as:

(15)$$L = ||{\tilde{{\mathbf Z}} - {\mathbf Z}} ||_F ^2 + {\gamma _Q}{R_Q},$$

where ${\gamma _Q}$ is the weight of quadratic penalty term. The backpropagation algorithm is then used to optimize the weight matrices and bias vectors by minimizing the loss function.

The proposed ILT method is an image-based method to synthesis the mask pattern and improve the lithography image fidelity. Thus, the MGCN framework can be extended to optimize the phase-shifting mask (PSM) and non-Manhattan mask. For the PSM optimization, we can modify the network output in Eq. (12) to have three optional values, which correspond to the opaque mask region, transparent mask region with phase 0, and transparent mask region with phase π [33]. In addition, the proposed DCCS method can be also applied to a pixelated mask pattern with different phases and non-Manhattan geometries. The DCCS method sets abundant sampling points outside the horizontal and vertical directions. Thus, this method can effectively extract local geometric features from the non-Manhattan masks.

4. Simulation result and analysis

This section gives the simulation results to prove the superiority and effectiveness of the proposed MGCN method. Section 4.1 provides the simulation results based on simple layout patterns. Section 4.2 gives the results based on complex layout patterns. In future work, we will study the methods for training networks with simple layouts and testing them with complex layouts. In two groups of mask pattern simulation experiments, the proposed method is compared with steepest descent (SD) method [7], MLP method [9], CNN method, MIDL method [5], traditional GCN method [23], and several degenerate versions of the proposed method. The proposed method uses a computer with Intel Core i5-9600 CPU @ 3.70 GHz, 32GB RAM memory and NVIDIA RTX 2080 Ti graphics card to carry out the simulation experiments.

4.1 Simulation results based on simple layout patterns

In this paper, a partially coherent lithography system with the illumination wavelength of $\lambda $=193 nm is used in the following simulations. The light source is an annular illumination whose inner and outer partial coherence factors are ${\sigma _{inner}}$=0.8, ${\sigma _{outer}}$=0.975, respectively. The CD of the target layout is 45 nm. The numerical aperture of the projection system is NA = 1.35. The side length of the mask pattern is ${N_M}$=100 pixels, where the pixel size on mask is 2.5 nm × 2.5 nm. In Eq. (7) the steepness index and threshold of sigmoid functions are ${a_r}$=50, and ${t_r}$=0.1, respectively. In Eq. (12) the steepness index and threshold are ${a_m}$=1, and ${t_m}$=0.5, respectively. The quadratic penalty weight is ${\gamma _Q}$=0.0001.

The training layouts and testing layouts are shown in Fig. 6. These layout patterns are the common Manhattan geometries, which contain most of the typical features in integrated circuit layouts. The training layout data can be augmented by rotating the mask patterns by 90°, 180° and 270°, therefore the total number of training layouts is 36. Due to the high cost of acquiring real data, this paper uses the theoretical calculation model of lithography imaging system to drive the training process of GCN. If the real SEM data is obtained, those data can be used to calibrate the lithography imaging model, and then the calibrated imaging model can be used to guide the training of GCN. The overall framework and principle should be consistent with the proposed method in this paper. Therefore, the proposed method can be extended to the real data scenarios, and we will research this problem in future work.

Fig. 6. The examples of the (a) training set and (b) testing set of the simple layout patterns [5].

Download Full Size | PDF

In Fig. 7, Fig. 8, and Fig. 9, the first rows show the optimized mask patterns obtained by different methods, the second rows show the corresponding print images, and the third rows show the error patterns between the target layouts and print images. The values of PEs are presented at the bottom of each column. The first column shows the results obtained by a traditional gradient-based ILT algorithm referred to as SD method. The second to the eighth columns show the results of the MLP method, CNN method, MIDL method, traditional GCN method (F = 1, SCCS, Su.), GCN method (F = 9, SCCS, Su.), GCN method (F = 9, SCCS, Unsu.), and the proposed MGCN method, respectively. It is worth noting that in the GCN methods of the fifth to eighth column, the feature dimension, sampling template and training manner are different.

Fig. 7. Simulation results of different methods based on the simple testing layout 1.

Download Full Size | PDF

Fig. 8. Simulation results of different methods based on the simple testing layout 2.

Download Full Size | PDF

Fig. 9. Simulation results of different methods based on the simple testing layout 3.

Download Full Size | PDF

In the SD method, the algorithm is iterated for 100 times, and the step size is 1.5. The MLP network contains five fully connected layers, and are trained in the supervised manner for 2000 loops with the learning rate of 0.09. The CNN model consists of three convolutional layers with the ReLU activation function, and the network is trained in an unsupervised manner similar to the proposed MGCN method for 200 loops until convergence, and the learning rate is 0.0001. The MIDL model contains five layers, and the learning rate is ${10^{ - 8}}$. In the proposed MGCN method, the DCCS template used is shown in Fig. 4(b), which contains five rings and ${N_V}$=109. The sizes of the weight matrices and bias vectors in the network are defined as ${\mathbf W}_{GCN}^{(1)} \in {{\mathbb R}^{9 \times 6}}$, ${\mathbf W}_{GCN}^{(2)} \in {{\mathbb R}^{6 \times 3}}$, ${\mathbf W}_{GCN}^{(3)} \in {{\mathbb R}^{3 \times 1}}$, $\hat{{\mathbf W}}_{FCL}^{(1)} \in {{\mathbb R}^{100 \times {N_V}}}$, $\hat{{\mathbf W}}_{FCL}^{(2)} \in {{\mathbb R}^{20 \times 100}}$, $\hat{{\mathbf W}}_{FCL}^{(3)} \in {{\mathbb R}^{1 \times 20}}$, $\hat{b}_{FCL}^{(1)} \in {{\mathbb R}^{{N_V} \times 1}}$, $\hat{b}_{FCL}^{(2)} \in {{\mathbb R}^{100 \times 1}}$, and $\hat{b}_{FCL}^{(3)} \in {{\mathbb R}^{20 \times 1}}$. The network is trained for 300 loops with the learning rate $lr$=0.0001. In these simulations, we search the optimal parameters for each algorithm using the trail-and-error method, which tries to obtain the lowest PE for each algorithm.

Among the non-GCN methods in the first four columns of Fig. 7, Fig. 8 and Fig. 9, we observe that the MLP has the highest PE, the SD method has the second highest PE, followed by CNN and MIDL. The proposed MGCN method has the lowest PE and obtains the best imaging performance. The proposed MGCN method is better than the MIDL method, since MGCN predicts the ILT mask pixel by pixel based on the sampled local environment of the target layout. Therefore, the MGCN is able to generate the detailed features on the mask pattern, such as the line-ends and corners.

We also compare the proposed MGCN method with traditional GCN and variant versions of GCN. In particular, traditional GCN (F = 1, SCCS, Su.) means using the SCCS template to sample the target layout, where the feature dimension associated with each vertex is F = 1, and the network is trained in a supervised manner. Similarly, GCN (F = 9, SCCS, Su.) means that the feature dimension associated with each vertex is F = 9, and the network is trained in a supervised manner. GCN (F = 9, SCCS, Unsu.) means that the feature dimension is F = 9, but the network is trained in an unsupervised manner similar to the proposed MGCN method. It shows that increasing the feature dimension of the graph is helpful to improve the network prediction capacity, and thus improving the final image fidelity. In addition, the unsupervised training based on the model-driven decoder can achieve better imaging performance than the supervised learning. Comparing the GCN (F = 9, SCCS, Unsu.) with the proposed MGCN, we found that using the proposed DCCS template to sample the layout pattern can effectively reduce the image error.

Table 1 also compares the computational efficiency of different methods, including the training times and average testing times over the three testing layouts in Fig. 6. The traditional SD method has no training process, so the average testing time represents the runtime of mask optimization. Compared with the MLP method, SD method, CNN method, and MIDL method, the proposed MGCN method has the lowest average testing time. Although the training time of MGCN is higher than the MLP and MIDL methods, we only need to train the network once and can reuse it for other layout patterns. In addition, the runtimes of MGCN are longer than the traditional GCN, GCN (F = 9, SCCS, Su.), and GCN (F = 9, SCCS, Unsu.), which mainly has three reasons. Firstly, the vertex number of DCCS template is more than that of SCCS template, thus, the feature extraction time with DCCS template is longer than that with SCCS method. Secondly, the model-driven decoder takes some time to compute the print image in the training process. Thirdly, the numbers of training loops to reach convergence are different for these methods. In short, the proposed MGCN method provides an efficient way to predict the ILT result with the best image fidelity among the comparative methods.

Table 1. Runtimes of different methods based on simple layout patterns

View Table | View all tables in this article

4.2 Simulation results based on complex layout patterns

In this section, we use a set of complex layouts to train and test the MGCN to verify the extensibility and generalization ability of the proposed method. Figure 10 shows the training set and testing set of the complex layouts, respectively [5]. The side length of the complex layouts is ${N_M}$=184 pixels, and the pixel size is 5.625 nm × 5.625 nm. Other parameters are the same as those in Section 4.1.

Fig. 10. The examples of the (a) training set and (b) testing set of the complex layout patterns [5].

Download Full Size | PDF

Figure 11, Fig. 12 and Fig. 13 show the simulation results of different methods for the complex testing layouts. From left to right, it shows the results of the SD method, MLP method, CNN method, MIDL method, MIDL + SD method, and proposed MGCN method, respectively. The MLP is trained for 400 loops with the learning rate of 0.3, and CNN is trained for 500 loops with the learning rate of 0.0001. The parameters of SD, MIDL and MIDL + SD methods are the same as those in [5]. It is noted that “MIDL + SD” means to combination of MIDL and SD methods. This method first uses MIDL to predict the ILT mask, and then uses the SD method to continue optimization. The structure of MGCN was described in Section 4.1. It is trained for 850 loops with the learning rate of 0.0001.

Fig. 11. Simulation results of different methods based on the complex testing layout 1.

Download Full Size | PDF

Fig. 12. Simulation results of different methods based on the complex testing layout 2.

Download Full Size | PDF

Fig. 13. Simulation results of different methods based on the complex testing layout 3.

Download Full Size | PDF

Overall, the proposed MGCN method achieves the lowest PE, the MIDL + SD method has the second lowest PE, followed by the SD method. The MLP method, CNN method, and MIDL method obtain poorer image fidelity. It is also noted that compared with the MIDL + SD method, the MGCN method can significantly reduce the PE without using any iterative optimization algorithm subsequently. In addition, both MIDL + SD method and MGCN method use the backpropagation algorithm to iteratively update the weight matrices and bias vectors in the network training processes. However, the networks are trained once, and the backpropagation algorithm is not executed in the testing stage any more. In the testing stage, the proposed MGCN can directly output the mask correction results without subsequent iterative optimization on the mask pattern.

Table 2 compares the runtimes for different methods using the complex layouts, including the training time and average testing time. It is worth noting that the training times of MIDL and MIDL + SD methods are the same. However, the average testing time of MIDL + SD method is longer than the MIDL method, because the MIDL + SD method includes additional runtime of the subsequent iterative mask optimization [5]. Although the training time of the proposed method is relatively longer, however, it has the shortest average testing time among these methods. So, the proposed MGCN method achieves the best performance in both computational efficiency and image fidelity.

Table 2. Runtimes of different methods based on complex layout patterns

View Table | View all tables in this article

5. Conclusion

This paper proposed the MGCN method to improve the image performance and mask optimization efficiency. The proposed method used a model-driven strategy to realize the unsupervised training of the network, which avoided the time-consuming labeling process. The MGCN network contained the GCN-based encoder and the model-driven decoder based on lithography imaging system. The DCCS template was designed to sample the feature matrices from the target layout, which include the local geometric feature information. Then the feature matrices were input into encoder by parallel computing in GPU to obtain the ILT mask. In the training stage, the lithography imaging process is used to improve learning performance of the network. The simulation results showed that the proposed method can effectively improve the imaging fidelity of the lithography system and the mask optimization efficiency. In the future, we will study the network structure and optimization process to further improve the speed of network optimization and the efficiency of mask optimization. In addition, we will also attempt to incorporate regularization terms into the loss function to improve the mask manufacturability obtained by the proposed method.

Funding

State Key Lab of Digital Manufacturing Equipment and Technology (DMETKF2022011).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. E. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE 86(1), 82–85 (1998). [CrossRef]

2. Y. Wei, Advanced Lithography Theory and Application of VLSI (Science Press, 2016).

3. X. Ma and G. R. Arce, Computational Lithography (John Wiley and Sons, 2010).

4. X. Ma, S. Zhang, Y. Pan, J. Zhang, C. Yu, L. Dong, and Y. Wei, “Research and progress of computational lithography,” Laser Optoelectron. Prog. 59(9), 0922008 (2022). [CrossRef]

5. X. Zheng, X. Ma, Q. Zhao, Y. Pan, and G. R. Arce, “Model-informed deep learning for computational lithography with partially coherent illumination,” Opt. Express 28(26), 39475–39491 (2020). [CrossRef]

6. L. Pang, “Inverse lithography technology: 30 years from concept to practical, full-chip reality,” J. Micro/Nanolithogr., MEMS, MOEMS 20(03), 030901 (2021). [CrossRef]

7. A. Poonawala and P. Milanfar, “Mask design for optical microlithography—an inverse imaging problem,” IEEE Trans. on Image Process. 16(3), 774–788 (2007). [CrossRef]

8. X. Ma, Y. Li, and L. Dong, “Mask optimization approaches in optical lithography based on a vector imaging model,” J. Opt. Soc. Am. A 29(7), 1300–1312 (2012). [CrossRef]

9. R. Luo, “Optical proximity correction using a multilayer perceptron neural network,” J. Opt. 15(7), 075708 (2013). [CrossRef]

10. X. Ma, B. Wu, Z. Song, S. Jiang, and Y. Li, “Fast pixel-based optical proximity correction based on nonparametric kernel regression,” J. Micro/Nanolithogr., MEMS, MOEMS 13(4), 043007 (2014). [CrossRef]

11. H. Yang, S. Li, Y. Ma, B. Yu, and E. Young, “GAN-OPC: mask optimization with lithography-guided generative adversarial nets,” in Proceedings of IEEE Conference on Design Automation, 1–6 (2018).

12. Z. Xiao, S. Xu, and Z. Chen, “Mask optimization method based on residual network,” In 2nd International Conference on Laser, Optics and Optoelectronic Technology, 1234332 (SPIE, 2022).

13. D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag. 30(3), 83–98 (2013). [CrossRef]

14. S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional networks: a comprehensive review,” Computational Social Networks 6(1), 11–23 (2019). [CrossRef]

15. Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2021). [CrossRef]

16. J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” In Proceedings of the 2nd International Conference on Learning Representations, 1–14 (2014).

17. M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” In Proceedings of the 30th International Conference on Neural Information Processing Systems, 3844–3852 (2016).

18. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” In Proceedings of the 5th International Conference on Learning Representations, 1–14 (2017).

19. P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” In Proceedings of the 6th International Conference on Learning Representations, 1–12 (2018).

20. W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 1025–1035 (2017).

21. F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model CNNs,” In Conference on Computer Vision and Pattern Recognition, 5425–5434 (IEEE, 2017).

22. K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” In Proceedings of the 7th International Conference on Learning Representations, 1–17 (2019).

23. S. Zhang, X. Ma, J. Zhang, R. Chen, Y. Pan, C. Yu, L. Dong, Y. Wei, and G. R. Arce, “Fast optical proximity correction based on graph convolution network,” Proc. SPIE 1161, 116130V (2021). [CrossRef]

24. Y. C. Pati and T. Kailath, “Phase-shifting masks for microlithography automated design and mask requirements,” J. Opt. Soc. Am. A 11(9), 2438–2452 (1994). [CrossRef]

25. B. E. A. Saleh and M. Rabbani, “Simulation of partially coherent imagery in the space and frequency domains and bymodal expansion,” Appl. Opt. 21(15), 2770–2777 (1982). [CrossRef]

26. X. Ma and G. R. Arce, “Binary mask optimization for inverse lithography with partially coherent illumination,” J. Opt. Soc. Am. A 25(12), 2960–2970 (2008). [CrossRef]

27. Y. Granik, N. B. Cobb, and T. Do, “Universal process modeling with VTRE for OPC,” In 27th Annual International Symposium on Microlithography, 377–394 (SPIE, 2002).

28. A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: overview, challenges, and applications,” Proc. IEEE 106(5), 808–828 (2018). [CrossRef]

29. F. R. K. Chung, Spectral Graph Theory (American Mathematical Society, 1996).

30. P. Gao, A. Gu, and A. Zakhor, “Optical proximity correction with principal component regression,” Proc. SPIE 6924, 69243N (2008). [CrossRef]

31. A. Parada-Mayorga, D. L. Lau, J. H. Giraldo, and G. R. Arce, “Blue-Noise Sampling on Graphs,” IEEE Trans. Signal Inf. Proc. Netw. 5(3), 554–569 (2019). [CrossRef]

32. D. L. Lau, G. R. Arce, A. Parada-Mayorga, D. Dapena, and K. Pena-Pena, “Blue-Noise Sampling of Graph and Multigraph Signals: Dithering on Non-Euclidean Domains,” IEEE Signal Process. Mag. 37(6), 31–42 (2020). [CrossRef]

33. S. H. Chan, A. K. Wong, and E. Y. Lam, “Initialization for robust inverse synthesis of phase-shifting masks in optical projection lithography,” Opt. Express 16(19), 14746–14760 (2008). [CrossRef]

Method	Training time	Average testing time	PE for simple testing layout
Method	Training time	Average testing time	Layout 1	Layout 2	Layout 3
SD	―	2.78 s	148	152	136
MLP	21.98 s	1.01 s	168	140	146
CNN	1118.94 s	10.05 × 10⁻² s	140	138	114
MIDL	29.62 s	4.01 × 10⁻² s	118	118	113
Traditional GCN (F = 1, SCCS, Su.)	284.32 s	1.16 × 10⁻² s	250	238	241
GCN (F = 9, SCCS, Su.)	67.92 s	2.05 × 10⁻² s	154	106	184
GCN (F = 9, SCCS, UnSu.)	213.05 s	2.29 × 10⁻² s	54	48	65
MGCN (F = 9, DCCS, UnSu.)	267.27 s	3.76 × 10⁻² s	36	24	39

Method	Training time	Average testing time	PE for complex testing layout
Method	Training time	Average testing time	Layout 1	Layout 2	Layout 3
SD	―	35.44 s	232	208	228
MLP	14.51 s	29.54 × 10⁻² s	1780	518	625
CNN	8115.69 s	34.41 × 10⁻² s	1415	1128	1332
MIDL	298.34 s	1.28 s	1032	724	1144
MIDL + SD	298.34 s	15.97 s	200	180	196
MGCN (F = 9, DCCS, Unsu.)	2083.88 s	11.62 × 10⁻² s	170	112	104

Method	Training time	Average testing time	PE for simple testing layout
Method	Training time	Average testing time	Layout 1	Layout 2	Layout 3
SD	―	2.78 s	148	152	136
MLP	21.98 s	1.01 s	168	140	146
CNN	1118.94 s	10.05 × 10⁻² s	140	138	114
MIDL	29.62 s	4.01 × 10⁻² s	118	118	113
Traditional GCN (F = 1, SCCS, Su.)	284.32 s	1.16 × 10⁻² s	250	238	241
GCN (F = 9, SCCS, Su.)	67.92 s	2.05 × 10⁻² s	154	106	184
GCN (F = 9, SCCS, UnSu.)	213.05 s	2.29 × 10⁻² s	54	48	65
MGCN (F = 9, DCCS, UnSu.)	267.27 s	3.76 × 10⁻² s	36	24	39

Method	Training time	Average testing time	PE for complex testing layout
Method	Training time	Average testing time	Layout 1	Layout 2	Layout 3
SD	―	35.44 s	232	208	228
MLP	14.51 s	29.54 × 10⁻² s	1780	518	625
CNN	8115.69 s	34.41 × 10⁻² s	1415	1128	1332
MIDL	298.34 s	1.28 s	1032	724	1144
MIDL + SD	298.34 s	15.97 s	200	180	196
MGCN (F = 9, DCCS, Unsu.)	2083.88 s	11.62 × 10⁻² s	170	112	104

Fast inverse lithography approach based on a model-driven graph convolutional network

Abstract

1. Introduction

2. Fundamentals of lithography imaging model and GCN

2.1 Fundamentals of lithography imaging model

2.2 Fundamentals of GCN

3. Fast ILT method based on MGCN

3.1 Feature matrix extraction using DCCS template

3.2 GCN-based encoder

3.3 Training process

4. Simulation result and analysis

4.1 Simulation results based on simple layout patterns

4.2 Simulation results based on complex layout patterns

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (2)

Equations (15)

Optics Express