## Abstract

Computational lithography is nowadays playing an indispensible role in improving the imaging performance of optical lithography systems. This paper develops a new and powerful approach to computational lithography by introducing an information theoretical channel modeling in partially coherent lithography systems. A statistical model is built up based on the lithography imaging model to characterize the information transfer between the mask and print images. Then, this paper calculates the optimal information transfer (OIT) in partially coherent lithography systems, and derives the theoretical limit of image fidelity for optical proximity correction (OPC), which is used extensively in computational lithography. Finally, the proposed information theoretical approaches are applied to improve the OPC solutions obtained by the gradient-based algorithm. A set of simulations are provided to verify the proposed information theoretical model and approaches.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

To date computational lithography has been extensively used in the semiconductor industry to improve the imaging performance of optical lithography systems [1,2]. Computational lithography is a set of mathematical and algorithmic approaches to improve the lithography resolution and image fidelity by individually or jointly optimizing the lithography tools, mask and process parameters based on imaging models and process models. Figure 1(a) illustrates a typical optical lithography system. In practical optical lithography systems, partially coherent light sources are used to illuminate the mask on which the circuit layout is carved. The circuit layout is then transferred from the mask onto the wafer by optical projection and photoresist development. As lithography technology enters the sub-wavelength realm, the optical proximity effect severely influences the imaging quality of lithography systems. An example of optical proximity effect is the distortion of mask patterns where aerial images for dense and isolated lines are different. Optical proximity correction (OPC) is thus one of the most important computational lithography techniques used to compensate for image distortion by optimizing the mask patterns [3–6].

In the last several years, a number of pixelated OPC approaches have been developed to obtain the optimized mask patterns by solving for the inverse lithography problem [7–19]. Different from traditional rule-based and edge-based OPC methods, pixelated OPC co-optimizes the transmissivities of all mask pixels. Therefore, pixelated OPC methods dramatically increase the degrees of optimization freedom and achieve higher resolution and image fidelity than rule-based or edge-based OPC methods [2]. It is natural to wonder how information is transferred in lithography systems, and what is the theoretical limit of image fidelity achieved by pixelated OPC methods. However, most of current research on computational lithography focuses on developing numerical approaches to optimize the lithography imaging performance. On the other hand, the underlying information transmission mechanism in computational lithography has not yet been well understood. Recently, Rieger discussed the analogy between the lithography techniques and communication theory [20]. Ma et. al, first established an approximate information channel model for coherent optical lithography systems [21], deriving the maximum information transfer and theoretical limit of image fidelity for coherent lithography systems. This prior work focused on coherent lithography systems. However, most practical lithography systems are partially coherent, where the partially coherent illumination consists of a number of coherent source points [22–24]. Thus, the information channel model proposed in [21] is inadequate to characterize partially coherent lithography systems. In addition, the information channel model in [21] fails to consider the correlation between different pixels on the print image. However, neighboring pixels on the print image are indeed correlated to each other due to the optical proximity effect. Thus, more rigorous models are desired to accurately depict the information transmission in partially coherent lithography systems.

To our best knowledge, this paper is the first to study information theoretical approaches for computational lithography in partially coherent lithography systems. It first develops the information channel model based on the statistical relationship between the mask and print images. As shown in Fig. 1(a) and 1(b), the partially coherent lithography system is regarded as an information channel that transfers the layout pattern from mask to wafer. The mask and print images are the input and output signals of the information channel, respectively. A statistical method is used to calculate the probability transfer matrix between a batch of pixels on the mask and print images. Then, we derive the mutual information between the mask and print images.

Another contribution of this paper is to study and analyze the theoretical limit of image fidelity that can be achieved by pixelated OPC in partially coherent lithography systems. The pixelated OPC method encodes the input mask to increase the information transfer accuracy in the channel. The image fidelity is evaluated by the pattern error (PE), which is defined as the square of the Euclidean distance between the actual print image and the target layout [6,25]. The theoretical limit of image fidelity is formulated as a function of the mutual information. Then, a numerical optimization algorithm is applied to solve for the optimal information transfer (OIT), which leads to the best image fidelity. Finally, an application of the proposed information theoretical approaches is discussed. In this application, the optimal probability distribution of the mask pattern, obtained from the information theoretical model, is used to improve the OPC solutions provided by current gradient-based algorithms. The proposed approach is assessed by a set of simulations based on different layout patterns.

The remainder of this paper is organized as follows. Section 2 establishes the information channel model and derives the mutual information between the mask and print images. Section 3 discusses the relationship between mutual information and image fidelity in partially coherent lithography systems. Section 4 proposes an optimization method to solve for the OIT and to obtain the theoretical limit of image fidelity for pixelated OPC techniques. Section 5 discusses the application of the proposed information theoretical approaches. Conclusions are provided in Section 6.

## 2. Information channel model for partially coherent lithography systems

In this section, we first establish the information channel model for partially coherent lithography systems, and then derive the mutual information between the mask pattern and print images. The aerial image of partially coherent lithography systems can be separated into several images generated by coherent systems. As shown in Fig. 2, the aerial image of partially coherent lithography systems is formulated as

**M**(

**r**) is the mask pattern,

**r**= (

*x*,

*y*) is the spatial coordinate, Γ

**is the coefficient of the Fourier series expansion of the complex degree of coherence**

_{m}*γ*(

**r**), and ⊗ is the notation of convolution operation. Assuming that the mask pattern is constrained within a square region defined by

*x*,

*y*∈ [−

*D*/2,

*D*/2], we can calculate Γ

**as**

_{m}*A*is a square region with

_{γ}*x*,

*y*∈ [−

*D*,

*D*],

*ω*

_{0}=

*π*/

*D*,

**m**= (

*m*,

_{x}*m*),

_{y}*m*and

_{x}*m*are integers, and · is the inner-product. In Eq. (1),

_{y}**h**is the point spread function of the coherent component corresponding to Γ

^{m}**and is given by**

_{m}**h**(

**r**) is the convolution kernel defined as [26,27] where NA is the numerical aperture,

*λ*is the wavelength of illumination, and

*J*

_{1}(·) is the Bessel function of the first kind. In addition, the effect of the photoresist is modeled by a hard threshold function [6]. The print image of partially coherent lithography systems can be expressed as where Λ{

*x*} = 1 if

*x*> 0, otherwise Λ{

*x*} = 0,

*t*is the threshold of photoresist, and

_{r}**I**is defined in Eq. (1).

In this paper, the partially coherent lithography system is regarded as an information channel, and the mask and print images are its input and output signals, respectively. According to Eq. (1), the optical proximity effect is described by the convolution between the point spread function **h ^{m}** (

**r**) and the mask

**M**(

**r**). This means that the imaging of one pixel on the mask will be influenced by its surrounding pixels within the region covered by

**h**(

^{m}**r**). Assume the area covered by

**h**(

^{m}**r**) is a circle denoted as

*C*. Each pixel at coordinate (

_{p}*x*,

*y*) on the print image is affected by a set of mask pixels within

*C*around the coordinate (

_{p}*x*,

*y*). Therefore, the neighboring pixels on the print image are correlated to each other. Accordingly, the information on the mask is transferred to the print image by a batch of pixels together, rather than by independent pixels. Next, we will build up the statistical relationship between a batch of pixels on the mask and print images to take into account the correlation between neighboring image pixels. In Fig. 2, assume region

*C*includes

_{p}*K*pixels. Let the vector

*x⃗*= (

*x*

_{1},

*x*

_{2}, . . .,

*x*)

_{K}*represent the*

^{T}*K*mask pixels covered by

*C*, where

_{p}*x*is the value of the

_{i}*i*th pixel. For the binary mask,

*x*= 0 or 1. The vector

_{i}*y⃗*= (

*y*

_{1},

*y*

_{2}, . . .,

*y*)

_{K}*represents the*

^{T}*K*pixels on the print image corresponding to

*x⃗*, and

*y*= 0 or 1. Let

_{i}*N*and

_{x}*N*be the number of one-valued pixels in

_{y}*x⃗*and

*y⃗*, respectively. Thus,

*p*is the probability that

_{m}*m*pixels in

*x⃗*have value of 1, and

*q*is the probability that

_{n}*n*pixels in

*y⃗*have value of 1, i.e.,

*m*,

*n*= 0, 1, . . .,

*K*, and

*P*{·} represents the probability of the argument. Define the vectors of probability masses as

_{r}*p⃗*= (

*p*

_{0},

*p*

_{1}, . . .,

*p*)

_{K}*and*

^{T}*q⃗*= (

*q*

_{0},

*q*

_{1}, . . .,

*q*)

_{K}*. Suppose*

^{T}**T**∈ ℝ

^{(K+1)×(K+1)}is the probability transfer matrix between

*p⃗*and

*q⃗*, i.e., In the above equation, the element of

**T**in the (

*n*+ 1)th row and the (

*m*+ 1)th column is defined as

**T**(

*n*+ 1,

*m*+ 1) =

*Pr*{

*N*=

_{y}*n*|

*N*=

_{x}*m*}, which indicates the probability of

*N*=

_{y}*n*given

*N*=

_{x}*m*, where

*m*,

*n*= 0, 1, . . .,

*K*.

In this paper, we use a statistical method to calculate the matrix **T** based on the layout patterns as shown in Fig. 3. The top and bottom rows in Fig. 3 show the mask patterns and the corresponding print images, respectively. Figures 3(a) and 3(b) illustrate the target of “layout 1” and its pixelated OPC solution, respectively. Figures 3(e) and 3(f) illustrate the target of “layout 2” and its pixelated OPC solution. Here, we use the gradient-based OPC algorithm in [6] to optimize the masks. We go through all of the sub-regions covered by the circle *C _{p}* on the mask patterns and their corresponding print images in Fig. 3. Then, we count the frequencies for the events {

*N*=

_{x}*m*and

*N*=

_{y}*n*}, where

*m*,

*n*= 0, 1, . . .,

*K*. Let #{

*N*=

_{x}*m; N*=

_{y}*n*} be the number of occurrences of the event {

*N*=

_{x}*m*and

*N*=

_{y}*n*}. We calculate the probability

*P*{

_{r}*N*=

_{y}*n*|

*N*=

_{x}*m*} as

*P*{

_{r}*N*=

_{y}*n*|

*N*=

_{x}*m*} is the element of

**T**in the (

*n*+ 1)th row and the (

*m*+ 1)th column.

Now, let us consider the specific case in which *n* pixels in *y⃗* have value of 1. Note that *y⃗* represents the *K* pixels on the wafer covered by the circle *C _{p}*. There will be $\left(\begin{array}{c}\hfill n\hfill \\ \hfill K\hfill \end{array}\right)$ distribution patterns of the

*n*one-valued pixels in

*C*, where$\left(\begin{array}{c}\hfill n\hfill \\ \hfill K\hfill \end{array}\right)$ represents the number of

_{p}*K*-combinations from a set of

*n*elements (

*n*choose

*K*). Furthermore, the

*K*grouped pixels in

*C*could be located in any region of the print image. Assuming that the print image can have arbitrary geometry, on average, all of the $\left(\begin{array}{c}\hfill n\hfill \\ \hfill K\hfill \end{array}\right)$ distributions have the same probability. In particular, each of the distribution pattern occurs with probability ${P}_{r}\left\{{N}_{y}=n\right\}/\left(\begin{array}{l}n\hfill \\ K\hfill \end{array}\right)={q}_{n}/\left(\begin{array}{l}n\hfill \\ K\hfill \end{array}\right)$. If we know that the number of one-valued pixels in

_{p}*y⃗*is

*n*, then the conditional probability of each distribution pattern is $1/\left(\begin{array}{l}n\hfill \\ K\hfill \end{array}\right)$. It is important to remark that this simple uniform distribution model can somehow approximately describe the statistical behavior of

*y⃗*on average, leading to some useful results that will be described in the sequel. Based on the analysis above, we can calculate the entropy of

*y⃗*as

*y⃗*given

*x⃗*is formulated as follows

*x⃗*and

*y⃗*can be calculated as

*p*and

_{m}*p*are the

_{u}*m*th and the

*u*th elements in

*p⃗*, and

**T**

*and*

_{nm}**T**

*are the (*

_{nu}*n*,

*m*)th and the (

*n*,

*u*)th elements in

**T**.

Note that Eq. (1) is a scalar imaging model, which takes into account only the amplitude of the electromagnetic field, but ignores its vector nature. For immersion lithography with hyper-NA (NA>1), the vector imaging model is more accurate than the scalar imaging model. According to the Abbe’s method, the vector imaging model of a partially coherent lithography system is

*x*,

_{s}*y*) is the coordinate on the source plane, ⊙ is the notation of element-by-element multiplication,

_{s}**M**is the mask pattern, ${\mathbf{h}}_{p}^{{x}_{s}{y}_{s}}$ represents the point spread function along the

*p*–axis (

*p*=

*x*,

*y*,

*z*) of the lithography system corresponding to the source point (

*x*,

_{s}*y*),

_{s}**B**

^{xsys}represents the oblique incidence effect of the light rays on the mask,

**J**(

*x*,

_{s}*y*) is the intensity of the source point at (

_{s}*x*,

_{s}*y*), and ${J}_{\mathit{sum}}={\sum}_{{x}_{s}}{\sum}_{{y}_{s}}\mathbf{J}({x}_{s},{y}_{s})$ is the normalized factor. Note that ${\mathbf{h}}_{p}^{{x}_{s}{y}_{s}}$ in Eq. (13) corresponds to the point spread function

_{s}**h**(

^{m}**r**) in Eq. (1). In this way, the aforementioned method to establish the information channel model can be straightforwardly applied to the vector imaging model by calculating the probability transfer matrix

**T**and the mutual information between

*x⃗*and

*y⃗*according to the statistical methods mentioned above.

## 3. Relationship between mutual information and image fidelity

In this section, we discuss the relationship between the mutual information in Eq. (12) and the image fidelity of partially coherent lithography systems. According to information theory, the mutual information indicates the rate at which the information can be reliably transmitted over the information channel. In the partially coherent lithography system, the information of the layout pattern is transferred from mask to wafer pixel by pixel. For the binary mask, each pixel on the mask includes 1 bit information. However, the information on the mask cannot be completely transferred due to the band-limitation of lithography systems. Equation (12) provides the mutual information between *K* pixels on the mask and print images. The average mutual information per pixel is *I*(*x⃗*; *y⃗*)/*K*. Thus, the partially coherent lithography system can only transmit *I*(*x⃗*; *y⃗*)/*K* bit of information over one pixel without error, where 0≤ *I*(*x⃗*; *y⃗*)/*K* ≤ 1. That means we have to use at least *K*/*I*(*x⃗*; *y⃗*) pixels to transfer 1 bit information. On the print image, the adjacent *K*/*I*(*x⃗*; *y⃗*) pixels within a squared area are grouped together to form a macro pixel. The macro pixel is the smallest unit to transmit 1 bit information through the lithography system, and thus all of the micro pixels in one macro pixel have the same value. Figures 4(a) and 4(b) provide the examples of single pixels and macro pixels, respectively. Assume the side length of a single pixel is *a*, then the side length of a macro pixel is given by

*d*=

*a*, which is the side length of a single pixel. This is because adjacent pixels on the mask can be independent to each other. Each mask pixel will be mapped to a macro pixel on the print image.

Figures 4(d)–4(f) show the method to evaluate the image fidelity based on mutual information. In this paper, pattern error (PE) is used as the metric to measure the image fidelity of lithography systems. The PE is defined as the square of the Euclidean distance between the actual print image and the target layout [6, 25]. Edge placement error (EPE) is another extensively used metric for lithography image fidelity. The EPE indicates the edge position error of the actual print image with respect to the desired pattern. It has been noted that the PE is approximately proportional to the average EPE for a given print image [25]. In Figs. 4(d)–4(f), the shadow regions illustrate examples of target layouts. As the output of the information channel, the print image consists of a set of macro pixels. Different macro pixels may overlap each other. Thus, we try to use the overlapped macro pixels to cover the underlying target layout. The macro pixels are represented by the squares contained in the red-dotted lines. The difference between the coverage and the target layout indicates the PE of the print image. The pattern error occurs when the target layout is uncovered (incomplete coverage) or the non-pattern area is extra-covered (overcomplete coverage).

Consider first the special case where the side length of the macro pixel, *a′*, is an integral multiple of the side length of the single pixel *a*. Now, let us consider two subcases. The first case is *a′* ≤ CD, where CD is the critical dimension, and *a′* is given in Eq. (14). The critical dimension is defined as the minimum feature size on the layout. Please note that CD is a multiple of *a*, since the single pixels are the minimum units to compose the layout pattern. As shown in Fig. 4(d), the macro pixels can perfectly cover the target layout without offset. According to Eq. (14), we have

*a′*> CD. In order to simplify the analysis, we assume that the target layout consists of lines with width equal to CD. Then, the PE resulting from overcomplete coverage is approximate to (

*a′*− CD) ·

*L*/(2 ·

_{t}*a*

^{2}), where

*L*is the perimeter of the target layout, and

_{t}*a*is the size length of the single pixel. On the other hand, the PE resulting from incomplete coverage is equal to the area of target layout divided by the area of single pixel, which is denoted by

*A*/

_{t}*a*

^{2}. In summary, the minimum PE of print image can be calculated as follows

Consider next the most general case where the side length of the macro pixel is not an integral multiple of the side length of the single pixel. As shown in Figs. 4(e) and 4(f), the target layout cannot be perfectly covered by the macro pixels, because of the inconsistency between the side lengths of a single pixel and a macro pixel. In order to obtain the minimum PE of the print image, we need to choose one of two options, either incomplete coverage as shown in Fig. 4(e) or overcomplete coverage as shown in Fig. 4(f). Now, let us consider three subcases. The first case is *a′* > CD. Based on the analysis mentioned above, the PE resulting from overcomplete coverage is approximately (*a′* − CD) · *L _{t}*/(2 ·

*a*

^{2}), while the PE resulting from incomplete coverage is equal to

*A*/

_{t}*a*

^{2}. The second case is

*a′*≤ CD and (

*a′*mod

*a*) ≥

*a*/2, where “mod” is the congruence symbol. In this case, the incomplete coverage will lead to smaller PE than the overcomplete coverage. The minimum PE is [

*a*− (

*a′*mod

*a*)] ·

*L*/(2 ·

_{t}*a*

^{2}). The third case is

*a′*≤ CD and (

*a′*mod

*a*) <

*a*/2. In this case, the overcomplete coverage will lead to smaller PE than the incomplete coverage. The minimum PE is (

*a′*mod

*a*) ·

*L*/(2 ·

_{t}*a*

^{2}). In summary, the minimum PE of the print image can be calculate as follows

Based on the analysis above, in order to find out the theoretical limitation of image fidelity, we should try to make *a′* approximate an integral multiple of *a*. At the same time, *a′* should not be much larger than CD. According to Eq. (15), the mutual information should satisfy the following equation to reduce the minimum PE of the print image:

*a*.

## 4. Optimal information transfer and image fidelity limit

In this section, we will solve for the optimal information transfer (OIT), and then calculate the theoretical limit of image fidelity that can be achieved by pixelated OPC. The OIT is defined as the optimal value of the mutual information that results in the minimum PE of the print image. As discussed in Section 3, the OIT should satisfy Eq. (18). According to Eqs. (12) and (18), in order to calculate the OIT, we will find out the optimal distribution of *p⃗* to minimize the following cost function:

*p⃗*. Specifically, given the initial

*p⃗*, we can calculate the initial value of (

*I*(

*x⃗*;

*y⃗*) denoted by

*I*

^{0}. Then, we substitute

*I*

^{0}into Eq. (18) to find the best positive integer Π that meets Eq. (18) most closely.

When we solve for the OIT, three constraints on *p⃗* need to be taken into account. First, all of the elements in *p⃗* are restricted within the range of [0,1], since the probability is always non-negative and smaller than or equal to 1. Second, the summation of all elements in *p⃗* is equal to 1, i.e., ${\sum}_{m=0}^{K}{p}_{m}=1$. In addition, the optimized print image obtained by computational lithography should be very close to the target layout. Thus, the probability distribution *p⃗* should satisfy the following equation:

*C*on the target layout, and then counting the frequencies for the events

_{p}*N*=

_{y}*n*, where

*n*= 0, 1, . . .,

*K*. Finally, we calculate the probability of

*P*{

_{r}*N*=

_{y}*n*}, which is the (

*n*+ 1)th element of $\tilde{\overrightarrow{q}}$.

According to Eq. (19) and the constraints on *p⃗*, the optimal probability distribution $\widehat{\overrightarrow{p}}$ can be calculated by solving the following optimization problem:

*p⃗*under the constraint for the target layout distribution. In order to do so, we transfer the second and third constraints as penalty terms and reformulate Eq. (21) as

*F*(

*p⃗*) is defined as

*ω*

_{1}and

*ω*

_{2}are the weight coefficients of the penalty terms.

Next, we use the following parameter transformation to restrict *p _{m}* within the range of [0,1]

*θ⃗*= (

*θ*

_{1},

*θ*

_{2}, . . .,

*θ*)

_{K}*. Before optimization, the parameters are initialized as ${\overrightarrow{\theta}}^{0}={\text{cos}}^{-1}\left\{2\tilde{\overrightarrow{q}}-1\right\}$. Then, the steepest descent algorithm is applied to solve Eq. (25). In the (*

^{T}*i*+ 1)th iteration, the variables are updated as

*F*(

*θ⃗*) is the gradient of the cost function with respect to

*θ⃗*. After each update, we normalize

*θ⃗*to guarantee that ${\sum}_{m=0}^{K}{p}_{m}=1$. Appendix A provides the details to derive the gradient ∇

*F*(

*θ⃗*).

After the optimization procedure, we substitute the optimal probability distribution $\widehat{\overrightarrow{p}}$ into Eq. (12) and calculate the OIT as

Next, we use the proposed method to obtain the OITs of “Layout 3” and “Layout 4” in Figs. 5(a) and 5(e). The weight coefficients *ω*_{1} and *ω*_{2} in Eq. (23) are chosen empirically to keep balance among different terms in the cost function. In the following simulations, we set step = 0.1, *ω*_{1} = 1, *ω*_{2} = 10, and the total number of iterations is 200. The convergence curves for the cost function and the mutual information for both layouts are plotted in Fig. 6. Notice that the cost function decreases with the iteration number, while the mutual information gradually approaches the optimal values to satisfy Eq. (18). The resulting OITs for “Layout 3” and “Layout 4” are 4.5982 and 4.5900, respectively. Substituting these OITs into Eqs. (14) and (17), we can calculate the lower bound of PE for the print images. As shown in the second column of Table 1, the limits of image fidelity for “Layout 3” and “Layout 4” are 3.05 and 17.50, respectively.

In order to verify the theoretical limits of image fidelity, we use the gradient-based pixelated OPC algorithm in [6] to optimize the mask patterns for “Layout 3” and “Layout 4”. In this case, the wavelength *λ*=193nm, CD=45nm, NA=1.25, and the resist threshold is *t _{r}* = 0.19. The light source is an annular illumination with inner and outer partial coherence factors of

*σ*= 0.8 and

_{in}*σ*= 0.975 [2]. In Eq. (1), the lateral sizes of

_{in}**M**(

**r**) for “Layout 3” and “Layout 4” are 201 pixels and 261 pixels, respectively. The lateral size of

**h**(

^{m}**r**) is 21 pixels. In order to use the gradient-based OPC algorithm to optimize the mask, the hard threshold function in Eq. (5) is approximated by the differentiable sigmoid function [5], so that

*a*is the steepness index of the sigmoid function. Note that

_{r}*a*and

_{r}*t*are two important parameters in the sigmoid function characterizing the effect of photoresist development on the print image. The gradient-based OPC methods optimize the mask patterns based on the imaging model and the photoresist model. As an important parameter in the photoresist model,

_{r}*a*has a significant impact on the optimization results and on the lithography image fidelity. In addition, the value of the step length has a significant influence on the convergence of gradient-based OPC algorithms. In order to find out the minimum PE of the print image that can be obtained by the gradient-based OPC algorithm, we repeat the simulations using different combinations of step lengths and

_{r}*a*. Particularly, we traverse the step length from 0.5 to 4 with an interval of 0.5, and traverse

_{r}*a*from 10 to 90 with an interval of 5. For each parameter combination, we run the algorithm for 100 iterations, and find out the minimum PE value obtained during these iterations. Then, we compare the minimum PE to its theoretical limit derived from the optimal information transfer, thus verifying the proposed information theoretical approaches.

_{r}Figure 5 and Table 1 illustrate the results obtained from the OPC algorithm. For both target layouts, the best parameter combination is “step length”=1.5 and *a _{r}* = 90. Figures 5(b) and 5(f) show the optimized masks of “Layout 3” and “Layout 4” under the best parameter combinations. Figures 5(d) and 5(h) are the corresponding print images. Figures 5(c) and 5(g) are the print images obtained when the masks are equal to the target layouts. In Table 1, the second column presents, as explained before, the theoretical limits of image fidelity for both of the layouts. The third column provides the minimum PEs of print images that can be achieved by the gradient-based OPC algorithm. Notice that the theoretical limits are lower than the minimum PEs obtained by the OPC algorithm.

## 5. Application of the proposed information theoretical approach to improve the performance of OPC methods

As shown in Fig. 5, the minimum PEs obtained by the gradient-based OPC algorithm are larger than the theoretical limits. In this section, we apply the proposed information theoretical approach to improve the OPC solutions shown in Fig. 5. According to Eq. (22), $\widehat{\overrightarrow{p}}$ represents the optimal probability distribution of the mask pattern, by which we can reach the lower bound of PE. Figures 7(a) and 7(d) illustrate the optimal distributions for “Layout 3” and “Layout 4”, respectively. On the other hand, Figs. 7(b) and 7(e) show the probability distributions of the obtained OPC masks in Figs. 5(b) and 5(f), respectively. Compared to the optimal distributions, the root mean square errors (RMSE) of the distributions of the OPC masks are 0.4997 and 0.2880, respectively. Notice that compared to the distributions of the OPC masks, the optimal distributions have smaller values for *p*_{0}, and larger values for *p _{i}* when

*i*is small, where

*p*

_{0}is the probability of opaque regions on the mask, the

*p*’s for small

_{i}*i*represent the likelihood of sub-resolution assist features (SRAF), and

*p*’s for large

_{i}*i*represent the likelihood of the main features. This means that, compared to the optimal distributions, the OPC patterns in Fig. 5 have too many opaque areas, and fail to insert enough SRAFs around the main features. This conclusion can be verified in Figs. 5(b) and 5(f), which show that the gradient-based OPC algorithm generates few SRAFs surrounding the main features, and most of the mask areas are opaque.

Next, we propose a method to refine the OPC solutions and improve the image fidelity based on the optimal probability distribution $\widehat{\overrightarrow{p}}$. Let **M′** be the OPC mask obtained by the gradient-based OPC algorithm. The workflow of the proposed method to refine the OPC solutions is provided in Table 2. At the beginning, the contour of **M′** is calculated and denoted as *E*{**M′**}. Then, we try to insert the SRAFs around the main features to reduce the PE by turnning on some of the zero-valued pixels outside *E* {**M′**}. Subsequently, we go over all of the one-valued pixels inside *E*{**M′**}, and turn off some of them to further reduce the PE. After that, the refined mask pattern is produced. According to the manufacturing constraints of the mask, the SRAFs are not allowed to be too close to the main features on the mask pattern. Thus, the proposed method inserts SRAFs within the regions that are 10 pixels away from the boundaries of the mask patterns produced by OPC. In practice, the distance between SRAFs and main features can be adjusted based on the real mask fabrication requirements.

Figure 8 shows the refined mask patterns and their corresponding print images for “Layout 3” and “Layout 4”. Figures 7(c) and 7(f) show the probability distributions of the refined masks, which are shown in Figs. 8(a) and 8(c), respectively. Compared to the optimal distributions, the RMSEs of the distribution of the refined masks are 0.2067 and 0.1387, respectively, lower than the RMSEs for the masks optimized by the OPC method. The PEs for the refined masks are presented in the fourth column in Table 1, and are lower than the PEs resulting from the standard OPC masks. From the comparison between Figs. 5 and 8, we observe that the proposed method can insert many more SRAFs on the masks, and further improve the image fidelity of lithography systems.

It should be noted that the proposed method in Table 2 is a heuristic method, which is likely to converge to a local minimum of pattern error. In addition, the mask patterns in Fig. 8 are complex and include numerous tiny features that are difficult to produce by current mask fabrication techniques. Nevertheless, the results in Fig. 8 demonstrate the potential of the proposed information theoretical approaches to improve the performance of current OPC algorithms, providing intuitions on how to achieve this improvement. In future work, we will extend the proposed information channel model to take into account other realistic influence factors, such as the process variations in lithography systems [28], mask manufacturability issues [29] and so on. We will also try to develop a global OPC optimization method to minimize the pattern error according to the optimal probability distribution of mask patterns.

## 6. Conclusion

This paper has introduced information theoretical approaches for computational lithography in partially coherent lithography systems. The information channel model was built up, and the mutual information between mask and print images was formulated and studied. Subsequently, the relationship between the mutual information and lithography image fidelity was discussed. Then, we derived the optimal information transfer and the theoretical limit of image fidelity for pixelated OPC techniques. Finally, the proposed information theoretical approaches were utilized to improve the OPC solutions of gradient-based algorithms. The methods proposed in this paper were verified by a set of simulations based on different layout patterns. In our future work, we will extend the proposed information theoretical approaches to take into account other realistic factors that influence lithography, such as process variations and mask manufacturability.

## A. Appendix

According to Eq. (23), we define the three terms in the cost function as

*F*

_{1}with respect to

*p*is

_{m}*F*

_{2}with respect to

*p*is The gradient of

_{m}*F*

_{3}with respect to

*p⃗*can be calculated as

**T**

*is the transposition of*

^{T}**T**.

## Funding

National Natural Science Foundation of China (NSFC) (61675021); Beijing Natural Science Foundation (4173078); the Fundamental Research Funds for the Central Universities (2018CX01025); the China Scholarship Council (Grant No. 201706035012).

## References and links

**1. **A. K. Wong, *Resolution Enhancement Techniques in Optical Lithography* (SPIE, 2001). [CrossRef]

**2. **X. Ma and G. R. Arce, *Computational Lithography*, 1st ed. (John Wiley and Sons, 2010). [CrossRef]

**3. **Y. Liu and A. Zakhor, “Binary and phase shifting mask design for optical lithography,” IEEE Trans. Semicond. Manuf. **5**(2), 138–152 (1992). [CrossRef]

**4. **Y. Granik, “Fast pixel-based mask optimization for inverse lithography,” J. Microlith. Microfab. Microsyst. **5**(4), 043002 (2006).

**5. **A. Poonawala and P. Milanfar, “Mask design for optical microlithography – an inverse imaging problem,” IEEE Trans. Image Process. **16**(3), 774–788 (2007). [CrossRef] [PubMed]

**6. **X. Ma and G. R. Arce, “Binary mask optimization for inverse lithography with partially coherent illumination,” J. Opt. Soc. Am. A **25**(12), 2960–2970 (2008). [CrossRef]

**7. **X. Ma and G. R. Arce, “Generalized inverse lithography methods for phase-shifting mask design,” Opt. Express **15**(23), 15066–15079 (2007). [CrossRef] [PubMed]

**8. **N. B. Cobb and Y. Granik, “Dense OPC for 65nm and below,” Proc. SPIE **5992**, 599259 (2005). [CrossRef]

**9. **P. M. Martin, C. J. Progler, G. Xiao, R. Gray, L. Pang, and Y. Liu, “Manufacturability study of masks created by inverse lithography technology (ILT),” Proc. SPIE **5992**, 599235 (2005). [CrossRef]

**10. **A. Poonawala and P. Milanfar, “OPC and PSM design using inverse lithography: A non-linear optimization approach,” Proc. SPIE **6154**, 61543H (2006). [CrossRef]

**11. **A. Poonawala, B. Painter, and C. Kerchner, “Model-based assist feature placement for 32nm and 22nm technology nodes using inverse mask technology,” Proc. SPIE **7488**, 748814 (2009). [CrossRef]

**12. **Y. Shen, N. Wong, and E. Y. Lam, “Level-set-based inverse lithography for photomask synthesis,” Opt. Express **17**(26), 23690–23701 (2009). [CrossRef]

**13. **N. Jia and E. Y. Lam, “Machine learning for inverse lithography: Using stochastic gradient descent for robust photomask synthesis,” J. Opt. **12**(4), 045601 (2010). [CrossRef]

**14. **J. Yu and P. Yu, “Impacts of cost functions on inverse lithography patterning,” Opt. Express **18**(8), 23331–23342 (2010). [CrossRef] [PubMed]

**15. **X. Ma and G. R. Arce, “Pixel-based OPC optimization based on conjugate gradients,” Opt. Express **19**(3), 2165–2180 (2011). [CrossRef] [PubMed]

**16. **X. Ma, Y. Li, and L. Dong, “Mask optimization approaches in optical lithography based on a vector imaging model,” J. Opt. Soc. Am. A **29**(7), 1300–1312 (2012). [CrossRef]

**17. **X. Ma, Z. Song, Y. Li, and G. R. Arce, “Block-based mask optimization for optical lithography,” Appl. Opt. **52**(14), 3351–3363 (2013). [CrossRef] [PubMed]

**18. **X. Ma, G. R. Arce, and Y. Li, “Optimal 3D phase-shifting masks in partially coherent illumination,” Appl. Opt. **50**(28), 5567–5576 (2011). [CrossRef] [PubMed]

**19. **W. Lv, S. Liu, Q. Xia, X. Wu, Y. Shen, and E. Y. Lam, “Level-set-based inverse lithography for mask synthesis using the conjugate gradient and an optimal time step,” J. Vac. Sci. Technol. B **31**(4), 041605 (2013). [CrossRef]

**20. **M. L. Rieger, “Communication theory in optical lithography,” J. Micro/Nanolith. MEMS MOEMS **11**(1), 013003 (2012). [CrossRef]

**21. **X. Ma, H. Zhang, Z. Wang, Y. Li, G. R. Arce, J. Garcia-Frias, and L. Zhang, “Information theoretical aspects in coherent optical lithography systems,” Opt. Express **25**(23), 29043–29057 (2017). [CrossRef]

**22. **B. E. A. Saleh and M. Rabbani, “Simulation of partially coherent imagery in the space and frequency domains and by modal expansion,” Appl. Opt. **21**(15), 2770–2777 (1982). [CrossRef] [PubMed]

**23. **X. Ma, D. Shi, Z. Wang, Y. Li, and G. R. Arce, “Lithographic source optimization based on adaptive projection compressive sensing,” Opt. Express **25**(6), 7131–7149 (2017). [CrossRef] [PubMed]

**24. **Z. Song, X. Ma, J. Gao, J. Wang, Y. Li, and G. R. Arce, “Inverse lithography source optimization via compressive sensing,” Opt. Express **22**(12), 14180–14198 (2014). [CrossRef] [PubMed]

**25. **W. Lv, Q. Xia, and S. Liu, “Mask-filtering-based inverse lithography,” J. Micro/Nanolith. MEMS MOEMS **12**(4), 043003 (2013). [CrossRef]

**26. **M. Born and E. Wolf, *Principles of Optics*, Cambridge University (1999). [CrossRef]

**27. **R. Wilson, *Fourier Series and Optical Transform Techniques in Contemporary Optics* (Wiley, 1995).

**28. **P. Yu, S. X. Shi, and D. Z. Pan, “True process variation aware optical proximity correction with variational lithography modeling and model calibration,” J. Micro/Nanolith. MEMS MOEMS **6**(3), 031004 (2007). [CrossRef]

**29. **S. Jiang, X. Ma, and A. Zakhor, “A recursive cost-based approach to fracturing,” Proc. SPIE **7973**, 79732P (2011). [CrossRef]