Design of optical systems that maximize as-built performance using tolerance/compensator-informed optimization

Brian J. Bauman; Michael D. Schneider

doi:10.1364/OE.26.013819

1. Introduction

The optical designer’s goal is often to create the best as-built optical system (i.e., the best performance when the optics are fabricated and assembled according to the designer’s tolerances and compensation schemes). In other cases, the designer’s goal is to achieve a pre-determined as-built performance while loosening tolerances as much as possible. In describing the theory that follows, we adopt the goal of optimal performance to simplify the exposition, but the work applies equally towards a goal of easing tolerances. In either case, current design techniques tend to produce the best nominal design and do not incorporate knowledge about tolerances and compensators.

As is well-known by designers, the process of optical design often starts with an initial design that is approximately of the correct form [1]. The system is characterized by a merit function that summarizes the performance of the system. Using the merit function with the aid of powerful numerical optimization codes and the designer’s insight (i.e., the combination of professional experience and informed intuition), the design is optimized to a form that meets requirements: the nominal design. Some designers will use global optimization codes to improve the nominal design. These are computationally intensive and may require extended periods of computational time.

The second part of the optical design process is tolerancing. Typically, a designer either starts with a set of candidate tolerances and assesses the resulting performance degradations, or the designer starts with a performance degradation target and allows the tolerances to adjust to meet the degradation target. In either case, the designer further adjusts tolerances as needed. Compensators are often included in the tolerancing approach. The number and type of the compensator(s) depend on the manufacturing environment and the quantity. Compensators may be related to fabrication, such as a radius of curvature, which is fabricated last to compensate for glass melt parameters (e.g., index, Abbe number, and inhomogeneity) and other as-built radii and thicknesses. Compensation may also be in the assembly: intentionally despacing, decentering or tilting a component to compensate for other tolerances. For example, adjusting focus is a common compensator. Other types of compensators can include tilting a focal plane or decentering a lens. In all cases the compensators are used to ease tolerances in the lens designs and thereby improve yield, performance and/or costs.

Assessing the effects of tolerancing is computationally intensive. Optical design software packages include a one-factor-at-a-time tolerancing analysis where the code perturbs the lens by a tolerance, optimizes the lens using a compensator(s), and calculates the performance degradation. This procedure is often performed with a user-written tolerancing script. The procedure is then repeated for each tolerance in turn and the degradations are often root-sum-squared (rss’d) to yield a measure of predicted as-built performance. In addition, the tolerance/compensation approach is often assessed using many Monte Carlo (MC) simulations of tolerance realizations.

If the candidate tolerancing and compensation approach does not meet the required performance, then either the designer tightens tolerances (at increased cost) or changes the compensation approach. If neither of these approaches is effective, then the designer, using insight, attempts to change the design to decrease tolerance sensitivity to acceptable levels. This is a challenging task and may not succeed given the resources available.

Some lens design codes have various operands or constraints that are implemented in the merit function to try to reduce tolerance sensitivity. However, these are “unlabeled knobs” where one needs to experiment with the operand/constraint and weights to find the right effect. Even so, these knobs do not consider the effect of compensators. Other codes include tolerancing scripts within the optimization routine in order to improve the as-built performance; this is a “brute-force” approach. While these scripts can include the effect of compensators, they are computationally very expensive and often impractical.

In short, the tolerancing of a lens is computationally intensive and iterative. Incorporating tolerancing considerations into the merit function is difficult and computationally slow, which impedes the designer’s effort to create designs that perform well when assembled. Some of this challenge may be solved over time as computing power grows, but we expect that the designer will welcome improved efficiency, which may allow more time to explore more of design space using global optimization.

2. Previous approaches

Some authors have attempted methods to create designs that perform better when built. Rogers suggested creating an ensemble of misaligned systems (where the misalignments were MC realizations of the tolerances), which were then simultaneously optimized [2]. This approach is computationally intensive, and the designer does not know if the tolerance space has been sampled appropriately. It also does not make use of the considerable insights available through Nodal Aberration Theory (NAT), discussed in section 5. Rogers also suggests another approach: run a global optimization and find a large number of candidate solutions (located at local minima of the merit function), then tolerance all of them and choose the solution with the best as-built performance [3]. Again, this is computationally intensive, and the design is not informed during optimization by tolerance/compensator performance. Further, this approach will not find solutions that are not at local minima of the merit function.

Chapman and Sweeney [4] developed a method that can be used for alignment, and Tessieres implemented a similar approach, but included mirror bending modes [5, 6]. An optical system’s wavefront is measured at several field points and the low-order Zernike coefficients are extracted. Meanwhile, in the optical design software, unit motions of the various compensators are simulated and the low-order Zernike coefficients at the same field points are calculated. Using Singular Value Decomposition (SVD) [7], the required compensator motion is deduced for the given measurement of Zernike coefficients. This method is computationally slow, which is acceptable for alignment purposes, but can be impractical for design. This method does not use NAT insights, and does not predict the corrected performance.

Manuel [8, 9] recognized that Double Zernike polynomials (DZ’s) [10] could be used as a basis set to describe the aberrations of a misaligned telescope.

3. Approach

We now describe our approach to optimizing optical designs to maximize as-built performance [11]. We adopt a metric for system performance, as described in Section 4, which is the root-mean-square (rms) wavefront error, rss’d over the field. This metric can also be expressed using Double Zernike coefficients. We use Nodal Aberration Theory (NAT) [12–16] to describe the additional aberrations that are generated when optical surfaces are tilted or decentered. We then express these aberrations in terms of Double Zernike polynomials. Last, we introduce a theory of compensation that analytically produces the correct amount of compensation and generates the correct residual system wavefront error (WFE) in terms of Double Zernike polynomials, which can then be combined with the nominal performance to produce the expected as-built performance.

Section 4 defines terms that we will use. Section 5 reviews NAT. Section 6 summarizes DZ’s. Section 7 derives expressions for decentered aberrations in terms of DZ’s. The compensation theory is developed in Section 8, and all the theory components are assembled in Section 9. Section 10 discusses the application of this method to a common exemplar for optical design theory, the Cooke triplet. Section 11 discusses how to extend this approach to include aspheres, higher-order aberrations, consideration of pupil aberrations, other kinds of system performance metrics, and freeform surfaces.

4. Definitions

We will use a small but important deviation from most previous NAT and optical design literature, which is discussed in Manuel [8]. In most optics literature, field positions are often taken in the y-axis and the pupil coordinate angle (θ) is described as a clockwise rotation from the y-axis. Because we will be considering the aberrations over the entire field, it is more convenient to define azimuthal angles in terms of counterclockwise rotation from the x-axis, in concert with customary practice in most science and engineering; see Fig. 1. Shack and Thompson’s NAT still applies when we shift this angle convention, except for the details of vector multiplication; this is described in Appendix 1.

Fig. 1 Definition of normalized pupil and field coordinates on the unit circle for this paper. The vectors $\vec{ρ} and \vec{H}$ are illustrated. Quantities without vector symbols are magnitudes. Section 5 will briefly use $θ^{'} = θ - ϕ$ , which is the angular variable used in most aberration theory.

Download Full Size | PDF

In accordance with a frequent practice, we will use the rms wavefront error as the performance metric at a given field point:

W_{rms} (\vec{H}) = {(\frac{\int_{pupil} W {(\vec{ρ}; \vec{H})}^{2} d \vec{ρ}}{\int_{pupil} d \vec{ρ}})}^{1 / 2}

where

W (\vec{ρ}; \vec{H})

is the piston-tip-tilt-removed wavefront error function at pupil coordinate

\vec{ρ}

and field coordinate

\vec{H}

as defined in Fig. 1. The factor

d \vec{ρ}

is a differential area in the pupil and is scalar. In general, integration over the pupil in Eq. (1) could include vignetting as a function of field coordinate

\vec{H}

, but will be neglected in this paper for simplicity.

The system wavefront performance is defined as the rss across the field of the rms wavefront error, i.e.,

W_{sys} = {(\frac{\int_{field} {(W_{rms} (\vec{H}))}^{2} d \vec{H}}{\int_{field} d \vec{H}})}^{1 / 2}

where

d \vec{H}

is the differential area in the field. We will also use the square of this expression and refer to the value as the variance of the system wavefront performance.

Of course, this is not the only way that system performance can be defined and in Section 11, we show that the approach in this paper can be applied to many reasonable metrics that integrate performance over field.

5. Review of nodal aberration theory

In the 1970’s and 1980’s, Shack [12], Thompson [13, 14], and Buchroeder [16] developed the fundamentals of what is now called Nodal Aberration Theory (NAT). NAT has given us great insight into the aberration behavior and performance of systems with tilted and decentered components. We do not re-develop NAT here, but rather recount enough of the theory to establish the parts of NAT that we use and to establish notation.

We use the Hopkins notation [17] as modified and popularized by Shack [18]. The system wavefront error function is expressed with

\begin{array}{l} W (H, ρ, θ^{'}) = \sum_{j k l m} {(W_{k l m})}_{j} H^{k} ρ^{l} \cos^{m} θ^{'} \\ m \geq 0; k - m \geq 0 and even; l - m \geq 0 and even \end{array}

where

W_{k l m}

is the Hopkins coefficient, j is the surface number, H is the normalized magnitude of the field location, ρ is the normalized pupil radius, and θʹ is the angle between the field position vector and pupil vector, as mentioned in the caption of Fig. 1. For brevity, we will often write

W

rather than

W (H, ρ, θ^{'})

. The indices k, l, m are the powers of H, ρ, and

\cos θ^{'}

respectively. The order of the aberration (e.g., 4^th order) is given by k + l.

As we know from 4^th-order aberration theory, each surface in an optical system generates aberration fields, i.e., a particular aberration pattern as a function of field position. These aberration fields include spherical aberration, coma, astigmatism, Petzval curvature, and distortion. The magnitude of these aberration fields is found from first-order calculations [19]. The surface contributions are summed together into a system aberration field. At this point, we neglect induced aberrations so that this summation is accurate to 4^th order. The issue of induced aberrations [20], termed “extrinsic aberrations” by Sasian [21], is set aside until Section 11.

The Seidel aberrations are embedded in Eq. (3) above and enumerated in the first four columns of Table 1 below. Distortion, which does not affect image quality, is neglected for simplicity. We will generally call these centered aberrations or design aberrations since these are the aberrations that exist without misalignments. The vector forms of the terms, developed by Shack and Thompson, are helpful in analyzing aberrations from surface misalignments, discussed next. NAT uses a concept called vector multiplication which is similar to the notion of multiplying complex numbers on $ℜ^{2}$ . Discussion of vector multiplication properties is given in Thompson [14] and Appendix 1.

Table 1. Seidel aberrations and the decentered aberration terms that are created when surfaces are decentered.

View Table | View all tables in this article

We now consider a misalignment of a surface such as a decenter or tilt. For a system with no tilts/decenters, the aberration fields from all the surfaces merely add together to generate the system aberration field. If a surface is tilted/decentered, then the aberration field from that surface is also decentered in the image plane. Buchroeder [16] and Thompson [22] showed that the displacement of a surface’s aberration field, described by the image plane displacement vector $\vec{σ}$ , depends on the gut ray’s intercept with the transverse plane containing the surface’s center of curvature (see Fig. 2).

Fig. 2 (left) a gut ray incident on a misaligned surface; unperturbed surface is light gray, perturbed surface is dark gray; original center of curvature is C, perturbed center of curvature is Cʹ; (center) same as left panel, but the gut ray is perturbed by an upstream surface; (right) quantities used for defining $\vec{σ}$ .

Download Full Size | PDF

\vec{σ} \equiv \vec{β} / \bar{i}

The gut ray, also called the optical axis ray, is defined as the ray that originates in the center of the object field and passes through the center of the aperture stop; in a centered system, the gut ray is coincident with the optical axis. The aberration fields from each surface are still added together, but they now add in a more complicated and asymmetric way [16]. The result is that the same design aberrations are present, but we have added terms with similar pupil dependence that are lower-order in field dependence than the Seidel aberrations; these terms can be found by substituting $(\vec{H} - \vec{σ})$ for $\vec{H}$ in the fourth column of Table 1; the last two columns are the result.

The last two columns of Table 1 summarize the decentered terms and the magnitudes; we will generally refer to these as decentered aberrations. For example, a surface that generates astigmatism (which varies quadratically with field) will generate linear astigmatism and constant astigmatism when decentered (i.e., the wavefront is still astigmatic, but there is a term that depends linearly on field, and a term that is independent of field). For simplicity, we will neglect magnification/distortion effects although there is nothing in the approach that excludes their consideration as long as the magnification/distortion considerations can be included sensibly into the merit function.

NAT has not included effects from lateral color, but since the field dependence of lateral color is identical to that of coma, their decentered effects will be identical other than depending on a lateral color coefficient $W_{111 λ}$ rather than the coma coefficient $W_{131}$ [19].

For simplicity, we are not including aspheric terms, but nothing in this theory precludes them—they are merely more Seidel aberration contributions with different $\vec{σ}'s$ , as discussed in [22]. Aspheric terms will be discussed in Section 11.

6. Double Zernike polynomials

Zernike polynomials are well-known as an orthogonal basis set for describing any continuous, well-behaved practical wavefront over a circular aperture [23–25]:

W (ρ, θ) = \sum_{m, n} A_{n m} Z_{n m} (ρ, θ)

where

Z_{n m}' s

are Zernike polynomials defined for generic polar variable

\vec{r} = (r, α)

as

\begin{array}{l} Z_{n m} (\vec{r}) = Z_{n m} (r, α) \equiv {\begin{matrix} R_{n m} (r) \cos (m α), & m > 0 \\ R_{n m} (r) \sin (| m | α), & m < 0 \\ R_{n m} (r), & m = 0 \end{matrix} \\ and R_{n m} (r) \equiv \sum_{s = 0}^{(n - m) / 2} {(- 1)}^{s} \frac{(n - s)!}{s! (\frac{n + m}{2} - s)! (\frac{n - m}{2} - s)!} r^{n - 2 s} \end{array}

Note that the first index (n) is the power of the radial coordinate and the second index (m) indicates the multiple of the angular coordinate used in the sine or cosine function. For readability, if m is negative, we place the negative sign in front of n.

While Eq. (5) is expressed in terms of pupil coordinates, Zernike polynomials may also be used with field coordinates (or any other circular domain), as we shall see. The Zernike polynomials are orthogonal and can be normalized in one of two common ways: either the zero-peak amplitude is unity (Fringe Zernike polynomials, for which we use the symbol $Z_{n m}$ ) or so that the rms over the circular domain is unity (Standard Zernike polynomials, for which we use the symbol ${\hat{Z}}_{n m}$ ). Conversion between normalization schemes in given in Appendix 2.

The rms WFE is found using the coefficients of the Standard Zernikes:

W_{rms}^{2} = \sum_{m, n} A_{n m}^{2}

Kwee and Braat [10] introduced the notion of double Zernike (DZ) polynomials. Whereas Zernike polynomials are orthogonal over a circular pupil, DZ polynomials are orthogonal over the dual domain of a circular pupil and a circular field. The DZ polynomials are formed as a product of a Zernike polynomial in pupil multiplied by a Zernike polynomial in field:

Z_{k l, n m} (\vec{H}, \vec{ρ}) \equiv Z_{k l} (\vec{H}) Z_{n m} (\vec{ρ}) = Z_{k l} (H, ϕ) Z_{n m} (ρ, θ)

See Fig. 1 for definitions of variables. Note that the order of field/pupil indices is the same as Hopkins/Shack, namely, with field indices first. A similar equation can be written using standard Zernikes. We use a notation different from Kwee, because we find Kwee’s notation awkward. A table of DZ’s in this new notation is given in Appendix 3.

While fields are not necessarily circular, the underlying aberration fields of optical systems are often naturally expressed in these terms. As needed, the DZ polynomials can be related to an orthogonal basis over differently-shaped pupils and fields by appropriate linear combinations [26].

The system wavefront function, similar to the expression for a pupil, is a linear combination of (standard) DZ’s:

W (\vec{H}, \vec{ρ}) = \sum_{k l m n} A_{k l, n m} {\hat{Z}}_{k l, n m} (\vec{H}, \vec{ρ})

The DZ’s include not only the centered aberrations, but also the decentered system aberrations seen in NAT. For example, $Z_{00, 31}$ is a constant coma term, and $Z_{11, 22}$ is a linear astigmatism term.

Similar to Eq. (7), the magnitude of a the DZ vector of a system yields the rms wavefront error of the optical system, rss’d over the field:

W_{sys}^{2} = \sum_{k l m n} A_{k l, n m}^{2}

This is exactly the system wavefront error defined in Eq. (2).

Note the advantage of DZ’s with respect to Seidel terms. To optimize performance, Seidel aberrations are often partially compensated by other terms. For example, spherical aberration and astigmatism are partially balanced by defocus. In other words, Seidel aberrations are not orthogonal and so are more difficult to work with. Note also that the standard Seidel aberrations are linear combinations of the low-order pupil Zernike terms (i.e., there is not a one-to-one correspondence between Seidel and Zernike terms).

7. Expressing decentered system aberrations in terms of DZ’s

Using equations from NAT and the Standard DZ definitions from the previous two sections, we can express system aberrations due to gut ray perturbation at a single surface j in terms of σ_j. Manuel [8] and Tessieres [5] perform a similar calculation. The square root values below follow from the normalization of the Standard DZ’s as seen in Appendices 2 and 3.

7.1. Spherical aberration

Since spherical aberration does not depend on field, there are no additional terms created by decentering a surface with spherical aberration.

7.2. Constant coma

We use the expression for constant coma listed in Table 1, and substitute $\vec{σ_{j}} = (σ_{x j}, σ_{y j})$

\begin{matrix} W = W_{131 j} (\vec{ρ} \cdot \vec{ρ}) (\vec{ρ} \cdot \vec{σ_{j}}) = W_{131 j} (σ_{x j} ρ^{3} \cos θ + σ_{y j} ρ^{3} \sin θ) \\ = \frac{W_{131 j} σ_{x j}}{\sqrt{72}} {\hat{Z}}_{00, 31} + \frac{W_{131 j} σ_{y j}}{\sqrt{72}} {\hat{Z}}_{00, - 31} \end{matrix}

7.3. Linear astigmatism

In an analogous way, we use the expression for linear astigmatism listed in Table 1, and substitute $\vec{σ_{j}} = (σ_{x j}, σ_{y j})$ .

W = W_{222 j} (\vec{H} \vec{σ_{j}}) \cdot \vec{ρ^{2}}

Using properties in Appendix 1,

\begin{matrix} (\vec{H} \vec{σ_{j}}) \cdot \vec{ρ^{2}} = (H_{x} σ_{x j} - H_{y} σ_{y j}, H_{y} σ_{x j} - H_{x} σ_{y j}) \cdot (ρ^{2} \cos^{2} θ, ρ^{2} \sin^{2} θ) \\ = H ρ^{2} (σ_{x j} \cos 2 θ \cos ϕ - σ_{y j} \cos 2 θ \sin ϕ + σ_{x j} \sin 2 θ \sin ϕ - σ_{y j} \sin 2 θ \cos ϕ) \end{matrix}

Therefore,

W = - \frac{W_{222 j}}{\sqrt{24}} (σ_{x j} {\hat{Z}}_{11, 22} - σ_{y j} {\hat{Z}}_{- 11, 22} + σ_{x j} {\hat{Z}}_{- 11, - 22} - σ_{y j} {\hat{Z}}_{11, - 22})

7.4. Constant astigmatism

Using the same approach,

\begin{matrix} W = \frac{1}{2} W_{222 j} ({\vec{σ_{j}}}^{2} \cdot {\vec{ρ}}^{2}) \\ = \frac{1}{2} W_{222 j} (σ_{x j}^{2} - σ_{y j}^{2}, 2 σ_{x} σ_{y}) \cdot (ρ^{2} \cos^{2} θ, ρ^{2} \sin^{2} θ) \\ = \frac{1}{2} W_{222 j} [(σ_{x j}^{2} - σ_{y j}^{2}) ρ^{2} \cos^{2} θ + 2 σ_{x j} σ_{y j} ρ^{2} \sin^{2} θ] \\ = W_{222 j} \frac{(σ_{x j}^{2} - σ_{y j}^{2})}{\sqrt{24}} {\hat{Z}}_{00, 22} + W_{222 j} \frac{σ_{x j} σ_{y j}}{\sqrt{6}} {\hat{Z}}_{00, - 22} \end{matrix}

7.5 Field tilt

\begin{matrix} W = - 2 W_{222 j} (\vec{H} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ}) \\ = - W_{222 j} [σ_{x j} H \cos ϕ + σ_{y j} H \sin ϕ] [(2 ρ^{2} - 1) + 1] \\ = - \frac{W_{222 j} σ_{x j}}{\sqrt{24}} {\hat{Z}}_{11, 20} - \frac{W_{222 j} σ_{y j}}{\sqrt{24}} {\hat{Z}}_{- 11, 20} + piston terms \end{matrix}

7.6 Defocus

\begin{matrix} W = W_{222 j} (\vec{σ_{j}} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ}) = \frac{1}{2} W_{222 j} (σ_{j x}^{2} + σ_{j y}^{2}) [(2 ρ^{2} - 1) + 1] \\ = \frac{W_{222 j} (σ_{j x}^{2} + σ_{j y}^{2})}{\sqrt{12}} {\hat{Z}}_{00, 20} + piston terms \end{matrix}

7.7. Constant lateral color

Since the field dependence of lateral color is identical to that of coma, the expression for constant lateral color is developed similarly to that for constant coma. The formula for the lateral color coefficient, $W_{111 λ j}$ , is found in Welford [19]. Since $W_{111 λ j}$ is a peak-to-valley quantity and is linearly dependent on wavelength to first order, a factor of $1 / \sqrt{12}$ is needed to convert to rms. This is similar to the standard deviation of a uniform distribution.

\begin{matrix} W = W_{111 λ j} (\vec{ρ} \cdot \vec{σ_{j}}) = W_{111 λ j} (σ_{x j} ρ \cos θ + σ_{y j} ρ \sin θ) \\ = \frac{W_{111 λ j} σ_{x j}}{\sqrt{48}} {\hat{Z}}_{11, 00} + \frac{W_{111 λ j} σ_{y j}}{\sqrt{48}} {\hat{Z}}_{- 11, 00} \end{matrix}

where

\begin{array}{l} W_{111 λ j} = y (n \bar{i}) (\frac{Δ n^{'}}{n^{'}} - \frac{Δ n}{n}) \\ y = marginal ray height at surface j \\ n, n^{'} = refractive index in optical space before/after surface j \\ Δ n, Δ n^{'} = change of index due to dispersion in optical space before/after surface j \\ \bar{i} = angle of incidence of chief ray at surface j \end{array}

These terms can be rss’d with the nominal design to obtain the performance with perturbations.

8. Calculating DZ’s due to tolerances

Schmid [27] observed that if a surface is misaligned, then the gut ray on downstream surfaces is perturbed, thereby generating decentered aberrations from surfaces which are otherwise aligned to the nominal optical axis. This section lays out an orderly approach for assessing the decentered aberrations generated from any number of misalignments due to tolerances.

The process for calculating DZ coefficients (“DZ’s” is used below for brevity) due to tolerances is given in Fig. 3 below. Each tolerance can be defined in terms of component surface decenters (Δx, Δy). Each surface decenter can perturb the gut ray for all surfaces—these are “effective decenters” (Δxʹ, Δyʹ) and a field-decentering parameter $\vec{σ_{j}}$ is generated at each affected surface j. Each $\vec{σ_{j}}$ generates a set of DZ effects. Using each surface’s $\vec{σ_{j}}$ , we can calculate the resulting DZ terms, all of which are summed together to produce the overall set of DZ’s. A tolerance set’s effect, T, in terms of DZ’s can be expressed with general operators:

Fig. 3 Process for calculating DZ's due to tolerances. n_surf is the number of surfaces in the lens; n_tols is the number of tolerances; n_DZ is the number of DZ terms. The factor of 2 arises because there are perturbations in 2 axes (x,y)

Download Full Size | PDF

T = M_{3} [M_{2} (M_{1} (Δ x_{1}, Δ y_{1}, Δ x_{2}, Δ y_{2} ...,))]

In general, $M_{2}$ and $M_{3}$ depend nonlinearly on $\vec{σ}$ , but if we neglect higher orders of $\vec{σ}$ , we can express the operators in terms of matrices and use linear algebra. This approximation is often justified because tolerances tend to be small perturbations so that usually $σ ≪ 1$ .

Working at linear order in σ, the effect of a tolerance can be found by a product of three matrices as follows.

T = M_{3} M_{2} M_{1}

The matrices can be laid out with the following schematic, where $n_{s} =$ number of surfaces and $n_{D Z} =$ number of double Zernike terms. Dimensions of the matrices $M_{1}, M_{2}, M_{3}$ are given in Fig. 3. The dimensions of $T$ are $n_{DZ} \times n_{tols}$ . Details of the construction of these matrices follow.

\begin{matrix} M_{1} = \begin{matrix} \begin{matrix} {tol}_{1} & \dots & {tol}_{n} \end{matrix} \\ (\begin{matrix}  \end{matrix}) & \begin{matrix} Δ x_{1} \\ Δ y_{1} \\ ⋮ \\ Δ x_{n s} \\ Δ y_{n s} \end{matrix} \end{matrix} M_{2} = \begin{matrix} \begin{matrix} Δ x_{1} & Δ y_{1} & \dots & Δ x_{n s} & Δ y_{n s} \end{matrix} \\ (\begin{matrix}  \end{matrix}) & \begin{matrix} Δ {x^{'}}_{1} \\ Δ {y^{'}}_{1} \\ ⋮ \\ Δ {x^{'}}_{n s} \\ Δ {y^{'}}_{n s} \end{matrix} \end{matrix} \\ M_{3} = \begin{matrix} \begin{matrix} Δ {x^{'}}_{1} & Δ {y^{'}}_{1} & \dots & Δ {x^{'}}_{n s} & Δ {y^{'}}_{n s} \end{matrix} \\ (\begin{matrix}  \end{matrix}) & \begin{matrix} A_{1} \\ A_{2} \\ ⋮ \\ A_{n D Z - 1} \\ A_{n D Z} \end{matrix} \end{matrix} T = M_{3} M_{2} M_{1} = \begin{matrix} \begin{matrix} {tol}_{1} & \dots & {tol}_{n} \end{matrix} \\ (\begin{matrix}  \end{matrix}) & \begin{matrix} A_{1} \\ A_{2} \\ ⋮ \\ A_{n D Z - 1} \\ A_{n D Z} \end{matrix} \end{matrix} \end{matrix}

A tolerance can be considered as a linear combination of decenters in x and y of various surfaces. For example, a 100 micron decenter of a lens is composed of a 100 micron decenter of each of the two surfaces of the lens. Similarly tilts and decenter of lens groups can be similarly composed of surface decenters. The tolerance definition matrix, $M_{1}$ , captures these tolerances. The tolerance expressed by the Δx’s and Δy’s in each column is called a unity-value tolerance, to be used later.

Matrix $M_{2}$ captures the effective decenters that result from a unit perturbation of each surface. To find the elements of $M_{2}$ , each surface is perturbed one at a time, and a gut ray is traced. Depending on the location of the aperture stop and which surface is perturbed, a given surface may see the gut ray perturbed by an amount (Δxʹ, Δyʹ) as shown in Fig. 2; in general, each surface sees a different gut ray perturbation. The values in $M_{2}$ in that perturbed surface’s column are the effective decenters for each surface, i.e., the displacements of the gut ray at each subsequent surface’s center of curvature. This is applied to a unit value, say 1mm. This unit amount can be chosen differently as long as all the matrices use the same unit value.

The $M_{3}$ matrix takes a unit gut ray perturbation at a surface and finds the resulting DZ coefficients. This is generally what NAT computes but we convert Thompson decentered aberration expressions to DZ’s as shown in Section 7. Multiplying the three matrices $(M_{1}, M_{2}, M_{3})$ together yields the effect of an array of tolerances, as expressed in Eq. (21). Each column expresses the DZ effects from the corresponding tolerance.

Note that while we discuss decenters of surfaces exclusively, nothing in this formulation excludes considering centered tolerances such as lens spacing, center thickness, and radii of curvature tolerances—we need only include the centered tolerances into the tolerance matrix, $M_{1}$ , and their effects into the DZ matrix, $M_{3}$ . The effect of these centered tolerances on other DZ terms such as spherical aberration and field curvature are easily calculated from the expressions for Seidel aberrations.

9. Compensation

As mentioned previously, no theory has dealt with predicting the effects of compensation. We introduce a new theory of compensation here. We have seen that we can represent the effect of a tolerance as a vector of double Zernike coefficients (“a DZ vector”) that is added to the DZ vector for the nominal system. Since a compensator has the same form as a tolerance (e.g., a decenter of an optic, or axial spacing adjustment), it too can be represented as a DZ vector. The only difference is that the compensator DZ vector is intentionally applied by the builder and its magnitude and direction can be chosen to optimize performance. We can construct a compensator matrix in the same way as we defined the tolerance matrix in the previous section (see Eq. (22)). We define a new matrix ${M^{'}}_{1}$ which defines the unit compensators; the dimensions of ${M^{'}}_{1}$ are $n_{DZ} \times n_{comps}$ , where $n_{comps}$ is the number of compensators. $M_{2}$ and $M_{3}$ are unchanged since the effects of compensators are identical to the effects of tolerances. The compensator matrix is given by

\begin{matrix} \hat{C} = M_{3} M_{2} {M^{'}}_{1} \\ {M^{'}}_{1} = \begin{matrix} \begin{matrix} {comp}_{1} & \dots & {comp}_{n} \end{matrix} \\ (\begin{matrix}  \end{matrix}) & \begin{matrix} Δ x_{1} \\ Δ y_{1} \\ ⋮ \\ Δ x_{m} \\ Δ y_{m} \end{matrix} \end{matrix} \end{matrix}

The “hat” on

\hat{C}

indicates this matrix applies to unit compensator motions.

Though not necessary, it is convenient to think of a tolerance’s DZ vector as being a vector in a multidimensional space where each axis corresponds to a DZ term. The axes are mutually orthogonal since the DZ’s are orthogonal. For illustration, we consider just two such axes, constant coma ( $Z_{00, 31}$ ) and linear astigmatism ( $Z_{11, 22}$ ). A tolerance with, say 1 wave of constant coma and 2 waves of linear astigmatism can be represented as a vector, like the red vector in the left panel of Fig. 4.

Fig. 4 (left) representation of a tolerance's and compensator's DZ vector. A two-dimensional DZ space is shown for simplicity. The A quantities are the DZ coefficients. (center) Two notional unit compensators, generally non-orthogonal to each other. (right) Same two compensators after a Gram-Schmidt process to orthogonalize them.

Download Full Size | PDF

Now, a compensator, as described above, is merely another DZ vector (green) in the left panel of Fig. 4. If we want to use the compensator to remove as much of the tolerance’s DZ vector as possible, it is evident what we need to do: find the projection of the unit compensator DZ vector onto the tolerance DZ vector, and then subtract that projection from the tolerance vector. The projection uses the dot product between the tolerance and unit compensator vector, and the units are those of DZ’s, i.e., system wavefront performance.

Note that in general the compensators will not span DZ space nor the space spanned by the tolerances and so there will generally be imperfect correction, i.e., a residual. This residual is the wavefront degradation of the system due to the tolerance after compensation.

In equation form:

\vec{R} = \vec{T} - \hat{C} (\hat{C} \cdot \vec{T})

The amount of compensation applied is $\hat{C} \cdot \vec{T}$ .

9.1. Multiple compensators

Now suppose that one has more than one compensator as shown in the middle panel of Fig. 4. How shall one apply the compensators so as to remove as much of the magnitude of the tolerance T as possible? In general, the compensators $\hat{C_{1}}$ and $\hat{C_{2}}$ will not be orthogonal to each other. While there is more than one way to tackle this problem, the most conceptually convenient approach is to form a new set of compensators, $\hat{C^{'}}$ s, from the original set so that the new compensators are orthogonal to each other in DZ space. This may be achieved with a Gram-Schmidt process [7], or with a modified Gram-Schmidt process [28] for greater numerical accuracy. The orthogonalized compensators can then be assembled into a matrix $C^{'}$ which has the same dimensions as C ( $n_{DZ} \times n_{comps}$ ). The matrix form of Eq. (23) is given by

\begin{array}{l} R = T - C^{'} {C^{'}}^{T} T \\ R = \begin{matrix} \begin{matrix} {tol}_{1} & \dots & {tol}_{n} \end{matrix} \\ (\begin{matrix}  \end{matrix}) & \begin{matrix} A_{1} \\ A_{2} \\ ⋮ \\ A_{n D Z - 1} \\ A_{n D Z} \end{matrix} \end{matrix} \end{array}

9.2. Calculation of as-built performance

Each column of Eq. (24) represents the residual DZ’s from the corresponding tolerance of unity value, i.e., due to the tolerance given in the corresponding column of $M_{1}$ after compensator correction. As shown in Eq. (10), the residual system wavefront variance from the j-th unity-value tolerance is equal to the quadrature sum of the DZ’s in the j-th column of R: $\sum_{i} R_{i j}^{2}$ . Since we have assumed linearity, the system wavefront variance degradation due to a particular tolerance realization value $τ_{j}$ is just

Δ W_{sys, j}^{2} (τ_{j}) = τ_{j}^{2} \sum_{i} R_{i j}^{2}

The expected value of the variance of the system wavefront degradation due to the range of possible realizations of the j-th tolerance is:

〈 Δ W_{sys, j}^{2} 〉 = (\int τ_{j}^{2} f (τ_{j}) d τ_{j}) \sum_{i} R_{i j}^{2}

where

f (τ_{j})

is the probability distribution function of

τ_{j}

.

Suppose that the unity-value tolerance value expressed in the j-th column of $M_{1}$ represents the standard deviation of a Gaussian distribution of the tolerance. It follows that $τ_{j}$ is distributed according to a standard Gaussian distribution, i.e., zero mean and unity variance and therefore

〈 Δ W_{sys, j}^{2} 〉 = \sum_{i} R_{i j}^{2}

represents the expected system performance degradation variance for tolerance j, after compensation. This is similarly true for all tolerances. Since the tolerances are independent and zero-mean, we can find the expected variance of the residual system wavefront by adding their effects together in quadrature:

〈 Δ W_{sys}^{2} 〉 = \sum_{i j} R_{i j}^{2}

It can be shown that the expected as-built performance can be described with the quadrature sum of the nominal design performance and Eq. (28):

〈 W_{sys}^{2} 〉 = W_{sys, nominal}^{2} + \sum_{i j} R_{i j}^{2}

This is true even if the residual DZ’s are not orthogonal to the nominal DZ’s as long as the tolerances are zero mean.

10. Example

As an example of this method, we take the common exemplar of the Cooke triplet. The starting point is a ± 20° Cooke triplet design taken from the Zemax sample lens folder [29]. We modify the field of view (FOV) from ± 20° to ± 14° to avoid higher-order aberrations for this first optimization attempt. At ± 14° FOV, 6^th-order aberrations are approximately 10% as large as 4^th-order aberrations. The 4^th-order aberration coefficients are roughly 10-50 waves for each surface. See Fig. 5 for a graphical display of aberration coefficients.

Fig. 5 4^th- and 6^th-order aberration coefficients of each surface for Cooke triplet starting point. Notation and formulas for 6^th-order aberrations are from Sasian [21].

Download Full Size | PDF

The first-order properties of this lens are f/5, 50mm focal length, monochromatic, ± 14° field angle. We allow the six curvatures to be optimized while the lens positions and thicknesses are held fixed. We select and weight field points according to a Gaussian Quadrature (GQ) scheme and build a merit function that calculates the rms wavefront error, rss’d over the field [30, 31].

In one case, we optimize in a conventional way, i.e., run the damped least-squares optimizer to produce the best nominal performance. In the other case, we include in the merit function an operand to include the expected degradation due to tolerances. The operand value is the result of a macro which implements the method in this paper and reports in the same units as the merit function for the conventional optimization. The weight of the tolerancing operand is equal to the total weight of the nominal performance metric as suggested in Eq. (29) above. The tolerances are 250-micron radial decenters (i.e., 175 microns in each transverse axis) for each of the last five lens surfaces, and the compensators are decenter of the first lens surface, focus, and image plane tilt. The tolerances follow a Gaussian distribution, and 250 microns represents the standard deviation of the tolerance. We rss the degradation with the nominal merit function to generate the expected as-built performance. The tolerance and compensator set are not meant to be practical (indeed, decentering a single surface of a lens is not a practical compensator), but rather illustrative of the method while easing the initial code development task. The optimization process and merit function are exactly the same as for a conventional optimization, other than the presence of the macro that produces the degradation due to tolerances after compensation.

In addition to comparing as-built performance, we compare the speed of the method in this paper against Zemax’s built-in “TOLR” operand, which includes the brute-force tolerancing routine inside the optimization.

As expected, the nominal performance is worse when we design considering tolerances than we do not consider tolerances. However, the as-built performance after considering tolerances and compensators is better for the former case than the latter. Importantly, there is a considerable computational time advantage when compared to using Zemax’s built-in “TOLR” operand. The analytic macro written to estimate the effect of tolerances, even though it runs much more slowly than native Zemax code, was able to optimize the as-built performance in ~90 seconds on a 32-core, 2-GHz computer. In contrast, using the “TOLR” operand required 4-5 hours to obtain a nearly-identical as-built performance level. While the difference in time is not important for this particular local optimization of a simple lens, the >100x speed difference would become very significant if applied to a global optimization. The speed difference would be larger if the method were implemented in native code.

The results of the two approaches are given in Fig. 6 and Table 2. For simplicity, we report only the effects of all tolerances together; however, the theory and the macro can report the effect of each tolerance individually to highlight the sensitive tolerances.

Fig. 6 Plots shows layout and performance of Cooke triplets designed under two optimization scenarios. The upper left figure shows a lens is optimized in a conventional way, using a merit function defined by nominal performance of the lens. The upper right figure shows a lens optimized using a merit function consisting of the nominal performance rss’d with the expected tolerance degradation. The nominal performance (rms wavefront error vs. field angle) of both lenses is shown in the bottom figure.

Download Full Size | PDF

Table 2. Merit function values for designs optimized conventionally and using tolerance information.

View Table | View all tables in this article

In order to compare the as-built performance, we run Monte Carlo simulations with realizations of the tolerance set and optimized compensators, and then compare merit function values. The merit function is configured so as to produce the rms wavefront error, rss’d over the field—this is done via Gaussian Quadrature selection of rays in the pupil and points in the field. Figure 7 demonstrates the results of the MC analysis. The design created with tolerance and compensator information in the optimization loop resulted in system performance that is overall far superior to the conventionally-optimized design and is well-predicted by the analytic macro (Table 2 and arrows at bottom of Fig. 7). While the conventionally-optimized design produced a small number of systems that performed better (~10%), the vast majority of the systems produced performed worse. The variation of system performance is much smaller than for the conventionally-designed case. In other words, the system performance is more consistent.

Fig. 7 Histogram (bars) and cumulative distribution function (lines) for Monte Carlo analysis of 10,000 trials of triplet with tolerances and compensators described above. Histogram bins are 0.01 waves wide. The blue graphics refer to the conventionally-optimized design, which does not use information about tolerances and compensators. The red graphics refer to the design optimized with tolerance and compensator information as described in this paper. The cutoff at the left side of both diagrams is at the nominal performance level.

Download Full Size | PDF

11. Extensions

We have developed a technique for optimizing as-built performance of an optical system under a set of assumptions: aberrations higher than 4^th order are neglected; only spherical surfaces are considered; the merit function is the rms wavefront error rss’d across the field; and induced aberrations are neglected. Having developed the technique, we now show that these assumptions are not necessary. In general, for an extension to work under the present approach, we need to have three ingredients: (1) formulas for the centered aberrations (e.g., $W_{131}, W_{222}$ ) up to the desired order; (2) NAT (or equivalent) that includes the contemplated extension, including formulas to calculate the decentered aberrations; (3) an orthonormal basis set that can express the system aberrations—for this paper it was the DZ’s—and which can be rss’d into a single system performance metric.

11.1. Higher-order aberrations

We developed the theory above using 4^th-order centered and decentered aberrations. In particular, we used 4^th-order NAT to develop the DZ coefficients in Section 7. Intrinsic (as opposed to induced/extrinsic) higher-order aberrations can be handled the same way. We can extend to 6^th-order aberrations using existing work. The DZ polynomial set includes arbitrarily-high order terms and so does not impose a limitation. Formulas for 6^th-order aberrations under the Hopkins formulation have been developed [21] as well as 6^th-order NAT [31–33], thus all three necessary ingredients exist. We can include arbitrarily high orders in this approach as long as we can find the required formulas for aberration contributions and generate the required NAT.

Sixth-order NAT will generate several more decentered aberrations that are linear in $σ$ , as well as others that are higher-order in $σ$ . Again, as long as $σ ≪ 1$ , we may be able to neglect the high-order terms in order to allow us to use matrices; otherwise, we will need to resort to nonlinear methods.

11.2. Aspheric surfaces

We have developed the present theory using decenters of spherical surfaces. For aspheric surfaces, we need to consider tilts as well as decenters. We accomplish this by including expanding the matrices in Section 8 to include tilts for aspheric surfaces. Formulas for 4^th-and 6^th-order aberration coefficients of aspheric surfaces exist [19, 21], as does NAT theory for aspheric terms [14, 32–34]. The DZ expressions in Section 7 would need to change only slightly to include the aspheric coefficients.

11.3. rms spot size or other image quality metrics

We now show that we could have used rms spot size rather than rms wavefront error as our metric for a single field point’s image quality. In Section 6, we found the rms wavefront error by using an orthonormal basis set (Standard Zernike polynomials) to describe the wavefront. The rms wavefront error was then the rss of the standard Zernike coefficients (Eq. (7)). Zhao and Burge have developed an orthonormal basis set for vector polynomials on the unit circle [35, 36]; these are derived from the gradients of the standard Zernike polynomials. Using these vector Zernikes, we can express the ray error function for a given field point:

\begin{matrix} \vec{ε} (ρ, θ) = \sum_{m, n} A_{n m} {\hat{\vec{S}}}_{n m} (ρ, θ) \\ where \vec{ε} (ρ, θ) = vector ray error at image plane from ray with pupil coordinates (ρ, θ) \\ {\hat{\vec{S}}}_{n m} (ρ, θ) = orthonormal vector polynomials defined in Zhao and Burge \end{matrix}

We can then create a new “DZ” polynomial which again is the product of the orthonormal terms in pupil and field (cf. Equation (8)):

{\hat{\vec{S}}}_{k l, n m} (\vec{H}, \vec{ρ}) \equiv Z_{k l} (\vec{H}) {\hat{\vec{S}}}_{n m} (\vec{ρ}) = Z_{k l} (H, ϕ) {\hat{\vec{S}}}_{n m} (ρ, θ)

The system ray error can be described as

\vec{ε} (\vec{H}, \vec{ρ}) = \sum A_{k l, n m} {\hat{\vec{S}}}_{k l, n m} (\vec{H}, \vec{ρ})

\vec{ε}

serves the same function as

W

in the previous development and system performance can be expressed as the quadrature sum of the

A_{k l, n m}'s

. The only step remaining is to express the decentered aberrations in terms of the vector polynomials

\hat{\vec{S}}

. The rest of the development is the same.

11.4. Induced aberrations

Induced aberrations are aberration fields which are generated by pupil aberrations, which are neglected for 4^th-order optical design. The entrance pupil for each surface is, in general, shifted transversely and/or distorted as a function of field angle. Transverse pupil shifts cause additional Seidel-like aberrations, but with different field dependence. Pupil distortions cause higher-order aberrations [20, 21]. For example, a system with 4^th-order spherical aberration and 4^th-order pupil distortion will generate 6^th-order spherical aberration components, which can be expressed with 6^th-order DZ’s. Other aberrations generated by pupil aberrations can also be represented by higher-order DZ’s. The induced aberrations are simply added into the system wavefront functions:

W_{sys} (\vec{H}, \vec{ρ}) = W_{intrinsic} (\vec{H}, \vec{ρ}) + W_{induced} (\vec{H}, \vec{ρ})

The pupil aberration coefficients can be calculated paraxially, just as typical Seidel aberrations are. Formulas exist up to 6^th order for calculating the aberration field contributions including pupil aberrations [21]. No NAT-like theory including pupil aberrations exists, however we can predict what it will look like: just as the perturbations produce a field-displacement vector $\vec{σ}$ , which generates decentered aberration fields, we will also have a pupil-displacement vector (call it $\vec{ψ}$ ), which will also generate decentered aberration fields. We expect that most terms will be linear in $\vec{ψ}$ . As we did when considering aspheric terms, we expand the matrices of Section 8 to include $\vec{ψ}$ terms. All other aspects are the same.

11.5. Different field-weighting dependencies

We had also assumed that our system metric was our image quality metric (e.g., rms wavefront error or rms spot size, as described above) rss’d over the field. We now show that we can define our system metric using any polynomial field-weighting, integrated over the field.

If we generalize our system metric to include field-weighting $w (\vec{H})$ , we get

W_{sys} = {(\frac{\int_{field} {[w (\vec{H}) W_{rms} (\vec{H})]}^{2} d \vec{H}}{\int_{field} d \vec{H}})}^{1 / 2}

The field-weighting function can be expressed with Zernike polynomials

w (\vec{H}) = \sum_{p q} a_{p q} {\hat{Z}}_{p q} (\vec{H}),

and using Eq. (8), the expression in brackets in Eq. (34) becomes:

\begin{matrix} [] = (\sum_{p q} a_{p q} {\hat{Z}}_{p q} (\vec{H})) (\sum_{k l m n} A_{k l, n m} {\hat{Z}}_{k l, n m} (\vec{H}, \vec{ρ})) \\ = (\sum_{p q} a_{p q} {\hat{Z}}_{p q} (\vec{H})) (\sum_{k l m n} A_{k l, n m} {\hat{Z}}_{k l} (\vec{H}) {\hat{Z}}_{n m} (\vec{ρ})) \end{matrix}

Since products of Zernike polynomials are also Zernike polynomials, this expression can put in terms of DZ’s, albeit with different coefficients. Therefore, we can still apply the approach of this paper to image quality metrics with field-weighting.

We could also have an image quality metric formed from a function of W_rms:

W_{sys} = \frac{\int_{field} f (W_{rms} (\vec{H})) d \vec{H}}{\int_{field} d \vec{H}}

By the same argument, as long as f (∙) is a polynomial, we can express W_sys in terms of DZ’s, and the same approach can be applied.

11.6. Freeform surfaces

We addressed aspheric surfaces above, but this approach could also be applied to freeform surfaces. Fuerschbach [37] has described an aberration theory for systems with freeform surfaces, including tilts and decenters. Double Zernikes span the space of system wavefront functions and so could be used to describe performance degradations due to misalignments or other errors. Therefore, we have the ingredients necessary to develop our approach for systems with freeform surfaces.

12. Conclusions

We have demonstrated a method that predicts as-built performance considering tolerances and the action of compensators. The as-built performance can be used as the merit function so that designs are optimized to produce the best as-built performance. Computational speed is largely unaffected since only a handful of additional rays are required and all effects of tolerances and compensators are analytically computed, rather than resorting to iterative techniques. Since the method in this paper affects just the merit function definition, all optimization techniques (e.g., damped least squares, global optimization) still work the same way.

Appendix 1 Vector multiplication

Vector multiplication has been extensively discussed in Thompson [14]. The angle convention used here (x-axis as reference axis), however, is different than in Thompson (y-axis as reference axis) and so vector multiplication properties differ from Thompson. Manuel [8] discussed different angle conventions, including the convention used here. Properties of vector multiplication with generic vectors $\vec{A} and \vec{B}$ under this convention follow.

\begin{matrix} \vec{A} = a_{x} \hat{i} + a_{y} \hat{j} = a \exp (i α) \\ \vec{B} = b_{x} \hat{i} + b_{y} \hat{j} = b \exp (i β) \\ \vec{A} \vec{B} = a b \exp (i (α + β)) = a b \cos (α + β) + i a b \sin (α + β) \\ {(\vec{A} \vec{B})}_{x} = a b \sin (α + β) \\ {(\vec{A} \vec{B})}_{y} = a b \sin (α + β) \\ \vec{A^{2}} = a^{2} \cos (2 α) + i a^{2} \sin (2 α) \\ {(\vec{A^{2}})}_{x} = a^{2} \cos (2 α) \\ {(\vec{A^{2}})}_{y} = a^{2} \sin (2 α) \end{matrix}

Appendix 2 Zernike and DZ normalization

Conversion between the Fringe and Standard Zernike normalization schemes is given by [24]

\begin{array}{l} {\hat{Z}}_{n m} (\vec{ρ}) = {\begin{matrix} {(n + 1)}^{1 / 2} Z_{n m} (\vec{ρ}), & m = 0 \\ 2^{1 / 2} {(n + 1)}^{1 / 2} Z_{n m} (\vec{ρ}), & m \neq 0 \end{matrix} \\ Z_{n m} (\vec{ρ}) = {\begin{matrix} {(n + 1)}^{- 1 / 2} {\hat{Z}}_{n m} (\vec{ρ}), & m = 0 \\ 2^{- 1 / 2} {(n + 1)}^{- 1 / 2} {\hat{Z}}_{n m} (\vec{ρ}), & m \neq 0 \end{matrix} \end{array}

Using the “Standard Zernike” normalization,

\int {\hat{Z}}_{a b} (\vec{ρ}) {\hat{Z}}_{c d} (\vec{ρ}) = δ_{a c} δ_{b d}

we get the equations to convert between Fringe and Standard-style DZ’s:

{\hat{Z}}_{k l, n m} (\vec{H}, \vec{ρ}) = {\begin{matrix} {[(n + 1) (k + 1)]}^{1 / 2} Z_{k l, n m} & l, m = 0 \\ {[2 (n + 1) (k + 1)]}^{1 / 2} Z_{k l, n m} & l = 0 or m=0 \\ {[4 (n + 1) (k + 1)]}^{1 / 2} Z_{k l, n m} & l, m \neq 0 \end{matrix}

{\hat{Z}}_{k l, n m} (\vec{H}, \vec{ρ}) = {\begin{matrix} {[(n + 1) (k + 1)]}^{- 1 / 2} Z_{k l, n m} & l, m = 0 \\ {[2 (n + 1) (k + 1)]}^{- 1 / 2} Z_{k l, n m} & l = 0 or m=0 \\ {[4 (n + 1) (k + 1)]}^{- 1 / 2} Z_{k l, n m} & l, m \neq 0 \end{matrix}

Appendix 3 Terms

List of terms used (Table 3).

Table 3. DZ terms

View Table | View all tables in this article

Funding

LLNL Laboratory Directed Research and Development (14-SI-005); LLNL Center for Advanced Signal and Image Sciences (CASIS).

Acknowledgments

This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract no. DE-AC52-07NA27344. The authors would like to thank the anonymous reviewers for their comments and Anastacia Manuel for helpful discussions.

References and links

1. R. E. Fischer, B. Tadic-Galeb, and P. R. Yoder, Optical system design (McGraw-Hill, 2008).

2. J. R. Rogers, “Using Global Synthesis to find tolerance-insensitive design,” Proc. SPIE 6342, 63420M (2006). [CrossRef]

3. J. Rogers, “Global optimization and desensitization,” Proc. SPIE 9633, 96330S (2015). [CrossRef]

4. H. N. Chapman and D. W. Sweeney, “A rigorous method for compensation selection and alignment of microlithographic optical systems,” Proc. SPIE 3331, 102–113 (1998). [CrossRef]

5. R. Tessieres, “Analysis for alignment of optical systems,” master’s thesis (The University of Arizona, 2003).

6. R. Tessieres and J. Burge, “Alignment strategy for the LSST” (LSST Corporation, 2004).

7. G. Strang, Linear algebra and its applications (Thomson, Brooks/Cole, 2006).

8. A. M. Manuel, “Field-Dependent Aberrations for Misaligned Reflective Optical Systems,” Ph.D. dissertation (The University of Arizona, 2003).

9. A. M. Manuel and J. H. Burge, “Alignment aberrations of the New Solar Telescope,” Proc. SPIE 7433, 74330A (2009). [CrossRef]

10. I. W. Kwee and J. J. M. Braat, “Double Zernike expansion of the optical aberration function,” Pure Appl. Opt. 2(1), 21–32 (1993). [CrossRef]

11. B. Bauman and M. Schneider, “Optical System Design and Method for Same,” U.S. patent application 15/860,609 (January 2, 2018).

12. R. V. Shack and K. Thompson, “Influence of alignment errors of a telescope system on its aberration field,” Proc. SPIE 251, 146–153 (1980). [CrossRef]

13. K. P. Thompson, “Aberration fields in tilted and decentered optical systems,” Ph.D. dissertation (The University of Arizona, 1980).

14. K. Thompson, “Description of the third-order optical aberrations of near-circular pupil optical systems without symmetry,” J. Opt. Soc. Am. A 22(7), 1389–1401 (2005). [CrossRef] [PubMed]

15. J. R. Rogers, “Origins and Fundamentals of Nodal Aberration Theory,” Proc. SPIE 10590, 105900R (2017).

16. R. A. Buchroeder, “Tilted component optical systems,” Ph.D. dissertation (The University of Arizona, 1976).

17. H. H. Hopkins, Wave theory of aberrations (Clarendon Press, 1950).

18. R. V. Shack, “Optical Sciences 506 Class Notes” (The University of Arizona, 1989).

19. W. T. Welford, Aberrations of optical systems (Hilger, 1986).

20. J. Hoffman, “Induced Aberrations in Optical Systems,” Ph.D. dissertation (The University of Arizona, 1993).

21. J. Sasián, “Theory of sixth-order wave aberrations,” Appl. Opt. 49(16), D69–D95 (2010). [CrossRef] [PubMed]

22. K. P. Thompson, T. Schmid, O. Cakmakci, and J. P. Rolland, “Real-ray-based method for locating individual surface aberration field centers in imaging optical systems without rotational symmetry,” J. Opt. Soc. Am. A 26(6), 1503–1517 (2009). [CrossRef] [PubMed]

23. D. Malacara, Optical shop testing (Wiley-Interscience, 2007).

24. R. J. Noll, “Zernike polynomials and atmospheric-turbulence,” J. Opt. Soc. Am. 66(3), 207–211 (1976). [CrossRef]

25. F. Zernike, “Diffraction theory of the knife-edge test and its improved form, the phase-contrast method,” Mon. Not. R. Astron. Soc. 94(1), 377–384 (1933).

26. V. N. Mahajan and G.-M. Dai, “Orthonormal polynomials in wavefront analysis: analytical solution,” J. Opt. Soc. Am. A 24(9), 2994–3016 (2007). [CrossRef] [PubMed]

27. T. Schmid, “Misalignment Induced Nodal Aberration Fields and Their Use in the Alignment of Astronomical Telescopes,” Ph.D. dissertation (University of Central Florida, 2010).

28. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing (Cambridge University, 1992).

29. “Zemax Optical Design Software,” Zemax LLC.

30. G. W. Forbes, “Optical-system assessment for design – numerical ray tracing in the Gaussian pupil,” J. Opt. Soc. Am. A 5(11), 1943–1956 (1988). [CrossRef]

31. B. J. Bauman and H. Xiao, “Gaussian quadrature for optical design with non-circular pupils and fields, and broad wavelength ranges,” Proc. SPIE 7652, 76522S (2010). [CrossRef]

32. K. P. Thompson, “Multinodal fifth-order optical aberrations of optical systems without rotational symmetry: spherical aberration,” J. Opt. Soc. Am. A 26(5), 1090–1100 (2009). [CrossRef] [PubMed]

33. K. P. Thompson, “Multinodal fifth-order optical aberrations of optical systems without rotational symmetry: the comatic aberrations,” J. Opt. Soc. Am. A 27(6), 1490–1504 (2010). [CrossRef] [PubMed]

34. K. P. Thompson, “Multinodal fifth-order optical aberrations of optical systems without rotational symmetry: the astigmatic aberrations,” J. Opt. Soc. Am. A 28(5), 821–836 (2011). [CrossRef] [PubMed]

35. C. Zhao and J. H. Burge, “Orthonormal vector polynomials in a unit circle, Part I: Basis set derived from gradients of Zernike polynomials,” Opt. Express 15(26), 18014–18024 (2007). [CrossRef] [PubMed]

36. C. Zhao and J. H. Burge, “Orthonormal vector polynomials in a unit circle, Part II : Completing the basis set,” Opt. Express 16(9), 6586–6591 (2008). [CrossRef] [PubMed]

37. K. Fuerschbach, J. P. Rolland, and K. P. Thompson, “Nodal Aberration Theory Applied to Freeform Surfaces,” Proc. SPIE 9293, 92931V (2014). [CrossRef]

Generating Aberration	Coefficient	Scalar form of term	Vector form of centered aberration		Decentered aberration for jth surface	Decentered aberration name
Spherical aberration	$W_{040}$	$ρ^{4}$	${(\vec{ρ} \cdot \vec{ρ})}^{2}$		none
Coma	$W_{131}$	$ρ^{3} H \cos θ$	$(\vec{ρ} \cdot \vec{ρ}) (\vec{ρ} \cdot \vec{H})$		$W_{131 j} (\vec{ρ} \cdot \vec{ρ}) (\vec{ρ} \cdot \vec{σ_{j}})$	constant coma
Astigmatism	$W_{222}$	$ρ^{2} H^{2} \cos^{2} θ$	$(\vec{ρ^{2}} \cdot \vec{H^{2}})$	${\begin{matrix} \end{matrix}$	$- W_{222 j} (\vec{H} \vec{σ}) \cdot (\vec{σ_{j}} \cdot \vec{ρ^{2}})$	linear astigmatism
					$(W_{222 j} / 2) (\vec{ρ} \cdot \vec{ρ}) (\vec{σ_{j}} \cdot \vec{σ_{j}})$	constant astigmatism
					$- 2 W_{222 j} (\vec{H} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	field tilt
					$W_{222 j} (\vec{σ_{j}} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	defocus
Medial field curvature	$W_{220 M}$	$ρ^{2} H^{2}$	$(\vec{ρ} \cdot \vec{ρ}) (\vec{H} \cdot \vec{H})$	${\begin{matrix} \end{matrix}$	$- 2 W_{220 M j} (\vec{H} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	field tilt
Medial field curvature	$W_{220 M}$	$ρ^{2} H^{2}$	$(\vec{ρ} \cdot \vec{ρ}) (\vec{H} \cdot \vec{H})$	${\begin{matrix} \end{matrix}$	$W_{220 M j} (\vec{σ_{j}} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	defocus

	Conventional optimization (no consideration of tolerances/compensators)	Tolerance/compensator-informed optimization
Nominal design	0.178 λ	0.333 λ
As-built performance due to tolerances and compensators (one-factor-at-a-time analysis)	0.811 λ	0.558 λ
Prediction of as-built performance by analytic macro	0.801 λ	0.532 λ
Mean of 10,000 Monte Carlo trials	0.808 λ	0.502 λ
rms of 10,000 Monte Carlo trials	0.889 λ	0.518 λ

Aberration	Coefficient	Normalized DZ polynomial
Field shift in x	$A_{00, 11}$	${\hat{Z}}_{00, 11} = \sqrt{4} ρ \cos θ$
Field shift in y	$A_{00, - 11}$	${\hat{Z}}_{00, - 11} = \sqrt{4} ρ \sin θ$
Focus	$A_{00, 20}$	${\hat{Z}}_{00, 20} = \sqrt{3} (2 ρ^{2} - 1)$
Field tilt in x	$A_{11, 20}$	${\hat{Z}}_{11, 20} = \sqrt{12} (2 ρ^{2} - 1) H \cos ϕ$
Field tilt in y	$A_{- 11, 20}$	${\hat{Z}}_{- 11, 20} = \sqrt{12} (2 ρ^{2} - 1) H \sin ϕ$
Linear coma in x	$A_{11, 31}$	${\hat{Z}}_{11, 31} = \sqrt{32} ρ^{3} \cos θ H \cos ϕ$
Linear coma in y	$A_{- 11, - 31}$	${\hat{Z}}_{- 11, - 31} = \sqrt{32} ρ^{3} \sin θ H \sin ϕ$
Constant coma in x	$A_{00, 31}$	${\hat{Z}}_{00, 31} = \sqrt{8} ρ^{3} \cos θ$
Constant coma in y	$A_{00, - 31}$	${\hat{Z}}_{00, - 31} = \sqrt{8} ρ^{3} \sin θ$
Quadratic astigmatism in x	$A_{22, 22}$	${\hat{Z}}_{22, 22} = \sqrt{36} ρ^{2} \cos^{2} θ H^{2} \cos^{2} ϕ$
Quadratic astigmatism in y	$A_{22, - 22}$	${\hat{Z}}_{- 22, - 22} = \sqrt{36} ρ^{2} \sin^{2} θ H^{2} \sin^{2} ϕ$
Linear astigmatism (1)	$A_{11, 22}$	${\hat{Z}}_{11, 22} = \sqrt{24} ρ^{2} \cos^{2} θ H \cos ϕ$
Linear astigmatism (2)	$A_{11, - 22}$	${\hat{Z}}_{11, - 22} = \sqrt{24} ρ^{2} \sin^{2} θ H \cos ϕ$
Linear astigmatism (3)	$A_{- 11, 22}$	${\hat{Z}}_{- 11, 22} = \sqrt{24} ρ^{2} \cos^{2} θ H \sin ϕ$
Linear astigmatism (4)	$A_{- 11, - 22}$	${\hat{Z}}_{- 11, - 22} = \sqrt{24} ρ^{2} \sin^{2} θ H \sin ϕ$
Constant astigmatism in x	$A_{00, 22}$	${\hat{Z}}_{00, 22} = \sqrt{6} ρ^{2} \cos^{2} θ$
Constant astigmatism in y	$A_{00, - 22}$	${\hat{Z}}_{00, - 22} = \sqrt{6} ρ^{2} \sin^{2} θ$
Spherical aberration	$A_{00, 40}$	${\hat{Z}}_{00, 40} = \sqrt{5} (6 ρ^{4} - 6 ρ^{2} + 1)$

Generating Aberration	Coefficient	Scalar form of term	Vector form of centered aberration		Decentered aberration for jth surface	Decentered aberration name
Spherical aberration	$W_{040}$	$ρ^{4}$	${(\vec{ρ} \cdot \vec{ρ})}^{2}$		none
Coma	$W_{131}$	$ρ^{3} H \cos θ$	$(\vec{ρ} \cdot \vec{ρ}) (\vec{ρ} \cdot \vec{H})$		$W_{131 j} (\vec{ρ} \cdot \vec{ρ}) (\vec{ρ} \cdot \vec{σ_{j}})$	constant coma
Astigmatism	$W_{222}$	$ρ^{2} H^{2} \cos^{2} θ$	$(\vec{ρ^{2}} \cdot \vec{H^{2}})$	${\begin{matrix} \end{matrix}$	$- W_{222 j} (\vec{H} \vec{σ}) \cdot (\vec{σ_{j}} \cdot \vec{ρ^{2}})$	linear astigmatism
					$(W_{222 j} / 2) (\vec{ρ} \cdot \vec{ρ}) (\vec{σ_{j}} \cdot \vec{σ_{j}})$	constant astigmatism
					$- 2 W_{222 j} (\vec{H} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	field tilt
					$W_{222 j} (\vec{σ_{j}} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	defocus
Medial field curvature	$W_{220 M}$	$ρ^{2} H^{2}$	$(\vec{ρ} \cdot \vec{ρ}) (\vec{H} \cdot \vec{H})$	${\begin{matrix} \end{matrix}$	$- 2 W_{220 M j} (\vec{H} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	field tilt
Medial field curvature	$W_{220 M}$	$ρ^{2} H^{2}$	$(\vec{ρ} \cdot \vec{ρ}) (\vec{H} \cdot \vec{H})$	${\begin{matrix} \end{matrix}$	$W_{220 M j} (\vec{σ_{j}} \cdot \vec{σ_{j}}) (\vec{ρ} \cdot \vec{ρ})$	defocus

	Conventional optimization (no consideration of tolerances/compensators)	Tolerance/compensator-informed optimization
Nominal design	0.178 λ	0.333 λ
As-built performance due to tolerances and compensators (one-factor-at-a-time analysis)	0.811 λ	0.558 λ
Prediction of as-built performance by analytic macro	0.801 λ	0.532 λ
Mean of 10,000 Monte Carlo trials	0.808 λ	0.502 λ
rms of 10,000 Monte Carlo trials	0.889 λ	0.518 λ

Design of optical systems that maximize as-built performance using tolerance/compensator-informed optimization

Abstract

1. Introduction

2. Previous approaches

3. Approach

4. Definitions

5. Review of nodal aberration theory

6. Double Zernike polynomials

7. Expressing decentered system aberrations in terms of DZ’s

7.1. Spherical aberration

7.2. Constant coma

7.3. Linear astigmatism

7.4. Constant astigmatism

7.5 Field tilt

7.6 Defocus

7.7. Constant lateral color

8. Calculating DZ’s due to tolerances

9. Compensation

9.1. Multiple compensators

9.2. Calculation of as-built performance

10. Example

11. Extensions

11.1. Higher-order aberrations

11.2. Aspheric surfaces

11.3. rms spot size or other image quality metrics

11.4. Induced aberrations

11.5. Different field-weighting dependencies

11.6. Freeform surfaces

12. Conclusions

Appendix 1 Vector multiplication

Appendix 2 Zernike and DZ normalization

Appendix 3 Terms

Funding

Acknowledgments

References and links

Cited By

Figures (7)

Tables (3)

Equations (43)

Optics Express