## Abstract

We propose simultaneous measurement and reconstruction tailoring (SMaRT) for quantitative phase imaging; it is a joint optimization approach to inverse problems wherein minimizing the expected end-to-end error yields optimal design parameters for both the measurement and reconstruction processes. Using simulated and experimentally-collected data for a specific scenario, we demonstrate that optimizing the design of the two processes together reduces phase reconstruction error over past techniques that consider these two design problems separately. Our results suggest at times surprising design principles, and our approach can potentially inspire improved solution methods for other inverse problems in optics as well as the natural sciences.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Imaging the spatially-varying optical path length differences encountered by light as it passes through a specimen of interest, *i.e.*, quantitative phase imaging, has wide applicability in many scientific domains. For example, the spatial distribution of absorption in many biological specimens does not exhibit high contrast, unlike quantitative phase images of the same specimen. While interferometry and holography are classical solutions to phase imaging, there also exists a large family of self-interferometric techniques wherein the axial intensity evolution of a collimated beam after passing through a specimen of interest is used to numerically estimate the specimen’s optical path length profile. Members of this family include nonlinear phase retrieval techniques like Gerchberg–Saxton–Fienup approaches [1,2] or PhaseLift [3], techniques based on the transport of intensity equation (TIE) [4–8], and techniques based on contrast transfer functions (CTFs) [9, 10]. While TIE and CTF techniques usually assume coherent or nearly coherent illumination, they have been extended to the partially coherent regime as well. Partial coherence results in loss of high frequency information away from the focal plane [8,11]; loss of spatial coherence caused by area sources can be inverted via illumination diversity [12].

Like any other inverse problem, reconstruction quality depends on the design of both the measurement and reconstruction steps of the imaging method—what measurements do we take, and how do we then numerically estimate the phase? In standard TIE imaging, transverse intensity profiles at two closely spaced planes are used to estimate the axial derivative, from which a phase image is computed via regularized inversion of the Laplace operator. Recent papers [7, 8, 13–15] suggest measurements at multiple planes for better derivative estimates, whereas other threads [16,17] discuss alternatives to Tikhonov regularization for reconstruction.

The optimal reconstruction design naturally depends on the choice of measurements taken, but conversely the optimal measurement design also depends on the choice of reconstruction method—we want to avoid wasting resources, such as imaging time or light budget, to collect information that the reconstruction method will simply ignore. This principle suggests that it is suboptimal to design measurements to capture every last bit of information about the specimen and then design a reconstruction approach based on these measurements. Rather, it would be better to *jointly* optimize the design of the two. The seminal wavefront coding papers [18,19] as well as the ensuing computational photography field followed this general approach, wherein the captured images themselves may not be the “best” images of the object being photographed, but they yield higher quality images than standard photography after appropriate numerical postprocessing. Joint optimization was also leveraged explicitly via what is known as multi-domain optimization (MDO) in the design of many computational optical systems such as the origami lens [20,21]. Inspired by these approaches, we propose phase imaging via a joint optimization framework we call simultaneous measurement and reconstruction tailoring (SMaRT), which we now outline.

Our framework builds upon the same motivation as Wiener filtering and related minimum mean square error (MMSE) estimators, but we extend the classical approach by allowing the noise statistics (*e.g.*, per-image noise level) to vary as a consequence of measurement design. This leads to seeking an optimal measurement–reconstruction design *pair* that minimizes the expected reconstruction error. More specifically, we parameterize specimens by vectors ** s** in some Hilbert space

*𝒮*, and we are interested in imaging a class of specimens

*𝒮*

_{0}⊆

*𝒮*with prior probability density function

*f*

_{𝒮0}(

**). Given design spaces**

*s**ℳ*

_{0}and

*ℛ*

_{0}for the measurement M and reconstruction R, respectively, we wish to solve a problem of the form

*p*_{M}(·) is a stochastic function that describes the noisy forward model for a specific measurement method M, and

*r*_{R}(·) is a deterministic function that reconstructs an estimate of

**from data, based on reconstruction design R. While the general case is difficult to solve, the specific case of weak-phase objects illuminated by partially coherent light and imaged through a standard optical microscope under a fixed time budget enables rewriting Eq. (1) as a regularized least squares problem of the form**

*s**L*candidate imaging planes, and the reconstruction design R is a set of spectral coefficients so that

*r*_{R}(·) spectrally mixes the captured images accordingly. Regularization parameter

*γ*is not hand-tuned, but rather computed from calibration data; the

*A**matrices and vector*

_{l}**are also a function of calibration data as well as expected feature size and feature phase height, the only two user-adjustable parameters in this approach. Optimal M and R are both computed from the solution**

*b*

*x*_{1}, . . .,

*x**to the problem above. Furthermore, it is possible to predict the performance of the entire imaging process by examining the composite function*

_{L}

*r*_{R}(

*p*_{M}(·)).

The key to our SMaRT approach is the formulation of an optimization problem to jointly design the measurement and reconstruction processes for solving an inverse problem, rather than manually designing these two processes separately. This ameliorates the effect of human bias in the design process and allows a numerical algorithm to explore novel recipes that directly reduce the average reconstruction error. Furthermore, automated design facilitates leveraging additional information such as specimen statistics and the illumination spectrum, leading to more performant extraction of phase information compared to previous methods, which usually opt for simple measurement designs based on few assumptions. For example, knowledge of the spectrum of a polychromatic source allows us to ameliorate the effects of blurring caused by the source not being quasimonochromatic. Simple and intuitive parameters as well as predictable performance in our approach makes it easy to use in practice, and the concepts introduced in this paper can potentially be applied to other imaging scenarios as well.

The remainder of the paper is as follows. In Section 2, we derive an expression for the forward model *p*_{M}(·) of an optical microscope and list the assumptions needed for the model to be valid; in Section 3, we use the forward model to derive a convex problem of the form in Eq. (2) and discuss how to apply it to phase imaging as well as making the solution computationally tractable; in Section 4, we apply our method to both simulated and experimental data in the context of imaging a standard phase target; finally, in Section 5, we discuss the implications of our technique.

## 2. Forward propagation model

We start with a standard optical microscope layout under Köhler illumination, as shown in Fig. 1. Light from a single-color light-emitting diode (LED) is collimated by a condenser lens and illuminates a thin (phase) object of interest placed at the specimen plane. An objective lens and tube lens form a 4*f* system images the specimen onto the intermediate image plane, where a camera in a standard microcope would capture images. To accomodate phase imaging, we allow the camera to be translated axially to capture images at varying amounts of defocus. Our imaging system is typical of TIE imaging and similar to systems for low-coherence inline holography.

To simplify our model, we assume that the field at the intermediate image plane is a magnified and diffraction-limited representation of the field immediately after the specimen, since objective lenses are often highly optimized multi-element lens systems designed to minimize aberration. This allows us to mathematically model the system as a virtual partially coherent source illuminating a magnified specimen, whose image is captured at varying amounts of defocus, both negative and positive. The magnification reduces the effective range of illumination angles emanating from the virtual source, and it, together with the the numerical aperture (NA) of the objective lens, also imposes a low spatial bandlimit on the field observed at the intermediate image plane. These factors enable simplifying assumptions in the mathematical model that we derive in the rest of the section. Our derivation starts with a parameterization ** s** of the specimen, continues with expressions for the propagated intensity profiles under both coherent and partially coherent illumination, and concludes with a model for noisy pixel measurements, yielding the stochastic forward operator

*p*_{M}(·).

#### 2.1. Thin specimen parametrization

We consider thin specimens located on some reference plane (which we will denote as *z* = 0) with coordinates (*x*, *y*) and amplitude transmission profiles of the form

*Θ̄*is the average transmissivity,

*s*(

*x*,

*y*) is a zero-mean complex-valued function with

*α*(

*x*,

*y*) and

*β*(

*x*,

*y*) as its real and imaginary components, respectively, and i is the imaginary unit. In practice, we assume

*Θ̄*= 1 since we are not interested in illumination intensity for phase imaging. We assume that the specimen is not dispersive—

*i.e.*, an incident coherent field

*U*

_{in}(

*x*,

*y*) at

*z*= 0 for

*any*wavelength of interest

*λ*induces an output field at

*z*= 0 given by

*U*

_{out}(

*x*,

*y*) =

*Θ*(

*x*,

*y*)

*U*

_{in}(

*x*,

*y*). We also assume that

*s*(

*x*,

*y*) is spatially bandlimited, and that we are only interested in the value of

*s*(

*x*,

*y*) within a rectangular area of extent

*W*×

_{x}*W*. Let positive integers

_{y}*M*and

*N*be defined so that

*W*/

_{x}*M*and

*W*/

_{y}*N*are sufficient Nyquist sampling intervals for the bandlimit of

*s*(

*x*,

*y*). Given these assumptions, we choose to model

*s*(

*x*,

*y*) via a finite weighted sum of complex phasors, resulting in a periodic function with period

*W*and

_{x}*W*in the

_{y}*x*and

*y*directions, respectively:

*ξ*= (

_{m}*m*−

*m̄*)/

*W*and

_{x}*η*= (

_{n}*n*−

*n̄*)/

*W*give the discretized frequencies, and

_{y}*m̄*= ⌊

*M*/2⌋+1 and

*n̄*= ⌊

*N*/2⌋ + 1 give the indexes of the zero-frequency component. We define

*α̃*and

_{mn}*β̃*similarly for

_{mn}*α*(

*x*,

*y*) and

*β*(

*x*,

*y*), respectively. Since

*α*(

*x*,

*y*) and

*β*(

*x*,

*y*) are real-valued functions, their corresponding Fourier coefficients

*α̃*and

_{mn}*β̃*possess symmetry,

_{mn}*i.e.*, ${\tilde{\alpha}}_{mn}={\tilde{\alpha}}_{{m}^{\prime}{n}^{\prime}}^{*}$ and ${\tilde{\beta}}_{mn}={\tilde{\beta}}_{{m}^{\prime}{n}^{\prime}}^{*}$ where

*m′*= 2

*m̄*−

*m*, and

*n′*= 2

*n̄*−

*n*, and

^{*}denotes complex conjugation. Note that by definition,

*s̃*=

_{m̄n̄}*α̃*=

_{m̄n̄}*β̃*= 0. Thus, we can represent each specimen by a vector

_{m̄n̄}**of length 2**

*s**MN*− 2, which consists of

*MN*− 1 entries corresponding to the

*α̃*s followed by

_{mn}*MN*− 1 entries corresponding to the

*β̃*s. For the remainder of the paper, we will use ${\sum}_{m,n}$ to denote summing over all pairs of (

_{mn}*m*,

*n*) except for (

*m̄*,

*n̄*).

We note that we must restrict the region of interest to a finite *W _{x}* ×

*W*region to make any numerical approach tractable, and we assume a periodic extrapolation of the specimen to simplify notation and achieve a model compatible with the discrete Fourier transform. In practice, no matter what assumptions are made about areas outside the region of interest, there will always be boundary artifacts that signify a mismatch between these assumptions and reality.

_{y}#### 2.2. Noiseless propagated intensity for coherent illumination

We first derive the transverse intensity profile at various propagation distances *z* when the specimen is illuminated by a near-normal-incidence unit-amplitude coherent plane wave with wavelength *λ* much smaller than the features in the specimen; the magnification and numerical aperture present in the microscope’s 4*f* system satisfies this assumption. The incident field is

*ξ*,

*η*describe the angle of the incoming plane wave. The small incidence angle and large feature size assumptions imply

*M*/(2

*W*) + |

_{x}*ξ*| ≪ 1/

*λ*, and

*N*/(2

*W*) + |

_{y}*η*| ≪ 1/

*λ*. These assumptions allow us to model propagation of the outgoing field

*U*

_{out}(

*x*,

*y*) using the Fresnel diffraction integral, resulting in a propagated field of the form

*ϕ*= exp(i2

_{z}*πz*/

*λ*) exp[−i

*πλz*(

*ξ*

^{2}+

*η*

^{2})] and

*x*and

*y*of Re[

*s*

_{prop}(

*x*,

*y*,

*z*;

*λ*)] can be written as the following series:

#### 2.3. Noiseless propagated intensity for partially coherent illumination

Using results from the fully coherent case, we now derive an expression for the propagated intensity *I*(*x*, *y*, *z*) when we illuminate the specimen with a partially coherent field whose cross spectral density [24] at the *z* = 0 plane has the form

*x*

_{1},

*y*

_{1}and

*x*

_{2},

*y*

_{2}are coordinates on this plane,

*ω*= 2

*πc*/

*λ*is the optical frequency,

*c*is the speed of light,

*k*= 2

*π*/

*λ*is the wave number, and

*μ*is proportional to the complex degree of spectral coherence [25] and related to the two-dimensional Fourier transform of a nonnegative function

*ρ*(

*a*,

*b*,

*ω*) by

*ρ*and

*μ*are normalized such that ∫

*μ*(0, 0,

*ω*) d

*ω*= 1. Any field of the form given by Eq. (11) and Eq. (12) can be interpreted as a stochastic incoherent mixture of plane waves parameterized by

*ξ*= −

*ω*/

*c*and

*η*= −

*ω*/

*c*, with probability density

*ρ*(

*a*,

*b*,

*ω*). Thus, the resulting intensity can be written as

*x*and

*y*is

*s*(

*x*,

*y*) are given by

*i.e.*,

*ρ*(

*a*,

*b*,

*ω*) =

*ρ̂*(

*ω*) when ${a}^{2}+{b}^{2}<{r}_{\text{source}}^{2}$ and 0 otherwise, we can write

*μ*(−

*λzξ*, −

_{m}*λzη*,

_{n}*ω*) =

*J*

_{1}(2

*r̂*)/

*r̂*, where $\widehat{r}=\pi z{r}_{\text{source}}{\left({\xi}_{m}^{2}+{\eta}_{n}^{2}\right)}^{1/2}$ and

*J*

_{1}is the Bessel function of the first kind and order one.

#### 2.4. Noisy imaging model

We now model the relationship between the propagated intensity and measured pixel values. We assume that images are captured via an *M* × *N* array of sensor pixels covering the same *W _{x}* ×

*W*extent used in the definition of the specimen, with one image for each of the propagation distances

_{y}*z*

_{1}, . . .,

*z*. For the (

_{L}*m*,

*n*) pixel in the

*l*th image, we model the noiseless measured value as

*𝒞*

_{1}is a constant scaling factor,

*τ*is the exposure time for the

_{l}*l*th image,

*χ*(

*x*,

*y*) is the pixel’s spatial sensitivity profile normalized to ∬

*χ*(

*x*,

*y*) d

*x*d

*y*= 1,

*I*(

*x*,

*y*,

*z*) is the propagated intensity derived earlier, and (

*x*,

_{m}*y*) is the pixel center for the (

_{n}*m*,

*n*) pixel. This model relies on the assumption that the illumination spectrum is narrowband enough so that wavelength-dependent pixel sensitivity effects do not apply.

For noisy measurements, we assume that: (1) we record ≫ 10 photons per pixel, (2) Poisson shot noise dominates all other noise sources, and (3) the weak-perturbation assumption of *I* ≈ *I*_{0} applies. While more powerful techniques such as deep neural networks can be applied to low photon count scenarios (see for example [26]), in this paper we focus on typical imaging scenarios, where all three assumptions are usually valid. These assumptions allow us to approximate the noise at all the pixels as independent and identically distributed (i.i.d.) zero-mean Gaussian noise with variance proportional to exposure time *τ _{l}* and average intensity

*I*

_{0}:

*𝒞*

_{2}is another constant scaling factor and

*I*

_{0}is the average intensity. For the special case of a uniform intensity field, this reduces to

*𝒩*(

*𝒞*

_{1}

*τ*

_{l}I_{0},

*𝒞*

_{2}

*τ*

_{l}I_{0}). A centered unitary discrete Fourier transform (DFT) of the pixel values yields

*Î*(

*ξ*,

*η*,

*z*) =

*χ̃*(

*ξ*,

*η*) [

*Ĩ*(

*ξ*,

*η*,

*z*) ⊗

*W*sinc(

_{x}W_{y}*W*) sinc(

_{x}ξ*W*)] describes the Fourier transform of the intensity after cropping by the sensor area and applying the pixel sensitivity function

_{y}η*χ*(

*x*,

*y*), whose Fourier transform is

*χ̃*(

*ξ*,

*η*); we define sinc(

*a*) = sin(

*πa*)/(

*πa*). Given the derivations in the previous section,

*Î*(

*ξ*,

_{m}*η*,

_{n}*z*) =

_{l}*I*

_{0}when

*ξ*=

_{m}*η*= 0; for other frequency coordinates,

_{n}

*y**as a vector of length*

_{l}*MN*− 1 whose entries correspond to ${\tilde{\text{pixel}}}_{lmn}^{(\text{noisy})}/\left({\mathcal{C}}_{1}{\tau}_{l}{I}_{0}\right)$ for

*m*≠

*m̄*and

*n*≠

*n̄*. Let us also define

*n**as a random vector of length*

_{l}*MN*− 1 whose entries are i.i.d. Gaussian random variables with zero mean and variance equal to ${\sigma}_{l}^{2}={\mathcal{C}}_{2}/\left({\mathcal{C}}_{1}^{2}{\tau}_{l}{I}_{0}\right)$. For notational convenience, we define

**and**

*y***to be the vertical concatenation of**

*n*

*y*_{1}, . . .,

*y**and*

_{L}

*n*_{1}, . . .,

*n**, respectively, and we define measurement specification M as the collection of exposure times*

_{L}*τ*

_{1}, . . .,

*τ*. We can then express the forward operator

_{L}*p*

_{M}as the sum of a deterministic linear function acting on specimen vector

**and a stochastic noise component**

*s***,**

*n**i.e.*, with the linear relationship defined by the

*L*(

*MN*− 1) × 2(

*MN*− 1) sparse matrix

**. Let us index the entries of**

*P***and thus the rows of**

*y***by the triplet**

*P**lmn*, where 1 ≤

*l*≤

*L*, and (

*m*,

*n*) takes on values between (1, 1) and (

*M*,

*N*) with the exclusion of (

*m̄*,

*n̄*). Similarly, let us index the entries of

**and thus the columns of**

*s***by the triplet**

*P**pqr*, where

*p*and

*q*are spectral coordinates corresponding to

*m*and

*n*, and

*r*∈ {1, 2} determines whether we are considering the real or imaginary parts of

*s̃*,

_{pq}*i.e.*,

*s*=

_{pqr}*α̃*if

_{pq}*r*= 1 and

*s*=

_{pqr}*β̃*if

_{pq}*r*= 2. Given these index definitions, then the (

*lmn*,

*pqr*) entry of matrix

**is given by**

*P*## 3. Formulation as a convex optimization problem

The sparsity structure of ** P** suggests a linear reconstruction operator of the form

*r*_{R}(

**) =**

*y***, where**

*Ry***is a 2(**

*R**MN*− 1) ×

*L*(

*MN*− 1) matrix with similar sparsity structure parameterized by 2

*L*(

*MN*− 1) coefficients:

*x**be a complex-valued vector of length 2*

_{l}*MN*− 2 whose entries consist of the

*x*

_{mn1l}s, which parameterize the reconstruction coefficients for recovering the

*α̃*components of

_{mn}**from the**

*s**l*th image, followed by the

*x*

_{mn2l}s for likewise recovering the

*β̃*components; let

_{mn}**be the vertical concatenation of the**

*x*

*x*_{1}, . . .,

*x**. Therefore, we can say that*

_{L}**is a parameterization for the reconstruction design R.**

*x*As stated previously, the measurement design M is a list of exposure times *τ*_{1}, . . ., *τ _{L}* ≥ 0 for each candidate imaging plane, with imaging skipped when the exposure time is zero. To make Eq. (1) meaningful, we must impose some cost for the exposure time, or else we can unrealistically aim for infinite exposure time in order to reduce noise. We propose a simple light budget scheme constraining total exposure time to be less than some user-defined constraint

*τ*

_{0}.

With these definitions of M and R as well as the stochastic forward model function *p*_{M} defined in the previous subsection, we can rewrite Eq. (1) as a constrained convex problem that attempts to minimize the expectation of reconstruction error across all possible realizations of specimen ** s** in specimen class

*𝒮*

_{0}and noise

**(which we assume to be uncorrelated):**

*n*

*n**is an*

_{mn}*L*-vector containing the relevant components of

**, and**

*n*

*R**and*

_{mn}

*P**are nonoverlapping blocks of*

_{mn}**and**

*R***:**

*P*

*B**via factoring the specimen’s spectral (raw) covariance matrix*

_{mn}

*C**,*

_{mn}*i.e.*,

**is a vector of length 4(**

*b**MN*− 1) consisting of the

*b*

_{mn1}s followed by the

*b*

_{mn2}s,

*b*

_{mn3}s and

*b*

_{mn4}

*s*, and

*A**is a block diagonal matrix whose*

_{l}*mn*th block

*A**is given by*

_{lmn}

*x**=*

_{l}**0**and

*τ*= 0 to avoid ambiguity. If the exposure times in Eq. (34) were fixed, then the

_{l}*σ*s would become constants, making it a standard regularized least squares problem obtained in the derivation of the Wiener filter; such a optimization problem could easily be solved using standard methods such as conjugate gradients. However, since we are

_{l}*jointly*optimizing both reconstruction and measurement design, our solution is no longer standard Wiener filtering, and we need to rewrite Eq. (34) to solve it.

Note that only the second term in the objective function of Eq. (34) depends on the *τ _{l}*s; therefore, we can reduce Eq. (34) to an unconstrained optimization over only the

*x**s if we can derive a simplified expression for the minimal value of ${\sum}_{l=1}^{L}{\sigma}_{l}^{2}{\Vert {\mathit{x}}_{l}\Vert}_{2}^{2}$ given the*

_{l}

*x**s,*

_{l}*i.e.*, solving

*τ*is proportional to ‖

_{l}

*x**‖*

_{l}_{2}for 1 ≤

*l*≤

*L*. We note that this exposure allocation recipe applies to

*any*spectral mixing reconstruction method where noise energy is equally distributed in the frequency domain and images are dominated by Poisson noise with more than ∼10 photons per pixel. The closed form solution for Eq. (36) allows us to rewrite Eq. (34) as an unconstrained convex problem:

*x**s; it can be solved using proximal gradient methods [27–30], which were designed for minimizing sums of smooth and nonsmooth convex functions. The appropriate proximal operator for the regularization term is derived in Appendix C.*

_{l}Once the optimal *x** _{l}*s have been found, creating the reconstruction function

*r*_{R}(·) is straight-forward. The optimal

*τ*s can be obtained by setting ${\tau}_{l}={\tau}_{0}{\Vert {\mathit{x}}_{l}\Vert}_{2}/\left({\sum}_{k}{\Vert {\mathit{x}}_{k}\Vert}_{2}\right)$. The end-to-end spectral imaging performance of the measurement–reconstruction pair can be analyzed by observing the entries of the 2 × 2 matrix

_{l}

*H**=*

_{mn}

*R*

_{mn}

*P**. The diagonal terms give essentially the discretized transfer function for*

_{mn}*α*(

*x*,

*y*) and

*β*(

*x*,

*y*), whereas the cross-terms denote whether information about

*α*bleeds into

*β*and vice versa.

Solving Eq. (37) technically yields a measurement design M and reconstruction design R for estimating the functions *α*(*x*, *y*) and *β*(*x*, *y*) in the general case. For the remainder of this section, we will discuss how SMaRT can be applied specifically to quantitative phase imaging as well as how Eq. (37) can be solved efficiently in practice.

#### 3.1. Estimating the covariance for phase imaging

Solutions to Eq. (37) depend on prior knowledge regarding the set *𝒮*_{0} of specimens of interest; the *A** _{l}* matrices and the vector

**in Eq. (37) are functions of the**

*b*

*C**s, the (prior) spectral covariance matrices defined in Eq. (32). We propose a generic two-parameter prior for phase imaging that works well in our experiments, although more apt*

_{mn}

*C**s could be derived for specific applications.*

_{mn}We start with the assumption that our specimen is a piecewise-constant phase specimen *ϕ*(*x*, *y*) where each piece has phase uniformly distributed between −*ϕ*_{max}/2 and *ϕ*_{max}/2. The corresponding *α*(*x*, *y*) and *β*(*x*, *y*) functions are given by

*ζ*= 2 sin(

*ϕ*

_{max}/2)/

*ϕ*

_{max}is a normalization term to ensure that

*α*(

*x*,

*y*) is zero-mean.

Furthermore, we assume that the probability of any two points belonging to the same “piece” is equal to exp(−*Δ _{xy}*/

*Δ*), where

*Δ*is the distance between the two points and

_{xy}*Δ*is a parameter, which we will define as the average feature size. A convex tessellation of the plane where the number of discontinuities in any line segment is a Poisson distribution would induce this type of exponential falloff, but here we make no additional assumptions about the exact form of

*ϕ*(

*x*,

*y*). Using results from Appendix B, we find that the spectral covariance can be approximated by

Although most specimens are not piecewise-constant, naturally occurring specimens often have large, nearly constant regions. The Lorentzian-like falloff of the cross spectral density is also reminiscent of natural image priors with power law falloffs [31–33], which can be stochastically approximated using a piecewise-constant “dead leaves” model [34].

We note that in practice, *C** _{mn}* mismatch results in a noisier phase image, but the reconstructed images are still

*predictable*, as we can examine

*r*_{R}

*(p*_{M}(·)) for the transfer function of the imaging system to interpret the results in context. This is in contrast to a regularized reconstruction approach using total variation (TV) [17] or nonlinear diffusion [16], which forces reconstructed phase images to approximate piecewise-constant functions in order to minimize the regularization term. With our approach, we simply use prior statistics as a way to estimate the approximate signal-to-noise ratio at each spectral band in order to allocate measurement resources; our approach does not force compatibility with the prior, and thus our results are also less sensitive to the design of the prior.

#### 3.2. Accomodating tilt in the illumination

For two optical imaging systems whose illumination are identical except for a constant tilt, the optimal measurement–reconstruction pair for one can be adapted for the other. Suppose the two imaging systems’ illumination density functions are related by

Then, the reconstruction operator for the second system is the same as multiplying the input images’ Fourier transform by exp[i2*πz*(

_{l}*ξ*

_{m}a_{0}+

*η*

_{n}b_{0})] and then applying the reconstruction operator for the first system. This is because the tilt induces a linear phase term in

**, which can be neutralized by an equal and opposite linear phase term in**

*P***. Therefore, in practice, SMaRT allows for a slight tilt in the system, which can be removed using calibration.**

*R*#### 3.3. Exploiting symmetry

Directly solving Eq. (37) can present numerical challenges, since the number of unknowns is on the order of *𝒪*(*LMN*), leading to slow convergence and high storage costs. One saving grace is that the solution *x*_{1}, . . ., *x** _{L}* is usually highly redundant. The optical layout of a microscope is often circularly symmetric about the imaging axis, and the spectral covariance of the specimen is circularly symmetric as well, leading to nearly circularly symmetric solutions. Therefore, we propose a modification to Eq. (37) to exploit these symmetries.

Let ** T** ∈ ℝ

^{(MN−1)×K}be an interpolation operator that takes

*K*coefficients and interpolates them linearly into an

*M*×

*N*image, while disregarding the center pixel. In other words, we have a real-valued interpolation map vector

**of length**

*t**MN*− 1, and its

*mn*th element is given by 1≤

*t*≤

_{mn}*K*. We then define the (

*mn*,

*k*)th entry of

**to be**

*T*

*I*_{2}and

*I*_{4}be 2 × 2 and 4 × 4 identity matrices, respectively, and let

**be an equalization operator for**

*Q***so that**

*T***has unit singular values and that**

*TQ*

*Q*^{⊤}

*T*^{⊤}

**=**

*TQ***; we can easily construct**

*I***from the singular value decomposition of the tridiagonal matrix**

*Q*

*T*^{⊤}

**. With these definitions, we propose an efficient approximation to Eq. (37) by replacing**

*T***and the**

*b*

*A**s with*

_{l}**and the ${\widehat{\mathit{A}}}_{\mathit{l}}$s given by**

*b̂***and compute**

*x̂*

*x*_{approx}=

*I*_{2}⊗ (

**)**

*TQ***; the regularization term in the convex problem remains unchanged because**

*x̂*

*Q*^{⊤}

*T*^{⊤}

**=**

*TQ***.**

*I*We note that the bracketed expression in ${\widehat{\mathit{A}}}_{l}$’s definition is sparse and thus can be compactly stored in memory, and only one copy of ** Q** needs to be stored, regardless of the size of

*L*. If

**is an operator that interpolates**

*T**K*coefficients into a radially symmetric image, then as long as the illumination pattern

*ρ*(

*a*,

*b*,

*ω*) is radially symmetric about some point (

*a*

_{0},

*b*

_{0}), the only approximation error incurred is with the pixel function

*ξ*(

*x*,

*y*), which is usually not circularly symmetric. However, in practice this error can be ignored, as the pixel function is usually a rough approximation as well.

#### 3.4. Pruning

While the regularizer in Eq. (37) is a mild sparsity promoter, solutions for large *L* usually still yield many nonzero *τ _{l}*s, some of them very small. This can create two problems in practical applications. First, having too many positions introduces a lot of mechanical motion into the experimental apparatus, which may be undesirable as it can disturb the specimen, gradually cause the apparatus to go out of alignment, and increase the amount of time needed to capture the images, since the image sensor has to wait for any residual motion to subside before exposing. Second, very short exposure times easily violate the assumption of Gaussian noise with variance proportional to average intensity, because quantization noise and readout noise start dominating the desired signal, neither of which scale with average intensity.

We propose a pruning method that, while not yielding a globally optimal solution, still results in a practical measurement–reconstruction design pair. We start by solving Eq. (37) for the full set of possible positions (or terminating after a fixed number of iterations) and then iteratively prune the positions. In each iteration, we remove a small set of positions corresponding to those with the lowest exposure times (or zero exposure), and then recompute Eq. (37) with the reduced set of positions. This is repeated until either: (1) the number of positions is reduced below a certain threshold, or (2) the shortest exposure time is above a certain threshold, or (3) the computed expected error is more than some relative threshold compared to the full position set solution. Since the initial number of unknowns can be large, pruning improves results in practice by speeding up convergence speed, allowing better results in a fixed amount of iterations. A similar approach has been applied successfully in the deep learning literature [35,36]; we can also imagine pruning as a variant of orthogonal matching pursuit [37], wherein we remove basis vectors instead of adding them before recomputing coefficients.

## 4. Imaging a phase target

We now apply our SMaRT approach to image a standard phase resolution target. For illumination, we use a Newport M-20X objective to focus light from a Thorlabs LED4C1 installed with the amber LED module (nominal wavelength 590 nm) onto a 25 μm pinhole placed at the front focal plane of a 50 mm focal-length plano-convex lens. The collimated light illuminates the specimen, which is then imaged using a Mitsutoyo 10X Plan Apo objective (NA = 0.28) and a 100 mm plano-convex tube lens for a total magnification of 5 × at the intermediate image plane. Both plano-convex lenses have diameters of 25 mm. Images of the propagated intensity were captured using a UI-2240-M camera with pixel pitch 4.65 μm placed on a Newport XMS50 linear stage, with tilt correction followed by cropping to produce 1001 × 1001 pixel images.

For calibrating the camera’s noise level, we took uniform field images at various exposure levels; we assumed *𝒞*_{1} = 1 and estimated *𝒞*_{2} ≈ 1.33 × 10^{−4}. Using a one-second exposure of the target, we obtained an estimate of *I*_{0} ≈ 0.113. We set *τ*_{0} = 3 s and *L* = 385 for uniformly spaced *z* positions 250 μm apart between ±24 mm. For our phase imaging prior, we set the average feature size *Δ* = 59.7 μm and *ϕ*_{max} = 0.3 rad. We also pruned down to 7 images on either size of *z* = 0. Solving Eq. (37) using the proximal gradients algorithm given in [30] with the symmetry modifications as well as pruning yielded a measurement design shown in Fig. 2; computation was via MATLAB, taking ∼23 min on a quad-core Intel Xeon W3520 CPU. The corresponding transfer functions for the real and imaginary parts of the transmission profile, shown in Fig. 3, were computed from the forward model and the optimal reconstruction design.

Although the LED emanates light in a narrow range of wavelengths, its polychromaticity is still an important factor; to see its importance, we also compute a measurement–reconstruction pair (which we will call SMaRT-qm), assuming that the illumination was quasimonochromatic and well approximated by a single wavelength. Furthermore, to facilitate comparison with Gaussian process regression transport of intensity equation phase imaging (GP-TIE) [8], a similar state-of-the-art method, we incorporate an additional set of measurements to capture one image at *z* = 0 in addition to 14 images at positions exponentially spaced between *z* = ±250 μm and *z* = ±24 mm, with exposures at 0.2 s each; this recipe is shown in Fig. 2 as well. For comparison with standard TIE, we took 1 s exposures at *z* = 0, ±300 μm (TIE short) as well as *z* = 0, ±10 mm (TIE long). We used code provided by [8] and adjusted the Tikhonov parameter to match the low frequency cutoff of our method, as shown in Fig. 3.

With CTF theory, the closest propagation distance offering the highest contrast for a particular spatial frequency is inversely proportional to the square of the frequency, and both SMaRT and GP-TIE approaches use different propagation distances to capture different sets of spatial frequencies. However, from Fig. 2, SMaRT’s distribution of positions is clearly trimodal compared to the first-principles exponential placement of positions suggested by the GP-TIE approach. It appears that SMaRT chose not to incorporate intermediate positions since they are not as cost efficient as splitting exposure time between a set of positions close to *z* = 0 and a set of positions far from *z* = 0 on either side. This sounds surprising at first but can be explained as follows. Since contrast transfer functions are nearly periodic in *z*, falling off only as a function of partial coherence, the intensity pattern at a position far away from *z* = 0 contains not only low frequency components, but also partial information about intermediate frequency components. Therefore, we can obviate positions intentionally designed for intermediate frequency components and allocate more time to other positions, reducing noise levels for the remaining images.

To verify our approach, we applied the computed SMaRT measurement–reconstruction pair as well as the GP-TIE and TIE methods to both simulated and experimental data of a phase-only Siemens star spoke pattern.

#### 4.1. Simulation

For ground truth, we computed a 40-spoke Siemens star phase object with a diameter of 2092.5 μm and height of 0.3 rad at the intermediate image plane with an effective sampling interval of 4.65/9 μm and filtered the result to remove all frequency components that cannot be captured due to the NA and magnification of the system before downsampling to a sampling rate equal to the pixel pitch (4.65 μm). Simulated noisy images of size 1001 × 1001 were used to reconstruct phase images using the SMaRT, SMaRT-qm, GP-TIE and TIE approaches. The center 501 × 501 pixel region containing the star is shown in Fig. 4, and two smaller regions of the object are shown magnified in Fig. 5 and Fig. 6 for the SMaRT, SMaRT-qm and GP-TIE approaches. Cross sections of the reconstruction are shown in Fig. 7 (we omit SMaRT-qm for clarity), with the resulting error for the SMaRT and GP-TIE approaches shown in Fig. 8.

Reconstructed images in Fig. 4 as well as cross sections in Fig. 7 show that standard TIE results in either sharp yet cloudy images (TIE short) or blurry images with good low frequency performance (TIE long). Both SMaRT and GP-TIE leverage diversity to keep both high frequency and low frequency components, yielding significantly better reconstructions. However, the magnified crops in Fig. 5 and Fig. 6 show that SMaRT is more efficient at combating noise, resulting in cleaner spokes in the center of the target as well as flatter phase in uniform regions. Similarly, the rms error values in Fig. 4 and the cross section of the error in Fig. 8 support this observation. Lastly, we see in Fig. 5 and Fig. 6 the importance of considering the polychromaticity of the source, as the SMaRT-qm results—which assume monochromatic illumination—are markedly blurrier than the SMaRT results.

#### 4.2. Experiment

Using the computed measurement design, we captured images of a Benchmark Technologies quantitative phase resolution target, which has a refractive index 1.52 with various binary phase patterns. We imaged the 50 nm height Siemens star, corresponding to a phase height of ∼0.277 rad. Tilt in the illumination was estimated by using the computed reconstruction function *r*_{R}(·) on the ±24 μm images individually, finding centroids of the same small point feature in both images and then estimating the slope of the tilt. Images of size 1280 × 1024 from the camera were then adjusted for tilt and cropped to 1001 × 1001 before reconstruction using either the SMaRT, GP-TIE or TIE methods. Reconstructed images, close-ups and cross sections of the phase are shown in Fig. 9, Fig. 10 and Fig. 11, respectively.

The computed phase from experimental data exhibits behavior similar to the results from the simulated data—both SMaRT and GP-TIE easily outperform single distance TIE reconstructions, but SMaRT has a slight edge in noise performance compared to GP-TIE. The GP-TIE reconstruction appears to be more sensitive to low frequency noise, as seen in Fig. 9. Furthermore, Fig. 10 shows SMaRT reconstructing the flat areas of the phase target with less noise than GP-TIE; the high frequency center area of the target is also cleaner in the SMaRT result.

## 5. Discussion

We have demonstrated that a SMaRT joint optimization approach can be used to reduce reconstruction error in self-interferometric phase imaging by designing the measurement process jointly with the reconstruction process. Our approach leverages additional information such as the estimated spatial spectral distribution of the specimen class being imaged as well as the spectrum of the polychromatic illumination source to compute an informed decision on how to allocate imaging exposure times, resulting in better noise performance under a fixed photon budget compared to previous techniques.

While we employed a piecewise-constant phase model as a prior, this prior serves only as a guide for estimating the spectral signal-to-noise ratio and does not force the reconstructed phase images to appear piecewise-constant, unlike many other nonlinear methods. Since we do retrieve the real and imaginary parts of the specimen’s transmission profile, our method can be readily adapted to image other weakly-scattering specimens with a different prior, for example replacing standard CTF methods in electron microscopy or X-ray phase imaging, where there may be spatially-varying absorption in addition to optical thickness variations. One can also potentially build a collection of spectral covariance matrices suitable for various imaging scenarios, with more accurate priors when more information is known about the specimen to be imaged.

Although we used a translation stage to image different axial positions, movement can be minimized via judicious use of liquid lenses. Furthermore, while we assumed for simplicity that the objective–tube lens system is diffraction limited, it is trivial to incorporate information about shift-invariant aberrations such as spherical aberration into the forward operator. In general, the resolution of phase images generated by our method is primarily limited by the numerical aperture of the objective lens and any unaccounted optical aberrations in the system.

One path for improvement is that our method cannot be directly applied to arbitrary phase specimens, as large phase differences violate the small-perturbation assumption that we use in estimating noise and modeling the propagated intensity. Automatic differentiation machinery commonly used for deep neural networks, as well as suitable application of transmission cross coefficients from the nonlinear electron microscopy literature, can potentially extend our approach to strong phase objects in the future. Furthermore, we can also directly leverage deep neural networks, converting our model-driven framework to a data-driven framework and solving Eq. (1) using stochastic gradient descent methods.

## Appendix A. Optimal exposure allocation subproblem

We give a solution for problems of the form

*γ*s are all positive. First we note that the objective function and constraints are all convex. Furthermore, any solution to the problem must satisfy the Karush-Kuhn-Tucker conditions [38, 39]. The stationarity condition requires that the optimal ${v}_{k}=\frac{1}{u}\sqrt{{\gamma}_{k}}$ for some constant

_{k}*u*≥ 0.

*u*cannot be zero because that would make

*v*infinite and violate primal feasibility. Thus, complementary slackness requires ${\sum}_{k}{v}_{k}={v}_{0}$. Therefore, the optimal

_{k}*v*must be equal to

_{k}*γ*

_{total}/

*v*

_{0}.

For our specific problem in Eq. (36), we can write the subproblem in the form of Eq. (43) by setting ${\gamma}_{k}={\mathcal{C}}_{2}{\Vert {\mathit{x}}_{k}\Vert}_{2}^{2}/\left({\mathcal{C}}_{1}^{2}{I}_{0}\right)$ and *v _{k}* =

*τ*. This yields an optimal value of $\gamma {\left({\sum}_{l=1}^{L}{\Vert {\mathit{x}}_{l}\Vert}_{2}\right)}^{2}$, where $\gamma ={\mathcal{C}}_{2}/\left({\mathcal{C}}_{1}^{2}{I}_{0}{\tau}_{0}\right)$. The

_{k}*l*th optimal exposure time can be obtained via ${\tau}_{l}={\tau}_{0}{\Vert {\mathit{x}}_{l}\Vert}_{2}/\left({\sum}_{{l}^{\prime}=1}^{L}{\Vert {\mathit{x}}_{{l}^{\prime}}\Vert}_{2}\right)$.

## Appendix B. Spectral covariance of a piecewise-constant vector function

We consider an ensemble of piecewise-constant two-dimensional real vector-valued functions ** p**(

*x*,

*y*) where the values

**for individual pieces of this function are independent and identically distributed (i.i.d.) according to some continuous statistical distribution with density function**

*v**f*(

_{𝒱}**). We define lines through this function using parameters**

*v**s*and

*θ*to specify the line and parameter

*r*to specify position on this line,

*i.e.*,

**(**

*p**x*,

*y*) in polar coordinates is given by

*r*for

**(**

*l**r; s*,

*θ*), we assume the probability that

**(**

*l**r*+

*ρ*;

*s*,

*θ*) is still in the same “piece” and hence has the same value is equal to exp(−

*ρ*/

*Δ*). Since the pieces of this function are i.i.d.,

*i.e.*, the Fourier transform of the cross-correlation, can be computed via the Hankel transform of the exponential, yielding

## Appendix C. On the squared sum of norms penalty term

We consider a problem of the form

*w*> 0 are weights, and

_{i}

*x*_{1}, . . .,

*x**are*

_{n}*m*-length non-overlapping partitions of

**∈ ℂ**

*x**. Without loss of generality, we can assume that*

^{nm}**consists of the**

*x*

*x*_{1}, . . .,

*x**vertically concatenated. Let us denote the least squares term*

_{n}*f*(

**) and the penalty term**

*x**g*(

**).**

*x*We note that *g*(** x**) is a convex function of

**, since it is the composition of a nondecreasing convex function**

*x**s*(

*ν*) = max{0,

*ν*}

^{2}and a convex function, the weighted sum of

*ℓ*

_{2}norms. Since

*f*(

**) is also a convex function, the above problem can be solved by convex optimization. While**

*x**f*(

**) is also smooth,**

*x**g*(

**) is**

*x**not*smooth,

*e.g.*, when one or more of the

*x**s are zero. This suggests the use of a proximal gradient method [27–29] to solve Eq. (50), and thus we require the proximal operator for*

_{i}*g*(

**), defined as:**

*x**α*> 0, and

*y*_{1}, . . .,

*y**form the same partition of*

_{n}**as the**

*y*

*x*_{1}, . . .,

*x**do of*

_{n}**. We note the following property of the optimal**

*x***and hence**

*y*

*y**s:*

_{i}**Theorem 1.** *The optimal* *y*_{i}*for* *Eq.* (51) *are of the form* *y** _{i}* =

*β*

_{i}

*x**∈ [0, 1].*

_{i}, where β_{i}*Proof.* Since the expression in Eq. (51) is also the dual of a convex function, strong duality implies that there exist *∊*_{1}, . . ., *∊ _{n}* ≥ 0 such that the optimal

**is also a solution to the following constrained optimization problem:**

*y**s*(

*ν*) = max{0,

*ν*}

^{2}is nondecreasing, the above problem is equivalent to:

*y**can be solved separately via:*

_{i}Therefore, in order to solve Eq. (51), we can instead solve the following:

*γ*

_{1}, . . .,

*γ*to be elements of vector

_{n}**where**

*γ**γ*=

_{i}*β*‖

_{i}

*x**‖*

_{i}_{2}, and likewise

*ξ*

_{1}, . . .,

*ξ*to be elements of vector

_{n}**where**

*ξ**ξ*= ‖

_{i}

*x**‖*

_{i}_{2}. Hence, the above problem is equivalent to the following nonnegative least squares formulation:

**are obtained, we can then compute the optimal**

*γ*

*y**s. While any nonnegative least squares algorithm can be used to solve Eq. (56), we note the following property about our specific problem:*

_{i}**Lemma 1.** *Let* *γ̂**denote the nonzero components of the optimal* *γ**for* *Eq.* (56)*, and let* *ξ̂**and* *ŵ**denote the corresponding elements of* *ξ**and* *w**, respectively. Then, the* *γ̂**must satisfy:*

*Proof.*Since the zero-valued entries of

**annihilate corresponding components of the expression to be minimized in Eq. (56), we can reduce that expression to**

*γ***=**

*W***+**

*I**α*

*ŵŵ*^{⊤}. Since

*α*> 0,

**is invertible, and thus**

*W**l*(

**) has minimizer**

*γ̂*

*ξ̂*^{⊤}

**.**

*γ̂*We now consider the following theorem regarding the solution to Eq. (56):

**Theorem 2.** *Consider a vector* ** γ** ≽

**0**

*; let 𝒫*⊆ {1, . . .,

*n*}

*denote the indexes of entries that are positive. Then,*

*γ**is a solution to*

*Eq.*(56)

*if and only if the following statements are true:*

*where s*

_{★}

*is as it is defined in Lemma 1.*

*Proof.* By inspection, we note that *α**ŵ*^{⊤}** γ̂** =

*α*

*w*^{⊤}

**=**

*γ**s*

_{★}, based on the definition of

*s*

_{★}and the structure of

**. Thus, the first and second statements are equivalent to**

*γ***and**

*w̄***are entries of**

*ξ̄***and**

*w***, respectively, that correspond to the zero-valued entries of**

*ξ***. Thus, both statements being true is equivalent to:**

*γ**u*of

_{i}**are zero for**

*u**i*∈

*𝒫*and nonnegative for

*i*∉

*𝒫*. This equation itself is equivalent to the stationarity condition of the Karush-Kuhn-Tucker (KKT) conditions, and the structure of

**is equivalent to the combination of complementary slackness and dual feasibility. Primal feasibility is satisfied by the nonnegativity of**

*u***assumed by the theorem. Thus, satisfying the two statements of the theorem under the assumed conditions is equivalent to the KKT conditions. Since the primal problem exhibits strong duality,**

*γ***is a solution to Eq. (56) if and only if it satisfies the two statements.**

*γ*Without loss of generality, let’s assume that

since we can sort and unsort the entries as needed if this assumption doesn’t hold. Under this assumption, we have the following corollary:**Corollary 1.** *If* *Eq.* (65) *holds and* *γ**is a solution to* *Eq.* (56)*, then 𝒫* = {1, . . ., *n*_{★}} *for some* 0 ≤ *n*_{★} ≤ *n, with n*_{★} = 0 *if and only if* ** ξ** =

**0**.

*Proof. 𝒫* is a contiguous range of indexes via a combination of Eq. (62) and Eq. (65). If *n*_{★} is zero, then *𝒫* = ∅ and hence **0** ≽ ** ξ**. Since the elements of

**are norms, the only possible conclusion is that**

*ξ***=**

*ξ***0**. Now, suppose

**=**

*ξ***0**, then since

*s*

_{★}

*w*≥ 0 by construction, no

_{i}*γ*> 0 can exist via Eq. (61) and thus

_{i}*𝒫*= ∅.

With these observations, we propose Algorithm 1 for computing Eq. (51). This algorithm has asymptotic complexity equal to *𝒪*(*mn* + *n*^{2}), where the *𝒪*(*mn*) part is due to computing the *ℓ*_{2} norms in step 3 and rescaling in step 18, whereas the *𝒪*(*n*^{2}) part is due to the while loop of at most *n* iterations wherein each iteration requires inner products of vectors of at most length *n*. Sorting the elements only takes *𝒪*(*n* log *n*) time and is dominated by the other operations.

## Funding

National Research Foundation Prime Minister’s Office Singapore (CREATE Programme Singapore-MIT Alliance for Research and Technology BioSystems and Micromechanics IRG).

## Acknowledgments

The authors would like to thank Colin Sheppard, Jon Petruccelli and Chenglong Bao for their helpful discussions.

## References

**1. **R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of the phase from image and diffraction plane pictures,” Optik **35**, 237–246 (1972).

**2. **J. R. Fienup, “Phase retrieval algorithms: A comparison,” Appl. Opt. **21**, 2758–2769 (1982). [CrossRef] [PubMed]

**3. **E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski, “Phase retrieval via matrix completion,” SIAM J. Imaging Sci. **6**, 199–225 (2013). [CrossRef]

**4. **M. R. Teague, “Deterministic phase retrieval: a Green’s function solution,” J. Opt. Soc. Am. **73**, 1434–1441 (1983). [CrossRef]

**5. **N. Streibl, “Phase imaging by the transport equation of intensity,” Opt. Commun. **49**, 6–10 (1984). [CrossRef]

**6. **A. Barty, K. A. Nugent, D. Paganin, and A. Roberts, “Quantitative optical phase microscopy,” Opt. Lett. **23**, 817–819 (1998). [CrossRef]

**7. **L. Waller, L. Tian, and G. Barbastathis, “Transport of intensity phase-amplitude imaging with higher order intensity derivatives,” Opt. Express **18**, 12552–12561 (2010). [CrossRef] [PubMed]

**8. **J. Zhong, R. A. Claus, J. Dauwels, L. Tian, and L. Waller, “Transport of intensity phase imaging by intensity spectrum fitting of exponentially spaced defocus planes,” Opt. Express **22**, 10661–10674 (2014). [CrossRef]

**9. **J.-P. Guigay, “Fourier transform analysis of Fresnel diffraction patterns and in-line holograms,” Optik **49**, 121–125 (1977).

**10. **Y. I. Nesterets and T. E. Gureyev, “Partially coherent contrast-transfer-function approximation,” J. Opt. Soc. Am. A **33**, 464–474 (2016). [CrossRef]

**11. **J. C. Petruccelli, L. Tian, and G. Barbastathis, “The transport of intensity equation for optical path length recovery using partially coherent illumination,” Opt. Express **21**, 14430–14441 (2013). [CrossRef] [PubMed]

**12. **T. Chakraborty and J. C. Petruccelli, “Source diversity for transport of intensity phase imaging,” Opt. Express **25**, 9122–9137 (2017). [CrossRef] [PubMed]

**13. **B. Xue, S. Zheng, L. Cui, X. Bai, and F. Zhou, “Transport of intensity phase imaging from multiple intensities measured in unequally-spaced planes,” Opt. Express **19**, 20244–20250 (2011). [CrossRef] [PubMed]

**14. **S. Zheng, B. Xue, W. Xue, X. Bai, and F. Zhou, “Transport of intensity phase imaging from multiple noisy intensities measured in unequally-spaced planes,” Opt. Express **20**, 972–985 (2012). [CrossRef] [PubMed]

**15. **J. Sun, C. Zuo, and Q. Chen, “Iterative optimum frequency combination method for high efficiency phase imaging of absorptive objects based on phase transfer function,” Opt. Express **23**, 28031–28049 (2015). [CrossRef] [PubMed]

**16. **L. Tian, J. C. Petruccelli, and G. Barbastathis, “Nonlinear diffusion regularization for transport of intensity phase imaging,” Opt. Lett. **37**, 4131–4133 (2012). [CrossRef] [PubMed]

**17. **L. Tian, J. C. Petruccelli, Q. Miao, H. Kudrolli, V. Nagarkar, and G. Barbastathis, “Compressive x-ray phase tomography based on the transport of intensity equation,” Opt. Lett. **38**, 3418–3421 (2013). [CrossRef] [PubMed]

**18. **E. R. Dowski Jr. and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. **34**, 1859–1866 (1995). [CrossRef] [PubMed]

**19. **W. T. Cathey and E. R. Dowski, “New paradigm for imaging systems,” Appl. Opt. **41**, 6080–6092 (2002). [CrossRef] [PubMed]

**20. **M. D. Stenner, A. Ashok, and M. A. Neifeld, “Multi-domain optimization for ultra-thin cameras,” in *Frontiers in Optics*, (Optical Society of America, 2006), p. FWH5. [CrossRef]

**21. **E. J. Tremblay, J. Rutkowski, I. Tamayo, P. E. X. Silveira, R. A. Stack, R. L. Morrison, M. A. Neifeld, Y. Fainman, and J. E. Ford, “Relaxing the alignment and fabrication tolerances of thin annular folded imaging systems using wavefront coding,” Appl. Opt. **46**, 6751–6758 (2007). [CrossRef] [PubMed]

**22. **J. Frank, *Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State* (Oxford University, 2006). [CrossRef]

**23. **M. Vulović, L. M. Voortman, L. J. van Vliet, and B. Rieger, “When to use the projection assumption and the weak-phase object approximation in phase contrast cryo-EM,” Ultramicroscopy **136**, 61–66 (2014). [CrossRef]

**24. **E. Wolf, “New theory of partial coherence in the space-frequency domain. Part I: spectra and cross spectra of steady-state sources,” J. Opt. Soc. Am. **72**, 343–351 (1982). [CrossRef]

**25. **L. Mandel and E. Wolf, “Spectral coherence and the concept of cross-spectral purity,” J. Opt. Soc. Am. **66**, 529–535 (1976). [CrossRef]

**26. **A. Goy, K. Arthur, S. Li, and G. Barbastathis, “Low photon count phase retrieval using deep learning,” arXiv preprint arXiv:1806.10029 (2018).

**27. **Y. Nesterov, “Gradient methods for minimizing composite objective function,” CORE Discussion Papers 2007076, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE) (2007).

**28. **A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci. **2**, 183–202 (2009). [CrossRef]

**29. **B. O’Donoghue and E. Candès, “Adaptive restart for accelerated gradient schemes,” Found. Comp. Math. **15**, 715–732 (2015). [CrossRef]

**30. **C. Bao, G. Barbastathis, H. Ji, Z. Shen, and Z. Zhang, “Coherence retrieval using trace regularization,” SIAM J. Imaging Sci. **11**, 679–706 (2018). [CrossRef]

**31. **G. J. Burton and I. R. Moorhead, “Color and spatial structure in natural scenes,” Appl. Opt. **26**, 157–170 (1987). [CrossRef] [PubMed]

**32. **D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A **4**, 2379–2394 (1987). [CrossRef] [PubMed]

**33. **D. J. Tolhurst, Y. Tadmor, and T. Chao, “Amplitude spectra of natural images,” Ophthalmic Physiol. Opt. **12**, 229–232 (1992). [CrossRef] [PubMed]

**34. **A. B. Lee, D. Mumford, and J. Huang, “Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model,” Int. J. Comput. Vis. **41**, 35–59 (2001). [CrossRef]

**35. **Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in *Advances in Neural Information Processing Systems 2*, D. S. Touretzky, ed. (Morgan-Kaufmann, 1990), pp. 598–605.

**36. **P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” in Proceedings of the International Conference on Learning Representations (ICLR), (2015).

**37. **Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, (1993), pp. 40–44 vol.1. [CrossRef]

**38. **W. Karush, “Minima of functions of several variables with inequalities as side constraints,” Ph.D. thesis, University of Chicago (1939).

**39. **H. W. Kuhn and A. W. Tucker, “Nonlinear programming,” in *Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability*, (University of California, 1951), pp. 481–492.