## Abstract

We propose an alternative method for solving the Transport of Intensity equation (TIE) from a stack of through–focus intensity images taken by a microscope or lensless imager. Our method enables quantitative phase and amplitude imaging with improved accuracy and reduced data capture, while also being computationally efficient and robust to noise. We use prior knowledge of how intensity varies with propagation in the spatial frequency domain in order to constrain a fitting algorithm [Gaussian process (GP) regression] for estimating the axial intensity derivative. Solving the problem in the frequency domain inspires an efficient measurement scheme which captures images at exponentially spaced focal steps, significantly reducing the number of images required. Low–frequency artifacts that plague traditional TIE methods can be suppressed without an excessive number of captured images. We validate our technique experimentally by recovering the phase of human cheek cells in a brightfield microscope.

© 2014 Optical Society of America

## 1. Introduction

Quantitative phase imaging has found useful applications in biology, surface metrology and X-ray imaging [1, 2]. Methods that use a series of through–focus intensity images (e.g., [3–9]) are especially popular due to their experimental simplicity. In–focus intensity images contain no phase information; however, defocus introduces phase contrast. In fact, any imaging system with a complex transfer function will provide some phase contrast. These images can then be inverted to recover phase and amplitude quantitatively. In defocus–based methods, the experimental procedure involves simply moving the sample (or camera) axially while capturing multiple images through–focus (see Fig. 1), or using any of the recently proposed schemes for simultaneous multi-plane capture [10–13].

Recovering phase (and amplitude) from a series of defocused images is a nonlinear problem because intensity is bilinear with phase and amplitude. One of the most successful methods for this inversion is the iterative method, which takes a nonlinear convex optimization approach [14, 15]. Though the problem is non-convex, results usually converge, especially when multiple images are used at various focal planes [5]. In another approach, where a direct solution is desired, the problem is linearized for small defocus distances using the Transport of Intensity Equation (TIE) [3, 4]. The TIE relates phase and amplitude to variations of intensity along the optical axis *z* [3]:

*I*(

*x*,

*y*, 0) is the intensity at focus,

*ϕ*(

*x*,

*y*) is the phase,

*λ*is the spectrally-weighted mean wavelength of illumination and ∇

_{⊥}denotes the gradient operator in lateral directions (

*x*,

*y*) only. Besides being simple, the TIE method achieves phase accuracy equivalent to that of interferometric methods, but it is much less sensitive to coherence in the illumination [16]. Numerical solvers require no phase unwrapping and are computationally efficient, employing FFT–based inversion solvers [17].

Numerical solutions for the TIE can be understood by considering the case of a pure–phase object, where *I*(*x*, *y*, 0) is constant [18]. In this situation, the right hand side of Eq. (1) simplifies to a second derivative (Laplacian). Phase can then be recovered from a Laplacian inversion

*u*,

*v*) is the Fourier transform of

*ϕ*(

*x*,

*y*),

*F*(

*u*,

*v*) is the Fourier transform of the measured first derivative ${\frac{\partial I(x,y,z)}{\partial z}|}_{z=0}$ scaled by 2

*π/λ*, and

*u*,

*v*are the spatial frequency variables. When

*I*(

*x*,

*y*, 0) is not constant, two Laplacian inversions are required [3]. Note that the denominator of Eq. (2) goes towards zero as the spatial frequency goes towards zero, and a small regularization parameter must be added in order to avoid division-by-zero instability. This means that the DC phase term is lost and low frequency noise is amplified, resulting in phase errors that give cloudy phase results. We discuss here strategies for mitigating these problems.

Limitations of the TIE method mainly stem from noise and nonlinearity in the intensity derivative estimate, both of which we address here. Nonlinearity error comes from the estimation of the intensity derivative,
${\frac{\partial I(x,y,z)}{\partial z}|}_{z=0}$, from through focus images. We cannot measure this linear (first) derivative directly and so it is usually estimated by finite difference methods. However, intensity is not linear through focus, due to diffraction, and so any nonlinearity in the through focus intensity corrupts the derivative estimate [6, 19]. Nonlinearity error can be removed by using higher order TIE [7], which performs polynomial fitting on each pixel’s intensity vs. *z* plot, then extracts the first order derivative. This requires multiple images through–focus, which provides some inherent noise stability; however, there is a danger of over-fitting (fitting to noise) and the optimal order to use will be phase dependent, making it a difficult parameter to optimize. The Savitzky-Golay differentiation filter (SGDF) TIE [20] was proposed to solve this trade-off between the order of fitting and noise in higher order TIE. It recovers phase images with different orders of polynomial fitting, then combines these phase images into a final reconstructed phase with band-pass filters. Limitations are that the fitting becomes unstable when the order is large [21] and the derivation assumes equally spaced focus steps, which we show here to be non-ideal.

We propose here a new TIE phase recovery method in which we also perform through–focus fitting of the data, except we do the fitting in Fourier space, rather than in real space. The advantage of this method can be understood from Fig. 2. On the left, we show a stack of through–focus intensity images, with a few example plots of the intensity vs. *z* behavior. These are the curves that would be fit to polynomials in higher order TIE [7]. It is easy to see that they do not conform to any particular functional form. On the right side of Fig. 2, we take 2D Fourier transforms of the intensity images at each of the defocus steps (showing here only the real part), then plot the intensity spectrum vs. *z* plots for a few example spatial frequencies. In contrast to the real space plots, these curves display a distinct sinusoidal behavior which can be predicted by the Contrast Transfer Function [22–24]. By using this a priori information of how light propagates in spatial frequency space, we can simultaneously deal with nonlinearity and noise in order to obtain a better derivative estimate. Here, we propose to fit the spectral data with Gaussian process (GP) regression [25]. GP regression does not require that the inputs are equally spaced and can suppress the unwanted high–frequency components in the fitted function by using the squared exponential covariance function (see details in Appendix A). The curves in Fig. 2 follow sinusoidal trajectories, but actual behavior may deviate from this model due to breaking of the small phase approximation or unmodeled coherence effects. GP regression is able to flexibly fit the data to variations in the model using the Contrast Transfer Function as a priori information to set a threshold for suppressing the unwanted high frequency components in the fitted function. In this way, the over-fitting to noise is avoided without knowing or fitting to the precise function of a sinusoid.

In studying the frequency behavior for through–focus images, we find that equally spaced defocus steps in *z* are not ideal. Previous methods generally use images equally spaced in *z*, though higher order TIE can accommodate arbitrary *z* steps [26]. In a recent study [27], an exponentially-growing measurement scheme is proposed to deal with the regularization problem. Here, we show that exponentially spaced defocus steps can provide a large reduction in the number of images required for a high quality phase result. We suggest an efficient scheme for choosing the *z* plane distances (within the limits of the motion stage), from the perspective of efficiently transferring phase information to intensity images, and show how our method can achieve high quality phase results with far fewer images than equally spaced schemes.

The rest of the paper is structured as follows. In Section 2, we apply GP regression to TIE phase recovery and propose an exponential spacing measurement scheme. In Section 3, we compare the error performance of the proposed algorithm with traditional TIE methods. In Section 4, we discuss the experimental results. We offer concluding remarks in Section 5.

## 2. Theory

#### 2.1. Intensity spectrum fitting

We consider the intensity derivative estimation problem from the spatial frequency domain. When the phase *ϕ*(*x*, *y*) is small, the variations of intensity over *z* can be approximated in frequency space by the Contrast Transfer Function [22–24]:

*δ*(

*u*,

*v*) denotes Dirac delta function, and

*ℐ*(

*u*,

*v*,

*z*),

*U*(

*u*,

*v*), and Φ(

*u*,

*v*) are Fourier transforms of

*I*(

*x*,

*y*,

*z*), $-\frac{1}{2}\text{ln}I(x,y,0)$, and

*ϕ*(

*x*,

*y*), respectively. For each spatial frequency (

*u*,

_{m}*v*),

_{n}*ℐ*(

*u*,

_{m}*v*,

_{n}*z*) follows a sinusoidal pattern that oscillates with frequency $\pi \lambda ({u}_{m}^{2}+{v}_{n}^{2})$. Thus, instead of fitting the intensity curves in real space to polynomials (as in higher order TIE [7]), we use the sinusoids in the intensity spectra through–focus as prior in the fitting.

#### 2.2. TIE using Gaussian process regression

To fit the data to our model, we employ Gaussian process regression. We estimate the derivative of the intensity spectrum with respect to defocus
${\frac{\partial \mathcal{I}(u,v,z)}{\partial z}|}_{z=0}$ from *ℐ*(*u*, *v*, *z*_{1})..., *ℐ*(*u*, *v*, *z _{N}*), which are the 2D Fourier transforms of the measured intensity images. For each spatial frequency (

*u*,

_{m}*v*), ${\frac{\partial \mathcal{I}({u}_{m},{v}_{n},z)}{\partial z}|}_{z=0}$ can be obtained from the continuous function fitted from the discrete data points

_{n}*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{1}),...,

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*).

_{N}GP regression is a suitable fitting method that allows us to fit the discrete points *ℐ*(*u _{m}*,

*v*,

_{n}*z*

_{1}),...,

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*) to a continuous sinusoidal function (see details in Appendix A). In the fitted function, frequency components of the axial oscillation that are higher than a threshold

_{N}*s*can be suppressed by initializing appropriate hyper-parameters in the regression (see details in Appendix B). From the Contrast Transfer Function (Eq. (3)),

_{c}*ℐ*(

*u*,

_{m}*v*,

_{n}*z*), as a function of

*z*, does not contain frequency components higher than $\pi \lambda ({u}_{m}^{2}+{v}_{n}^{2})$. Therefore, the threshold can be set just above $\pi \lambda ({u}_{m}^{2}+{v}_{n}^{2})$ in order to eliminate high–frequency noise while fitting over the inputs

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{1}),...,

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*). Although the Contrast Transfer Function is derived under the weak phase assumption, it only serves as a guideline to pick a frequency threshold, above which variations are treated as noise. Thus, large phase variations can still be recovered, beyond the small phase approximation. Furthermore, by using GP regression, other prior knowledge about the sinusoidal pattern could be incorporated into the regression -for example, coherence effects of the illumination could be modeled in the Contrast Transfer Function as a product term, since reduced spatial coherence blurs defocused images and thus reduces the amplitude of the oscillations away from the focus plane.

_{N}The resulting algorithm of GP TIE is summarized in Table 1. The scaling factor *γ* in Step (2) of the table allows a trade-off between accuracy and noise filtering. Since the computational complexity of GP regression is proportional to the cubic of the number of measurements [25], the algorithm has a computational complexity of *𝒪*(*N*(*N _{z}*)

^{3}), where

*N*is the number of pixels in one image, and

*N*is the number of measurements. This algorithm is very amenable to parallel processing and can be implemented with Graphics Processing Units (GPUs) for near real-time performance, in conjunction with GPU–based TIE solvers [28]. Further, GP regression does not require the measuring positions

_{z}*z*

_{1},

*z*

_{2},...,

*z*to be equally spaced, so any set of measurement positions along the axial dimension can be used as input. We show in the next session why this is not only convenient, but also leads to better phase recovery.

_{N}#### 2.3. Exponential spacing measurement scheme

In looking at this problem from Fourier space, we see that equally spaced *z* steps are not the ideal measurement scheme. The choice of *z* distances is crucial, since it defines how much phase information from the object is transferred into intensity contrast in the defocused images [29]. Each *z* distance measurement can be thought of as having its own phase to intensity transfer function, with its own set of preferred spatial frequencies. The low–frequency information of phase is particularly poorly represented, as seen in the Laplacian inversion of Eq. (2). To better capture this low frequency phase information, images with large *z* are required, ideally out to the maximum range of the axial motion stage. However, the high–frequency components favor a small *z* and are important for recovering fine detail in the phase result. We would like to capture images *both* at the minimum *z* possible as well as at the maximum *z* possible. For linearly spaced schemes, this will require excessive data capture in between. To avoid this, we introduce an unequally spaced measurement scheme that balances these concerns. Since diffraction goes as (Δ*x*)^{2}/*λz* [18], with Δ*x* relating to the object size, we expect the ideal spacing to follow a nonlinear trajectory.

To derive an ideal *z* sampling scheme, we start with Eq. (3) and extract the phase transfer function, *g*(*u*, *v*, *z*):

*z*, the phase transfer function is a sine function of

*πλ*(

*u*

^{2}+

*v*

^{2}) that will be maximized for particular spatial frequencies, as shown in Fig. 3. Since each object contains a different mix of spatial frequencies in its phase information, the theoretically optimal measurement planes are object-dependent [29], so it is not possible to determine them without knowing the object itself. Thus, we aim not for the absolute optimal set of measurements, but rather for the optimal coverage of sensitivity across all spatial frequencies that pass through the imaging system, treating each equally.

Consider the goal of selecting a set of *z* planes such that the maximum phase measurement sensitivity (set by the value of the transfer function) is at least *α* for the largest range of spatial frequencies possible. In the following, we demonstrate an exponential spacing measurement scheme that achieves this goal with the least number of measurement required. The highest frequency that can be recovered is set by the diffraction limit as NA/*λ*, where NA is the numerical aperture of the imaging system. This will define the minimum defocus distance from which we can capture relevant information. We define *f* = *πλ*(*u*^{2} + *v*^{2}) and
$f\le \pi \lambda {\left(\frac{\mathit{NA}}{\lambda}\right)}^{2}$. First, we select the minimum defocus distance *z*_{1} such that the sensitivity *g*(*u*, *v*, *z*_{1}) is *α* at frequencies corresponding to the maximum frequency allowed
${f}_{1}=\pi \lambda {\left(\frac{\mathit{NA}}{\lambda}\right)}^{2}$, the solution of which is
${z}_{1}=\frac{\pi -\text{arcsin}(\alpha )}{\pi {(\mathit{NA})}^{2}}\lambda $. Defocus steps smaller than *z*_{1} will not provide useful information and are thus unnecessary, though in practice the minimum *z* step size may be set by the axial motion stage. As shown by the blue curve in Fig. 3, the sensitivity remains above *α* until
${f}_{2}=\frac{\text{arcsin}(\alpha )\pi {(\mathit{NA})}^{2}}{\lambda [\pi -\text{arcsin}(\alpha )]}$. Next, we select the second defocus distance *z*_{2} = *βz*_{1}, where
$\beta =\frac{\pi -\text{arcsin}(\alpha )}{\text{arcsin}(\alpha )}>1$, such that the sensitivity *g*(*u*, *v*, *z*_{2}) is *α* at *f*_{2}, and will remain at least *α* for a range of frequencies, as illustrated by the green curve in Fig. 3. By induction, the optimal measurement scheme that satisfies a minimum phase measurement sensitivity *α* should satisfy the following exponential relation for the defocus distances

*z*can be reached with far fewer measurements as compared to the equal-spacing measurement schemes. Large

*z*images are crucial for transferring low–frequency phase information, and the exponential spacing scheme enables us to reach this without taking an excessive number of measurements or trading off high–frequency information. The larger the maximum

*z*, the better the low–frequency result, and we desire to keep the minimum

*z*as small as possible in order to maintain diffraction–limited phase resolution, both extremes being limited by the motion stage axial range. Accuracy can be traded off against number of images through the choice of

*α*. Larger values results in more accurate phase retrieval, at the cost of requiring more images.

## 3. Simulations

We compare the performance of various phase recovery methods that use equal spacing and exponential spacing. In the simulation, the illumination wavelength is set as 632.8*nm*, and each image has 100 × 100 pixels (pixel size 2*μm* × 2*μm*). **Equally Spaced Stack** has 9 intensity images simulated with a constant defocus step size of 5*μm*. **Exponentially Spaced Stack** also has 9 images; however, they are exponentially spaced, at *z* positions around focus of ±5*μm*, ±20*μm*, ±80*μm* and ±320*μm* (see Fig. 4). Although both data sets use the same number of images, the exponentially spaced data set contains more information about the low–frequency phase information because it has a higher *z* range.

In order to assess the error performance, the intensity images of the equally and exponentially spaced stacks are corrupted by white noise with SNR ranging from 18.5 to 8 *dB* (noise variance from 0 to 0.02). Fig. 5 shows the average mean square error (MSE) of the recovered phase over 50 trials as SNR decreases. For the Equally Spaced Stack, higher order TIE performs significantly worse than SGDF TIE, and GP TIE is slightly better than SGDF TIE. For the Exponentially Spaced Stack, we show the results for two possible choices of Higher order TIE: *m* = 9, which performs better in low noise, and *m* = 5, which performs better in high noise. GP TIE with exponential spacing clearly exhibits the lowest MSE. This can be explained by the fact that the exponentially spaced data contains more low–frequency phase content in the measurements than the equally spaced data, and there is no trade–off between noise and nonlinearity. Figure 6 gives an example of the recovered phase at SNR of 11.1*dB*.

## 4. Experimental results

We tested our method experimentally in a microscope (magnification 20×, NA=0.5) with filtered white light illumination (center wavelength 650*nm*, 10*nm* bandwidth). **Data Set 1** comprises 129 images of human cheek cells, equally spaced by a constant small step size *dz* = 4*μm* over a large defocus range [*−*252*μm* to 252*μm*] (Fig. 8(a)). In Fig. 7, the GP fitted intensity spatial frequency variations along the propagation direction *z* for 3 different (*u*, *v*) values are shown. Both the measured and fitted curves follow nearly sinusoidal patterns, as predicted by Eq. (3). Note that the delay of each sinusoid is dependent on the absorption of the object at the corresponding spatial frequency. When the frequency (*u*, *v*) is high, the intensity spectrum variations diminish for large defocus distance (see plot of (*u*_{60}, *v*_{60}) in Fig. 7). This is due to the effect of partially coherent illumination, which will be the subject of future work.

With our exponentially spaced measurement scheme, GP TIE requires fewer images to be captured in order to obtain a high quality phase result. To demonstrate this, we compare the recovered phase by GP TIE from different subsets of Data Set 1 (Fig. 8) by sampling the image stack along *z* using various strategies. Figure 8(b) uses data subsets with equally spaced z–planes. First, we show the best possible phase result, when all 129 images are used. With equally spaced planes, we have two possibilities for reducing the number of images: we can either reduce the defocus range (keeping the step size small) or increase the step size (reducing the defocus range). If we reduce the defocus range, the recovered phase becomes susceptible to low–frequency noise (see the middle of Fig. 8(b)). This is due to the fact that the low–frequency information of phase is not well captured at small defocus distances. If we instead increase the step size (keeping the defocus range large), high-frequency components are lost due to nonlinearity (see the right of Fig. 8(b)). In order to accurately capture both high and low–frequency information with the same reduced number of images, we need nonlinearly spaced measurements. Near the focus, the step size should be small, yet a large focus range can still be covered with only a few measurements in our exponentially spaced scheme. In Fig. 8(c), we extract images from Data Set 1 according to the exponential spacing scheme described in Section 2.3. As can be seen from Fig. 8(c), the phase results are free of low-frequency noise and also have high resolution. Even after further reducing the dataset to only 9 exponentially spaced images, we obtain a similar result to that with all 129 images. Thus, we have reduced the data capture requirement by more than an order of magnitude, without sacrificing quality.

Having shown that exponential spacing is advantageous, we now compare GP TIE with Higher order TIE, both of which are capable of using unequally spaced data. **Data Set 2** comprises 9 images of human cheek cells, exponentially spaced about focus at ±5.7*μm*, ±11.4*μm*, ±22.8*μm*, and ±45.6*μm* (Fig. 9(a)). The imaging system has 10× magnification and NA of 0.5. Each image has 350 × 360 pixels of size 0.62*μm* × 0.62*μm*. Figure 9(b) shows the recovered phase of Data Set 2 obtained by Higher order TIE and GP TIE. We show the results for higher order TIE with the order of polynomial fitting *m* equal to 2, 3, and 4. The phase images of *m* = 2 and 3 have small low–frequency noise but appear blurred. The phase of *m* = 4 has strong contrast in some regions, but contains low–frequency noise. In contrast, GP TIE does not suffer from this tradeoff, so the phase recovered has less low–frequency noise and also exhibits high contrast. We can see details inside cells in the phase image recovered by GP TIE.

## 5. Conclusions

In this paper, we proposed a TIE phase recovery method that exploits prior knowledge of spatial frequency variations of the intensity in order to yield more accurate phase images from exponentially spaced images through–focus. Our proposed method estimates the intensity derivative by means of GP regression. It is robust and stable with noise, while incorporating the a priori knowledge of the sinusoidal patterns predicted when fitting spectrum variations over *z*. We derived an exponentially spaced measurement scheme which guarantees a minimum phase measurement sensitivity across the full range of available spatial frequencies with the least possible number of images required. In future work, we will extend the current technique to add more a priori information, such as partial coherence of the illumination and large phase effects. With the freedom to choose nonlinearly spaced measurement positions, we can develop new strategies to optimize measurements if any prior knowledge of the phase spectrum is known.

It should be noted that improving the TIE result by incorporating prior knowledge has been previously considered in [30–32]. The difference is that the previous approach considers priors over the phase [30, 31] or refractive index distribution [32], whereas here we use a prior on intensity spectral evolution to improve the intensity derivative estimate. We expect the approach considered here should have wide application as it not only can be directly apply to the special cases in [30–32], but also to more general situations without constraints on the phase distribution. If any reader is interested in this algorithm, open source code can be obtained by emailing the authors or visiting the website either www.laurawaller.com or www.dauwels.com.

## Appendix A: review of Gaussian process regression

We review the basics of Gaussian process regression [25]. Consider the problem of 2-D regression: given input/output pairs (*z _{n}*,

*f*), where

_{n}*n*= 1,...,

*N*, we would like to estimate

*f*(

*z*) at arbitrary position

*z*. Under the Gaussian process assumption, the outputs

*f*are drawn from the zero-mean Gaussian distribution with the covariance as a function of

_{n}*z*:

_{n}**K**(

**Z**,

**Z**) is the covariance matrix of the outputs given the input set

**Z**, and

*σ*is the variance of additive Gaussian noise in outputs. Generally, the squared exponential covariance function is used to model the covariance matrix:

_{n}**K**

*is the element at coordinate (*

_{ij}*i*,

*j*) of the matrix

**K**(

**Z**,

**Z**). The parameters

*σ*,

_{f}*ℓ*, and

*σ*in Eq. (6) are defined as the hyper-parameters of the GP model. We can write the joint distribution of the observed input/output pairs with the unknown value of

_{n}*f*(

*z*) at

*z*as:

**f**= [

*f*

_{1},

*f*

_{2},...,

*f*]

_{N}*. The conditional distribution of the unknown output*

^{T}*f*(

*z*) at

*z*is calculated as:

**h**(

*z*):

*f̄*(

*z*) can be understood as a weighted combination of the shifted equivalent kernel

*h*(

*z*) [25, 33]. In [33], the Fourier transform of the equivalent kernel

*h*(

*z*) for the squared exponential covariance function is given as:

*ρ*is the average number of observations per unit (for example length). When

*b*is small,

*h̃*(

_{SE}**s**) is approximated by a step function. The rapid change from 1 to 0 happens at the point when Therefore, the frequency components above the threshold

*s*in the fitted function

_{c}*f̄*(

*z*) are suppressed. From Eq. (15), we can set the desired threshold

*s*by changing the hyper-parameters. The property of suppressing the unwanted high frequency components in the fitted function is applied in the regression of the intensity images in TIE.

_{c}## Appendix B: derivation of GP TIE

We have a 3D stack *ℐ*(*u*, *v*, *z*_{1}),..., *ℐ*(*u*, *v*, *z _{N}*), which are the 2D Fourier transforms of the measured intensity images. Our goal is to use regression to estimate the first derivative of the intensity spectrum at

*z*= 0, ${\frac{\partial \mathcal{I}(u,v,z)}{\partial z}|}_{z=0}$. Instead of doing 3D regression, we perform GP regression for each lateral spatial frequency (

*u*,

_{m}*v*) on

_{n}*N*data points

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{1}),

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{2}),...,

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*). It is easy to observe that

_{N}*ℐ*(

*u*,

_{m}*v*,

_{n}*z*) is taken from

*ℐ*(

*u*,

*v*,

*z*) at the same frequency coordinates (

*u*,

_{m}*v*). From Eq. (3),

_{n}*ℐ*(

*u*,

_{m}*v*,

_{n}*z*) follows a sinusoidal pattern with frequency $\pi \lambda ({u}_{m}^{2}+{v}_{n}^{2})$. The sinusoids prior is incorporated into the GP regression by setting the frequency threshold

*s*as $\pi \lambda ({u}_{m}^{2}+{v}_{n}^{2})$. This is realized by setting appropriate hyper-parameters

_{c}*σ*,

_{f}*σ*, and

_{n}*ℓ*in the regression. The hyper-parameters

*σ*and

_{f}*σ*are initialized to keep

_{n}*b*in Eq. (14) small. Next, the parameter

*ℓ*is solved from Eq. (15) with

*s*and

_{c}*σ*,

_{f}*σ*already known. The frequency threshold

_{n}*s*can be larger than $\pi \lambda ({u}_{m}^{2}+{v}_{n}^{2})$ to allow trade-offs between accuracy and noise filtering.

_{c}Define
${\frac{\partial \mathcal{I}({u}_{m},{v}_{n},z)}{\partial z}|}_{z=0}$ as the element of
${\frac{\partial \mathcal{I}(u,v,z)}{\partial z}|}_{z=0}$ at the coordinate (*u _{m}*,

*v*). From Eqs. (10)(12), the function

_{n}*ℐ̄*(

*u*,

_{m}*v*,

_{n}*z*) fitted by GP regression is expressed as:

**I**

*= [*

_{mn}*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{1}),

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{2}),...,

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*)]

_{N}*and*

^{T}**h**(

*u*,

_{m}*v*,

_{n}*z*) can be obtained from Eq. (12). The fitted function

*ℐ̄*(

*u*,

_{m}*v*,

_{n}*z*) is a function of the single variable

*z*. Therefore, its first derivative at

*z*= 0, ${\frac{\partial \mathcal{I}({u}_{m},{v}_{n},z)}{\partial z}|}_{z=0}$, is approximated by:

**I**

*is known from the measurements, and the vector ${\frac{\partial \mathbf{h}{({u}_{m},{v}_{n},z)}^{T}}{\partial z}|}_{z=0}$ can be derived from the GP regression with hyper-parameters already known. Therefore, ${\frac{\partial \mathcal{I}({u}_{m},{v}_{n},z)}{\partial z}|}_{z=0}$ is estimated by performing GP regression over data points*

_{mn}*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{1}),

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*

_{2}),...,

*ℐ*(

*u*,

_{m}*v*,

_{n}*z*).

_{N}By repeating the same process, we can obtain all of the frequency components for the first derivative of intensity spectrum at focus
${\frac{\partial \mathcal{I}(u,v,z)}{\partial z}|}_{z=0}$, then the phase is recovered with the Laplacian inversion in Eq. (1)(2). Because the computational complexity of GP regression is proportional to the cubic of the number of measurements [25], the algorithm has a computational complexity of *𝒪*(*N*(*N _{z}*)

^{3}), where

*N*is the number of pixels in one image, and

*N*is the number of measurements. In order to save computational time, the frequency components which have similar ${u}_{m}^{2}+{v}_{n}^{2}$ values can share the same hyper-parameters and hence the same ${\frac{\partial \mathbf{h}{({u}_{m},{v}_{n},z)}^{T}}{\partial z}|}_{z=0}$.

_{z}## Acknowledgments

The authors would like to thank Sijia Liu and Jingyan Wang for helping with experiments, and Aamod Shanker, Daniel Shuldman, and Andrew R. Neureuther for helpful discussions.

## References and links

**1. **G. Popescu, *Quantitative phase imaging of cells and tissues* (McGraw-HillNew York, 2011).

**2. **K. Nugent, D. Paganin, and T. Gureyev, “A phase odyssey,” Physics Today **54**, 27–32 (2001). [CrossRef]

**3. **M. R. Teague, “Deterministic phase retrieval: a Greens function solution,” J. Opt. Soc. Am. **73**, 1434–1441 (1983). [CrossRef]

**4. **N. Streibl, “Phase imaging by the transport equation of intensity,” Opt. Commun. **49**, 6–10 (1984). [CrossRef]

**5. **L. J. Allen and M. P. Oxley, “Phase retrieval from series of images obtained by defocus variation,” Opt. Commun. **199**, 65–75 (2001). [CrossRef]

**6. **M. Soto and E. Acosta, “Improved phase imaging from intensity measurements in multiple planes,” Appl. Opt. **46**, 7978–7981 (2007). [PubMed]

**7. **L. Waller, L. Tian, and G. Barbastathis, “Transport of intensity phase-amplitude imaging with higher order intensity derivatives,” Opt. Express **18**, 12552–12561 (2010). [PubMed]

**8. **L. Waller, M. Tsang, S. Ponda, S. Yang, and G. Barbastathis, “Phase and amplitude imaging from noisy images by Kalman filtering,” Opt. Express **19**, 2805–2814 (2011). [PubMed]

**9. **Z. Jingshan, J. Dauwels, M. A. Vázquez, and L. Waller, “Sparse ACEKF for phase reconstruction,” Opt. Express **21**, 18125–18137 (2013). [PubMed]

**10. **L. Waller, S. S. Kou, C. J. R. Sheppard, and G. Barbastathis, “Phase from chromatic aberrations,” Opt. Express **18**, 22817–22825 (2010). [PubMed]

**11. **L. Waller, Y. Luo, S. Y. Yang, and G. Barbastathis, “Transport of intensity phase imaging in a volume holographic microscope,” Opt. Lett. **35**, 2961–2963 (2010). [CrossRef] [PubMed]

**12. **P. M. Blanchard and A. H. Greenaway, “Simultaneous multiplane imaging with a distorted diffraction grating,” Appl. Opt. **38**, 6692–6699 (1999). [CrossRef]

**13. **S. S. Gorthi and E. Schonbrun, “Phase imaging flow cytometry using a focus-stack collecting microscope,” Opt. Lett. **37**, 707–709 (2012). [PubMed]

**14. **J. R. Fienup, “Reconstruction of an object from the modulus of its Fourier transform,” Opt. Lett. **3**, 27–29 (1978). [PubMed]

**15. **J. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. **21**, 2758–2769 (1982). [PubMed]

**16. **D. Paganin and K. A. Nugent, “Noninterferometric phase imaging with partially coherent light,” Phys. Rev. Lett. **80**, 2586–2589 (1998).

**17. **T. Gureyev and K. Nugent, “Rapid quantitative phase imaging using the transport of intensity equation,” Opt. Commun. **133**, 339–346 (1997).

**18. **T. Gureyev, A. Pogany, D. Paganin, and S. Wilkins, “Linear algorithms for phase retrieval in the Fresnel region,” Opt. Commun. **231**, 53–70 (2004).

**19. **M. Soto, E. Acosta, and S. Ríos, “Performance analysis of curvature sensors: optimum positioning of the measurement planes,” Opt. Express **11**, 2577–2588 (2003). [PubMed]

**20. **C. Zuo, Q. Chen, Y. Yu, and A. Asundi, “Transport-of-intensity phase imaging using Savitzky-Golay differentiation filter-theory and applications,” Opt. Express **21**, 5346–5362 (2013). [PubMed]

**21. **R. W. Schafer, “What is a Savitzky-Golay filter?[lecture notes],” Signal Processing Magazine, IEEE **28**, 111–117 (2011). [CrossRef]

**22. **S. Zabler, P. Cloetens, J. Guigay, J. Baruchel, and M. Schlenker, “Optimization of phase contrast imaging using hard x rays,” Review of Scientific Instruments **76**, 073705 (2005). [CrossRef]

**23. **J. Guigay, M. Langer, R. Boistel, and P. Cloetens, “Mixed transfer function and transport of intensity approach for phase retrieval in the Fresnel region,” Opt. Lett. **32**, 1617–1619 (2007). [CrossRef] [PubMed]

**24. **D. Paganin, T. E. Gureyev, K. M. Pavlov, R. A. Lewis, and M. Kitchen, “Phase retrieval using coherent imaging systems with linear transfer functions,” Opt. Commun. **234**, 87–105 (2004). [CrossRef]

**25. **C. E. Rasmussen and C. K. I. Williams, *Gaussian Processes for Machine Learning* (the MIT Press, 2006)..

**26. **B. Xue, S. Zheng, L. Cui, X. Bai, and F. Zhou, “Transport of intensity phase imaging from multiple intensities measured in unequally-spaced planes,” Opt. Express **19**, 20244–20250 (2011). [CrossRef] [PubMed]

**27. **K. Falaggis, T. Kozacki, and M. Kujawinska, “Optimum plane selection criteria for single beam phase retrieval techniques based on the contrast transfer function,” Opt. Lett. **39**, 30–33 (2014). [CrossRef]

**28. **N. Loomis, L. Waller, and G. Barbastathis, “High-speed phase recovery using chromatic transport of intensity computation in graphics processing units,” in “Biomedical Optics and 3-D Imaging,” (Optical Society of America, 2010), p. JMA7.

**29. **D. J. Lee, M. C. Roggemann, and B. M. Welsh, “Cramér-Rao analysis of phase-diverse wave-front sensing,” J. Opt. Soc. Am. A **16**, 1005–1015 (1999). [CrossRef]

**30. **L. Tian, J. C. Petruccelli, and G. Barbastathis, “Nonlinear diffusion regularization for transport of intensity phase imaging,” Opt. Lett. **37**, 4131–4133 (2012). [CrossRef] [PubMed]

**31. **A. Kostenko, K. J. Batenburg, A. King, S. E. Offerman, and L. J. van Vliet, “Total variation minimization approach in in-line X-ray phase-contrast tomography,” Opt. Express **21**, 12185–12196 (2013). [CrossRef] [PubMed]

**32. **L. Tian, J. C. Petruccelli, Q. Miao, H. Kudrolli, V. Nagarkar, and G. Barbastathis, “Compressive X-ray phase tomography based on the transport of intensity equation,” Opt. Lett. **38**, 3418–3421 (2013). [CrossRef] [PubMed]

**33. **P. Sollich and C. K. I. Williams, “Using the equivalent kernel to understand Gaussian process regression,” in *Advances in Neural Information Processing Systems 17*,” (the MIT Press, 2005), pp. 1313–1320.