Coherent diffraction imaging methods enable imaging beyond lens-imposed resolution limits. In these methods, the object can be recovered by minimizing an error metric that quantifies the difference between diffraction patterns as observed, and those calculated from a present guess of the object. Efficient minimization methods require analytical calculation of the derivatives of the error metric, which is not always straightforward. This limits our ability to explore variations of basic imaging approaches. In this paper, we propose to substitute analytical derivative expressions with the automatic differentiation method, whereby we can achieve object reconstruction by specifying only the physics-based experimental forward model. We demonstrate the generality of the proposed method through straightforward object reconstruction for a variety of complex ptychographic experimental models.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Ptychography is a coherent diffraction imaging (CDI) technique that acquires a series of intensity diffraction patterns through spatial shifts of the illumination (probe) across the sample (object) in a set of overlapping beam positions. Given a large number of overlapping beam positions, the ptychography experiment yields sufficient redundant information with which we can reconstruct the object structure to sub-beam-size spatial resolution, and even determine additional experimental parameters such as the structure of the probe itself. First proposed by Hoppe in 1969 , the ptychographic technique was realized experimentally and rapidly developed algorithmically in the 2000s [2–6]. By removing the typical CDI limitation that the probe size has to be larger than the sample, ptychography has enabled high resolution imaging of extended objects, making it a powerful imaging technique. As such, the ptychographic technique has found application not only as a 2D far-field diffraction imaging method but also as 2D near-field diffraction imaging method , a 3D Bragg imaging method , a 3D multislice imaging method , and a part of the 3D tomographic imaging method  including for objects beyond the depth of focus limit .
Imaging with ptychography involves solving the challenging phase retrieval problem, where one attempts to reconstruct an object from only the magnitude of its Fourier transform. In general, the phase retrieval problem is ill-posed ; solving it requires the use of oversampling and support constraints. These are typically used in an iterative projection framework that updates the object guess by applying a Fourier magnitude projection and a real-space constraint projection [13–16]. Alternatively, we can also frame phase retrieval as a nonlinear minimization problem, where we minimize an error metric using a gradient-based approach. The gradient-based approach is flexible and can include in the forward model a large variety of the physical phenomena related to the probing light (such as partial coherence , source fluctuations , and errors in positions [19, 20]), or the detection process (such as the measurement noise [21, 22] and the finite size of the pixel ). As such, this method has been the focus of much recent literature, leading to the development of steepest descent methods [16, 24, 25], conjugate gradient methods [4, 16, 26], Gauss-Newton methods , and quasi-Newton methods . These algorithms have found application in the far-field ptychographic problem not only to solve for the object alone [24, 25, 29–32] but also to additionally solve for the probe [32, 33] as well.
Gradient-based phase retrieval methods in the literature tend to rely on the availability of a closed-form expression for the gradient calculation. This closed-form expression is typically obtained by writing down an explicit expression for the error metric to minimize, then symbolically differentiating the error metric with respect to the individual input parameters . Calculating the gradient in this fashion is laborious; a slight modification of the forward model usually requires a complete rederivation and algorithmic reimplementation of the gradient expressions. This becomes especially limiting if we desire to explore variations of, or introduce new capabilities to, our basic experimental methodology. As such, it is more than desirable to have an approach beyond symbolic differentiation in order to easily explore a variety of algorithms and approaches.
Automatic differentiation , or algorithmic differentiation, provides such an alternative to symbolic differentiation. This approach is based upon the observation that vector-valued functions can, in general, be interpreted as composites of basic arithmetic operations and a limited number of elementary functions (including exponentials and trigonometric operations). Differentiation of functions can then be understood as a recursive application of the chain rule of differentiation, wherein we repeatedly differentiate the same elementary functions (with known derivatives), only with different input parameters. This is a mechanistic process and can hence be performed entirely in software. Given a set of input numeric parameters, the automatic differentiation method computes the exact derivative by accumulating the numerical values of the elementary functions and their derivatives, without ever calculating the closed form derivative expression (Section 2).
While the field of automatic differentiation has a long and storied history , it is only recently that the emergence of deep neural network methods has driven its widespread adoption in the optimization and machine learning communities. Specifically, there has now arisen a need to perform gradient-based minimization to optimize state-of-the-art neural networks, which can be compositions of thousands or even millions of individual elementary functions. Since calculating closed-form derivatives for these is not feasible, automatic differentiation has become the tool of choice, thus leading to the recent rapid adoption and advancement of such software. In 2014, when Jurling and Fienup first proposed an automatic differentiation framework for phase retrieval , they commented on the lack of suitable existing software packages. Since then, we have seen the rise of multiple powerful, easy-to-use, and computationally efficient automatic differentiation frameworks such as TensorFlow , PyTorch , and Autograd . More recently, we have even seen proof-of-concept demonstrations [38, 39] that successfully adapt these software packages to solve the phase retrieval problem.
In this paper, we first provide an overview of the reverse-mode automatic differentiation algorithm (also referred to as the backpropagation algorithm) for gradient calculations (Section 2), and then mathematically justify the application of this algorithm as a general framework for ptychographic phase retrieval. We demonstrate the numerical correctness of the reverse-mode automatic differentiation framework through a comparison with the popular symbolic-gradient-based ePIE method (Section 3). Finally, we demonstrate the generalizability of this framework through successful phase retrieval for increasingly complex ptychographic forward models–near-field ptychography and 3D Bragg projection ptychography–emphasizing the flexibility and potential capacity of the approach for non-standard ptychography experiments (Section 4). This shows that the reverse-mode automatic differentiation framework allows the practitioner to update and change the forward model as necessary to better reflect the physics of the problem, without prior consideration towards how to symbolically differentiate the error metric.
2. Overview of automatic differentiation
In this section, we provide a limited overview of the automatic differentiation procedure, focusing singularly on the reverse-mode automatic differentiation (or reverse-mode AD) framework, finally motivating the application of this framework to the phase retrieval problem. Detailed and rigorous examinations of the automatic differentiation procedure–the various modes and the algorithmic frameworks–are available , as is a detailed exposition of the reverse-mode AD procedure in application to the phase retrieval problem .
To demonstrate the idea of automatic differentiation, we first consider differentiable functions with the function composition , with the assumption that ϕ1, ϕ2, and ϕ3 are elementary functions with priorly available individual function derivatives , , and (for any ). At a given point , we can compute the value of f through the sequence of successive evaluations shown in Table 1. We refer to this sequence of computations as the forward pass.
In the evaluation trace shown in Table 1, we follow the notation in the literature [34, 40, 41] and index the variables stored in the memory as vi, with for the input variables, and for the intermediate computed variables. To calculate the derivative of f at , we use the chain rule of differentiation:
To evaluate this derivative, we perform a backward pass (i.e., a reversed sequence of computations) during which we associate each intermediate variable from the forward pass, vi, with a new adjoint variable ; we then evaluate from i = 3 to i = 0. This is shown in Table 1. Noting thatTable 1. Thus, to calculate the final derivative value, we need to:
- identify the derivatives of the elementary functions ϕ1, ϕ2, and ϕ3,
- perform a forward pass evaluation of the function and store the intermediate values calculated, and
- perform a backward pass to accumulate the final derivative.
This reverse-mode automatic differentiation scheme calculates the exact derivative value (up to floating point errors) without relying on the closed form expression of the derivative; instead, it relies only on the structure of the computational graph.
From Table 1, we can see that for the reverse-mode AD gradient calculation, the values of the intermediate variables calculated during the forward pass are shared with the backward pass; each elementary expression is only computed once but reused multiple times. This ensures that the gradient calculation is very efficient computationally. In fact, for functions of the form , if the function evaluation (forward pass) requires34, 40]. In other words, for reverse-mode AD, barring memory limitations, the time required to calculate the gradient is always within an order of magnitude of the time required to calculate the function value itself. This is known as the cheap gradient principle.
As such, the reverse-mode AD procedure is ideally suited to provide gradient-based iterations aiming at solving of optimization problems. A quintessential case in point is machine learning with neural networks–the ‘training step’ in large neural networks boils down to the numerical optimization of an error metric involving a large number of functional compositions and up to millions of input parameters. In such a situation, the derivation of the closed-form gradient with pen and paper is simply out of reach. Thus, the implementation of gradient descent procedures that use the AD framework has been key to the recent meteoric rise in machine learning applications.
In contrast to the neural network case, a typical error metric for the phase retrieval problem only involves a limited number of functional compositions, and the closed form gradients can be calculated manually. This has been demonstrated in prior phase retrieval literature: the Wirtinger flow method and its variants [24, 25, 42], the PIE family of methods , and numerous other methods proposed in the literature [14, 16, 22, 29, 43] are examples of gradient-based descent procedures that rely on such closed-form gradient expressions. However, the impressively flexible AD framework not only simplifies these selfsame gradient calculations, but also allows us to modify the forward model; this empowers us to address the full range of problems that are related to phase retrieval. Consequently, we expect that the AD framework should also greatly benefit the phase retrieval community.
As we demonstrate in Section 3 for the ptychographic problem, the error metric for the phase retrieval problem is a scalar-valued multivariate objective function of complex variables, [16, 44]. To minimize such a function using a gradient descent approach, we adopt the Wirtinger gradient descent formalism [45–47]. For some , the Wirtinger gradient operator is defined asTable 1. This ensures that the gradient calculation procedure is very efficient, with the time cost once again comparable to that for the objective function itself.
3. Validation: Far-field transmission ptychography
In this section, we first establish a forward model for far-field transmission ptychography. We then use reverse-mode AD to set up a gradient descent procedure that is, by construction, equivalent to the ePIE reconstruction method  which uses a closed-form gradient expression. We compare these frameworks numerically and establish that the automatic differentiation framework calculates gradient values that are identical to those calculated via the closed-form gradient expressions. This comparison to the well-known ePIE method serves to establish the validity of the AD framework for phase retrieval. The ePIE method, however, cannot be used out-of-the-box for object reconstruction once we modify the ptychographic forward model–as such, it is not well-suited for use with the AD framework. Instead, we demonstrate that we can use the flexible and easy-to-use state-of-the-art accelerated adaptive gradient descent algorithms (like the Adaptive Moment Estimation or Adam algorithm ) that are commonly available out-of-the-box with AD software to efficiently solve the ptychographic reconstruction problem.
3.1. Forward model for far-field transmission ptychography
In far-field transmission ptychography, an unknown object is illuminated with a coherent beam, called the probe, which is localized to a small area on the object. The intensity of the wavefront transmitted through the object is then measured at the far field by a pixel-array detector. The beam is used to raster scan the object in a grid of J spatially overlapping illumination spots, generating a set of J diffraction patterns at the detector plane. The forward model for far-field ptychography consists of the following steps:
- The complex valued object transmission function is approximated by a total of N pixels in the object plane. For the illumination position rj, a shift operator , with an matrix representation, extracts the M object pixels illuminated by the probe beam containing M pixels in the object plane. The thus transmitted wave function is represented by the exit wave :
- Each transmitted exit wave ψj propagates to the far field detector plane, where the detector then records a real-valued intensity pattern containing M pixels. If there is no noise fluctuation, the expected intensity of this jth wave-field at the detector plane reads
- The model in Eq. (4) describes the behavior of the detected intensity in average. During an experiment, each diffraction pattern is subject to random fluctuations produced by the instrumental and the Poisson counting fluctuations (shot-noise). If one further assumes that the instrumental (thermal) noise is negligible, a Gaussian additive perturbation is usually accurate enough to describe how the counting fluctuation plagues the square-root of the measured intensity (see, for instance,  and references therein):
The above relations are the forward (observation) model that predicts how the observations behave when the sample and the probe are jointly given. The inversion/reconstruction step simply aims at retrieving both these quantities from the observations. For that purpose, the maximum likelihood estimator defines the solution of this reconstruction step via a minimization problem that can be easily derived from the fluctuation model of Eq. (5). In our case it leads to4, 22]. In any case, the optimization step derived from the noise model involves a large-scale phase retrieval problem that is rather challenging. Phase retrieval problems are NP-hard problems because of their inherent non-convex structure and the search for good and computationally efficient heuristics is still an open problem. The overlapping between successive scanning probe is however recognized as a key factor that helps in preventing stagnation for standard, derivative-based iterations. Thanks to AD, these derivatives can be efficiently computed and further used by one of the iterative solvers that will be discussed in the next section.
3.2. Iterative ptychographic reconstruction with reverse-mode AD: ePIE and Adam
With b = 1, Algorithm 1 exhibits one iteration of the ePIE procedure, during which it uses the information in single diffraction patterns to cyclically update the current probe and object estimates. The core (or “Engine”) of this update is the correction (Eq. (10)) computed from the considered probe position using the derivative of g with respect to the sample and the probe . In its original formulation, the ePIE procedure uses the well-known closed form expression of these derivatives (see Eqs. (6) and (7) in ) to compute these updates.
Alternatively, we can use the reverse-mode AD procedure for the numerical computation of these derivatives by relying on the Wirtinger formalism (Eq. (2)) for complex derivatives. The derivative with respect to the real and imaginary parts of the complex variable are evaluated separately, and they are then accumulated according to the equations
The AD version of ePIE is mathematically equivalent to the original ePIE ; thus, both the algorithms should generate the same sequence of iterates. We refer to the AD version of the ePIE solver as AD-ePIE.
In Fig. 1 we present a simplified representation of the computational graph structure that generates the AD-ePIE iterates (see also Table 1). We can see from the figure that the modular reverse-mode AD framework is agnostic as to the choice of the error metric, the update scheme, and even to the forward model itself; it generalizes straightforwardly to any variations in these. We demonstrate this in Section 4, where we vary both the forward model and the update scheme.
A natural extension to the ePIE/AD-ePIE algorithm is to use several probe positions to compute a single update: that is, to increase the minibatch size in Algorithm 1. In this perspective, Algorithm 1 belongs to a larger family of iterates that takes advantage of the natural partitioning of the dataset . Similar or identical optimization strategies are indeed well known under different names in several communities, such as ‘ordered-subset’ in image processing , ‘incremental gradient’  in the optimization literature, or ‘stochastic gradient’ for neural-network learning algorithms .
A salient feature of ePIE/AD-ePIE is that the chosen step sizes and (Eq. (9)) for the object and probe updates respectively equate to the inverse of the Lipschitz constant of the partial gradients and . This choice of step sizes is well-known in the optimization literature, and is particularly useful in a batch gradient descent setting where the derivatives in the update (Eq. (10)) are calculated using all available probe positions at once (setting ). As an example, with a steepest-descent iteration algorithm  these step sizes would then ensure global convergence (toward a local minimizer). However, the Lipschitz constants are not usually known a priori; they generally have to be carefully derived using the closed-form gradient expressions . Changes to the forward model, or the inclusion of additional terms in the error metric (e.g. regularizers), can thus require a re-deriving of both the closed form gradient expression and the Lipschitz constant. As such, using the Lipschitz constants to calculate the step sizes would negate the flexibility that is the hallmark of the AD procedure.
To circumvent this limitation and enable phase retrieval within the AD framework, we can substitute the ePIE/AD-ePIE method with a choice of state-of-the-art adaptive gradient descent algorithms that are widely used in the machine learning literature , such as Momentum, Nesterov Momentum, Adagrad, Adadelta, RMSProp, or Adam. The performance of these exotic methods specifically in phase retrieval applications is a new but promising area of research, with the recent literature [32, 53, 54] demonstrating that momentum-based accelerated gradient descent methods converge to a solution faster than standard gradient descent methods. In this work, we use the Adam (Adaptive Moment Estimation) gradient descent procedure  that uses a momentum-like acceleration and, crucially, does not rely on the Lipschitz constant for the gradient descent. As such, the Adam method is robust to changes in the error metric and therefore well-suited to phase retrieval applications.
The Adam optimization method is available out-of-the-box in the commonly used AD toolsets TensorFlow  and PyTorch , so that it can be used directly without first implementing the underlying algorithm. Nevertheless, for the sake of clarity, we present in Appendix A the parameter initialization step and the variable update step that make up the Adam optimizer for the object variable (the probe updates are similarly calculated). To solve the ptychography problem, we incorporate the Adam optimizer in a fashion identical to Algorithm 1, but with steps 4 and 5 substituted with the update computed from the Adam optimizer. In Section 3.3, we provide a minimal comparison of performance of the Adam method to the ePIE/AD-ePIE method by iterating through the individual probe positions and calculating the object and probe updates separately per probe position. In Section 3.4, we present the search strategy used for the choice of the Adam hyperparameters (viz., the initial probe update step size, the initial object update step size, and the minibatch size) adopted in this work, and provide heuristic guidelines for the choice of hyperparameters to achieve computationally efficient ptychographic reconstruction. Beyond our heuristic approach, a more detailed evaluation of the application of the Adam algorithm for phase retrieval is an important research question but is beyond the scope of this paper.
3.3. Numerical results
For a numerical validation of the reverse-mode AD procedure, we simulated a far-field transmission ptychography experiment with an incident x-ray of energy keV. We used the ‘Mandrill’ and ‘Cameraman’ images, each 128 × 128 pixels in size, as the test object magnitude and phase respectively, and embedded the test object at the center of a simulation box of size 190 × 190 pixels (with tight support). We illuminated the test object with a complex-valued probe function approximated using an array of size 64 × 64 pixels and with a total integrated intensity of photons at the object plane. The probe function was obtained by propagating the exit wave from a square aperture of width m to a distance of ζ = 15 cm so that it contained diffraction fringes characterized by m. The raster grid for the object scan was obtained by translating the probe latitudinally and longitudinally in steps of 3.5 μm (6 pixels). The real-space pixel grid was obtained by assuming exact Nyquist sampling with respect to a detector with a pixel pitch of 55 μm placed at a distance of m from the object. We thereby generated 484 far-field diffraction patterns at the detector position, then incorporated Poisson noise into these diffraction patterns to get the simulated far-field data points.
Our implementation of the AD-based ptychographic reconstruction framework follows previous work [38,39] and uses the TensorFlow  machine learning library for the gradient calculation. For the reconstruction, we used a m (12 pixels) square aperture as the initial probe guess, and a random complex array as the initial object guess. Using the same starting parameters, we performed 150 iterations through the data set (with the probe value held constant for the first iteration) for the ePIE, AD-ePIE, and the Adam methods, with the same randomized sequence of diffraction patterns for all three algorithms. For the Adam method, we used initial update step sizes and for the probe and object respectively.
As we demonstrate in Fig. 2, the AD-ePIE method followed the same reconstruction path as the ePIE method for both the error metric , and the normalized object reconstruction error calculated per pixel from the ground truth . This demonstrates that the reverse-mode AD framework described in this paper calculates gradient values numerically identical to those calculated using the closed form symbolic derivatives . Additionally, reconstruction using the Adam algorithm also converges to the same final probe and object structures as the ePIE algorithm: the final Adam object and probe structures only differ by 5% and 4% respectively from the corresponding final ePIE structures.
The reconstruction algorithms implemented in this work do not use any advanced domain decomposition techniques to parallelize the ptychographic reconstruction [38, 56]; they iterate sequentially through minibatches of diffraction patterns. When these algorithms were implemented to run on a single 3.00 GHz Intel Xeon processor with a minibatch b = 1, the runtime for each minibatch update for the forward model and the ePIE algorithm (implemented with the numpy library in Python) were found to be ms, and msrespectively. The corresponding runtime for the AD-ePIE and Adam algorithms (implemented with TensorFlow) were found to be ms, and ms respectively. Disregarding backend discrepancies, this is in agreement with the expected computational costs described in Section 2—the ePIE algorithm implements the symbolic gradient expressions directly and has a runtime comparable to the forward model itself. The AD-ePIE method has a runtime that is within a small scaling factor of that for the forward model, and the Adam algorithm requires some additional computation for the necessary moment updates (Algorithm 2). In practice, the Adam algorithm converges to the final solution in 80 iterations ( s) while the ePIE algorithm requires 150 iterations ( s) to achieve convergence. Once we use larger minibatch sizes, however, as we demonstrate in Section 3.4, the Adam algorithm can converge to a solution much faster than the basic ePIE approach.
3.4. Choosing Adam hyperparameters for efficient ptychographic reconstruction
In this work, for reconstructions with the Adam algorithm (Algorithms 1 and 2), we manually supply three key hyperparameters: the minibatch size b, and the initial Adam object and probe update step sizes and . For optimal performance of these reconstruction algorithms, we need to choose hyperparameter values that simultaneously optimize the hardware utilization, the time cost per minibatch update, and the total number of minibatch updates required to converge to a solution—this makes for a difficult research problem . As such, in this work, we take a primarily heuristic two-step approach: 1) we pick a minibatch size that optimizes the hardware utilization and time cost per minibatch update, and then, with this numbersize fixed, 2) we perform a gridsearch to identify the values for and that reduce the number of iterations required for the reconstruction.
Ptychographic reconstruction with b = 1, as used in Section 3.3, has optimally low time cost per minibatch update, but is not suitable for implementation on parallel computing hardware. In fact, when implemented for use in a nVIDIA K40 GPU, this reconstruction procedure has minimal GPU utilization (), and shows no noticeable improvement over the CPU version in the time cost per minibatch update. This suggests the use of larger minibatches for optimal GPU utilization. As we increase the minibatch size, however, the number of intermediate values stored in the memory also increases proportionally. As such, we need to choose a minibatch size that ensures that the resulting computational graph, including all the intermediate valuescalculated in the forward and backward passes, fits within the GPU memory. For the ptychographic dataset simulated in Section 3.3, we can set ; this choice combines much greater GPU utilization () with minimal time cost per minibatch update ( ms), and also maintains stochasticity in the gradient directions computed per update.
With the minibatch size fixed, we can then perform a gridsearch to identify the and values that lead to fast ptychographic reconstruction. While there exist sophisticated methods to modify these parameters within the descent procedure , our algorithms use the more common approach that uses fixed step sizes. Similarly, while there exist early stopping methods to identify the convergence of the reconstruction procedure , we simply monitor the error metric [Eq. (7)] for a fixed, large, number of iterations and choose the initial step sizes that give us the lowest values for the error metric.
To examine how the choice of these hyperparameters affects the reconstruction process at different experimental conditions, we followed the procedure in Section 3.3, but with the integrated probe intensity set to and photons respectively to generate two additional ptychographic datasets. To reconstruct the object and probe from each dataset, we used the Adam algorithm with a minibatch , and performed a gridsearch on a logarithmic grid of initial probe update step sizes [100,10,1,0.1,0.01] and initial object update step sizes of [10,1,0.1,0.01,0.001].
In Fig. 3, we show the object NRMSE obtained for each of these hyperparameter sets after 1500 iterations (a runtime of s). We can see that remains an optimal object update step size for all three experiments. The probe update step size requires more careful tuning, with a good starting choice for the tuning procedure. For , after 1500 iterations, we obtained object and probe  reconstruction errors of , , and respectively for integrated probe intensities of , , and . In comparison, when we stop the ePIE algorithm (b = 1) after 150 iterations ( s) without ensuring convergence, we get the corresponding reconstructions errors of , , and . This demonstrates that the Adam algorithm, with the provided hyperparameter search strategy, is a fast and robust choice for ptychographic reconstruction within the AD framework.
4. Applications to other ptychographic forward models
In this section, we apply the reverse-mode AD framework developed in Section 3 for the far-field transmission ptychography experiment to the more complex experimental regimes of near-field ptychography and multi-angle Bragg ptychography.
4.1. Near-field ptychography
Just as in the far-field case, the near-field ptychography experiment uses a probe beam to raster scan the object in a grid of spatially overlapping illumination spots, generating a set of J exit waves after the probe-object interaction. The detector, however, is placed at a distance z from the object so that the Fresnel number satisfies the condition7] at any given probe position. This is the high Fresnel number condition for the near-field regime. In this regime, given a probe , an object , a lateral shift operator , and an exit wave , the expected intensity at the detector plane is given by 59]. The experimentally measured intensity patterns are again denoted as yj. Detailed characterizations of this near-field ptychography experiment can be found in , , , and .
We simulated a near-field ptychography experiment using an incident 8.7 keV probe beam approximated using an array of size 512×512 pixels at the object plane, with the pixel pitch set to m. The probe was initialized as a Gaussian with a FWHM of m (50 pixels), then passed through a diffuser and modeled as a speckle pattern with an average flux of photons/pixel. The simulated object consists of the ‘Mandrill’ and ‘Cameraman’ images of size 192×192 pixels, modeled as the object magnitude and phase respectively. The raster grid was obtained by translating the object in the horizontal and vertical in steps of m (44 pixels), generating a total of 25 diffraction patterns at a detector placed cm from the object plane. We added Poisson noise to the diffraction patterns to obtain our simulated dataset.
For the near-field reconstruction, we obtained an initial probe estimate by backpropagating the average measured intensity, , to give
The object was initialized as a 192×192 array of random complex numbers. To solve for the probe and object structures, we again used the Adam optimizer in the approach outlined in Algorithm 1, but considering 5 probe positions per iteration (i.e. with a minibatch size of 5). The initial Adam update step sizes were set to and for the object and probe respectively. The minibatch size and initial step size were chosen as described in Section 3.4.
After 10,000 iterations of Adam gradient descent, we obtained the object and probe reconstructions shown in Fig. 4. The object was reconstructed with an NRMSE of 0.01, and the probe with an NRMSE of 0.12. The discrepancies in the reconstructed probe were contained at the edges of the full-field probe, in regions which did not interact with the object and were thus unconstrained. These reconstructions demonstrate that the straightforward generalization of the reverse-mode AD reconstruction framework from the far-field ptychography case (Fig. 1) to the near-field ptychography case successfully solves the near-field ptychography phase retrieval problem.
4.2. Multi-angle Bragg ptychography
Multi-angle Bragg projection ptychography (maBPP)  is a ptychographic experiment that allows for two degrees of freedom in the scan parameters: 1) the choice of the planar scan positions for the usual two-dimensional ptychographic scan, and 2) the choice of angular scan positions corresponding to small object rotations of a crystalline object oriented to satisfy a Bragg diffraction condition. The maBPP experiment uses the far-field ptychography setup, but with the detector placed to measure a crystalline Bragg peak, typically displaced from the direct beam by tens of degrees. The detector records a set of two-dimensional (2D) coherent diffraction patterns at overlapping scan positions at one or more angles near the Bragg diffraction condition. The diffraction patterns are then used to reconstruct a three-dimensional (3D) strain-sensitive image of an extended crystalline sample . An example experimental geometry with the exit beam at a 60° angle to the incident beam direction is shown in Fig. 5.
To develop the maBPP forward model, we consider a 3D diffracting crystal volume and an illumination volume . The incident probe direction and the exit beam direction satisfy the Bragg diffraction condition when the scattering vector coincides with some reciprocal lattice vector for the crystal. At the Bragg condition, if the probe volume interacts with the slice of the object indicated by the lateral shift operator , then the 2D exit wave ψj is calculated as 
When the object is rotated by a small angle , the scattering vector deviates from the Bragg condition by . The corresponding change in the object-probe interaction can be encoded in terms of a phase shift operator defined as . The 2D exit wave is then given by 
We can again adapt the forward model of the far-field ptychography experiment to the multi-angle Bragg experiment by generalizing the probe and object models to 3D, and applying the angle-dependent phase shifts along with the projection operator . This 3D generalization comes at the cost of a dramatic increase in the parameter space, and the reconstruction becomes correspondingly more difficult. With this in mind, for ease of reconstruction we use a speckle pattern, which was found to help in solving the phase retrieval problem for 2D CDI [65, 66], as a highly diverse structured probe illumination. We also impose the commonly applied  additional constraints on the reconstruction problem: we assume that the probe structure and the object thickness (along the direction) are known in advance.
For the multi-angle Bragg ptychography experiments, we simulated the object as a compact crystal represented by set of grid points inside a faceted volume. We set the interior points to have magnitude 0.5 and a spatially varying phase structure emulating a strain field. We defined the orthonormal real space axes such that each object voxel is of size nm, and such that the polyhedral crystal is situated obliquely within (with of the volume of) a cuboid of size voxels (). The cuboid was itself placed at the center of a simulation box of total size voxels (). A 64 × 64 pixel detector with a pixel pitch of m was placed at a distance of 1.5 m from the object, normal to the real space y-axis. The probe was initialized as Gaussian beam of energy 8.7 keV with an FWHM of 396 nm (6 pixels) contained entirely within a 64 × 64 pixel array, then passed through a diffuser and modeled as a speckle pattern. The probe was then interpolated to approximate the incident beam at a Bragg condition of (Fig. 5) such that and . The beam profile was assumed to be constant along the propagation direction during its propagation through the simulation box. To obtain the ptychographic data set, we applied phase modulations corresponding to an angular shift between around the Bragg condition using an angular step size of , for a total of 15 angles. At each such angle, we translated the probe along the y and z directions with a step size of 132 nm (2 pixels) in a raster grid of 41×24 scan positions, i.e. with an overlap of per step in the y direction and per step in the z direction. This generated a total of 14760 diffraction patterns, to each of which we added Poisson noise, which gave us a data set with an overall intensity maximum of 13560 photons/pixel.
During the reconstruction, we solved for a volume of size voxels, initialized as a random complex array, and placed centrally in the simulation box. The object support along the direction is set to be slightly () larger than the actual object thickness. This allows for the expected spreading of the object induced by resolution effects due to the low photon count . To solve for the object structure, we again adapted the approach outlined in Algorithm 1 to use the Adam optimizer with the initial update step size , and with minibatches each containing 400 diffraction patterns selected in a stochastic fashion irrespective of either probe position or object rotation. The minibatch size and initial update step size were chosen as described in Section 3.4. After 70 iterations of Adam gradient descent, we obtained a reconstruction of the crystal (and the loose support) shown in Fig. 6. The overall normalized reconstruction error, calculated using a 3D adaptation of the sub-pixel registration algorithm presented in , was found to be 0.09, which is much larger than the NRMSE observed for the 2D objects in Sections 3.3 and 4.1. As we can see in Fig. 6, the error in the reconstruction is primarily due to discrepancies at the object edges—this is a physical effect that can be attributed to the presence of shot noise that corrupts the diffraction data . This results demonstrates that the reconstruction accurately captures the physics of the experiment.
Our results in this paper demonstrate that the automatic differentiation technique can be used to construct a general gradient-based inversion framework for ptychographic phase retrieval. Specifically, we have shown that we can minimize a ptychographic error metric by first using reverse-mode AD to numerically calculate the gradients of the error metric, then using these within a state-of-the-art adaptive gradient descent algorithm. We can thereby solve the ptychographic problem without ever performing an analytical derivation for any closed-form gradient expression. This inversion framework is robust to changes in the forward model and can be used identically for phase retrieval in such varying experimental configurations as far-field transmission ptychography, near-field ptychography, and multi-angle Bragg ptychography.
The inversion framework showcased in this work does not rely on any particular choice of the error metric or the experimental configuration. We can change the error metric either to accommodate alternate noise models, or to incorporate a priori knowledge about the experiment through the choice of a regularizer. We can tailor the experimental setup and introduce new degrees of freedom when necessary, then simply change the forward model correspondingly. Additionally, by using the state-of-the-art TensorFlow library for the gradient calculations, we gain in-built scalability and parallelization (multi-CPU/GPU architectures), thus allowing for convenient phase retrieval for experimental models large and small. In the future, we aim to leverage these flexibilities to incorporate a variety of physical phenomena–such as probe fluctuations, positional inaccuracies, and limitations in the detection process–into the phase retrieval framework. This would enable high-resolution phase retrieval from near-field/far-field 3D (or 2D) Bragg and non-Bragg experiments, all within a single consistent platform.
The flexible AD-based inversion framework is expected to provide a unified approach to phase retrieval for general (ptychographic and non-ptychographic) CDI experiments with x-rays, electrons, or optical waves, even as we move towards increasingly compleximaging regimes. This would potentially give researchers the ability to explore variations of basic CDI methodologies in a convenient and straightforward manner, and could thereby prove a powerful addition to the phase retreiver’s toolbox.
A Parameter update with the Adam optimizer
To examine how the Adam optimizer performs the parameter updates, we consider the far-field transmission ptychography setting from Section 3.1, with the error metric which is to be minimized with respect to the object parameter . The optimization is performed in the minibatch setting described in Algorithm 1, wherein the partial derivative at the step, , is calculated using Eq. (8).
The Adam optimization step consists of a parameter initialization step and a variable update step. The initialization step, which precedes the gradient descent iterations, is outlined in Algorithm 2a. The parameter update step in Algorithm 2b replaces Steps 4 and 5 in the iterative descent setting outlined in Algorithm 1. Typical out-of-the-box implementations of the Adam optimizer only support gradient descent for real-valued variables, in which case the real and imaginary components of the object are separately updated. To formalize this update step, we can separate the real and imaginary components of as48].
Office of Science, U.S. Department of Energy, Contract (DE-AC02-06CH11357); Basic Energy Sciences; Advanced Scientific Computing Research; Laboratory Directed Research and Development (LDRD-2017-080); National Institutes of Health (R01 GM104530, R01 MH115265).
1. W. Hoppe, “Beugung im inhomogenen Primarstrahlwellenfeld. III. Amplituden-und Phasenbestimmung bei unperiodischen Objekten,” Acta Crystallogr. A 25, 508–514 (1969). [CrossRef]
2. J. M. Rodenburg and H. M. Faulkner, “A phase retrieval algorithm for shifting illumination,” Appl. Phys. Lett. 85, 4795–4797 (2004). [CrossRef]
7. M. Stockmar, P. Cloetens, I. Zanette, B. Enders, M. Dierolf, F. Pfeiffer, and P. Thibault, “Near-field ptychography: phase retrieval for inline holography using a structured illumination,” Sci. Rep. 3, 1927 (2013). [CrossRef]
8. P. Godard, G. Carbone, M. Allain, F. Mastropietro, G. Chen, L. Capello, A. Diaz, T. Metzger, J. Stangl, and V. Chamard, “Three-dimensional high-resolution quantitative microscopy of extended crystals,” Nat. Commun. 2, 568 (2011). [CrossRef] [PubMed]
9. A. M. Maiden, M. J. Humphry, and J. M. Rodenburg, “Ptychographic transmission microscopy in three dimensions using a multi-slice approach,” J. Opt. Soc. Am. A 29, 1606–1614 (2012). [CrossRef]
10. M. Dierolf, A. Menzel, P. Thibault, P. Schneider, C. M. Kewish, R. Wepf, O. Bunk, and F. Pfeiffer, “Ptychographic x-ray computed tomography at the nanoscale,” Nature 467, 436–439 (2010). [CrossRef] [PubMed]
12. R. H. T. Bates, “Fourier phase problems are uniquely solvable in more than one dimension. I. Underlying theory,” Optik 61, 247–262 (1982).
13. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972).
15. V. Elser, “Phase retrieval by iterated projections,” J. Opt. Soc. Am. A 20, 40–55 (2003). [CrossRef]
16. S. Marchesini, “A unified evaluation of iterative projection algorithms for phase retrieval,” Rev. Sci. Instruments 78, 011301 (2007). [CrossRef]
19. A. Tripathi, I. McNulty, and O. G. Shpyrko, “Ptychographic overlap constraint errors and the limits of their numerical recovery using conjugate gradient descent methods,” Opt. Express 22, 1452–1466 (2014). [CrossRef] [PubMed]
20. P. Dwivedi, A. Konijnenberg, S. Pereira, and H. Urbach, “Lateral position correction in ptychography using the gradient of intensity patterns,” Ultramicroscopy 192, 29–36 (2018). [CrossRef] [PubMed]
21. P. Thibault and M. Guizar-Sicairos, “Maximum-likelihood refinement for coherent diffractive imaging,” New J. Phys. 14, 063004 (2012). [CrossRef]
24. E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via Wirtinger flow: theory and algorithms,” IEEE Trans. Inf. Theory 61, 1985–2007 (2015). [CrossRef]
25. H. Zhang and Y. Liang, “Reshaped wirtinger flow for solving quadratic system of equations,” Adv. Neural. Inf. Process. Syst. 29, 2622–2630 (2016).
26. Z. Wei, W. Chen, C.-W. Qiu, and X. Chen, “Conjugate gradient method for phase retrieval based on the Wirtinger derivative,” J. Opt. Soc. Am. A 34, 708 (2017). [CrossRef]
27. J. Zhong, L. Tian, P. Varma, and L. Waller, “Nonlinear optimization algorithm for partially coherent phase retrieval and source recovery,” IEEE Trans. Comput. Imaging 2, 310–322 (2016). [CrossRef]
28. J. Li and T. Zhou, “Numerical optimization algorithms for wavefront phase retrieval from multiple measurements,” Inverse Probl. & Imaging 11, 721–743 (2017). [CrossRef]
29. J. Qian, C. Yang, A. Schirotzek, F. Maia, and S. Marchesini, “Efficient algorithms for ptychographic phase retrieval,” Inverse Probl. Appli. Contemp. Math 615, 261–280 (2014).
31. L.-H. Yeh, “Analysis and comparison of fourier ptychographic phase retrieval algorithms,” Technical Report No. UCB/EECS-2016-86 (University of California, 2016).
32. A. M. Maiden, D. Johnson, and P. Li, “Further improvements to the ptychographical iterative engine,” Optica 4, 736–745 (2017). [CrossRef]
33. R. Hesse, D. R. Luke, S. Sabach, and M. K. Tam, “Proximal heterogeneous block implicit-explicit method and application to blind ptychographic diffraction imaging,” SIAM J. Imaging Sci. 8, 426–457 (2015). [CrossRef]
34. A. Griewank and A. Walther, Evaluating derivatives: principles and techniques of algorithmic differentiation, vol. 105 (SIAM, 2008). [CrossRef]
35. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Józefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. G. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. A. Tucker, V. Vanhoucke, V. Vasudevan, F. B. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv 1603.04467 (2016).
36. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in 31st Conference on Neural Information Processing Systems (NIPS, 2017).
37. D. Maclaurin, “Modeling, inference and optimization with composable differentiable procedures,” Ph.D. thesis, Harvard University (2016).
38. Y. S. G. Nashed, T. Peterka, J. Deng, and C. Jacobsen, “Distributed automatic differentiation for ptychography,” Procedia Comput. Sci. 108, 404–414 (2017). [CrossRef]
39. S. Ghosh, Y. S. Nashed, O. Cossairt, and A. Katsaggelos, “ADP : Automatic differentiation ptychography,” IEEE International Conference on Computational Photography (ICCP) (IEEE, 2018), pp.1–10.
40. A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: a survey,” J. Mach. Learn. Res. 18, 1–43 (2015).
41. P. H. Hoffmann, “A hitchhiker’s guide to automatic differentiation,” Numer. Algorithms 72, 775–811 (2016). [CrossRef]
43. Z. Wen, C. Yang, X. Liu, and S. Marchesini, “Alternating direction methods for classical and ptychographic phase retrieval,” Inverse Probl. 28, 115010 (2012). [CrossRef]
44. J. R. Fienup, T. R. Crimmins, and W. Holsztynski, “Reconstruction of the support of an object from the support of its autocorrelation,” J. Opt. Soc. Am. 72, 610–624 (1982). [CrossRef]
45. D. Brandwood, “A complex gradient operator and its application in adaptive array theory,” IEE Proc. F-Communications, Radar Signal Process. 130, 11–16 (1983). [CrossRef]
46. K. Kreutz-Delgado, “The complex gradient operator and the cr-calculus,” arXiv preprint arXiv:0906.4835 (2009).
47. L. Sorber, M. V. Barel, and L. D. Lathauwer, “Unconstrained optimization of real functions in complex variables,” SIAM J. Optim. 22, 879–898 (2012). [CrossRef]
48. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv 1412.6980 (2014).
49. S. Ahn and J. A. Fessler, “Globally convergent image reconstruction for emission tomography using relaxed ordered subsets algorithms,” IEEE Trans. Med. Imaging 22, 613–626 (2003). [CrossRef] [PubMed]
50. D. P. Bertsekas, Nonlinear programming (Athena Scientific, 1999).
51. Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural networks: Tricks of the trade, (Springer, 2012), pp. 9–48.
52. S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747 (2016).
53. E. J. R. Pauwels, A. Beck, Y. C. Eldar, and S. Sabach, “On fienup methods for sparse phase retrieval,” IEEE Trans. Signal Process. 66, 982–991 (2018). [CrossRef]
54. R. Xu, M. Soltanolkotabi, J. P. Haldar, W. Unglaub, J. Zusman, A. F. Levi, and R. M. Leahy, “Accelerated wirtinger flow: A fast algorithm for ptychography,” arXiv preprint arXiv:1806.05546 (2018).
56. Y. S. Nashed, D. J. Vine, T. Peterka, J. Deng, R. Ross, and C. Jacobsen, “Parallel ptychographic reconstruction,” Opt. Express 22, 223015 (2014). [CrossRef]
57. J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res. 13, 281–305 (2012).
58. Y. Yao, L. Rosasco, and A. Caponnetto, “On early stopping in gradient descent learning,” Constr. Approx. 26, 289–315 (2007). [CrossRef]
59. J. W. Goodman, Introduction to Fourier Optics (W.H. Freeman, 2017).
60. M. Stockmar, I. Zanette, M. Dierolf, B. Enders, R. Clare, F. Pfeiffer, P. Cloetens, A. Bonnin, and P. Thibault, “X-ray near-field ptychography for optically thick specimens,” Phys. Rev. Appl. 3, 1–6 (2015). [CrossRef]
62. A. L. Robisch, K. Kröger, A. Rack, and T. Salditt, “Near-field ptychography using lateral and longitudinal shifts,” New J. Phys. 17, 073033 (2015). [CrossRef]
63. M. O. Hill, I. Calvo-Almazan, M. Allain, M. V. Holt, A. Ulvestad, J. Treu, G. Koblmuller, C. Huang, X. Huang, H. Yan, E. Nazaretski, Y. S. Chu, G. B. Stephenson, V. Chamard, L. J. Lauhon, and S. O. Hruszkewycz, “Measuring three-dimensional strain and structural defects in a single InGaAs nanowire using coherent x-ray multiangle Bragg projection ptychography,” Nano Lett. 18, 811–819 (2018). [CrossRef] [PubMed]
64. S. O. Hruszkewycz, M. Allain, M. V. Holt, C. E. Murray, J. R. Holt, P. H. Fuoss, and V. Chamard, “High-resolution three-dimensional structural microscopy by single-angle Bragg ptychography,” Nat. Mater. 16, 244–251 (2017). [CrossRef]
65. A. Fannjiang, “Absolute uniqueness of phase retrieval with random illumination,” Inverse Probl. 28, 075008 (2012). [CrossRef]
66. T. B. Edo, D. J. Batey, A. M. Maiden, C. Rau, U. Wagner, Z. D. Pešić, T. A. Waigh, and J. M. Rodenburg, “Sampling in x-ray ptychography,” Phys. Rev. A 87, 053850 (2013). [CrossRef]
67. V. Chamard, M. Allain, P. Godard, A. Talneau, G. Patriarche, and M. Burghammer, “Strain in a silicon-on-insulator nanostructure revealed by 3D x-ray Bragg ptychography,” Sci. Rep. 5, 9827 (2015). [CrossRef]