Packed domain Rayleigh-Sommerfeld wavefield propagation for large targets

Andreas Wuttig; Mario Kanka; Hans Jürgen Kreuzer; Rainer Riesenberg

doi:10.1364/OE.18.027036

1. Introduction

The scalar wave propagation from a two-dimensional source wavefield o(x,y) to a target wave-field u(x′,y′) is controlled by the Helmholtz equation. Its solution depends on the boundary conditions and different choices lead to several competing solutions such as the Fresnel-Kirchhoff or the two Rayleigh-Sommerfeld diffraction formulas [1]. Each of these solutions equals a convolution

u (x^{'}, y^{'}) = \iint d x d y h (x^{'} - x, y^{'} - y, z) o (x, y),

with the according kernel h and the distance z between parallel planes. There are many other fields of science where similar convolution integrals must be solved. To do this numerically, both wavefields are represented by complex valued two-dimensional arrays o = {o_m,n} and u = {u_p,q}, i.e.

o_{m, n} = o (x_{0} + Δ x \cdot m, y_{0} + Δ y \cdot n)

u_{p, q} = u (x_{0}^{'} + Δ x^{'} \cdot p, {y^{'}}_{0} + Δ y^{'} \cdot q),

where the source field o has to be Nyquist-Shannon sampled [2]. The Nyquist-Shannon sampling has to be guaranteed by sufficiently low values of the sampling pitches Δx and Δy. A more detailed discussion of this topic and some modifications to the straight-forward sampling of the continuous field distributions are given in an article by Shen and Wang [3]. For the purpose of this article we consider the discretisations (2) and (3) to be sufficient. Inserting them into Eq. (1) yields

u_{p, q} = \sum_{m} \sum_{n} h (p Δ x^{'} - m Δ x, q Δ y^{'} - n Δ y, z) \cdot o_{m, n} .

Restricting this to equal pixel pitches, i.e. Δx′ = Δx and Δy′ = Δy, one arrives at the discrete convolution

u_{p, q} = \sum_{m} \sum_{n} h_{p - m, q - n} \cdot o_{m, n} \equiv {h * o}_{p, q},

and quickly realizes, that for a further restriction to equal field sizes, i.e. sampling point numbers of u and o, it can be computed using efficient fast Fourier techniques. Equal field sizes and pixel pitches are, however, an unacceptable restriction in many applications. As a concrete example we refer to digital inline holographic microscopy (DIHM) [4, 5] with point sources. Here the source field is a hologram recorded on a CCD chip with given pixel size of a few microns, whereas the target field is the wavefront at the object for which we demand submicron resolution, i.e. a pixel size at least an order of magnitude smaller than that in the source field. A solution to the dilemma of different pixel pitches has been given some time ago for DIHM based on the Kirchhoff-Helmholtz (K-H) diffraction formula with a very efficient and accurate algorithm [6, 7]. This algorithm transforms the hologram to an image of the object of strongly decreased lateral extent, i.e. reconstructs the microscopic object structure. We will refer to such a reconstruction as a backward propagation.

While usual algorithms for inline holographic reconstruction only require such a single backward propagation, there are also measurement approaches, which are based on iterative forward and backward propagation between different planes. They do so, for instance, to retrieve phase information without having a reference light source (such as the point source in DIHM setups) or to apply some constraints present in different planes of a DIHM setup. Examples are the Gershberg-Saxton algorithm [8], the Yang-Gu algorithm [9] or the Conjugate Gradient phase retrieval algorithm by Grjasnow et. al. [10]. In these approaches, the K-H DIHM algorithm can also be used to perform the forward propagation, computing a hologram of strongly increased lateral extent from a given microscopic object image. For relatively short axial distances compared to the object extent, however, the K-H formula is no longer sufficient to exploit the full numerical aperture. Instead, it will be necessary to use the Rayleigh-Sommerfeld (R-S) diffraction formula, which is known to perform well under such conditions. The R-S formula does however not allow for a similar approach to change the field size, as it was used in the K-H DIHM algorithm. Based on a different approach, fast and memory saving algorithms for exact evaluation of the Rayleigh-Sommerfeld diffraction integral through the equivalent [11, 12] Angular Spectrum method [13] have already been given in two previous papers [14, 15]. These algorithms however only work in the backward propagation direction from an extended in-line holographic wave field back to a small source or object region. They cannot be used for forward propagation.

It is the scope of this paper, to give a complementing forward propagation algorithm which works with the same DIHM geometries. The basic idea of the algorithm is a subdivision of the propagation into partial propagation patches (see Fig. 1) with limited response spectra and a consequent restriction of all usually time-consuming computations to this low number of significant components. Limited response spectra here means, that there is a low number of significant spectral or “Fourier” components per patch. The restriction of the computations to these components will be referred to as “working in the packed domain”, since the propagator patches will be packed in memory and the accelerated steps of the algorithm will directly work on these packed data.

Fig. 1 The main aim of the algorithm is the computational scalar wave field propagation of a small, densely sampled source field to a large, coarsely sampled detector image using an unaltered Rayleigh-Sommerfeld kernel. The sampling of the detector image will here differ from the sampling of the source field by an integer factor T. The propagation is divided into several patches with overlapping target regions on the detector.

Download Full Size | PDF

2. Patched propagation

2.1. Fourier domain propagation and subdivision into patches

The basic formula for a discrete propagation between parallel planes has been given in equation (5). For our holography task we will specialize to the discretized version of the first Rayleigh-Sommerfeld integral [1], whose kernel, the free space impulse response, is given by

h_{p, q} = \frac{Δ x \cdot Δ y}{2 π} \cdot \frac{z}{r_{p, q}^{2}} \cdot (\frac{1}{r_{p, q}} - i k) \cdot exp (- {i k r}_{p, q} \cdot \frac{z}{| z |}),

where

r_{p, q} = {[{(Δ x \cdot p)}^{2} + {(Δ y \cdot q)}^{2} + z^{2}]}^{1 / 2},

is the distance between the source point and the target point of the according summation step (p and q here are relative pixel positions) in Eq. (5)z is the distance between source and target plane, and k = 2π/λ is the wave number for the wavelength λ. For the remainder of this paper we assume the both pixel pitches to be identical, i.e.

Δ x \equiv Δ y,

to simplify the notation. Also the field sizes and the later introduced patch sizes will be implicitely assumend to be identical for the x- and y-direction for the purpose of a simpler notation. An extension to different pitches and sizes for x- and y-direction is however possible in a straight forward manner. As indicated above, for equal pixel pitches and field sizes of source and target field, the convolution in equation (5) is typically computed using the convolution theorem [11, 13] and the fast implementation of the discrete Fourier transform (DFT) - the fast Fourier transform (FFT) - leading to

u = ℱ^{- 1} [ℱ (h) \cdot ℱ (o) \equiv ℱ^{- 1} [H \cdot O] .

Here and in the remainder of this paper the Fourier transformed versions of the arrays such as o, u, h and so on are generally denoted by their according uppercase symbols. The two-dimensional DFT over N × N pixels shall be defined as

{[ℱ (u)]}_{s, t} \equiv U_{s, t} = \sum_{p = 0}^{N - 1} \sum_{q = 0}^{N - 1} u_{p, q} \cdot exp (- 2 π i \cdot \frac{s p + t q}{N}) .

With this approach the computational complexity can be kept at a convenient figure of 𝒪(N²logN). The method can however not be directly employed for the propagation of a small source field to a large and more coarsely sampled target field because of the requirement for equal pixel pitches and field sizes. This requirement is not yet included in the underlying field propagation equation [Eq. (4)], but a direct evaluation of Eq. (4) for a source array with N × N and a target array with M × M pixels would lead to an inconvenient time complexity of 𝒪(N²M²). For practical applications such a time complexity may be acceptable if the computation was done on a modern graphics processing unit (GPU) with several hundred processing cores. It would however be hardly acceptable on current main processors and also not be capable of producing video rate propagation on a current GPU. To circumvent this problem, Nascov et. al. [16] propose to extend the source field to the size of the target field by zero padding and to adapt the sampling pitch of the target field to that of the source field. This approach works well at moderate field size ratios. At larger ratios it is however limited due to the large pixel numbers, which can result for the two modified fields.

In contrast, the first step of the algorithm presented here is to go back to the formulation in Eq. (5) and to subdivide the contained convolution into a set of partial convolutions which can be computed independently using an FFT based approach. The resulting partial convolution results can then be resampled to a coarser raster and be accumulated (i.e. added up) inside a coarsely sampled target array. As shown later, some properties of the Fourier transform and the use of a packed representation of the Fourier transformed partial convolution kernels will allow to do this very efficiently. Efficiency here comprises a minimal memory and time overhead and the avoidance of an explicit computation of the densely sampled partial convolution results. The equality of the sampling pitches of the source and target arrays, required by Eq. (5) will thus only be virtual.

The subdivision of the convolution into partial convolutions works by decomposing the convolution kernel h into smaller pieces h^μν. Here they shall have a size of N × N pixels and be arranged in a regular grid with a pitch corresponding to S (S ≤ N) pixel pitches. The latter condition guarantees, that every point of h is covered by at least one of the patches. The decomposition of h can then be written as

h_{p, q} = \sum_{μ} \sum_{ν} h_{p - S \cdot μ, q - S \cdot ν}^{μ ν}, (0 \leq {p - S \cdot μ, q - S \cdot ν} < N),

so that equation (5) changes to

u_{p, q} = \sum_{μ} \sum_{ν} (\sum_{m = 0}^{N - 1} \sum_{n = 0}^{N - 1} h_{p - S \cdot μ - m, q - S \cdot ν - n}^{μ ν} \cdot o_{m, n}) = \sum_{μ} \sum_{ν} {h^{μ ν} * o}_{p - S \cdot μ, q - S \cdot ν} .

The geometrical arrangement of patches according to this class of decompositions is shown in Fig. 2. So far, no specific decomposition has been chosen yet. Without loss of generality the source array o shall also have a size of N × N pixels. Larger source arrays can be handled by tiling and smaller ones by zero padding. Thus Eq. (12) becomes a sum of several conveniently computable convolutions between the source array o and the partial convolution kernels h^μν, each having N × N elements and pixel pitch Δx. The individual convolutions can now be carried out independently using the time-efficient FFT approach. Before these FFTs, the field sizes are increased to 2N × 2N by zero padding, to get rid of the cyclic boundary condition introduced by the FFT and enable a non-ambiguous representation of the convolution result, which has a natural size of 2N × 2N pixels. The convolution results will then be accumulated at the proper positions in the target array. The latter now can be of any wanted size and have a coarser sampling than the source array, since it is no longer directly involved in the FFT.

Fig. 2 Geometrical arrangement of propagator kernel patches around an arbitrarily chosen kernel patch h^μν; Different patches are arranged with a pitch of S ≤ N pixels, leading to an overlap between adjacent patches of w = N – S pixels. The sum of all patches gives the total propagator kernel h. The blending function shown here is linked to the particular decomposition of h and will be described in the next section. The bitmap overlay for h^μν is figurative.

Download Full Size | PDF

In the following, we will restrict ourselves to a uniform target array raster with a sampling pitch of T Δx,T ∈ ℕ, whose sampling points are a subset of the sampling points of the convolution results. This will enable the below described Fourier domain resampling to avoid the computation of convolution samples between sampling points of the target array. T does not need to be a divisor of N, but making it a divisor will help to simplify the software implementation.

2.2. Patch blending and packing

From a geometrical point of view, each individual h^μν patch of the propagator describes the propagation of the source field distribution into a small solid angle range. If a proper decomposition of the propagator is chosen, this should correspond to a small number of relevant Fourier components in each propagator patch. For propagator packing described later in this section, it is intended to reduce this number to very low fractions of the total number of Fourier components. In this sense, a simple subdivision of h into disjoint squares (i.e. S = N) is not a proper decomposition, since the hard cut of the kernel function would introduce a large number of non-negligible excess Fourier components [see Fig. 3(a)]. These additional excess components would be canceled out by the excess components of the surrounding patches after re-combining the individual patch propagation results in the target array. However, since each patch is propagated independently of each other patch, their presence would lead to an unnecessarily high computational load and a significantly increased memory consumption for the kernel patches in the context of the below described optimizations. To avoid this, adjacent patches have to be overlapped (i.e. S < N) and by using a smooth blending function, the number of relevant Fourier components can be reduced by several orders of magnitude.

Fig. 3 Effect of soft blending of a propagator kernel patch; The left column shows the spatial amplitude distribution of an arbitrarily chosen (off-axis) propagator kernel patch without (a) and with (b) application of a soft blending function. Uncolored (grey and white) parts mean a phase angle of 0 or π, while red and blue color overlays stand for phase angles of π/2 and (3/2)π, respectively. The same color encoding is used for the Fourier domain representation of the patches shown on the right side. The figure visualizes the strong reduction of relevant Fourier components through the application of a soft-blending.

Download Full Size | PDF

Figure 3(b) shows the propagator patch spectrum for the same conditions as used in Fig. 3(a), but with a w = N/4 wide smoothing region, using the blending function

b (d) = \sin^{2} (\frac{π}{2} \frac{d}{w}),

where d is the distance from the patch border. This function ensures, that Eq. (12) is still satisfied, i.e. the sum of all weight factors b(d) contributing to a single pixel of the kernel h is always guaranteed to be unit. Because of this guaranteed unity of all contributions, the blending function will not produce any distortions of the kernel. Differences between different possible types of blending functions only consist in how good they can suppress the edge imposed excess frequency components. Among a range of possible candidate blending functions, we have chosen the particular sinusoidal blending function (13), since it performs well in practice. It is however beyond the scope of this article, to investigate and compare the performance of different blending functions. The sinusoidal blending function was used for all experiments reported in this article, mostly with w = N/8 (i.e. S = 7/8 × N), and led to fractions of relevant Fourier components between 0.015% and 1%. Here, for each considered patch, a relevance threshold of 10⁻⁸ times the maximum squared Fourier component of the patch was used.

With such a sufficiently low threshold, all non-relevant Fourier components can be truncated to zero without a perceivable impact on the propagated field. The low number of remaining non-zero Fourier components can now be exploited to reduce the memory required for storing the patch. To accomplish this, the non-zero components are pre-computed and then packed by storing them inside a sparse-array structure. In this way not only the memory consumption of the stored propagator is minimized. Also the multiplication between the components of the Fourier transformed source array and the Fourier components of the propagator patch is accelerated by two to four orders of magnitude, since all truncated components can be quickly skipped over during the multiplication.

Alternatively, one could think about multiplexing the different patches into one or a few single structures in a way similar to that described by Paturzo et. al. [17]. Currently, this approach however still poses some unresolved questions and will not be considered here.

2.3. Fourier domain resampling

In a straight forward approach, the few resulting Fourier component products would have to be inserted at their proper positions inside a 2N × 2N pixel array, which subsequently would have to be transformed back into the spatial domain, then downsampled to the coarser 2N′ × 2N′ raster (where N′ = N/T) and finally added to the target array. In contrast to the single 2N × 2N Fourier transform necessary for the source array, this would have to be repeated for each propagator patch and would thus lead to a significant limitation of the overall performance.

Consider that the intensity of the field is still Nyquist sampled at the coarsened sampling pitch of T Δx and the aim of the computation is to compute the intensity distribution. Then an adequate procedure for downsampling the array will be to simply take every T-th pixel out of the 2N × 2N array and add it directly to the according pixel in the target array. This approach would also result in the correct phase values for the picked points, although the phase as such would commonly not be Nyquist sampled any more. At this point, a significant speed-up is possible, since such a picking of values from a sub-raster can be achieved much more efficiently through a manipulation directly in the Fourier domain, requiring only a subsequent 2N′ × 2N′ Fourier backtransform instead of a 2N × 2N one. For the sake of clarity this will be shown here for the one-dimensional case instead of the two-dimensional one. In one dimension the inverse Fourier transform of the Fourier-domain propagated field U can be written as

u_{p} = \frac{1}{2 N} \sum_{t = 0}^{2 N - 1} U_{t} \cdot \exp (2 π i \cdot \frac{t p}{2 N}) .

Now we only want to compute every T-th pixel of u, eventually shifted by an offset of s pixels with respect to the origin. We thus have to compute the values of u for raster positions p = Tj + s, where j becomes the pixel index in the downsampled array û. This can be done as follows:

\begin{array}{l} {\hat{u}}_{j} & \equiv & u_{T j + s} \\ = & \frac{1}{2 N} \sum_{t = 0}^{2 N - 1} U_{t} \cdot \exp (2 π i \cdot \frac{t \cdot (T j + s)}{2 N}) \\ = & \frac{1}{2 N^{'}} \sum_{l = 0}^{2 N^{'} - 1} {\hat{U}}_{l} \cdot e x p (2 π i \cdot \frac{j l}{2 N^{'}}), \end{array}

where

{\hat{U}}_{l} \equiv \frac{1}{T} \sum_{m = 0}^{T - 1} U_{l + 2 N^{'} m} \cdot \exp (2 π i \cdot \frac{s \cdot (l + 2 N^{'} m)}{2 N}) .

The term Û_l, which was drawn out of the initial sum, only depends on the newly introduced index l and is a weighted sum of T Fourier components of the propagated array U (or less, if components are skipped in a packed domain computation). Each component of U only contributes to (at most) one single Û_l. The remaining expression in Eq. (15) again is an inverse Fourier transform, this time over 2N′ elements. Û can thus directly be transformed into û using an inverse Fourier transform:

{\hat{u}}_{j} = {[ℱ^{- 1} \hat{U}]}_{j}

The whole thing works identically in two dimension. So, as desired, the resampling of the propagated array U^μν = H^μν· O can be done in the Fourier domain and the inverse Fourier transform then only has a field size of 2N′ × 2N′ pixels, leading to a relative speed-up of the inverse Fourier transform part by more than a factor T². According to Eqs. (15) and (17), the non-zero Fourier component products resulting from the multiplication between the packed propagator patches and the Fourier transformed source array O can directly be accumulated at the proper positions inside a 2N′ × 2N′ target patch Û^μν, using the two-dimensional counterpart to the weighting function in Eq. (16). As in the process of skipping over truncated Fourier components of the propagator during the multiplication, no computations involving truncated components need to be performed during this accumulation. The accumulation process of the target patch thus is accelerated in the same manner as the multiplication process. After the accumulation of the single target patch Û^μν is finalized, the patch is inversely Fourier transformed [Eq. (17)] and the result û^μν is accumulated, i.e. co-added at the according sub-frame of the target array. The position of the sub-frame is determined by Eq. (12). After this, the memory used to store Û^μν can be reused for the next patch. The overall algorithm is sketched in Fig. 4.

Fig. 4 Schematic flow chart of the patch propagation algorithm; The point-wise multiplication and Fourier domain resampling are carried out in the packed domain, causing the main acceleration of the algorithm.

Download Full Size | PDF

3. Performance of the algorithm

The speed and memory consumption of the algorithm have been investigated through a number of particular propagation computations, using different patching, overlapping and downsampling factors, as well as different propagation distances. Table 1 shows some of the determined performance parameters for a base sampling pitch of Δx = 125 nm and a detector pixel pitch of T Δx = 4 μm. The total memory figures given include all memory involved in the computation, for instance the bookkeeping overhead for the packed propagators, the memory for the input and output fields, additional internal memory and so on. Accordingly, the given computation times not only include the core propagation, but also the overhead for copying input field tiles into a working field and copying back the result of the propagation. If the propagation to the 1 mm and 20 mm wide detector fields would have been performed using a straight forward angular spectrum computation without zero padding, the memory requirements would have been 0.5 GiB for the 1 mm detector and 190.7 GiB for the 20 mm detector. For the more accurate convolutional method with zero padding the memory requirements would be eight times these values (four times for zero-padded field and two times, since the propagator is now stored explicitely in memory). For the latter case the pure computation times can be roughly estimated to be at least 9 seconds and 20300 seconds, respectively. These values would be independent of the propagation distance. As it could be expected, the numerical efforts in computation time as well as in data memory depend strongly on the fill factor of the propagator, i.e the fraction of non-negligible coefficients of the propagator kernel. This fraction is essentially decreased if the propagation distance becomes large compared to the lateral extent of the initial field, but also for the low distance propagations it remains clearly below 1%. In the limit of very large distances the computation time is bound by the times needed for the Fourier transforms of the source array and the Fourier domain resampled target array patches. In the computations, no parallelization of the patch propagation was done, although the algorithm would easily allow for it. Only the individual, sequentially performed FFTs may have been taking advantage from internal parallelization of the used FFTW library [18].

Table 1. Some corner marks for propagation times, average propagator fill factors, and total memory requirements for different detector sizes, propagation distances and patch sizes (N). All other parameters were held constant: wavelength λ =500 nm; sampling raster Δx=125nm; detector pixel pitch T Δx=4 μm; overlap between the patches: N/8; truncation threshold for propagator Fourier components: 10⁻⁸. Computer: Intel Motherboard DP35DP with Processor Core Duo E6750 at 2.66 GHz, 2GiB RAM. FFT using the FFTW library [18] and float (32 bit) arithmetics; Propagator patches were pre-computed using double precision (64 bit) arithmetics; Compiler: MSVC++ 2005; The given memory requirements include also the memory for the input and output fields. Propagator patches were reused up to eight times, when linked to other patches through a mirroring operation. No parallelization of the patch propagation was done

View Table

For the experimental evaluation of the accuracy of the propagation algorithm, two methods have been used. In a first evaluation, a phase-resolved holographic image of a cluster of 1.07μm polymer beads, which was acquired using a slightly modified version of the Fourier domain tile superposition propagation algorithm (FTSP, [15]), was propagated from the object plane to the detector plane using the algorithm described above with an intensity-related truncation threshold of 10⁻⁸. The fill factor of the packed propagator was 0.3%, consuming 54.7 MiB of memory, the propagation took 6.7 seconds. The distance between object and detector plane was about 3.8 mm, the wavelength was 661 nm, and the detector pixel pitch was 3.5 μm. The object plane was accordingly sampled at a pitch of 218.75 nm. According to the resolved 1.07 μm beads, the realized numerical aperture was at least 0.62. Both, object plane image and detector plane image had a total of 2048 × 2048 pixels. After propagation from the object to the detector plane, the generated detector image was propagated back again to the object plane using the modified FTSP algorithm. The complex difference between the original phase resolved image and the result after one round-trip was then calculated and converted into an intensity image, showing a peak difference of about 2.0 × 10⁻⁴ compared to the maximum intensity of the original image. Detail images of the original, round-tripped and difference images are shown in Fig. 5a–c. The difference image shows no object-related structures, but a rather statistical behaviour.

Fig. 5 The validity of the patched propagation was investigated by propagating an object wave of a real inline holography sample containing clusters of 1.06 μm polymer beads (a) to a detector using the described algorithm and then back to the object plane (b), using the Fourier domain tile superposition propagation algorithm [15]. The images show a 40 × 76 μm² detail of the object plane. The distance between object and detector plane was about 3.8 mm. A truncation threshold of 10⁻⁸ was used for the forward propagation. No perceivable differences can be seen, indicating that the full information of the sample is retained through the forward-backward iteration cycle. This is verified by the 5000 times intensified difference intensity image (c), showing only minor, non-systematic differences. The wavelength of the measurement was 661 nm, indicating a realized numerical aperture of at least 0.62. Image (d) shows the 2 billion times intensified difference between the round-tripped image (b) and a round-tripped image generated without truncation in the forward propagation.

Download Full Size | PDF

To determine, whether the difference image is caused by the truncation of propagator kernel components or by some other effect, a second round-trip-image has been computed, this time using a truncation threshold of zero, i.e. taking all propagator kernel components into account. Because of the large number of coefficients, the propagator patches had to be computed on-the-fly and after the computation a round-trip image resulted, which was then subtracted from the round-trip image using the 10⁻⁸ threshold [Fig. 5(d)]. The peak intensity computed from the complex difference between the two round-trip images was about 5.5 × 10⁻¹⁰ of the maximum intensity of the original image, clearly indicating, that kernel component truncation was not the limiting factor in the first evaluation. The on-the-fly propagation without truncation of components took 1980 seconds of computation time and would have to be repeated for each propagation, while the pre-computation of the re-usable packed propagator required only 188 seconds and after this, every propagation only needed the above-stated 6.7 seconds. For comparison, the time requirement for a straight forward convolution method propagation of an according 32768 × 32768 field plus zero padding similar to Nascov’s method can be estimated to be roughly 750 seconds, provided 16 GiB of memory would be available for the field and the pre-computed propagator alone. Pre-computation of the 8 GiB propagator would additionally require about 380 seconds.

4. Conclusions

We present a fast convolution type algorithm for the numerical evaluation of the Rayleigh-Sommerfeld diffraction integral between differently sized and sampled source and target wave fields. In particular, the algorithm can be applied for the numerical propagation of a densely sampled source wave field with a smaller lateral extent to a more coarsely sampled target field with a larger lateral extent. An example is the propagation of the wave field from a microscopic sample to an extended image detector in an in-line holographic setup. The algorithm complements the previously given algorithms [14, 15] for quick computation of propagations in the opposite direction. The lateral extends of source and target field are independent of each other and the sampling pitch of the target field can be any integer multiple of the source field’s sampling pitch. Through the use of a patching approach, the algorithm takes full advantage of the fast Fourier transform (FFT).

As an initial step of the algorithm, the kernel of the convolution type propagation operator is subdivided into smaller, overlapping propagator kernel patches. These can be thought of as partial beams going into different directions with a smaller solid angle. The size of this solid angle is directly connected to the number of basically necessary Fourier components of the kernel patch. To minimize the introduction of excess Fourier components due to cutting of the kernel at patch borders, we use a soft blending between adjacent kernel patches. All Fourier components below a certain threshold level can then be neglected. This enables a computational regime, which efficiently skips over all neglected components during propagation and also does not hold these components in memory. In contrast to physically justified approximations, this one is of numerical nature and can be fine-tuned through the threshold parameter. With the used blending function and a relative threshold of 10⁻⁸ with respect to the peak squared absolute component, the number of non-neglected kernel Fourier components was two to four orders of magnitude below the total number of components. The imposed numerical errors were in the range of round-off errors for single precision (32 bit) computation and well below usual detector noise levels. Consequently, no perceivable limitations on the usable numerical aperture were implied. The numerical aperture was determined to be at least 0.62 in the reported experiments. Generally it should be able to achieve any value which is permitted by the underlying Rayleigh-Sommerfeld convolution kernel of the propagation.

Besides the above described holography applications, the presented algorithm has already been applied to the design of diffractive optical elements. It is however not limited to the propagation of light waves in the Rayleigh-Sommerfeld regime and can also be extended to one-dimensional or higher dimensional convolutions. It can potentially be used everywhere, where a large convolution kernel has to be convolved with an arbitrary source function. Besides all kinds of wave propagation algorithms this may include completely different things like general computer graphics, sound processing, or computational chemistry.

Acknowledgments

The authors wish to thank C. Graulig for supporting the experiments and testing the algorithm’s suitability for the design of diffractive optical elements. The work was partly supported by the Thueringer Aufbaubank (Reference 2008FE9128).

References and links

1. M. Gu, Advanced Optical Imaging Theory, vol. 75 of Optical Sciences (Springer, Berlin, Germany, 1999).

2. P. P. Vaidyanathan, “Generalizations of the Sampling Theorem: Seven Decades After Nyquist,” IEEE Trans. Circ. Syst. I Fundam. Theory Appl. 48, 1094–1109 (2001). [CrossRef]

3. F. Shen and A. Wang, “Fast-Fourier-transform based numerical integration method for the Rayleigh-Sommerfeld diffraction formula,” Appl. Opt. 45(6), 1102–1110 (2006). [CrossRef] [PubMed]

4. D. Gabor, “A new microscopic principle,” Nature 161(4098), 777 (1948). [CrossRef] [PubMed]

5. U. Schnars, “Direct phase determination in hologram interferometry with use of digitally recorded holograms,” J. Opt. Soc. Am. A 11(7), 2011–2015 (1994). [CrossRef]

6. H. J. Kreuzer, “Low energy electron point source microscopy,” Micron 26(6), 503–509 (1995). [CrossRef]

7. H. J. Kreuzer, “Holographic microscope and method of hologram reconstruction,” US 6,411,406 B1, June 25, 2002 (Canadian Patent CA2376395).

8. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik (Stuttg.) 35, 227 – 246 (1972).

9. G.-Z. Yang, B.-Z. Dong, B.-Y. Gu, J.-Y. Zhuang, and O. K. Ersoy, “Gerchberg-Saxton and Yang-Gu algorithms for phase retrieval in a nonunitary transform system: a comparison,” Appl. Opt. 33(2), 209–218 (1994). [CrossRef] [PubMed]

10. A. Grjasnow, A. Wuttig, and R. Riesenberg, “Phase resolving microscopy by multi-plane diffraction detection,” J. Microsc. 231(1), 115–123 (2008). [CrossRef] [PubMed]

11. G. C. Sherman, “Application of the Convolution Theorem to Rayleigh’s Integral Formulas,” J. Opt. Soc. Am. 57(4), 546–547 (1967). Proof: Rayleigh-Sommerfeld-integral (CV) equals angular spectrum (AS). [CrossRef] [PubMed]

12. J. Li, Z. Peng, and Y. Fu, “Diffraction transfer function and its calculation of classic diffraction formula,” Opt. Commun. 280(2), 243–248 (2007). [CrossRef]

13. J. W. Goodman, Introduction to Fourier Optics, Electrical and Computer Engineering, 2nd ed. (McGraw-Hill Companies, Inc., USA, 1996).

14. M. Kanka, R. Riesenberg, and H. J. Kreuzer, “Reconstruction of high-resolution holographic microscopic images,” Opt. Lett. 34(8), 1162–1164 (2009). [CrossRef] [PubMed]

15. M. Kanka, A. Wuttig, C. Graulig, and R. Riesenberg, “Fast exact scalar propagation for an in-line holographic microscopy on the diffraction limit,” Opt. Lett. 35(2), 217–219 (2010). [CrossRef] [PubMed]

16. V. Nascov and P. C. Logofătu, “Fast computation algorithm for the Rayleigh-Sommerfeld diffraction formula using a type of scaled convolution,” Appl. Opt. 48(22), 4310–4319 (2009). [CrossRef] [PubMed]

17. M. Paturzo, P. Memmolo, L. Miccio, A. Finizio, P. Ferraro, A. Tulino, and B. Javidi, “Numerical multiplexing and demultiplexing of digital holographic information for remote reconstruction in amplitude and phase,” Opt. Lett. 33(22), 2629–2631 (2008). [CrossRef] [PubMed]

18. M. Frigo and S. G. Johnson, “The Design and Implementation of FFTW3,” in Proc. IEEE , vol. 93, pp. 216–231 (2005). [CrossRef]

Detector edge (mm)	Patch size N	Propagation distance (mm)	Computation time (s)	Propagator fill factor	Total memory (MiB)

1	1024	1	0.5	0.57%	22.6
1	1024	10	0.27	0.09%	17.5
1	1024	50	0.26	0.07%	17.3
1	2048	1	1.47	0.76%	49.4
1	2048	50	1.03	0.02%	40.8
5	1024	1	2.43	0.23%	63.2
5	1024	10	1.25	0.09%	42.6
5	2048	1	3.45	0.23%	87.6
5	2048	10	1.95	0.07%	62.6
20	1024	10	14.86	0.08%	390
20	1024	50	13.19	0.07%	362
20	1024	200	12.35	0.06%	357
20	2048	10	12.2	0.05%	339
20	2048	200	7.94	0.02%	271

Packed domain Rayleigh-Sommerfeld wavefield propagation for large targets

Abstract

1. Introduction

2. Patched propagation

2.1. Fourier domain propagation and subdivision into patches

2.2. Patch blending and packing

2.3. Fourier domain resampling

3. Performance of the algorithm

4. Conclusions

Acknowledgments

References and links

Cited By

Figures (5)

Tables (1)

Equations (17)

Optics Express