Memory-efficient reference calculation of light propagation using the convolution method

Petr Lobaz

doi:10.1364/OE.21.002795

1. Introduction

Calculation of coherent light propagation in a free space is a fundamental tool in Fourier optics [1, 2], digital holography [3], computer generated holography (e.g. [4–8]) and other areas of optics. A very important and common task is the calculation of light propagation between two parallel planes; however, the general case of light propagation between tilted planes is also important in applications [9–11].

The problem is often given in this way: there is an area Σ in a plane z = 0 containing an image called the source that is lit by a coherent light, mostly by a plane wave. The task is to calculate the light field in a plane z = z₀ in an area called the target.

In this task, we usually assume validity of the scalar approximation of the light [1]. A good approximation of the correct solution is then given by, e.g., a Rayleigh-Sommerfeld integral of the first kind. In this article, let us assume this approximation as the reference one.

The Rayleigh-Sommerfeld solution cannot usually be used in an analytic calculation due to its complexity. Sometimes it is possible to get some results by using its mathematically equivalent form, the angular spectrum decomposition [12]. It is, however, most usual to restrict the calculation to the paraxial approximation in either the near (Fresnel) or far (Fraunhofer) region.

It is possible to evaluate the “reference” Rayleigh-Sommerfeld integral numerically using computers; thanks to its form of convolution, the calculation leads to the use of three fast Fourier transforms (FFT). The reason for the use of the aforementioned forms or approximations lies in the number of FFT’s: the angular spectrum decomposition leads to two FFT’s; the Fresnel or Fraunhofer approximation lead to just one FFT or fast fractional Fourier transform [13].

The implementations of these faster algorithms are unfortunately not straightforward, as the discretization of their equations leads to various problems. Correct implementation of the angular spectrum decomposition is especially tricky [14]; even Fresnel and Fraunhofer approximations have to be discretized carefully [2, 15, 16].

It is therefore wise to verify fast algorithms by comparing them with a reference method based on the carefully discretized Rayleigh-Sommerfeld integral [17, 18]. The discretization process must consider both correct sampling of the source and sampling of the illumination light field and the Rayleigh-Sommerfeld convolution kernel. It is also necessary to consider the inverse operation to the sampling, i.e. the reconstruction. All of these considerations often lead to sampling distances smaller than the wavelength of light, and therefore to huge memory demands. This article studies the discretization process and suggests a method to avoid huge memory demands and consequent time demands of large arrays FFT’s.

The structure of the article is as follows. Section 2 shows a naive method of discretization; the example given will show an illuminated amplitude diffraction grating with a sine transmittance profile. We will show that the naive discretization leads to “wrong” results; we will explain the “wrong” result in a physical way and show that a fine sampling leads to the correct result (and to huge memory demands). In Section 3, we will show how to lower memory demands by a simple 1D example. In Section 4 we will remove certain simplifications introduced in Section 3, and in Section 5 we will discuss the general 2D algorithm. Finally, in Section 6 we will present the time and memory demands of the algorithm and in Section 7 we will give conclusions.

In the rest of the article we will assume SI units. Absolute value of a complex amplitude is electric field amplitude, unit volt/m. For conversion of a complex amplitude to an intensity value, see e.g. [1, 2]. Please also note that 1D examples should not be interpreted physically as propagation integrals were derived for 3D space; they just demonstrate main ideas of the final algorithm.

2. Effect of naive discretization

Both theoretical analysis and experiments show that an amplitude diffraction grating with a sine transmittance profile illuminated by a plane wave (let us call it the source) creates just three diffraction maxima in the far field – the directly transmitted wave and the plus-minus first diffraction order [1]. Sampling of the source is easy in this case; it is necessary to use a sampling frequency at least 2× higher than the frequency of the pattern. Let us choose the perpendicular illumination and set its complex amplitude at z = 0 to be 1. Let us choose a sampling whose samples coincide with maxima and minima of the transmittance of the grating, i.e. the samples will be progressively ..., 0, 1, 0, 1, 0, 1,... (this is exactly at the Nyquist limit, see also the end of the section), and let us calculate the diffraction pattern using the Rayleigh-Sommerfeld integral. It is given as

U (x, y, z_{0}) = \frac{- 1}{2 π} \int_{Σ} U (ξ, η, 0) \frac{\partial}{\partial z} \frac{exp (j k r)}{r} d ξ d η

where U(x, y, z₀) is the calculated complex amplitude at a point [x, y, z₀] of the target, U(ξ, η, 0) is the complex amplitude at a point [ξ, η, 0] of the source (i.e. the product of the complex amplitude of incoming light and the transmittance of the grating), Σ is the extent of the source, j² = −1, k = 2π/λ is the wave number and

r = {({(x - ξ)}^{2} + {(y - η)}^{2} + z_{0}^{2})}^{- 1 / 2}

is the distance between points [x, y, z₀] and [ξ, η, 0].

Let us discretize the calculation by replacing the double integral with a double sum. The result shown in Fig. 1(a) differs a lot from the theoretical result. Where is the problem? (Note that Fig. 1 displays diffraction at sine grating of finite rectangular area in a finite distance, therefore the diffraction maxima have rectangular shape.)

Fig. 1 Light diffraction at sine grating 5 × 5 mm² of period 50 cycles/mm illuminated perpendicularly by a plane wave with λ = 650 nm at a distance z₀ = 0.5 m. a) Discretization using Δ = 10 μm. b) Discretization using Δ = 10/12 μm = 0.83 μm.

Download Full Size | PDF

When talking about discretization, let us discuss the direction of varying transmittance only (i.e. perpendicular to the grating stripes). The other direction is not important in this case.

The replacement of the integral by the sum means, in fact, that the original continuous function U(ξ, η, 0) is replaced by an array of Dirac pulses; in other words, by ideal point light sources arranged in a periodic lattice with spacing Δ. Light originating from a lattice of sources of equal complex amplitude interferes and creates m-th diffraction maximum in angle θ_m = arcsin mλ/Δ (see e.g. [19]), where m is an integer. In our case, every second sample is zero; that is, we are working in fact with a lattice with spacing 2Δ. This lattice creates diffraction maxima of equal intensities for all m, and the first diffraction maximum of this lattice coincides with the first diffraction maximum of the original sine grating, because its period is, thanks to sampling distance, equal to 2Δ. The result presented in Fig. 1a is therefore physically correct – although not for a sine grating, but for another experiment. The problem is that the discretization process did not take into account how to reconstruct the original continuous function U(ξ, η, 0) from the samples of the source, i.e. if there is zero transmittance between samples, if the samples represent a sine profile, a rectangular profile, etc. If we took this into account, the diffraction maxima created by the lattice would be attenuated somehow and the result would correspond with the original continuous situation.

A simple solution is easy. In the continuous situation, two close enough point light sources of the same complex amplitude do not create any interference pattern, i.e. at least one destructive interference. In the discretized situation, two point light sources of the same complex amplitude in adjacent samples can interfere destructively if the sampling distance is too big. Therefore, the sampling of the source has to represent the source correctly, and, moreover, the effect of the discretization has to be hidden – it must be such that the first-order destructive interference of adjacent point light sources created by the discretization (assume they have the same complex amplitude for now) has to lie out of the target area; note that in numerical calculations we are dealing with finite areas only. Mathematically, for any point T of the target and any adjacent samples S₁, S₂ of the source, the inequality |T − S₁| − |T − S₂| < λ/2 must hold as the density of the samples has to resemble continuous nature of the source. The same idea could be expressed in terms of correct sampling of the propagation integral kernel (e.g. [2, 14, 18]), but this explanation based on point light sources is perhaps more intuitive. It is interesting to note that this simple explanation based on point light source model has not been explicitly published yet (as far as I know). Also note that advanced solutions exist that do not need finer sampling, e.g. [13, 15]; the purpose of this paragraph is to provide simple insight and a starting point for the following sections.

The result of the simple solution is presented in Fig. 1b. It was necessary to work with 12× finer sampling, i.e. with an array 12² = 144× bigger. This results in noticeably higher memory and time demands. In the following text we will present a way to discretize correctly without increasing memory demands.

At the end of the section it is worth noting that the sampling of the sine grating in the example above was not done “correctly”, as the Nyquist limit (in its simplest form) requires a sampling frequency higher than double the maximum frequency contained in a signal. We have used exactly double the maximum frequency, and moreover, we did not consider that the source is spatially limited. However, had we used the mathematically precise method, the result would be the same and the discussion would not be as clear.

3. Simplified solution

In search of a memory-efficient algorithm, let us start with the solution presented in the last section, i.e. the sampling finer than requested by the Nyquist limit. For the sake of clarity, let us introduce two simplifications: let the source and the target be one-dimensional objects in the xz plane (this will be relaxed in Section 5) and let us suppose they are unbounded (this will be relaxed in Section 4).

Let the object source represented by samples source[i], i ∈ Z (let the sample source[0] be located at the point [0, 0]) be perpendicularly illuminated by a plane wave of wavelength λ (see Fig. 2). We will consider transmittance of the source to be complex, i.e. any source illuminated by any light can be converted to this scenario.

Fig. 2 Geometry of the simplified 1D case. Samples with index 0 are depicted as full circles, the others as empty circles.

Download Full Size | PDF

Let us calculate complex amplitudes of propagated light in the “plane” z = z₀ > 0 in the unbounded area target represented by samples target[j], j ∈ Z (let the sample target[0] be located at the point [x₀, z₀]). The samples source[i] are samples of the complex function U(x, 0); the samples target[j] represent the function U(x, z₀) (see Eq. (1)). If the sampling distance is equal in both the source and the target, let us call it Δ, the following holds:

\begin{array}{l} target [j] = U (j Δ + x_{0}, z_{0}) & = & Δ \sum_{i = - \infty}^{\infty} U (i Δ, 0) h ((j - i) Δ + x_{0}, 0, z_{0}) = \\ = & Δ \sum_{i = - \infty}^{\infty} source [i] h_{x_{0}, z_{0}, 1} [j - i] = \\ = & Δ (source [] \otimes h_{x_{0}, z_{0}, 1} []) [j] \end{array}

where ⊗ is the discrete convolution and the array h_x₀,z₀,1[] represents the Rayleigh-Sommerfeld convolution kernel (impulse response) defined as

h_{x_{0}, z_{0}, ups} [j] = h (j Δ / ups + x_{0}, 0, z_{0})

and according to (1)

h (x, y, z) = - \frac{1}{2 π} \frac{\partial}{\partial z} \frac{exp (j k r)}{r} = \frac{- z}{2 π} (j k - \frac{1}{r}) \frac{exp (j k r)}{r^{2}}, r = \sqrt{x^{2} + y^{2} + z^{2}}

Equation (2) is a discretization of the integral (1), i.e. we have changed the integral to the sum and the differential to the difference Δ. As we will be interested just in the structure of the target, we will omit the constant term Δ.

Let us suppose that the source is sampled correctly, such that the structure of U(x, 0) is fully acquired, but not fine enough for the propagation calculation. Please note that this “correct sampling” is not the same as sampling that obeys the sampling theorem; for example, piecewise constant signal cannot be sampled correctly according to the basic formulation of the sampling theorem as it has infinite frequency extent. However, if we know that the signal is piecewise constant with “steps” at known locations, then we can express the signal precisely using just one sample per constant part of the signal.

For the propagation calculation, we have to use a ups-times finer sampling, ups ∈ Z, ups ≥ 1. Let us call the preliminary upsampled array source_ups[]. To obtain this array, let us put (ups − 1) zero samples between every two original samples of the source (see Fig. 3 top):

{source}_{ups} [j] = {\begin{array}{l} source [j / ups] & if (j / ups) \in Z \\ 0 & otherwise \end{array}

Fig. 3 Examples of interpolation convolution kernels (filters) for ups = 3. The windowed sinc filter shown is the normalized Lanczos filter for a = 2.

Download Full Size | PDF

The final upsampled array $sourc e_{ups}^{fin} []$ is calculated using convolution with a kernel filter[], i.e. $sourc e_{ups}^{fin} [] = {source}_{ups} [] \otimes filter []$ . The convolution kernel filter[] has to be chosen according to the nature of the function U(x, 0): a rectangular kernel provides a piecewise constant interpolation (which is suitable if U(x, 0) represents a pixelated spatial light modulator), a windowed sinc kernel provides a good interpolation in terms of frequency content (which is suitable if U(x, 0) is a general continuous function), a triangular kernel provides a piecewise linear interpolation (which is faster than a windowed sinc kernel and can provide acceptable results), etc. (see Fig. 3). Examples of kernel implementations will be shown at the end of Section 4.

The main idea of the method to be derived exploits the fact that the length of the filter[] array is much smaller than the length of the array source[] and target[]. For simplicity, let us assume the length of the filter[] array to be odd. Let us write it as 2 × fwh + 1, where fwh ∈ Z, fwh ≥ 0.

Thanks to associativity of the convolution, the following holds:

\begin{array}{l} {target}_{ups} [] & = ({source}_{ups} [] \otimes filter []) \otimes h_{x_{0}, z_{0}, ups} [] = {source}_{ups} [] \otimes (filter [] \otimes h_{x_{0}, z_{0}, ups} []) = \\ = {source}_{ups} [] \otimes h_{x_{0}, z_{0}, ups}^{fin} [] \end{array}

where target_ups[] represents the target sampled with a period Δ/ups and

h_{x_{0}, z_{0}, ups}^{fin} []

is the propagation kernel convolved by the array filter[]. Specifically,

\begin{array}{l} {target}_{ups} [j] & = \sum_{i = - \infty}^{\infty} {source}_{ups} [j - i] h_{x_{0}, z_{0}, ups}^{fin} [i] = \\ = \sum_{i = - \infty}^{\infty} {source}_{ups} [j - i] \sum_{k = - f w h}^{f w h} filter [k] h_{x_{0}, z_{0}, ups} [i - k] \end{array}

However, we need the final result ups-times downsampled, i.e. target[j] = target_ups[ups × j]. Moreover, the sample source_ups[i] is zero for i/ups ∉ Z. We can therefore omit these samples from the sum. It follows that

\begin{array}{l} target [j] = {target}_{ups} [ups \times j] & = \sum_{i = - \infty}^{\infty} {source}_{ups} [ups \times (j - i)] h_{x_{0}, z_{0}, ups}^{fin} [ups \times i] = \\ = \sum_{i = - \infty}^{\infty} source [j - i] h_{x_{0}, z_{0}, 1}^{fin} [i] \end{array}

where

h_{x_{0}, z_{0}, 1}^{fin} []

is a filtered propagation kernel:

h_{x_{0}, z_{0}}^{fin} [i] = \sum_{k = - f w h}^{f w h} filter [k] h_{x_{0}, z_{0}, ups} [ups \times i - k]

The propagation calculation leads to two discrete convolutions: in the first one (5), we work with sampling period Δ, in the second one (6), with sampling period Δ/ups.

The aforementioned equations worked with an infinite extent of the indices in order to avoid array boundary effects. In the following section, we will adjust the indices extent but the structure of the result will remain the same. The section will also explain the advantage of the presented method, which can be briefly described as follows.

The discrete convolution can be calculated indirectly using FFT or directly using the definition Eq. (2). The first way is advantageous if the convolution kernel is large; we will use it, therefore, for Eq. (5). The second way is better in the opposite case; we will therefore use it for Eq. (6). Here we will also use a nice property of direct calculation: the samples $h_{x_{0}, z_{0}, 1}^{fin} [i]$ can be calculated with minimum memory demands.

4. Practical 1D solution

Let us assume that the source is sampled using M samples and the target is sampled using N samples. If we want to use FFT for the target[] calculation, we have to work with cyclic convolution. This means that the arrays source[] and target[] have to be zero-padded to C samples, C ≥ M + N − 1 (see [17]). Then:

target [j] = \sum_{i = 0}^{C - 1} source [i mod C] h_{x_{0}, z_{0}, 1}^{fin} [(j - i) mod C]

The array source[] contains correct values for sample indices 0, 1,...,M − 1, the array target[] for indices 0, 1,...,C − M. The array $h_{x_{0}, z_{0}, 1}^{fin} []$ has to be then calculated as:

h_{x_{0}, z_{0}, 1}^{fin} [i] = {\begin{array}{l} \sum_{k = - fwh}^{fwh} filter [k] h_{x_{0}, z_{0}, ups} [ups \times i - k] & if 0 \leq i < N \\ \sum_{k = - fwh}^{fwh} filter [k] h_{x_{0}, z_{0}, ups} [ups \times (i - C) - k] & if N \leq i < C \end{array}

It is worth noting that in the calculation of the sample $h_{x_{0}, z_{0}, 1}^{fin} [i]$ , it is possible to use the samples h_{x₀,z₀,ups}[] calculated before, specifically for $h_{x_{0}, z_{0}, 1}^{fin} [i - 1]$ . It is therefore convenient to save the samples h_{x₀,z₀,ups}[] in a temporary buffer of size 2 × fwh + 1 samples, and to replace part of them in the calculation of $h_{x_{0}, z_{0}, 1}^{fin} [i]$ with new values. The number of these new samples depends on the filter type.

To calculate the filter[] array, we have to choose the number of samples of the source[] array that contribute to the calculation of the interpolated sample in the source^fin[] array, or in other words, in the $h_{x_{0}, z_{0}, 1}^{fin} [i]$ array. This user-defined number specifies all the parameters needed: the size of the filter[] array, the interpolation type, and the number of the samples shared in the calculation of the neighbouring samples of the $h_{x_{0}, z_{0}, 1}^{fin} []$ array.

It is practical to use separable kernels when working with 2D arrays. We can discuss them right now, when working with 1D arrays. For clarity, some examples are given in Fig. 3. To make things simpler, we will show pure interpolation kernels in both the Fig. 3 and the following text, i.e. they do not preserve signal energy. For propagation calculation it is, however, appropriate to normalize the filter, i.e. the sum of its coefficients equals 1.

Piecewise constant interpolation

It is suitable if the source is split to rectangular pixels of non-zero area. In this case it is appropriate for ups to be odd, due to symmetry. Then the interpolation process adds an even number of samples between every two samples of the source[] array. Therefore, fwh = (ups − 1)/2 and the kernel is given as: filter[i] = 1 for −fwh ≤ i ≤ fwh.

Piecewise linear interpolation

It is suitable if the source represents a continuous function and it does not contain fine details. We need two neighbouring original samples for the calculation of the interpolated one, i.e. fwh = ups − 1, filter[i] = 1 − |i|/(fwh + 1) for −fwh ≤ i ≤ fwh.

Windowed sinc interpolation

It is suitable if the source represents a continuous function and we care about good replication of its frequency content. Choice of the number of the original source[] samples has to be a compromise. The more samples are included, the better is the frequency content replication; on the other hand, too large kernels perform badly in the spatial domain. A Lanczos filter is considered a reasonable compromise. It considers 2a neighbouring samples, where usually a = 2 or a = 3 [20]. Then fwh = a×ups−1 and a preliminary kernel is defined as filter_prel[i] = lanczos(a×i/(fwh+1), a), where lanczos(x, a) = asin(πx)sin(πx/a)/(π²x²) for −fwh ≤ i ≤ fwh. The final kernel has to be adjusted before use: the coefficients contributing to the same sample calculation, i.e. the coefficients ups samples apart, have to sum to 1 [21]. Mathematically, filter[i] = filter_prel[i]/ ∑_kfilter_prel[i + k × ups] for all allowed values of k.

5. Final 2D solution

Generalization of the aforementioned ideas is straightforward; instead of 1D cyclic convolutions (or FFT’s), we have to use 2D versions. For simplicity, let us assume the sampling periods in both x and y directions to be the same, let us call them Δ.

To propagate the source sampled by M_x × M_y samples to the target sampled by N_x × N_y samples, let us create the arrays source[,] and target[,] of size C_x × C_y, C_x ≥ M_x + N_x − 1, C_y ≥ M_y + N_y − 1. It is convenient to choose the numbers C_x and C_y so that the FFT of these arrays runs fast, e.g. powers of 2. The samples of the source must be stored in the array source[,] at indices from [0, 0] to [M_x − 1, M_y − 1]. After the calculation, the correct samples are located in the array target[,] at indices from [0, 0] to [N_x − 1, N_y − 1].

As the next step, we have to choose the parameter ups. The way to do it is as follows. Let us assume for a while that we work with the original lattice, i.e. ups = 1. In the propagation calculation, we have to calculate (4) for every vector T − S, where S is a 3D position of a sample in the source and T is a 3D position of a sample in the target. In (4) we have to use r = |T − S| (see its application in (2)). Let S₁ and S₂ be positions of adjacent samples in the source and these samples have the same value; they represent two point light sources of the same complex amplitude. Let us calculate r₁ = |T − S₁| and r₂ = |T − S₂|. If |r₁ − r₂| = λ/2, then the contributions from S₁ and S₂ cancel each other at the point T, i.e. in this direction there is the first diffraction minimum. As we have shown in Section 2, we need to exclude the first diffraction minimum from the target. To make this happen, we have to refine the sampling, i.e. increase the parameter ups until |r₁ − r₂| < λ/2. If a more precise result is needed, we can refine further. Numerical experiments have shown that higher ups than those leading to |r₁ − r₂| < λ/5 did not have any significant impact. The adjacent points S₁, S₂ and the point T have to be chosen as “the worst case”, i.e. the angle between T − S₁ (or T − S₂) and the z axis has to be as big as possible.

As the next step, we have to calculate the array $h_{x_{0}, y_{0}, z_{0}, 1}^{fin} [,]$ , where [x₀, y₀, z₀] is the position of the sample target[0, 0]. We assume the position of the sample source[0, 0] to be [0, 0, 0]. For the calculation, we need the numbers h_{x₀,y₀,z₀,ups}[i, j] = h(iΔ/ups + x₀, jΔ/ups + y₀, z₀) (see (3) for comparison). Thanks to separability of the filters, we can calculate them for one upsampled row only, convolve them with filter[] and downsample, i.e. to use the procedure described in Section 4. We need to calculate 2×fwh+1 of such rows, convolve them by columns and down-sample; this procedure leads to one row of the array $h_{x_{0}, y_{0} z_{0}, 1}^{fin} [,]$ . The other rows are calculated in the same way.

Finally, we can calculate the propagation itself:

target [,] = IFFT (FFT (source [,]) ⊙ FFT (h_{x_{0}, y_{0}, z_{0}, 1}^{fin} [,])

where FFT and IFFT are fast Fourier transform and inverse fast Fourier transform, respectively, and ⊙ is the Hadamard product (element-wise product). The propagation calculation is complete now. Examples of the propagations are shown in Fig. 4.

Fig. 4 Examples of diffraction by grating with vertical strips; grating size 5 × 5 mm², sampling period 10 μm, samples in each row 1, 0, 1, 0, ... (i.e. 50 slits/mm). Propagation distance 500 mm, illumination at normal incidence, λ = 650 nm. The left images show right halves of the diffraction patterns (compare with Fig. 1); the graphs to the right show the intensity relative to the central intensity. The interpolation used is a) none, b) rectangular filter, c) triangular filter, d) Lanczos filter, a = 2, e) Lanczos filter, a = 3.

Download Full Size | PDF

It is worth adding three remarks:

The same result can be obtained using the basic method, i.e. using upsampling, interpolation of the source, and convolution with a common Rayleigh-Sommerfeld propagation kernel. The upsampled array source_ups[,] would have ups − 1 zero samples between original samples in every row (column), and additionally fwh zero samples to the sides in order to correctly calculate the convolution with an interpolation kernel filter[,] of size (2 × fwh + 1) × (2 × fwh + 1) samples. The size of the array source_ups would then be 2 × fwh + 1 + ups × (M_x − 1) × 2 × fwh + 1 + ups × (M_y − 1) samples, sampling period Δ/ups. This means that the physical size of the source increases a bit for fwh ≥ 1; this is the reason why the article never mentions the exact physical sizes of the source and the target. The additional borders are an interpolation artifact. However, their effect is negligible for big arrays.
The proposed method with propagation kernel filtering is nothing else than a calculation rearrangement. The article [17] that describes the propagation calculation with large source and target or with different sampling periods of source and target is therefore fully compatible with the proposed method; is is sufficient to replace the calculation of propagation kernels.
The calculation rearrangement leads to different rounding errors in numerical calculation and thus to differences between the basic and the proposed method. The differences are negligible, though. As it is hard to tell which method gives a more precise result, we will not discuss the numerical aspects of the proposed method.

6. Time and memory requirements

The motivation to create the proposed method was to calculate reference propagation using a small amount of memory. Let us compare its memory and time demands with the basic method. In this section, we will assume square arrays for simplicity, i.e. M_x = M_y = M, N_x = N_y = N.

The basic method, as we have shown in the last section, upsamples the array source[,] to the array source_ups with 2 × fwh + 1 + ups × (M − 1) ² samples and calculates propagation to the array target_ups. There is no need to introduce additional zero borders due to interpolation, i.e. the target will have 1 + ups × (N − 1) ² samples. The propagation will be calculated in the arrays with 2×fwh+1+ups×(M +N −2) ² samples, which gives the memory requirements of the basic method.

The proposed method uses the original arrays source[,] and target[,], so the propagation will be calculated in arrays with (M + N − 1)² samples. The memory demands are, however, a bit higher, as we have to take account of temporary memory used in the calculation of the filtered propagation kernel $h_{x_{0}, y_{0}, z_{0}, 1}^{fin} [,]$ . It is 2 × fwh + 1 samples for “row convolution” and 2 × fwh + 1 rows with M + N − 1 samples for “column convolution”; together (2 × fwh + 1)(M + N) samples. Typically, fwh = a × ups, where a is small (in our examples at most 3), and ups is much smaller than M + N. Therefore, it is possible to ignore this amount in further discussion.

By comparing the memory demands, we conclude that the propagation calculation using the proposed method takes approximately 2 × fwh + 1 + ups × (M_x + N_x − 2) ²/(M + N −1)² ≈ ups²-times less memory. Common experiments in computer generated holography with centimetre-sized fields using a sampling period of about 10 μm off-axis propagated to a distance in the order of tens of centimetres require the ups parameter up to 20. We can therefore say that the proposed method has about 100-times less memory demands than the basic method.

On the other hand, the time of the computation does not fall so quickly. This is because the calculation consists not just of a convolution calculation using the FFT, which is very fast in the proposed method, but of a convolution kernel calculation as well that is approximately as slow as in the basic method. More precisely, the FFT works with arrays approximately ups²-times smaller than in the basic method, which means it is approximately ups²-times faster. The convolution kernel calculation requires calculating the array h_{x₀,y₀,z₀,ups} [,] (the same as in the basic method), filtering it by rows and by columns using a filter of width 2 × fwh + 1, where again fwh = a × ups, and downsampling the result. The analysis shows that the convolution kernel calculation in the proposed method is approximately 4a²-times slower than in the basic method, where a is again a small number. It is not worth making a precise analysis, as the speed of the FFT, the Rayleigh-Sommerfeld convolution kernel and its filtering are hard to compare theoretically. It is more useful to measure the calculation times (see Fig. 5). The graph to the right shows that the speedup of the proposed method is not as big as its memory savings; this is mainly because the calculation of the convolution kernel dominates in the total time of the calculation.

Fig. 5 Time of the calculation comparison. The graphs to the left and in the middle show the dependency of the time of propagation calculation for N = M = 500 on the upsampling factor ups; the vertical scale used defines the time of the basic method for ups = 1 to be 1. Besides the complete time of the calculation, the times of the FFTs and the propagation kernel calculation times are shown. The rightmost graph shows the ratio of the proposed and the basic method calculation times; e.g. a value of 4 means that the proposed method is 4× faster for a given ups.

Download Full Size | PDF

7. Conclusions

In Section 2 we have shown and explained in terms of physics that in the discretization of the light propagation between parallel planes it is necessary to take into account both the correct sampling of the source and the opposite procedure, the reconstruction. We have shown that the correct result can be obtained using upsampling; we have shown in Section 5 how fine this upsampling should be. We have demonstrated in Section 6 that in typical tasks in computer generated holography the upsampling can be approximately 10×, which leads to 100× slower calculation and 100× bigger memory demands.

In Sections 3 to 5 we have derived a procedure based on filtration of the propagation kernel by the “interpolation” kernel that is used for interpolated upsampling of the source. Thanks to the properties of the interpolation kernel (small support, separability), the proposed method is faster and cuts the memory demands to approximately those values which would be needed if no upsampling was used. It can thus be said that the proposed method has approximately 100× smaller memory demands than the basic method of the same precision. As the method just rearranges the calculation, the result is mathematically the same.

Acknowledgments

This work has been supported by the Ministry of Education, Youth and Sports of the Czech Republic projects LC06008 and LH12181. The author would like to thank Václav Skala for discussions concerning the subject.

References and links

1. J. W. Goodman, Introduction to Fourier Optics (Roberts & Company Publishers, 2004), 3rd ed.

2. D. G. Voelz, Computational Fourier Optics: A Matlab Tutorial, Tutorial texts in optical engineering (SPIE Press, 2011).

3. U. Schnars and W. Jueptner, Digital Holography: Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer, 2005).

4. K. Matsushima, “Computer-generated holograms for three-dimensional surface objects with shade and texture,” Applied Optics 44, 4607–4614 (2005). [CrossRef] [PubMed]

5. I. Hanák, M. Janda, and V. Skala, “Detail-driven digital hologram generation,” Visual Computer 26, 83–96 (2010). [CrossRef]

6. Y. Sakamoto, M. Takase, and Y. Aoki, “Hidden surface removal using z-buffer for computer-generated hologram,” Practical Holography XVII and Holographic Materials IX 5005, 276–283 (2003). [CrossRef]

7. M. Yamaguchi, “Ray-based and wavefront-based holographic displays for high-density light-field reproduction,” Three-Dimensional Imaging, Visualization, and Display 2011 8043, 804306 (2011). [CrossRef]

8. P. W. M. Tsang, J. P. Liu, K. W. K. Cheung, and T. C. Poon, “Modern methods for fast generation of digital holograms,” 3D Research 1, 11–18–18 (2010). [CrossRef]

9. N. Delen and B. Hooker, “Free-space beam propagation between arbitrarily oriented planes based on full diffraction theory: a fast fourier transform approach,” J. Opt. Soc. Am. A 15, 857–867 (1998). [CrossRef]

10. K. Matsushima, H. Schimmel, and F. Wyrowski, “Fast calculation method for optical diffraction on tilted planes by use of the angular spectrum of plane waves,” J. Opt. Soc. Am. A 20, 1755–1762 (2003). [CrossRef]

11. L. Onural, “Exact solution for scalar diffraction between tilted and translated planes using impulse functions over a surface,” J. Opt. Soc. Am. A 28, 290–295 (2011). [CrossRef]

12. E. Lalor, “Conditions for the validity of the angular spectrum of plane waves,” J. Opt. Soc. Am. 58, 1235–1237 (1968). [CrossRef]

13. H. M. Ozaktas, S. O. Arik, and T. Coşkun, “Fundamental structure of fresnel diffraction: natural sampling grid and the fractional fourier transform,” Opt. Lett. 36, 2524–2526 (2011). [CrossRef] [PubMed]

14. K. Matsushima, “Shifted angular spectrum method for off-axis numerical propagation,” Opt. Express 18, 18453–18463 (2010). [CrossRef] [PubMed]

15. L. Onural, “Exact analysis of the effects of sampling of the scalar diffraction field,” J. Opt. Soc. Am. A 24, 359–367 (2007). [CrossRef]

16. L. Onural, A. Gotchev, H. Ozaktas, and E. Stoykova, “A survey of signal processing problems and tools in holographic three-dimensional television,” Circuits and Systems for Video Technology, IEEE Transactions on 17, 1631 –1646 (2007). [CrossRef]

17. P. Lobaz, “Reference calculation of light propagation between parallel planes of different sizes and sampling rates,” Opt. Express 19, 32–39 (2011). [CrossRef] [PubMed]

18. V. Katkovnik, J. Astola, and K. Egiazarian, “Discrete diffraction transform for propagation, reconstruction, and design of wavefield distributions,” Appl. Opt. 47, 3481–3493 (2008). [CrossRef] [PubMed]

19. E. Steward, Fourier Optics: An Introduction, Ellis Horwood Series in Physics (Dover Publications, 2004), 2nd ed.

20. K. Turkowski, “Filters for common resampling tasks,” in “Graphics gems,”, A. S. Glassner, ed. (Academic Press Professional, Inc., San Diego, CA, USA, 1990).

21. D. P. Mitchell and A. N. Netravali, “Reconstruction filters in computer-graphics,” SIGGRAPH Comput. Graph. 22, 221–228 (1988). [CrossRef]

Memory-efficient reference calculation of light propagation using the convolution method

Abstract

1. Introduction

2. Effect of naive discretization

3. Simplified solution

4. Practical 1D solution

Piecewise constant interpolation

Piecewise linear interpolation

Windowed sinc interpolation

5. Final 2D solution

6. Time and memory requirements

7. Conclusions

Acknowledgments

References and links

Cited By

Figures (5)

Equations (12)

Optics Express