Reference calculation of light propagation between parallel planes of different sizes and sampling rates

Petr Lobaz

doi:10.1364/OE.19.000032

1. Introduction

A fundamental tool of digital holography, or computer generated holography, is a numerical simulation of coherent light propagating in free space. We will use, as usual, scalar approximation of the vectorial nature of light, and will not consider time-dependent behavior of the light [1]. One of the most usual tasks is to calculate light propagation between two parallel planes; the Rayleigh-Sommerfeld integral [1], its mathematical equivalent angular spectrum [1,2] or their various approximations are used most often.

Those equations need to be discretized for numerical calculation, i.e. one has to both sample and spatially restrict optical fields involved. The discretization itself leads to various errors well described in literature [3,4]. The calculation tries to avoid those errors while trying to work as fast as possible.

It is, however, often hard to decide what is an error of discretization itself, what is an inherent error of a given fast method and what is just an error of a particular implementation. The goal of this article is to describe the reference numerical calculation of light propagation between parallel planes that has just one “error”, the discretization. Any other method can be then compared to this one. As we will assume fine sampling that does not lead to aliasing errors, we have to deal with a large amount of data. We will try to handle it as fast as possible while retaining the accuracy of the calculation.

The proposed method focuses on off-axis light propagation between two rectangular areas that share neither size nor sampling, or, between a spatial light modulator (SLM) and a camera sensor. Reference calculation is described, e.g., in [5], off-axis propagation, e.g., in [6,7]; different samplings are treated in [8] by coordinate system change, in [9] by shifted convolution kernel, in [10] by scaled Fourier transform; different sizes of SLM and sensor is solved in [10,11] using tiling while decreasing memory demands. This article solves all the requirements in a unified way by using the convolution approach and tiling; in contrast to references given, it deals with optimization too.

2. One-dimensional case

We are going to show all the principles in a one-dimensional case before showing the full 2D version. This means that we will calculate light propagation between two line segments instead of two rectangular areas. We will call them Source and Target.

In linear optics it is assumed that light sources do not influence each other and that every single point $t (x)$ of Target (e.g. a camera sensor) is affected by the shining of all points $s (x)$ of Source (e.g. a SLM, see Fig. 1a ). As mentioned in the introduction, we will assume scalar approximation of coherent light, i.e. the light source can be completely described by amplitude A and phase ϕ, or complex amplitude $A \exp (j ϕ)$ , where $j^{2} = - 1$ .

Fig. 1 One-dimensional light propagation. Description of (a), (b), (c) is given in the text below.

Download Full Size | PDF

Light changes both amplitude and phase by propagation. This change can by described by multiplication with some complex number p. Light propagation is space invariant; therefore, it is just the mutual position of points on Source and Target that matters, not their absolute positions. It follows that the calculation will have a convolution form:

t (x) = \int_{S o u r c e} s (ξ) p (x - ξ) d ξ

In the equation there is no distance along z axis between Source and Target because it is a constant and can be a part of the function p.

We can discretize the equation by equidistant splitting of Source and Target into M and N basic elements $s [m]$ ( $0 \leq m \leq M - 1$ ) and $t [n]$ ( $0 \leq n \leq N - 1$ ), i.e. by uniform sampling. Therefore, the element $t [n]$ receives from the element $s [m]$ light with complex amplitude $s [m] p [n - m]$ , see Fig. 1b. Equation (1) can be written in discrete form:

t [n] = \sum_{m = 0}^{M - 1} s [m] p [n - m] for 0 \leq m \leq M - 1, 0 \leq n \leq N - 1.

To compute all elements

t [n]

, we have to know numbers

p [n - m]

for

- (M - 1) \leq n - m \leq N - 1

, i.e.

M + N - 1

different values. We can obtain them in different ways, some of which will be mentioned in section 3. For now, let us assume we know them.

The calculation of all elements $t [n]$ is most often done by rewriting Eq. (2) as a cyclic convolution and subsequent use of the discrete Fourier transform. The cyclic convolution has a form:

t [n] = \sum_{m = 0}^{C - 1} s [m] p [(n - m) mod C] for 0 \leq m \leq C - 1, 0 \leq n \leq C - 1,

where

C \geq M + N - 1

and

s [m] = 0

for

M \leq m \leq C - 1

(i.e., the s is zero-padded to the size C). Notice that the important values of

t [n]

are those for

0 \leq n \leq C - M

; the others are damaged by the cyclic behavior of indices in arithmetic

(mod C)

. It is also worth mentioning that in cyclic convolution it is usually assumed that

M = N

. The proof of validity for the case

M \neq N

consists just in the expansion of equations for

t [n]

. Finally, C is an arbitrary number bigger than or equal to

M + N - 1

. By choosing a suitable C, we can speed up the computation significantly, because

t = I D F T (D F T (s) \times D F T (p)),

where t, s and p are C-dimensional vectors (arrays) of complex numbers,

D F T

is a discrete Fourier transform of a vector,

I D F T

is an inverse discrete Fourier transform of a vector and × is the Hadamard product (element by element product). The speedup is expected due to the fact that the calculation of

D F T

, or

I D F T

, can be done in time

O (C \log C)

[12]. Choosing a suitable C is important because the actual calculation time is highly sensitive to its character.

Let us assume that the sampling rate of Target is twice as fine as the sampling rate of Source. Then the subset of even samples from Target has the same sampling rate as Source. Consequently, we can easily calculate the propagation of Source to even samples of Target. Obviously, we can do the same with odd samples. It means we can split the calculation of light propagation into two calculations; they are denoted in Fig. 1c by black and magenta arrows. It follows that the same principle can be applied when the sampling rate of Target is τ-times finer than the sampling rate of Source, where τ is a natural number; we have to split Target into τ “interleaved tiles”, i.e. the calculation has to be split into τ calculations.

The same situation appears when Source has σ-times finer sampling than Target. The idea can be generalized: if the sampling rates of Source and Target are in a ratio $σ : τ$ , where σ and τ are coprime natural numbers (i.e. σ elements of Source have the same size as τ elements of Target), we can split the calculation into $σ \times τ$ independent calculations. More precisely, we have to σ-times calculate the propagation for the sampling rate ratio $1 : τ$ and sum the results. Usually we do not care about the exact value of the sampling rate; therefore, we can choose such σ and τ that approximate the desired sampling fairly well.

It follows from Eq. (3) that the vectors s and t have to be padded to size C. This means that for $M = N$ , approximately 50% of elements are held in memory uselessly; in 2D convolution, it is as much as 75%. Therefore, for big M and N we can expect a lack of memory very soon, especially when using special hardware for $D F T$ calculation, e.g. GPU. For example a naive approach to propagation of a microscopic Source to an extended detector Target could require hundreds of gigabytes.

We need two memory spaces of size $C \geq M + N - 1$ for the calculation of Eq. (4); in practice a restriction may appear: that just two spaces of size $P < C$ are available. Let us assume, for example, that $M = 512$ , $N = 1024$ and $P = 1024$ . We cannot make the calculation directly because $C \geq 1535$ ; but we can split Target in the middle into two parts (let us call them “common tiles”) with $N^{'} = 512$ elements and make two calculations. For them, we need only two spaces of size at least $M + N^{'} - 1 = 1023$ , and therefore we are not limited by $P = 1024$ . The same idea would apply if $M = 1024$ , $N = 512$ and $P = 1024$ : we would calculate the light propagation of both small parts of Source to Target and sum the results. As in the “different sampling” case, even this idea can be generalized: Source can be split into S parts, Target into T parts and then we have to calculate $S \times T$ propagations.

3. Two-dimensional case

The extension of equations from section 2 to the 2D case is straightforward: instead of sums we just put double sums there. For calculation of light propagation of a part of Source to a part of Target, it is necessary to carefully calculate the convolution kernel, i.e. 2D array p. A practical aid is the fact that its element $p [0, 0]$ describes light propagation from element $s [0, 0]$ of a particular part of Source to element $t [0, 0]$ of a particular part of Target. The equations for sampling rates, samples counts and offsets are simple but technically demanding, so we will not show them here.

We should, however, mention the calculation of convolution kernel values. The Rayleigh-Sommerfeld equation [2] for the light propagation from the plane $z = 0$ to point $[x, y, z]$ is:

U (x, y, z) = \frac{- 1}{2 π} {\int ​ ​ \int ​ ​}_{- \infty}^{\infty} U (ξ, η, 0) \frac{\partial}{\partial z} \frac{\exp (j k r)}{r} d ξ d η

where

U (x, y, z)

is a complex amplitude of light in a point

[x, y, z]

,

k = 2 π / λ

is a wavenumber (λ is a wavelength), and r is the distance between points

[x, y, z]

and

[ξ, η, 0]

. This equation can be written in a convolution form:

U (x, y, z) = U (x, y, 0) \otimes [\frac{- z}{2 π} (​
j k - \frac{1}{\sqrt{x^{2} + y^{2} + z^{2}}}) \frac{\exp (j k \sqrt{x^{2} + y^{2} + z^{2}})}{x^{2} + y^{2} + z^{2}}]

where ⊗ is the 2D convolution operator,

(f \otimes g) (x, y) = \int^{​} {\int^{​}}_{- \infty}^{\infty} f (ξ, η) g (x - ξ, y - η) d ξ d η

.

The simplest method of convolution kernel discretization (the expression on the right side of the convolution operator in Eq. (5)) is a plain sampling, i.e. its calculation for a particular x, y (z is a constant). Alternatively, we can assume that a sample of Source in fact expresses – using some pixel spread function – the shining of a particular non-zero-area element of Source and alter the kernel accordingly. If we take non-zero area of Target (sensor) elements into account, we can pre-filter the kernel. If Source is a mathematical model of a real display, we can even measure the kernel. The proposed method therefore does not depend on particular features of the kernel; its only assumption is the description of light propagation as a convolution. In our implementation, we did not deal with advanced methods of kernel calculation, however, and we have chosen plain sampling.

4. Theoretical time of computation

The calculation works with arrays of size $C = C_{x} \times C_{y}$ samples. It consists of three $D F T$ 's (more precisely, two forward and one backward), calculation of the convolution kernel p with complexity $O (C)$ and the Hadamard product of the same complexity. For a large C, only times spent on $D F T$ 's matter. The time of calculation using the fast Fourier transform is therefore proportional [12] to

(D F T ​
's count) C \log C

In the following analysis, we will assume light propagation from a square area of size

M \times M

samples to a square area

N \times N

samples. For the convolution calculation, we will assume memory space

C = (M + N) \times (M + N)

. Time of calculation is then given by:

t_{basic} (M, N) = 3 {(M + N)}^{2} \log {(M + N)}^{2} \equiv 6 {(M + N)}^{2} \log (M + N)

Let us begin with the case that Source and Target share the sampling. Then we can also split Target into

T \times T

common tiles of size

(N / T) \times (N / T)

. It follows that we calculate

D F T (S o u r c e)

once and calculate

D F T (p)

and

I D F T (D F T (S o u r c e) \otimes D F T (p))

T \times T

-times, while we assume the array sizes for

D F T

as

C = (M + N / T) \times (M + N / T)

. Using Eq. (6), we get time of calculation

t_{common tiles} (M, N, T) = (1 + 2 T^{2}) 2 {(M + \frac{N}{T})}^{2} \log (M + \frac{N}{T})

Let us assume that

M = ω N

and watch the speedup

s_{1} (ω, N, T) = \frac{t_{basic} (ω N, N)}{t_{common tiles} (ω N, N, T)}

of tiled calculation versus direct calculation. The graph of

s_{1} (ω, N, T)

shows that tiled calculation starts to be faster for

T = 2

and

ω < 1 / 4

, while for bigger T it is faster for even smaller ω. Figure 2a shows the graph for

N = 1024

; for different N it does not change very much.

Fig. 2 The ratio of direct calculation time to (a) tiled calculation for given ratio ω of Source to Target side sizes (for Target side size N = 1024), and (b) tiled calculation for given ratio τ of Target to Source and sampling rates (for Source side size M = 1024). FFTW library was used for time measurement.

Download Full Size | PDF

A more interesting result arises when Source and Target have different sampling rates. Let us assume that Target has τ-times finer sampling than Source, i.e. $N = τ M$ . For direct calculation, Source must have the same sampling rate as Target, i.e. it must have $τ M \times τ M$ samples too. We can increase number of samples by interpolation or just by interleaving current ones by a suitable amount of zeros. By substituting $C = (τ M + τ M) \times (τ M + τ M)$ into Eq. (6), we get time of calculation

t_{upscaled} (M, τ) = 6 {(τ M + τ M)}^{2} \log (τ M + τ M) \equiv 24 τ^{2} M^{2} \log (2 τ M)

If we split Target into

τ \times τ

interleaved tiles, we have to calculate

τ \times τ

light propagations from Source to a part of Target of size

M \times M

samples. By substituting

C = (M + M) \times (M + M)

into Eq. (6), we get time of calculation

t_{interleaved tiles} (M, τ) = (1 + 2 τ^{2}) 2 {(M + M)}^{2} \log (M + M) \equiv (1 + 2 τ^{2}) 8 M^{2} \log (2 M)

The speedup

s_{2} (M, τ) = \frac{t_{upscaled} (M, τ)}{t_{interleaved tiles} (M, τ)} \equiv \frac{3 τ^{2} \log (2 τ M)}{(1 + 2 τ^{2}) \log (2 M)}

is bigger than 1 for

τ > 1

, i. e., tiled calculation is always faster; a graph of

s_{2} (M, τ)

dependent on τ for

M = 1024

is in Fig. 2b. Again, a different M leads to a similar graph.

One can see in Fig. 2 that measured speedups are scattered around theoretical curves. A discussion of this fact is contained in the following section.

Until now we have assumed that Target has more samples than Source. That led to saving nearly one-third of all $D F T$ 's. In the opposite situation this does not hold any more; however, the results of the analysis are similar, although not so outstanding. The analysis of general situations, where the ratio of sampling rates or sizes is a general fraction, leads to the following results: speedup is greater when the nominator and denominator of the sampling rate ratio are big too; splitting of Source and Target of the same sampling rate into common tiles in a complicated ratio leads rather to slowdown if we do not mind memory restrictions.

5. Implementation notes

The implementation of the proposed method is simple, although due to a number of variables quite demanding. The first step is to estimate the ratio of Source and Target sampling rates, and to split them into interleaved tiles of common sampling. In case we have enough memory to calculate light propagations between these tiles, we can calculate them (but see later). In another case, we are facing a still unresolved problem.

Let us assume that two memory spaces of size C are available. The interleaved tiles created by splitting Source and Target due to the requirement of common sampling have to be split into $S_{x} \times S_{y}$ common tiles of size $M_{x} \times M_{y}$ , and $T_{x} \times T_{y}$ common tiles of size $N_{x} \times N_{y}$ (due to integer division one row and one column may be smaller) so that:

(M_{x} + N_{x} - 1) (M_{y} + N_{y} - 1) \leq C

while the time of the calculation has to be as short as possible. We cannot trust the theoretical time complexity of

D F T

O (C \log C)

when searching for optimum; algorithms of

D F T

for special array sizes (e.g. power of 2) are usually faster than for comparable (e.g. prime) array sizes. For example, FFTW library used in our implementation calculates DFT of special array sizes up to 6 × faster than for comparable prime sizes. This is the first expected reason for scattered look of the measured data in Fig. 2. One can see in Fig. 2b that measured speedups are bigger than theoretical ones. This behavior starts to appear for

M > 512

in our implementation. We expect this behavior appears due to a following fact. When calculating a propagation without any tiling, we have to increase Source side size τ-times, so DFT has to work with a big array. Using interleaved tiling, DFT works with smaller arrays that fit into cache memory more easily, and therefore bigger speedup can be expected.

Implemented heuristic suggests splitting in this way. First, we have measured calculation times $t i m e [i]$ of one-dimensional $D F T$ 's of array sizes i. From these times we have picked “friendly-size” ones that satisfy condition $t i m e [f r i e n d l y - s i z e] < t i m e [i]$ for all measured $i > f r i e n d l y - s i z e$ . Two-dimensional $D F T$ is separable, so it is expected that a two-dimensional array of friendly-size side sizes will be “friendly” too. Next, we have measured calculation times for those arrays. The propagation itself is then calculated in 2D “friendly-size” arrays. In case we need to split into common tiles due to memory restrictions, we choose tiles of maximum width (tiles are then in fact horizontal stripes); height of Source and Target tiles is chosen to be approximately equal and as big as possible. We have compared this heuristic to the optimal solution found using brute force; it suggests at most approximately $1.7 \times$ worse tiling.

A topic that has not been discussed yet is the precision of the proposed algorithm compared to the precision of direct calculation without any tiling. We have tried to propagate the Source to the Target of the same size (for several different sizes) using different tiling schemes. We have found that the relative difference of the calculated complex amplitudes is in the order of 10⁻¹⁰ or smaller; the difference was calculated as the absolute value of complex amplitudes differences. This difference is so small that we have not tried to find where it comes from.

6. Conclusion

A method for reference calculation of light propagation between two rectangular areas of parallel planes has been presented. These areas do not have to have either the same sampling rate or the same size, see Fig. 3 . The calculation can be split into tiles to meet memory restrictions given; on the other hand, saving memory often leads to worse calculation times. We have shown that in the case of a simple sampling rates ratio, the proposed method actually speeds up the calculation up to $2 \times$ by splitting the areas into interleaved tiles compared to zero-padding and direct calculation. We have shown as well that if one area is much bigger than the other, it is faster to split the bigger one into common tiles. A still unresolved problem is how to split the calculation into common tiles (due to memory restrictions) to get the fastest calculation. We have, however, proposed suboptimal heuristic to address this problem.

Fig. 3 Speckle simulation: an example of algorithm output. A patch of size 0.1 × 0.1 mm² with a random phase (a) and a Gaussian intensity (b) was sampled to a 1024 × 1024 array of complex amplitudes. Intensity of the off-axis propagation to the distance 5 mm is shown in the subimage (c), size of the subimage is 4.7 × 1.6 mm². A naive approach to the convolution would require approx. 28 GiB of memory while proposed algorithm can work on a common PC.

Download Full Size | PDF

Acknowledgments

This work has been supported by the Ministry of Education, Youth and Sports of the Czech Republic under the research program LC-06008 (Center for Computer Graphics). The author would like to thank Ivo Hanák, Václav Skala and the reviewers for their valuable comments.

References and links

1. J. W. Goodman, Introduction to Fourier Optics (Roberts & Company Publishers, 2004), 3rd ed.

2. E. Lalor, “Conditions for the validity of the angular spectrum of plane waves,” J. Opt. Soc. Am. 58(9), 1235–1237 (1968). [CrossRef]

3. L. Onural, “Exact analysis of the effects of sampling of the scalar diffraction field,” J. Opt. Soc. Am. A 24(2), 359–367 (2007). [CrossRef]

4. L. Onural, A. Gotchev, H. Ozaktas, and E. Stoykova, “A survey of signal processing problems and tools in holographic three-dimensional television,” IEEE Trans. Circ. Syst. Video Tech. 17(11), 1631–1646 (2007). [CrossRef]

5. V. Katkovnik, J. Astola, and K. Egiazarian, “Discrete diffraction transform for propagation, reconstruction, and design of wavefield distributions,” Appl. Opt. 47(19), 3481–3493 (2008). [CrossRef] [PubMed]

6. N. Delen and B. Hooker, “Free-space beam propagation between arbitrarily oriented planes based on full diffraction theory: a fast Fourier transform approach,” J. Opt. Soc. Am. A 15(4), 857–867 (1998). [CrossRef]

7. J.-L. Kaiser, E. Quertemont, and R. Chevallier, “Light propagation in the pseudo-paraxial fresnel approximation,” Opt. Commun. 233(4-6), 261–269 (2004). [CrossRef]

8. E. Sziklas and A. Siegman, “Diffraction calculations using fast Fourier transform methods,” Proc. IEEE 62(3), 410–412 (1974). [CrossRef]

9. R. P. Muffoletto, J. M. Tyler, and J. E. Tohline, “Shifted Fresnel diffraction for computational holography,” Opt. Express 15(9), 5631–5640 (2007). [CrossRef] [PubMed]

10. F. Zhang, G. Pedrini, and W. Osten, “Reconstruction algorithm for high-numerical-aperture holograms with diffraction-limited resolution,” Opt. Lett. 31(11), 1633–1635 (2006). [CrossRef] [PubMed]

11. K. Matsushima and S. Nakahara, “Extremely high-definition full-parallax computer-generated hologram created by the polygon-based method,” Appl. Opt. 48(34), H54–H63 (2009). [CrossRef] [PubMed]

12. M. Frigo and S. G. Johnson, “The design and implementation of FFTW3,” Proc. IEEE 93(2), 216–231 (2005) (Special issue on “Program Generation, Optimization, and Platform Adaptation”). [CrossRef]

Reference calculation of light propagation between parallel planes of different sizes and sampling rates

Abstract

1. Introduction

2. One-dimensional case

3. Two-dimensional case

4. Theoretical time of computation

5. Implementation notes

6. Conclusion

Acknowledgments

References and links

Cited By

Figures (3)

Equations (14)

Optics Express