HORN-9: Special-purpose computer for electroholography with the Hilbert transform

Yota Yamamoto; Tomoyoshi Shimobaba; Tomoyoshi Ito

doi:10.1364/OE.471720

1. Introduction

Holography [1] can reconstruct the wavefront of light and provides all requirements for human depth perception [2]. Stereoscopic viewing that lacks depth cues can cause nausea and fatigue, resulting in a problem known as vergence accommodation conflicts (VAC). Holography is attracting attention as an ideal three-dimensional (3D) display method that does not cause VAC [3].

In holography, wavefronts are recorded as holograms using light interference. These wavefronts can be reconstructed using light diffraction. Holograms obtained by computer simulation of the light behavior are called computer-generated holograms (CGHs) [4]. Electroholography (EH) [5,6], which uses a spatial light modulator (SLM) to display hologram patterns, mainly uses CGH. Additionally, EH can reconstruct a 3D moving image by switching CGHs at high speed.

High-resolution SLMs and high computer performance are paramount factors in EH applications [7]. SLMs with a 1 µm operating wavelength have been developed [8,9], and their resolution is improving; 8K4K-size SLMs have also been developed [10], and the development of larger screens is feasible. Moreover, various approaches have been applied to accelerate CGH calculation, but sufficient performance has not been achieved.

In CGH methods, there is a tradeoff between computational speed and graphic representation. Examples of CGH methods are point cloud-based methods [11,12], polygon-based methods [13,14], layer-based methods [15,16], ray-based methods [17–19], and sparse-based methods [20,21]. Point cloud-based methods, which represent objects as point sources, have simple algorithms, and various acceleration approaches have been proposed; however, they have limitations in graphical representation [22].

In a cloud of $M$ points, a complex hologram $u_c(x_\alpha, y_\alpha )$ under the condition $z_j \gg x_j, y_j$ is expressed as follows:

(1)$$u_c\left(x_a,y_a\right)=\sum_{j=1}^{M}{A_j\exp{\left(i2\pi\theta_{aj}\right)}},$$

(2)$$\theta_{aj}=\rho_j\left(x_{aj}^2+y_{aj}^2\right),$$

where $\rho _j=1/\left (2\lambda \left |z_j\right |\right )$, $x_{aj}=x_a-x_j$, $y_{aj}=y_a-y_j$; $x_a$ and $y_a$ are coordinates on the CGH, $x_j$, $y_j$, and $z_j$ represent the coordinates of the point cloud, $A_j$ denotes the amplitude intensity of the point cloud, and $\lambda$ denotes the reference light’s wavelength. Complex holograms cannot be displayed at once in commercial displays but must be displayed with phase or amplitude distributions selected.

For phase-type holograms, the phase distribution is extracted using the following equation:

(3)$$u_p\left(x_a,y_a\right)=\tan^{{-}1}{\frac{\mathrm{Im}\{u_c\}}{\mathrm{Re}\{u_c\}}},$$

where $\mathrm {Re}\{u_c\}$ and $\mathrm {Im}\{u_c\}$ represent functions that extract the real and imaginary parts, respectively, from the complex distribution.

Amplitude holograms can be created by extracting the real part using $\mathrm {Re}\{u_c\}$. However, because of the computational cost, they can be calculated directly using Eq. (4) instead of Eq. (1), Eq. (2), and $\mathrm {Re}\{u_c\}$.

(4)$$u_a\left(x_a,y_a\right)=\sum_{j=1}^{M}{A_j\cos{\left[2\pi\theta_{aj}\right]}}.$$

In a point cloud-based CGH calculation, when the number of cloud points is $M$, and the numbers of horizontal and vertical pixels are $N_{\mathrm {x}}$ and $N_{\mathrm {y}}$, respectively, the computational complexity is $\mathrm {O}(MN_{\mathrm {x}}N_{\mathrm {y}})$. The phase-type CGH calculation is more complicated than the amplitude-type CGH calculation because of the trigonometric relationship; however, the phase-type is superior in diffraction efficiency [23].

We have developed special-purpose computers using field-programmable gate arrays (FPGAs) [24–35]. The special-purpose computers have been highly effective in accelerating calculations, achieving speedups of ~100 times faster than personal computers (PCs) (central procession units: CPUs). As special-purpose computers for CGH, we developed point cloud-based amplitude- and phase-type CGH computers. Compared with amplitude-type CGH, the performance of special-purpose computers for phase-type CGH is halved, attributable to more resources being required for computation and fewer compute cores for phase-type CGH, thus resulting in a longer computation time [31–33,35].

In this study, we implement the Hilbert transform [36] in a special-purpose computer architecture that computes phase-type CGH—holographic reconstruction (HORN)-9. The implementation of a special circuit hides the computation time required for the Hilbert transform and shows that phase-type CGH can be computed in the same computation time as amplitude-type CGH. The performance is not halved and is even more efficient than conventional calculations. We also demonstrate the real-time processing and display of 400,000 points on two HORN-9s.

2. Related research

Research on the acceleration of calculations for EH can be classified into software- and hardware-based approaches. In software-based approaches, there has been extensive research using lookup tables (LUTs) [11,12], which accelerate calculations by referencing tables that store precomputed results. Additionally, some approaches have been taken to reduce computational complexity by devising algorithms [36–39]. Moreover, modern computers are generally multicore, and the acceleration effect of parallel processing is high [40]. Even in software-based approaches, acceleration has been achieved by algorithms that focus on hardware characteristics and enable parallel processing. Hardware-based approaches that use computing resources include acceleration using many-core graphics processing units (GPUs) [41–43], FPGAs [24–33,35,44], and application-specific integrated circuits [45].

2.1 Recurrence relation algorithm

An recurrence relation algorithm [37] is a method that reduces computational complexity by exploiting the uniform pixel spacing characteristic of SLMs. It also has parallel processing suitability and can be further accelerated on multicore systems. We define the following recurrence relation algorithm:

(5)$$\mathit{\Gamma}_j=\frac{1}{\lambda{z_j}}=2\rho_j,$$

(6)$$\mathit{\Delta}_{0j}=\rho_j\left\{2\left(x_0-x_j\right)+1\right\}.$$

In the recurrence relation algorithm, initially, Eq. (2) as $\theta _{0j}$ is calculated. In the n-th $\theta _{nj}$ in the x-axis direction, $\theta _{nj}$ is formulated with the recurrence relation algorithm as follows:

(7)$$\theta_{nj}=\theta_{\left(n-1\right)j}+\mathit{\Delta}_{\left(n-1\right)j}.$$

Additionally, we update $\mathit {\Delta }_{nj}$ using the following equation:

(8)$$\mathit{\Delta}_{nj}=\mathit{\Delta}_{\left(n-1\right)j}+\mathit{\Gamma}_j.$$

We can calculate $\theta _{nj}$ by simply repeating Eqs. (7) and (8).

2.2 Hilbert transform

Efforts are underway to use Hilbert transforms to accelerate phase-type CGH calculations [36]. Hilbert transform can generate analytic functions from causal real-valued functions. That is, where computations are usually required in the order from Eqs. (1) and (2) to Eq. (3), a method has been proposed to recover its orthogonal CGH from an amplitude-type CGH and obtain a phase-type CGH. The method is to calculate in the order from Eq. (4) to Eq. (3). Hilbert transform can be computed independently row by row in parallel, further accelerating the process [36].

Hilbert transform using the one-dimensional fast Fourier transform (FFT) is expressed as follows:

(9)$$\hat{h}\left(x\right)=\mathrm{FFT}^{{-}1}\left[\mathrm{FFT}\left[u_a\left(x\right)\right]H\left(f\right)\right],$$

(10)$$H\left(f\right)= \begin{cases}1 &{(f=0),} \\ 1/2 &(f < W), \\ 0 & (\text{othewise}).\end{cases}$$

FFT and $\mathrm {FFT}^{-1}$ denote the forward and inverse transformations of the one-dimensional FFT, respectively; $W$ denotes the width of the image. The Hilbert transform can generate complex holograms from amplitude-type CGH. From the complex hologram obtained using the Hilbert transform, the phase CGH can be calculated by extracting the phase distribution using Eq. (3).

2.3 GPU

GPUs are often used as hardware-based approaches: GPUs have 10,000~ computation cores and achieve high acceleration through massively parallel computation with many cores [41–43]. One severe problem of acceleration using GPUs is power consumption. A single card consumes nearly 400 W.

2.4 HORN

HORN is a special-purpose computer for holography using an integrated circuit, digital signal processor (DSP), and FPGA [24–32]. HORN-8 [31] is a cluster system comprising eight dedicated boards that calculate $1,920 \times 1,080$ pixel amplitude CGH at 60 fps using the recurrence relation algorithm from 65,000 cloud points. Moreover, the performance of a phase-type CGH calculation under the same conditions was approximately 30 fps, half the performance [32]. To compute the phase-type CGH, it is necessary to obtain a complex hologram (computation of real and imaginary parts) and compute the phase [Eqs. (1)–(3)]. The amplitude-type CGH can be obtained from the real part only [Eq. (4)]. Compared with the amplitude-type CGH, the phase-type CGH requires twice the arithmetic circuit resources because of the requirement of real and imaginary part calculations. Furthermore, a huge argument table memory [Eq. (3)] is required to calculate the phase-type CGH.

In a previous study [32], focusing on the dependence of calculations, the issue of argument table memory capacity was addressed by installing one argument table memory for multiple calculation cores. However, the circuit configuration became more complex, and the lower operating frequency and increase in resources could not be completely suppressed. HORN-8 has one FPGA for control and seven FPGAs for arithmetic. The amplitude-type CGH has a maximum of 640 parallel stages with one FPGA for arithmetic, and the phase-type CGH has half the number of parallel stages—320. When the number of parallel computation cores is halved, the performance is also halved. A single HORN-8 board, computing phase-type CGH, had the same computational performance as a GPU (NVIDIA Geforce GTX 1080 Ti) [32]. The power consumption has not been verified.

3. Hardware design and implementation

HORN-9, the next-generation special-purpose computer for EH, newly applies the Hilbert transform for phase-type CGH calculations and uses the Xilinx Alveo U250 data center accelerator board (U250) released by Xilinx, Inc. U250 is an FPGA board, which is connected to a PC and used as an expansion board. As shown in Fig. 1, we developed a system with four U250 boards connected to a PC.

Fig. 1. HORN-9 system: PC connected with four U250s (red devices). The markings on the devices surfaces differ because the devices were purchased at different times but all cards have the same specifications.

Download Full Size | PDF

Figure 2 shows a block diagram of the special-purpose computation circuit implementation on the U250. The circuit comprises the recurrence relation unit (RRU) [Eqs. (2), (4)–(8)] that computes amplitude-type CGH, and the Hilbert transform unit (HTU) that performs the Hilbert transform [Eqs. (9) and (10)] and phase transform [Eq. (3)]. A pair of RRUs and HTUs was used to compute one column of CGH. Equipped with 10 units that compute a single column, the CGH is divided vertically into 10 parts, and each part is computed in parallel. One feature of the circuit configuration is that the RRU and HTU sandwich a RAM (CGH RAM) that temporarily stores CGH. The RRUs and HTUs operate simultaneously to hide the computation time required for the Hilbert transform. If the RRU computation time is larger than the HTU computation time, the HTU computation time can be hidden and the phase-type CGH can be computed with only the RRU computation time (amplitude-type CGH computation time).

Fig. 2. Top block diagram of the special calculation circuit. MUX stands for multiplexer.

Download Full Size | PDF

Figure 3 shows a block diagram of the RRU, comprising the basic phase unit [Eqs. (2), (4)–(6)] that calculates the initial phase in the recurrence relation algorithm and additional phase unit [Eqs. (7) and (8)] that computes the additive phase. The recurrence relation algorithm is applied to the x-axis of CGH to compute $N = 1,920$ pixels at a time for a $1,920 \times 1,080$-pixel CGH. $A_j$ is fixed to 1. $\rho _j$ is a value precalculated by the CPU. Fixed-point arithmetic is often used in special-purpose computers. The bit length required for EH has been verified in a previous study [31]; the same value is used in this study. The value next to the shaded line in the figure is the bit length, and “COS” is the LUT that performs the cosine calculation, which is six-bit long.

Fig. 3. Block diagram of the RRU. MSB stands for most significant bit.

Download Full Size | PDF

Figure 4 shows a block diagram of the HTU, which transforms the output $I_{x_a}\in \{0, 1\}$ of the RRU into the amplitude distribution ${I'}_{x_a}\in \{-128, 127\}$ as input to the Hilbert transform unit. FFT/IFFT in Fig. 4 is a unit that performs FFT and inverse FFT [Eq. (9)], which was provided by Xilinx Fast Fourier Transform Version 9.1 (FFT IP) [46]. The amplitude distribution after the FFT is multiplied by an internally generated Hilbert transform $H(f)$ [Eq. (10)] and inverse FFT is performed on all rows. In Fig. 4,“ATAN” is the LUT for calculating the arctangent [Eq. (3)], and as in the previous study [32], it is a 10-bit LUT with five bits each for the real and imaginary parts. To enable the computation with a 10-bit table reference, normalization is performed in the Normalization Unit. The obtained phase distribution $\mathit {{\Phi }_{x_a}}$ is output to CGH RAM, and the data in the CGH RAM are copied by the CPU to the frame buffer for screen output.

Fig. 4. Block diagram of HTU.

Download Full Size | PDF

Table 1 shows the resource utilization of the FPGA. The operating frequency is 250 MHz; BRAM and URAM are memory resources and are used as point cloud RAM and CGH RAM, respectively. The maximum number of cloud points that can be computed simultaneously is $2^{19}$, and the CGH size is $2^{24}\ (\approx 3,840 \times 2,160)$ pixels. The resource utilization that yields the best operating frequency is 70%–80%, according to a Xilinx paper [47]. Therefore, for U250, 10 parallels were the upper limit. In Table 1, the resource occupancy of HTUs is less than 1%. The resources are occupied by RRUs; the HTU resources are almost negligible.

Table 1. FPGA resource usage of a phase-type CGH computation circuit using Hilbert transform.

View Table | View all tables in this article

4. Performance

The time required to compute a $1,920 \times 1,080$-pixel CGH from a cloud of 65,000 points on each computer was measured (Table 2). The CPU was an Intel i9-10980XE (18 cores, 4.80 GHz), the compiler was icpc (Intel C++ Compiler) 2021.5.0 20211109, and the FFT library was FFTW 3.3.8 [48]. The GPU was an NVIDIA GeForce RTX 3080 Ti (10,240 CUDA cores, GDDR6X 12 GB, 1,785 MHz); the compiler was CUDA V11.6, and gcc GNU 9.3.0; and the FFT library was CUDA enclosed cuFFT [49].

Table 2. Comparison of phase-type CGH computation times (single-node/single-board).

View Table | View all tables in this article

We compared CPU and GPU computation times using the direct calculation [Eqs. (1)–(3)], recurrence relation algorithm [Eqs. (1)–(3), (5)–(8)], and Hilbert transform [Eqs. (2)–(10)], which are acceleration methods for parallel computing. Besides the commonly used float32 precision for CPUs, comparisons were made for int32 precision. The results showed that the Hilbert transform was the fastest on the CPU, whereas the direct calculation was the fastest on the GPU. Because GPUs are optimized for single instruction, multiple data (SIMD) calculations, the direct calculation, which is more suitable for SIMD calculations, was faster than both the recurrence relation algorithm and Hilbert transform, which reduced the computational complexity.

The theoretical computation time for HORN-9 is equivalent to the time it takes to compute the RRU [Eqs. (2), (4)–(8)] when the computation time for the Hilbert transform is completely hidden and is expressed as follows:

(11)$$T_{\text{logic}}[\text{s}] \approx \frac{K \times M}{P \times f},$$

where $K$ denotes the total number of pixels in the CGH, $M$ denotes the number of cloud points, $P$ is the number of pixels that single HORN-9 can process in parallel, and $f$ is the operating frequency. When $K = N_{\mathrm {x}}N_{\mathrm {y}} = 2~\text {Mega pixels}$ and $M=65,000~\text {points}$, $T_{\text {logic}}=0.03$ s because our system can process $P=19,200~\text {pixels}$ (10 units $\times 1,920$ (single RRU)) and operates at $f=250~\text {MHz}$. Table 2 shows that the computation time agrees with the theoretical value. In fact, the computation time for RRU only (computation time for amplitude-type CGH) is also 0.03 s, which hides the time required for HTU, and the phase-type CGH can be computed in the same computation time as the amplitude-type CGH.

Real-time EH systems require a performance of 10 fps or higher for smooth moving images. HORN-9 has demonstrated satisfactory performance with a single hologram calculated in 0.030 s (33 fps), which is 8, 7, and 170 times more efficient than HORN-8, a GPU, and a CPU, respectively.

Power consumption is an essential practical factor. The power consumptions of HORN-9, CPU, and GPU are 130, 165, and 370 W, respectively. Therefore, the proposed architecture is power efficient.

The computation times for each number of cloud points for a $1,920 \times 1,080$-pixel phase-type CGH calculation on a CPU (Hilbert transform), GPU (direct calculation), one HORN-9, two HORN-9s, and four HORN-9s are illustrated (Fig. 5). Unlike CPUs and GPUs, HORN-9 does not have a linear computation time. This is a characteristic of special-purpose computers. It takes 0.027 s for a single card, which is constant up to 60,000 points. This is equivalent to the time required for the Hilbert transform (HTU computation time). The computational complexity of the Hilbert transform is $\mathrm {O}(N_{\mathrm {y}}\times N_{\mathrm {x}}\log {N_\mathrm {x}})$ and is constant regardless of the number of cloud points. When the amount of data (number of cloud points) is less, the computation time of the RRU is short and the computation time of the HTU cannot be hidden; the HTU computation time directly becomes the overall computation time. In the proposed architecture, the RRU computation time exceeds the HTU computation time for more than 60,000 points, and the HTU computation time is hidden.

Fig. 5. Computation time of phase-type CGH versus number of cloud points on different computers.

Download Full Size | PDF

Using multiple units of HORN-9 further accelerates the time required for CGH calculations. Figure 5 shows that the time required to produce a $1,920 \times 1,080$-pixel image from 65,000 points is 0.015 s (67 fps) with two HORN-9 units and 0.008 s (125 fps) with four units, achieving an acceleration of 600 times with four units compared with a single CPU.

5. Image quality and optically demonstration

Evaluating the image quality of reproduced images strictly is difficult because of the effects of point cloud overlap and speckle noise. In this study, we simply compared simulated images reconstructed from CGH using the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM), which are image quality evaluation indices. The reconstructed simulation conditions are a $1,920 \times 1,080$-pixel image, a wavelength of $\lambda$ = 532 nm, and an SLM pixel pitch of 8.0 µm. The simulated reconstructed image of CGH, calculated using Eqs. (1)–(3) with float32 accuracy, was used as the ground truth and compared with the reconstructed image of CGH calculated by HORN-9. The results showed that the PSNR and SSIM were 44 dB and 0.97, respectively. Note that the PSNR and SSIM of the previous system HORN-8 were similarly 44 dB and 0.97, respectively. The simulated reconstructed image is shown in Fig. 6. Figure 7 shows a reconstructed image with the optical system under the same conditions; no difference could be seen by the naked eye. We confirmed that the reproduced image was equivalent to that calculated by the CPU.

Fig. 6. Comparison of CGH reconstructed simulated images: (a) Overview of the original point cloud, (b) simulated reconstructed images with CGH computed by CPU, (c) simulated reconstructed images with CGH computed using HORN-8, and (d) simulated reconstructed images with CGH computed by HORN-9.

Download Full Size | PDF

Fig. 7. Optically reconstructed images of CGH: (a) computed by CPU and (b) computed by HORN-9.

Download Full Size | PDF

For optical reproduction of large point clouds, the spatiotemporal division method is used [50,51]. There is an upper limit for the number of cloud points that can be displayed in the SLM. Therefore, the large point cloud is divided into several smaller groups, and CGHs are calculated for each. Although only the divided point cloud is reconstructed from each CGH, by switching CGHs on the SLM at high speed, the reconstructed images are synthesized by the human eye due to the afterimage effect and can be observed as if it were the point cloud before division. Such a method is called the spatiotemporal division method. When a point cloud is divided into six segments, six frames are used to represent one object; on a monitor updated at 60 fps, the entire point cloud appears to be running at 10 fps.

HORN-9 performance is so efficient that two units can reproduce a cloud of 400,000 points in six segments in spatiotemporal real time. Figure 8 shows an optical reproduction of a cloud of 400,000 points calculated by HORN-9 using the spatiotemporal division method.

Fig. 8. Reproduced image of 400,000 points using the spatiotemporal division method (Visualization 1). The camera shutter speed (10 fps) is that of the human eye.

Download Full Size | PDF

6. Conclusion and future work

In this study, we proposed HORN-9, which is capable of phase- and amplitude-type CGH calculations. HORN-9 was faster than the previous HORN-8 due to two technological advances: doubling the number of RRUs through a Hilbert transform circuit and quadrupling the number of RRUs through a more careful design of the RRU circuit. As a result, HORN-9 was eight times faster overall than HORN-8. The performance is that a $1,920 \times 1,080$-pixel CGH from a cloud of 65,000 points can be computed at 0.030 s (33 fps) with a single card and 0.008 s (125 fps) using four cards. It is 8, 7, and 170 times more efficient than HORN-8, the GPU, and the CPU, respectively. Four boards were up to 600 times more efficient than a single CPU. It also has superior watt performance.

The time required for the Hilbert transform was successfully hidden by the ingenuity of the circuit configuration. Phase-type CGH calculations, which are generally considered more computationally intensive than amplitude-type CGH calculations, can be performed with the same resource scale and in the same computation time as amplitude-type CGH calculations. The reconstructed images are equivalent to those computed on a CPU. Only a special-purpose computer can construct a data line that hides the Hilbert transform, demonstrating the superiority of phase-type CGH computation over CPU and GPU.

In the future, we will develop a dedicated board with multiple FPGA chips on a single board like HORN-8. Additionally, although the CGH size that can be computed is currently limited to 4K $\times$ 2K-pixels, we will attempt large-scale hologram computations using external memory.

Funding

Japan Society for the Promotion of Science (19H01097, 21K21294).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. D. Gabor, “A new microscopic principle,” Nature 161(4098), 777–778 (1948). [CrossRef]

2. S. Reichelt, R. Häussler, G. Fütterer, and N. Leister, “Depth cues in human visual perception and their realization in 3D displays,” in Three-Dimensional Imaging, Visualization, and Display 2010 and Display Technologies and Applications for Defense, Security, and Avionics IV, vol. 7690B. Javidi, J.-Y. Son, J. T. Thomas, and D. D. Desjardins, eds., International Society for Optics and Photonics (SPIE, 2010), pp. 92–103.

3. G. Kramida, “Resolving the Vergence-Accommodation Conflict in Head-Mounted Displays,” IEEE Trans. Visual. Comput. Graphics 22(7), 1912–1931 (2016). [CrossRef]

4. A. W. Lohmann and D. P. Paris, “Binary Fraunhofer Holograms, Generated by Computer,” Appl. Opt. 6(10), 1739–1748 (1967). [CrossRef]

5. P. St-Hilaire, S. A. Benton, M. E. Lucente, M. L. Jepsen, J. Kollin, H. Yoshikawa, and J. S. Underkoffler, “Electronic display system for computational holography,” in Practical Holography IV, vol. 1212S. A. Benton, ed., International Society for Optics and Photonics (SPIE, 1990), pp. 174–182.

6. N. Hashimoto, S. Morokawa, and K. Kitamura, “Real-time holography using the high-resolution LCTV-SLM,” in Practical Holography V, vol. 1461S. A. Benton, ed., International Society for Optics and Photonics (SPIE, 1991), pp. 291–302.

7. M. Lucente, “Interactive three-dimensional holographic displays,” SIGGRAPH Comput. Graph. 31(2), 63–67 (1997). [CrossRef]

8. K. Aoshima, N. Funabashi, K. Machida, Y. Miyamoto, K. Kuga, T. Ishibashi, N. Shimidzu, and F. Sato, “Submicron Magneto-Optical Spatial Light Modulation Device for Holographic Displays Driven by Spin-Polarized Electrons,” J. Disp. Technol. 6(9), 374–380 (2010). [CrossRef]

9. Y. Isomae, Y. Shibata, T. Ishinabe, and H. Fujikake, “Design of 1-µm-pitch liquid crystal spatial light modulators having dielectric shield wall structure for holographic display with wide field of view,” Opt. Rev. 24(2), 165–176 (2017). [CrossRef]

10. K. Wakunami, P.-Y. Hsieh, R. Oi, T. Senoh, H. Sasaki, Y. Ichihashi, M. Okui, Y.-P. Huang, and K. Yamamoto, “Projection-type see-through holographic three-dimensional display,” Nat. Commun. 7(1), 12954 (2016). [CrossRef]

11. M. E. Lucente, “Interactive computation of holograms using a look-up table,” J. Electron. Imaging 2(1), 28 (1993). [CrossRef]

12. S.-C. Kim and E.-S. Kim, “Effective generation of digital holograms of three-dimensional objects using a novel look-up table method,” Appl. Opt. 47(19), D55–D62 (2008). [CrossRef]

13. K. Matsushima, M. Nakamura, and S. Nakahara, “Silhouette method for hidden surface removal in computer holography and its acceleration using the switch-back technique,” Opt. Express 22(20), 24450–24465 (2014). [CrossRef]

14. J.-P. Liu and H.-K. Liao, “Fast occlusion processing for a polygon-based computer-generated hologram using the slice-by-slice silhouette method,” Appl. Opt. 57(1), A215–A221 (2018). [CrossRef]

15. J.-S. Chen and D. P. Chu, “Improved layer-based method for rapid hologram generation and real-time interactive holographic display applications,” Opt. Express 23(14), 18143–18155 (2015). [CrossRef]

16. H. Zhang, L. Cao, and G. Jin, “Computer-generated hologram with occlusion effect using layer-based processing,” Appl. Opt. 56(13), F138–F143 (2017). [CrossRef]

17. T. Yatagai, “Stereoscopic approach to 3-D display using computer-generated holograms,” Appl. Opt. 15(11), 2722–2729 (1976). [CrossRef]

18. K. Wakunami, H. Yamashita, and M. Yamaguchi, “Occlusion culling for computer generated hologram based on ray-wavefront conversion,” Opt. Express 21(19), 21811–21822 (2013). [CrossRef]

19. H. Zhang, Y. Zhao, L. Cao, and G. Jin, “Fully computed holographic stereogram based algorithm for computer-generated holograms with accurate depth cues,” Opt. Express 23(4), 3901–3913 (2015). [CrossRef]

20. T. Shimobaba and T. Ito, “Fast generation of computer-generated holograms using wavelet shrinkage,” Opt. Express 25(1), 77–87 (2017). [CrossRef]

21. D. Blinder and P. Schelkens, “Accelerated computer generated holography using sparse bases in the STFT domain,” Opt. Express 26(2), 1461–1473 (2018). [CrossRef]

22. P. W. M. Tsang, T.-C. Poon, and Y. M. Wu, “Review of fast methods for point-based computer-generated holography,” Photonics Res. 6(9), 837–846 (2018). [CrossRef]

23. L. B. Lesem, P. M. Hirsch, and J. A. Jordan, “The kinoform: a new wavefront reconstruction device,” IBM J. Res. Dev. 13(2), 150–155 (1969). [CrossRef]

24. T. Ito, T. Yabe, M. Okazaki, and M. Yanagi, “Special-purpose computer HORN-1 for reconstruction of virtual image in three dimensions,” Comput. Phys. Commun. 82(2-3), 104–110 (1994). [CrossRef]

25. T. Ito, H. Eldeib, K. Yoshida, S. Takahashi, T. Yabe, and T. Kunugi, “Special-purpose computer for holography HORN-2,” Comput. Phys. Commun. 93(1), 13–20 (1996). [CrossRef]

26. T. Shimobaba, N. Masuda, T. Sugie, S. Hosono, S. Tsukui, and T. Ito, “Special-purpose computer for holography HORN-3 with PLD technology,” Comput. Phys. Commun. 130(1-2), 75–82 (2000). [CrossRef]

27. T. Shimobaba, S. Hishinuma, and T. Ito, “Special-purpose computer for holography HORN-4 with recurrence algorithm,” Comput. Phys. Commun. 148(2), 160–170 (2002). [CrossRef]

28. T. Ito, N. Masuda, K. Yoshimura, A. Shiraki, T. Shimobaba, and T. Sugie, “Special-purpose computer HORN-5 for a real-time electroholography,” Opt. Express 13(6), 1923–1932 (2005). [CrossRef]

29. Y. Ichihashi, H. Nakayama, T. Ito, N. Masuda, T. Shimobaba, A. Shiraki, and T. Sugie, “HORN-6 special-purpose clustered computing system for electroholography,” Opt. Express 17(16), 13895–13903 (2009). [CrossRef]

30. N. Okada, D. Hirai, Y. Ichihashi, A. Shiraki, T. Kakue, T. Shimababa, N. Masuda, and T. Ito, “Special-purpose computer HORN-7 with FPGA technology for phase modulation type electro-holography,” Proceedings of the International Display Workshops3, 1284–1287 (2012).

31. T. Sugie, T. Akamatsu, T. Nishitsuji, R. Hirayama, N. Masuda, H. Nakayama, Y. Ichihashi, A. Shiraki, M. Oikawa, N. Takada, Y. Endo, T. Kakue, T. Shimobaba, and T. Ito, “High-performance parallel computing for next-generation holographic imaging,” Nat. Electron. 1(4), 254–259 (2018). [CrossRef]

32. T. Nishitsuji, Y. Yamamoto, T. Sugie, T. Akamatsu, R. Hirayama, H. Nakayama, T. Kakue, T. Shimobaba, and T. Ito, “Special-purpose computer HORN-8 for phase-type electro-holography,” Opt. Express 26(20), 26722–26733 (2018). [CrossRef]

33. Y. Yamamoto, N. Masuda, R. Hirayama, H. Nakayama, T. Kakue, T. Shimobaba, and T. Ito, “Special-purpose computer for electroholography in embedded systems,” OSA Continuum 2(4), 1166–1173 (2019). [CrossRef]

34. Y. Yamamoto, S. Namba, T. Kakue, T. Shimobaba, T. Ito, and N. Masuda, “Special-purpose computer for digital holographic high-speed three-dimensional imaging,” Opt. Eng. 59(05), 1 (2020). [CrossRef]

35. Y. Yamamoto, T. Shimobaba, H. Nakayama, T. Kakue, N. Masuda, and T. Ito, “System-on-a-chip-based special-purpose computer for phase electroholography,” OSA Continuum 3(12), 3407–3415 (2020). [CrossRef]

36. T. Shimobaba, T. Kakue, Y. Yamamoto, I. Hoshi, H. Shiomi, T. Nishitsuji, N. Takada, and T. Ito, “Hologram generation via Hilbert transform,” OSA Continuum 3(6), 1498–1503 (2020). [CrossRef]

37. T. Shimobaba and T. Ito, “An efficient computational method suitable for hardware of computer-generated hologram with phase computation by addition,” Comput. Phys. Commun. 138(1), 44–52 (2001). [CrossRef]

38. T. Shimobaba, N. Masuda, and T. Ito, “Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane,” Opt. Lett. 34(20), 3133–3135 (2009). [CrossRef]

39. A. Symeonidou, D. Blinder, A. Munteanu, and P. Schelkens, “Computer-generated holograms by multiple wavefront recording plane method with occlusion culling,” Opt. Express 23(17), 22149–22161 (2015). [CrossRef]

40. Y. Wang, D. Dong, P. J. Christopher, A. Kadis, R. Mouthaan, F. Yang, and T. D. Wilkinson, “Hardware implementations of computer-generated holography: a review,” Opt. Eng. 59(10), 102413 (2020). [CrossRef]

41. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express 14(2), 603–608 (2006). [CrossRef]

42. B. J. Jackin, S. Watanabe, K. Ootsu, T. Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai, and T. Baba, “Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster,” Appl. Opt. 57(12), 3134–3145 (2018). [CrossRef]

43. H. Sannomiya, N. Takada, K. Suzuki, T. Sakaguchi, H. Nakayama, M. Oikawa, Y. Mori, T. Kakue, T. Shimobaba, and T. Ito, “Real-time spatiotemporal division multiplexing electroholography for 1, 200, 000 object points using multiple-graphics processing unit cluster,” Chin. Opt. Lett. 18(7), 070901 (2020). [CrossRef]

44. P. W. M. Tsang, J. P. Liu, T. C. Poon, and K. W. K. Cheung, “Fast generation of hologram sub-lines based on field programmable gate array,” in Digital Holography and Three-Dimensional Imaging, (Optical Society of America, 2009), p. DWC2.

45. Y.-H. Seo, Y.-H. Lee, and D.-W. Kim, “ASIC chipset design to generate block-based complex holographic video,” Appl. Opt. 56(9), D52–D59 (2017). [CrossRef]

46. “Xilinx Product Specication LogiCORE IP Fast Fourier Transform v7.1,” https://www.xilinx.com/support/documentation/ip_documentation/xfft_ds260.pdf.

47. “UltraFast Design Methodology Timing Closure Quick Reference Guide (UG1292),” https://docs.xilinx.com/v/u/en-US/ug1292-ultrafast-timing-closure-quick-reference.

48. “FFTW,” http://www.fftw.org/.

49. “cuFFT,” https://developer.nvidia.com/cufft.

50. N. Takada, M. Fujiwara, C. Ooi, Y. Maeda, H. Nakayama, T. Kakue, T. Shimobaba, and T. Ito, “High-Speed 3-D Electroholographic Movie Playback Using a Digital Micromirror Device,” IEICE Trans. Electron. E100.C(11), 978–983 (2017). [CrossRef]

51. Y. Yamamoto, H. Nakayama, N. Takada, T. Nishitsuji, T. Sugie, T. Kakue, T. Shimobaba, and T. Ito, “Large-scale electroholography by HORN-8 from a point-cloud model with 400, 000 points,” Opt. Express 26(26), 34259–34265 (2018). [CrossRef]

Resource	Utilization	Available	Utilization [%]
LUT	1,296,457	1,728,000	75
Flip Flop	2,567,896	3,456,000	74
BRAM	265.5 (9 Mb)	2,688 (94.5 Mb)	10
URAM	544 (155 Mb)	1,280 (360 Mb)	43
DSP	9,765	12,288	79

Computational	Computation algorithm	Time per	Frame
hardware	(accuracy)	holograms [s]	rate [fps]
FPGA	HORN-9 (fixed) : proposed	0.030	33
FPGA	HORN-8 (fixed)	0.243	4.1
GPU	Direct calculation (float32)	0.218	4.6
	Recurrence relation algorithm (float32)	0.236	4.2
	Hilbert transform (float32)	0.226	4.4
CPU	Direct calculation (float32)	10.699	0.093
	Direct calculation (int32)	8.192	0.12
	Recurrence relation algorithm (float32)	7.976	0.13
	Recurrence relation algorithm (int32)	8.183	0.12
	Hilbert transform (float32)	5.102	0.20

Resource	Utilization	Available	Utilization [%]
LUT	1,296,457	1,728,000	75
Flip Flop	2,567,896	3,456,000	74
BRAM	265.5 (9 Mb)	2,688 (94.5 Mb)	10
URAM	544 (155 Mb)	1,280 (360 Mb)	43
DSP	9,765	12,288	79

Computational	Computation algorithm	Time per	Frame
hardware	(accuracy)	holograms [s]	rate [fps]
FPGA	HORN-9 (fixed) : proposed	0.030	33
FPGA	HORN-8 (fixed)	0.243	4.1
GPU	Direct calculation (float32)	0.218	4.6
	Recurrence relation algorithm (float32)	0.236	4.2
	Hilbert transform (float32)	0.226	4.4
CPU	Direct calculation (float32)	10.699	0.093
	Direct calculation (int32)	8.192	0.12
	Recurrence relation algorithm (float32)	7.976	0.13
	Recurrence relation algorithm (int32)	8.183	0.12
	Hilbert transform (float32)	5.102	0.20

HORN-9: Special-purpose computer for electroholography with the Hilbert transform

Abstract

1. Introduction

2. Related research

2.1 Recurrence relation algorithm

2.2 Hilbert transform

2.3 GPU

2.4 HORN

3. Hardware design and implementation

4. Performance

5. Image quality and optically demonstration

6. Conclusion and future work

Funding

Disclosures

Data availability

References

Supplementary Material (1)

Data availability

Cited By

Figures (8)

Tables (2)

Equations (11)

Optics Express