Ensemble convolutional neural network for classifying holograms of deformable objects

H. H. Lam; P. W. M. Tsang; T.-C. Poon

doi:10.1364/OE.27.034050

1. Introduction

A hologram is a holistic representation of a three-dimensional (3-D) image. Being different from stereoscopic imaging (3-D object image is encapsulated as the projected images on the left and the right eyes) which may lead to accommodation-vergence conflict [1] (i.e. unpleasant visual fatigue or even headaches), a hologram is capable of recording all the views of the object scene within the field of vision. A holographic imaging system is shown in Fig. 1. A hologram acquisition system, such as one based on optical scanning holography [2] or phase shifting holography [3], is used to capture digital holograms of physical objects. The digital hologram is displayed on a spatial light modulator (SLM), showing a virtual 3-D image with both intensity and depth perception. A hologram classifier is used to determine the identity of the hologram. Although holograms are superior in showing 3-D image as compared with 2-D photography and stereoscopic imaging, it is more difficult to identify their contents. The main reason is that a hologram is comprised of complicated fringe patterns that bear little resemblance to the object image. It is generally improbable to apply classical image classification techniques to obtain the identity of an object in a hologram.

Fig. 1. A holographic imaging system

Download Full Size | PDF

Attempts on image recognition through 2-D correlation of a source hologram/image and a set of reference holograms/images has been reported in [4–9]. A reference hologram/image and a hologram/image are considered to be of the same class if their correlation score is high enough. However, correlation score is sensitive to the pose and deformation of the object shapes. If a pair of holograms are representing 2 different deformation of the same object, their correlation score is very small. This problem has been overcome through adopting deep-learning for classifying holograms of deformable objects [10]. Deep-learning is commonly implemented with convolution neural network (CNN) which can be trained, after being presented with sufficient amount of examples, to identify important features that are representative of a given subject. Presumably, those features should also be invariant to reasonable degree of changes in the subject. In simple terms, a CNN is capable of generalizing its knowledge in a vast domain, based on its learning through a relatively much smaller set of examples. Apart from hologram classification, deep-learning has been applied in various problems in holography, such as the works reported in [11–14]. The method that is employed in [10] is referred as “deep learning invariant hologram classification” (DL-IHC). The magnitude components of a large collection of holograms is employed to train the CNN, which is then applied to classify the identities of unknown holograms based on their magnitude components. In [10], holograms of handwritten characters are employed to train, and to test the CNN. Although the method is feasible, the complexity of the CNN, and hence the computational resources involved, are high. In this paper, we have proposed a method known as “ensemble deep learning invariant hologram classification” (EDL-IHC) for increasing the success rate in hologram classification. In EDL-IHC, a pair of low complexity CNNs is used, one for classifying the magnitude component, and the other for classifying the phase component of a hologram. The CNN having a higher classification score will be taken as the identity of the input hologram. The complexity of our proposed CNN is about 4 times lower than the DL-IHC method in [10].

2. Proposed ensemble deep learning invariant hologram classification (EDL-IHC) method

Our proposed EDL-IHC based hologram classification method can be divided into 2 parts, as shown in Fig. 2. First, a pair of CNNs are trained with a set of augmented holograms, each corresponding to one of the reference objects that have been subject to certain degree of rigid and/or deformable transformation. Second, the deep-learning network is used to classifying input unknown holograms. The entire process of these 2 parts are described as follows.

Fig. 2. Training of the deep learning network in our proposed EDL-IHC method.

Download Full Size | PDF

2.1 Part 1: Training the CNNs for hologram classification

In this part, a large set of augmented holograms is generated, and applied to train a deep-learning network that is implemented with a pair of CNNs. One of the CNNs receive the magnitude component of the holograms as the input data, while the other accepts the phase component. The process in this part can be divided into 2 stages.

Stage 1: Data augmentation and training of the CNN

The images of the set of objects to be identified by the CNN is established. Different variations of each object image is generated by applying different degrees of Euclidean transformation and/or deformation, resulting in a set of augmented data set. We have assumed that the number of variations (i.e. the data augmentation factor) for all the reference object images are identical. Next, a Fresnel hologram is generated for each of the image in the augmented data set. Suppose ${I_{p;q}}({m,n} )$ denotes the images corresponding to the $qth$ variation of the $pth$ object, its Fresnel hologram can be computed from Fresnel diffraction [15] as

(1)$${H_{p;q}}({m,n} )= {I_{p;q}}({m,n} )^\ast h({m,n} ),$$

where $h({m,n} )= exp\left[ {\frac{{i2\pi }}{\lambda }\frac{{({{m^2}{\delta^2} + {n^2}{\delta^2}} )}}{{2z}}} \right]$ is the free space impulse response, with $\lambda ,$ $\delta ,$ and z being the wavelength of light, pixel size of the hologram, and the axial distance between the object image and the hologram plane, respectively.

Stage 2: Training the CNNs

After a hologram ${H_{p;q}}({m,n} )$ is generated, it is decomposed into the magnitude component ${M_{p;q}}({m,n} )= |{{H_{p;q}}({m,n} )} |,$ and the phase component ${P_{p;q}}({m,n} )= arg[{{H_{p;q}}({m,n} )} ]$ (where $arg[X ]$ denotes the phase angle of X). However, the phase component in angular representation has a Hamming cliff problem. For example, ${10^\circ }$ and ${359^\circ }$ have a large difference of ${349^\circ },$ but the physical angular separation is only ${11^\circ }.$ To avoid this problem, the phase component is represented by ${C_{p;q}}({m,n} )= cos[{{P_{p;q}}({m,n} )} ].$ The cosine function leads to loss of half of the phase information, but as will be shown later, the CNN is capable of classifying the hologram with high success rate with the remaining information. Next, ${M_{p;q}}({m,n} )$ and ${C_{p;q}}({m,n} )$ are taken to train the magnitude CNN, and the phase CNN, respectively. Training of these two CNNs are independent of each other. The structure of the CNN, which is identical for learning the magnitude and the phase components of the holograms, is shown in Fig. 3.

Fig. 3. Structure of the CNN adopted in our proposed method

Download Full Size | PDF

As CNN has been reported in numerous literature, e.g. [16], we only show its overall structure. Our CNN is comprised of 2 convolutional layers and a dense layer. Each convolutional layer includes a rectified linear units (ReLU) layer, and is coupled to a pooling layer and a dropout layer. The first and second convolutional layers are each comprised of a set of 2-D finite response (FIR) filters. Each filter comprises a $3 \times 3$ array of filter coefficients, the values of which are trained through a learning process. The dense layer, as well as the output dense layer are regular, fully connected neural networks for classifying holograms of similar images into their respective class according to the features extracted from the convolutional layers.

2.2 Part 2: Hologram classification

In the second part of operation, the deep-learning network that has been trained with the set of augmented holograms, are applied to classify unknown input holograms. As shown in Fig. 4, the magnitude and phase components of an unknown hologram are extracted, each being input to its corresponding CNN. The output of each CNN is a class identity (i.e. one of the K classes), together with a matching score reflecting the similarity of the unknown hologram with the identified class. Through an ensemble decision maker, the CNN that outputs a class identity with a higher matching score will be selected as the identity of the input hologram. From Fig. 4, it can be seen that the DL-IHC method proposed in [10] is equivalent to the magnitude convolutional neural network of the EDL-IHC.

Fig. 4. Proposed ensemble hologram classification method

Download Full Size | PDF

3. Experimental results

The holograms of handwritten numbers (0 to 9) are numerically generated to test our proposed method. To train the CNN, an augmented data set of 12500 images is generated from 1250 different variants of each number image. All the augmented images have identical size of 64 rows and 64 columns. Each variation may involve one or more of the following: Euclidean transform (rotation, scaling and translation on the x-y plane), shifting along the on-axis ‘z’ direction (i.e. depth between the image and the hologram plane), and non-rigid deformation to simulate deviations in handwriting style. The values of rotation, translation, scaling, and depth that are used to generate the augmented images are given in Table 1. For example, a set of augmented images of the handwritten characters are shown in Fig. 5.

Fig. 5. A set of augmented images of the 10 handwritten numbers.

Download Full Size | PDF

Table 1. Values of rotation, translation, scaling, and depth of the augmented images

View Table | View all tables in this article

Subsequently, Eq. (1) is applied to generate an augmented Fresnel hologram for each augmented image, with wavelength $\lambda = 540nm,$ and hologram pixel size $\delta = 6.4\mu m.$ We have assumed the dimension, and the pixel size of the image and hologram, to be identical. As an example, an augmented hologram of the handwritten character ‘5’ is shown in Fig. 6(a), while the magnitude and phase components of which are shown in Fig. 6(b) and 6(c), respectively.

Fig. 6. (a) image of handwritten number ‘5’, (b)(c) magnitude and phase components of the hologram of the handwritten number ‘5’ in (a), respectively.

Download Full Size | PDF

The magnitude and the phase components of 80% of the set of augmented holograms (i.e., 1000 out of 1250 augmented holograms for each number) are taken to train the CNNs. These set of augmented holograms is referred as in-training set. The rest of the augmented holograms (250 augmented holograms for each number) that is not used to train the CNNs is referred as the out-training set. Each CNN has the structure shown in Fig. 3, and with the configuration (i.e., the hyperparameters) shown in Table 2. The CNNs are implemented with the Keras application program interface (API). To reduce the complexity of the CNN, small number of simple $3 \times 3$ FIR filters are employed in both convolution layers. 50 training epochs is conducted, and in each epoch the holograms corresponding to a set of 10 augmented handwritten number image are selected from the training set, and applied to train the magnitude and the phase CNNs.

Table 2. Configuration of the CNN of the proposed EDL-IHC method

View Table | View all tables in this article

After the CNNs have been trained, the augmented holograms are used to test the success rates of our proposed EDL-IHC and the DL-IHC methods. The success rate is defined as the ratio of the number of holograms that have been correctly classified, versus the total number of holograms. As mentioned in Section 3, the DL-IHC is the magnitude convolutional neural network of the EDL-IHC. Hence, the success rates of DL-IHC will be taken from the success rates of the magnitude CNN of the EDL-IHC. The success rates in identifying the in-training set, the out-training set, and the full set (in-training and out-training sets) of holograms for the two methods, as well as the difference in the success rates achieved, are shown in Table 3. The success rates of EDL-IHC have been improved by 2.86% as compared with DL-IHC. For the EDL-IHC method, a success rate of 99.69% (i.e., a failure rate of 0.31%) is noted in the classification of the full set of hologram samples. Based on a commodity PC, the training and inference time of our proposed method are around 128ms and 1.3ms, respectively.

Table 3. Success rates for classifying in-training and out-training holograms with the EDL-IHC and the DL-IHC methods

View Table | View all tables in this article

4. Conclusion

The proposed EDL-IHC method is capable of classifying holograms of deformable handwritten numerals with over 99.6% success rate, based on a pair low complexity deep-learning networks. In comparison with [10], which also employs deep-learning network for hologram classification, our proposed method exhibits higher success rate with a reduction of about 4 times in complexity of the CNN. The computational resources that affects the training and inference times are reduced by a similar factor. Further research can be conduct by applying the EDL-IHC framework in classifying holograms of more complicated 3-D objects.

Disclosures

The authors declare no conflicts of interest.

References

1. C. Vienne, L. Sorin, L. Blondé, Q. Huynh-Thu, and P. Mamassian, “Effect of the accommodation-vergence conflict on vergence eye movements,” Vision Res. 100, 124–133 (2014). [CrossRef]

2. T.-C. Poon, Optical Scanning Holography with MATLAB(R) (Springer, New York, USA, 2007).

3. I. Yamaguchi and T. Zhang, “Phase-shifting digital holography,” Opt. Lett. 22(16), 1268–1270 (1997). [CrossRef]

4. D. Kumar and N. K. Nishchal, “Recognition of three-dimensional objects using joint fractional correlator and nonlinear joint fractional correlator with the help of digital Fresnel holography: a comparative study,” Opt. Rev. 22(2), 256–263 (2015). [CrossRef]

5. D. Kumar and N. K. Nishchal, “Three-dimensional object recognition using joint fractional Fourier transform correlators with the help of digital Fresnel holography,” Optik 126(20), 2690–2695 (2015). [CrossRef]

6. A. Alfalou and C. Brosseau, “Robust and discriminating method for face recognition based on correlation technique and independent component analysis model,” Opt. Lett. 36(5), 645–647 (2011). [CrossRef]

7. T. Kim and T.-C. Poon, “Three-dimensional matching by use of phase-only holographic information and the Wigner distribution,” J. Opt. Soc. Am. A 17(12), 2520–2528 (2000). [CrossRef]

8. T.-C. Poon and T. Kim, “Optical image recognition of three dimensional objects,” Appl. Opt. 38(2), 370–381 (1999). [CrossRef]

9. T. Kim and T.-C. Poon, “Extraction of 3-D location of matched 3-D object using power fringe-adjusted filtering and Wigner analysis,” Opt. Eng. 38(12), 2176–2183 (1999). [CrossRef]

10. H. Lam and P.W.M. Tsang, “Invariant Classification of Holograms of Deformable Objects Based on Deep Learning,” 28th IEEE Int’l Symp. on Ind’l Elec. (ISIE), Vancouver, Canada (2019).

11. R. Horisaki, R. Takagi, and J. Tanida, “Deep-learning-generated holography,” Appl. Opt. 57(14), 3859–3863 (2018). [CrossRef]

12. T. Pitkäaho, A. Manninen, and T. Naughton, “Focus prediction in digital holographic microscopy using deep convolutional neural networks,” Appl. Opt. 58(5), A202–A208 (2019). [CrossRef]

13. Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Sci. Appl. 7(2), 17141 (2018). [CrossRef]

14. C. Trujillo and J. Garcia-Sucerquiab, “Automatic detection and counting of phase objects in raw holograms of digital holographic microscopy via deep learning,” Opts. and Lasers in Engg. 120, 13–20 (2019). [CrossRef]

15. T.-C. Poon and J.-P. Liu, Introduction to Modern Digital Holography with MATLAB, Cambridge University Press, 9 (2014).

16. I. Zafar, G. Tzanidou, R. Burton, N. Patel, and L. Araujo, “Hands-On Convolutional Neural Networks with TensorFlow: Solve computer vision problems with modeling in Tensor Flow and Python”, Packt Publishing (2018).

	Values
Rotation (degree)	${- 30, - 15, 0, 15, 30}$
Translation (pixels)	${- 3, - 2, 0, 2, 3}$
Scaling	${0.80, 0.95, 1.25, 1.4}$
Depth from hologram (m)	${- 0.0016, - 0.0008, 0.0008, 0.0016}$

Hyperparameters	Values
Number of filters in 1^st convolution layer	4
Number of filters in 2^nd convolution layer	8
Kernel size of filters	$3 \times 3$
Stride size of 1^st convolution filters	$1 \times 1$
Stride size of 2^nd convolution filters	$2 \times 2$
Pooling size	$2 \times 2$
Dropout ratio of 1^st and 2^nd dropout layer	0.15
Dropout ratio of 3^rd dropout layer	0.25
Number of neurons in dense layer	12
Number of neurons in output dense layer	10 (one neuron for each of the 10 classes)

Success rate (%)	EDL-IHC	DL-IHC
In-training set	99.75	96.91
Out-training set	99.44	96.44
Full hologram set	99.69	96.82

	Values
Rotation (degree)	${- 30, - 15, 0, 15, 30}$
Translation (pixels)	${- 3, - 2, 0, 2, 3}$
Scaling	${0.80, 0.95, 1.25, 1.4}$
Depth from hologram (m)	${- 0.0016, - 0.0008, 0.0008, 0.0016}$

Hyperparameters	Values
Number of filters in 1^st convolution layer	4
Number of filters in 2^nd convolution layer	8
Kernel size of filters	$3 \times 3$
Stride size of 1^st convolution filters	$1 \times 1$
Stride size of 2^nd convolution filters	$2 \times 2$
Pooling size	$2 \times 2$
Dropout ratio of 1^st and 2^nd dropout layer	0.15
Dropout ratio of 3^rd dropout layer	0.25
Number of neurons in dense layer	12
Number of neurons in output dense layer	10 (one neuron for each of the 10 classes)

Ensemble convolutional neural network for classifying holograms of deformable objects

Abstract

1. Introduction

2. Proposed ensemble deep learning invariant hologram classification (EDL-IHC) method

2.1 Part 1: Training the CNNs for hologram classification

2.2 Part 2: Hologram classification

3. Experimental results

4. Conclusion

Disclosures

References

Cited By

Figures (6)

Tables (3)

Equations (1)

Optics Express