Edge detection in single multimode fiber imaging based on deep learning

Guohua Wu; Guohua Wu; Zhixiong Song; Min Hao; Longfei Yin

doi:10.1364/OE.464492

1. Introduction

Medical images have become an important basis and means for clinical medical diagnosis, pathological analysis, and treatment [1,2]. Edge detection [3] is one of the key technologies in medical image processing, and its purpose is to determine the edges of the target in images under the noise backgrounds [4]. The performance of edge detection determines the results of medical image analysis and even affects the subsequent treatment process. In view of the problem of poor edge detection in the field of endoscopy, a new edge detection method is proposed in this article.

Currently, edge detection in endoscopic imaging is usually performed by morphological methods or using edge detection operators directly [5–7]. In 2005, B.V.Dhandra et al. proposed a method of edge detection for endoscopic images by using morphological watershed segmentation technology [7]. In 2010, M. Häfner et al. got edges from endoscopic using Canny [6]. Jamil A. M. Saif et al. concluded that the Canny algorithm is more suitable for endoscopic images than Sobel and Prewitt in 2016 [5]. Their common disadvantage is that the result of edge detection will be poor if the image has a lot of noise. However, when using an endoscope based on a single multimode fiber (MMF), it is not easy to obtain a clear enough image [8]. Although some optimization algorithms have been proposed, such as compressive sensing (CS) [9,10], an ideal image still needs sufficient measurements [11] and keeping the fiber as unbent as possible.

In this paper, a new framework is proposed for high-quality edge detection in MMF imaging at low sampling rates. The method mainly uses deep learning (DL) [12,13], a machine learning technology widely used in various fields. In the field of multimode fiber, DL is used for image transmission [14], image classification [15], high-definition imaging [16] and so on. This method does not require image reconstruction and then edge detection, but directly obtains the edge of the target by the light intensity sequence. Compared with the edge detection method using an edge extraction operator from CS reconstructed images, DL can significantly reduce the number of the measurements and has a better edge detection result. We also demonstrate that DL is superior to CS in terms of robustness against fiber bending. The analysis results show that the edge detection results of this method are excellent in the disturbed environment, which is very helpful for practical application.

2. Method

2.1 Principle of edge detection using neural network

The experiment is divided into two steps. The first step is the calibration stage, where the speckle passing through the MMF is collected by the CMOS. The collected speckles are denoted by ${I_m}(x,y)$, where $m = 1,2,\ldots ,M$. The second step is the measurement stage, in which a bucket detector with no spatial resolution is used to receive the light intensity transmitted through the target. The object is denoted as $T(x,y)$ and the light field in the measurement stage is the same as that in the calibration stage. Therefore, the light intensity sequence collected by the bucket detector can be expressed as:

(1)$${S_m} = \;\int {{I_m}(x,y)T(x,y)dxdy} . $$

In edge detection in MMF imaging using CS and edge detection operator, two steps are needed. The first step is to reconstruct the image with CS, and the second step is to extract the edge from the reconstructed image with the edge extraction operator. In this paper, the Canny operator with excellent performance is used for comparison. First, an image reconstructed by CS can be viewed as finding a solution:

(2)$$\widetilde T{(x,y)_{CS}} = {T^\prime }(x,\;y), $$

which minimizes the ${L_1}$ norm in the sparse basis

(3)$$\arg \min ||\Psi \{{{T^\prime }(x,y)} \}|{|_{{L_1}}}, $$

subject to

(4)$$\int {{I_m}({x,y} ){T^\prime }({x,y} )dxdy = {S_m}} ,\forall m = 1,2,\;\;.\;.\;.\;\;,\;M, $$

where $\Psi $ is the transform operator to the sparse basis [12]. The edge detection imaging can be expressed as:

(5)$${E_{CS}} = \nabla {T^{canny}}{(x,y)_{CS}}. $$

Here, $\nabla {T^{canny}}{(x,y)_{CS}}$ is the edge of reconstructed imaging $T{(x,y)_{CS}}$ using the Canny operator.

The proposed DL method to obtain the edge of the object from the bucket signal ${S_m},m = 1,2,\ldots ,M$ can be formulated as:

(6)$${E_{DL}} = \;\mathrm{{\cal R}}\{ {S_m}\} , $$

where $\mathrm{{\cal R}}\{{\cdot} \} $ is the mapping relationship between the input and output of the neural network. Ideally, we should conduct a large number of repeated experiments to create a dataset and train the neural network. Because repeated experiments are too time-consuming, we use simulated data to create a dataset for training the neural network instead. We assume that

(7)$$\widetilde {{\cal {R}}}\; = \;{{{\cal {R}}}_{simulated}}, $$

where ${\mathrm{{\cal R}}_{simulated}}$ represents the neural network mapping relationship trained by simulation data. In the process of training the network, we can express that

(8)$${\mathrm{{\cal R}}_{simulated}}\; = \;\mathop {arg\;min}\limits_{{\mathrm{{\cal R}}_\theta },\theta \in \Theta } \sum\limits_{j = 1}^J {L(E{{(x,y)}^j},{\mathrm{{\cal R}}_\theta }\{ S_m^j\} )\; + \;\varphi (\theta )} , $$

where $\Theta $ is the parameter set, $L({\cdot} )$ is the loss function between the edge imaging of the object $\; $ and the network output ${\mathrm{{\cal R}}_\theta }\{ S_m^j\} $, and $\varphi (\theta )$ is the regularization term to prevent overfitting.

2.2 Structural design of neural network

Inspired by U-Net [17], we propose a neural network framework as shown in Fig. 1. The input of this neural network is the normalized one-dimensional bucket signal, and the output is the predictive edge detection of the object. Although the length of the input is fixed, we can standardize any under-sampled signal to a fixed length in experimental preprocessing. Therefore, the neural network we proposed can be used at any sampling rate. For the design of this neural network, a large number of convolution layers and pool layers are used to extract deeper information hidden in the bucket value sequence and reduce the dimension. At the meantime, many concatenate lays are designed for feature fusion. In order to prevent over-fitting, dropout layers, skip connection layers and batch normalization layers are used to solve this problem.

Fig. 1. Proposed neural network architecture to learn the edge detection principle from the normalized intensities.

Download Full Size | PDF

2.3 Training of neural network

Before training the neural network, a large amount of labeled data needs to be prepared. We randomly select 20,000 images from EMNIST handwritten letters database [18], binarize them, and adjust the size of each image to 150 * 150 pixels, which is consistent with the speckle pattern size in the experimental calibration stage. Next, the simulation experiment is carried out with these images, and the bucket signal generated by each image under the same optical fiber condition can be obtained. For data labels, the Canny operator [3] is used to extract edges from the image as labels for each piece of data. After completing the above steps, our training data set is ready. As the test set, we take another 500 images from EMNIST and make the test set by the same method. In the simulation and experiment, the size of the target is 150 * 150 pixels, so the sampling rate β = M/22500. In order to study the influence of β on edge detection, we mainly consider 5 different cases for M: 135, 225, 675, 1125, 2250, so that β = 0.6%, 1%, 3%, 5% and 10% respectively.

The training process of a neural network is essentially an optimization of the parameter set, including the weighting factors and biases in neurons. We iteratively optimize the network using the above-simulated training dataset as constraints to enable the network to accurately output edge detection images. The mean square error (MSE) [19] is used as the loss function in the training process:

(9)$$MSE = \frac{1}{{mn}}\sum\limits_{i = 0}^{m - 1} {\sum\limits_{j = 0}^{n - 1} {{{[I(i,j) - K(i,j)]}^2}} }, $$

where m, n are the width and height of the reconstructed edge detection imaging, and I, K are the matrix form of the reconstructed image and label image, respectively. Adam [19] is used as an optimizer to optimize the parameters and reduce the loss function. The learning rate is set to 0.001, the batch size is set to 128 and the training step is set to 80. The program is implemented in Python version 3.5 using Keras (TensorFlow support) and a graphics processing unit (NVIDIA GeForce GTX 1080 Ti) is used to accelerate the computation.

3. Results and discussions

3.1 Experimental setup

The experimental setup is schematically shown in Fig. 2. Similar to recovering images from MMF using CS [9], our experimental setup is divided into two phases: the calibration phase and the detection phase. In the calibration phase, the continuous wave laser (λ = 532mm) is guided to the phase-only spatial light modulator (SLM, Holoeye Pluto NIR-011-A) through a beam expander and polarizer, and then modulated by SLM with M random-phase patterns. The modulated beam reflected by the reflector is collimated (Thorlabs F220FC-532) into a 50 cm long MMF (Corning ClearCurve MMF) with a core diameter of 50 microns and a numerical aperture (NA) of 0.22. At the output end of the fiber, a CMOS camera (XiMEA xiQ-M002RG) is used to record the speckle pattern corresponding to the light field modulated by SLM one by one. The size of the speckle field recorded is 150 * 150 pixels. In the detection phase, most of the setup remains the same, we just replace the CMOS camera with a bucket detector (Thorlabs PDA100/A) with no spatial resolution and place a target object in front of it. In both stages, the SLM uses the same modulation scheme to ensure the same speckle pattern at the output end of the fiber.

Fig. 2. Experimental setup. E: beam expander; P: polarizer; SLM: spatial light modulator; R: reflector; MMF: multimode fiber.

Download Full Size | PDF

3.2 Simulation results

Before experimenting, we conduct a simulation experiment on the computer to test the performance of the algorithm. In order to visually observe the difference between the DL and the CS, we took the letter “W” from the test set for the simulation experiment. At five different sampling rates, DL and CS are used to generate edge detection images respectively. In Fig. 3, we can observe that the edge detection performance by CS is poor, especially at low sampling rates, but the DL is excellent at all current sampling rates. To evaluate their differences more accurately, we use structural similarity (SSIM) [20] as the evaluation metric. SSIM is an index to measure the similarity between two images, whose value ranges from −1 to 1. The larger the value, the more similar the two images are. This index can be calculated by this formula [21]:

(10)$$SSIM(x,y)\; = \;\frac{{(2{\mu _x}{\mu _y} + {c_1})(2{\sigma _{xy}} + {c_2})}}{{(\mu _x^2 + \mu _y^2 + {c_1})(\sigma _x^2 + \sigma _y^2 + {c_2})}}, $$

where ${\mu _{f\;}}$, $f\; \in \;\{ x,\;y\} $ is the mean of the image $f$, $\mu _f^2$ is the variance, ${\sigma _{xy}}$ is the covariance of x and y, ${c_1}$ and ${c_2}$ are the regularization parameters. In Fig. 4 (I), we directly compare the values of the SSIM index of edge imaging by DL and CS at different sampling rates. From the sampling rate of 0.6% to 10%, the SSIM of edge detection of CS processing increases from 0.44 to 0.64, while that of DL processing remains between 0.77 and 0.81. CS relies on more samples to get more information, while DL relies on learning useful information from training sets. Therefore, with the decrease of sampling rate, the results of CS become worse, while DL remains relatively stable. It can be seen that the edge detection generated by DL is still clear even at a really low sampling rate, and it has great advantages over CS. To verify the superior generality of the DL algorithm, 200 images are selected from the test set, and the respective performances of DL and CS are tested at a sampling rate of 1%. The results are shown in Fig. 4 (II), which shows the SSIM of DL and CS on the vertical axis and horizontal axis of each image. The horizontal and vertical coordinates of each point represent the performance of each picture under the two algorithms. All points are at the top left of the axes, which means the DL algorithm performs better than CS in all different target objects.

Fig. 3. Comparison of the edge detection of the letter “W” using (a) CS and (b) DL under the different sampling rates from 0.6% to 10%.

Download Full Size | PDF

Fig. 4. (I) shows edge detection quality of letter “W” assessed by SSIM with CS or DL method against the sampling rates, (II) shows edge detection quality of test set assessed by SSIM with both CS and DL under the sampling rate of 1%

Download Full Size | PDF

3.3 Experimental results

After using the simulation to preliminarily verify the superiority of the DL compared with the traditional method, experimental data is used for further verification. Like the simulation on the computer, the two algorithms are compared at five different sampling rates. We stabilize the MMF and try not to let it shake as much as possible. The experimental setup is given in Fig. 2. In order to verify the generalization of neural networks trained using EMNIST as the dataset, we use the Chinese letter “cloud” as an object that is very different from the English alphabets from EMNIST. The results of the experiment are shown in Fig. 5, the object image and the results generated by DL and CS under different sampling rates are given. It is obvious that under the five different sampling rates of the experimental data, the edge detection produced by the DL is superior to that of the CS. From the sampling rate of 0.6% to 10%, the SSIM of edge detection of CS processing increases from 0.38 to 0.63, while that of DL processing is always greater than 0.62. In particular, the edge detection of the CS is very blurry at the sampling rate of 0.6%, but the DL is still clearly visible. Edge detection using DL is almost noise-free, which is a good fit for the field of image recognition [22]. And the quantified data evaluated by SSIM is shown in Fig. 6, it can be observed that the experimental results are roughly consistent with the simulation results. Although the contour of edge detection using DL does not exactly match the target, this result still has considerable advantages over CS for target detection or other applications. The source of error in the experimental results is due to the use of the EMNIST dataset in training since Chinese characters and letters are not that similar. The generalizability of this neural network is one area where further work can be improved, and phase retrieval [23] may be used to improve efficiency and security.

Fig. 5. Comparison of the edge detection of experimental data using (a) CS and (b) DL under the different sampling rates from 0.6% to 10%.

Download Full Size | PDF

Fig. 6. Comparison of experimental object assessed by SSIM with CS or DL method against the sampling rate.

Download Full Size | PDF

Anti-interference performance is also a very important point in the field of endoscopy and disturbance is unavoidable in actual detection. We also test the performance of the DL edge detection when the MMF is disturbed. Previous research [24] has shown that the CS algorithm is robust against fiber bending. To investigate the robustness of the DL edge detection against fiber bending, we demonstrate our experiment by using the 2D translation stage to translate MMF. Since it has been proved that performance along the X and Y directions is very similar [25], we only translate MMF in the transverse direction (X-axis). In this experiment, we set the bending distance of the fiber along the X-axis from 0 to 8 mm, and the sampling rate of the CS and DL method is 10%. Different from the previous situation where the fiber is stationary, instead of 20,000 light-field images of the fixed fiber during the calibration phase, 4,000 light-field images for each of 5 different curvature cases are used when creating the neural network training set. The results are shown in Fig. 7, it is obvious that with the increase of fiber disturbance distance, the edge detection using CS deteriorates sharply, while the edge detection using DL deteriorates only slightly. The edge detection of CS is very fuzzy after 4mm bending, but the edge detection results of DL can still be clearly identified. The data quantified by the SSIM index is shown in Fig. 8, we can observe that as the disturbance distance increases, the edge detection SSIM index using CS processing drops from 0.66 to 0.36, but the SSIM index using DL consistently remains greater than 0.72. The anti-interference performance of DL is better than that of CS, which means that in the practical application of endoscope, DL will have a better effect than CS in the face of inevitable interference, such as interference during observation in vivo, bending or vibration of optical fiber, etc. We believe that in the actual application of endoscopes, DL will be able to play a greater advantage

Fig. 7. Verifications of edge detection against fiber bending: a letter “GI” using (a) CS and (b) DL. Image columns from left to right: the bending distance of the optical fiber from 0 to 8 mm along the X-axis.

Download Full Size | PDF

Fig. 8. Comparison of experimental object assessed by SSIM with CS or DL method against the fiber bending distance.

Download Full Size | PDF

4. Conclusion

In conclusion, we propose a new method of edge detection in the field of MMF-based endoscopy. After training the neural network, we only need to collect the sequence of light intensity in the calibration phase to easily obtain the boundary outline of the target. Compared with the traditional method, DL only needs one step to achieve edge detection, and it has better performance under the same sampling rate, and DL has a stronger anti-disturbance ability. In the actual complex environment, DL can use fewer sampling times to realize the edge detection of the target more simply and easily. In addition, our experiment demonstrated the performance of DL in the experiment when the similarity between the target image and the training set image is low. Further work will focus on improving the generalization performance. We believe that this method will make the edge detection of MMF imaging closer to commercial application.

Funding

National Natural Science Foundation of China (62071059).

Disclosures

The authors declare that they have no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. J. Gora, M. J. Suter, G. J. Tearney, and X. Li, “Endoscopic optical coherence tomography: technologies and clinical applications [Invited],” Biomed. Opt. Express 8(5), 2405 (2017). [CrossRef]

2. X. Wen, P. Lei, K. Xiong, P. Zhang, and S. Yang, “High-robustness intravascular photoacoustic endoscope with a hermetically sealed opto-sono capsule,” Opt. Express 28(13), 19255 (2020). [CrossRef]

3. J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-8(6), 679–698 (1986). [CrossRef]

4. D. Marr and E. Hildreth, “Theory of edge detection,” Proc. R. Soc. Lond. B. 207(1167), 187–217 (1980). [CrossRef]

5. J. A. M. Saif, M. H. Hammad, and I. A. A. Alqubati, “Gradient Based Image Edge Detection,” IJET 8(3), 153–156 (2016). [CrossRef]

6. M. Hafner, A. Gangl, M. Liedlgruber, A. Uhl, A. Vecsei, and F. Wrba, “Endoscopic Image Classification Using Edge-Based Features,” in 2010 20th International Conference on Pattern Recognition (IEEE, 2010), pp. 2724–2727.

7. B. V. Dhandra and R. Hegadi, “Classification of Abnormal Endoscopic Images using Morphological Watershed Segmentation,” 6 (n.d.).

8. T. Fukui, Y. Kohno, R. Tang, Y. Nakano, and T. Tanemura, “Single-Pixel Imaging Using Multimode Fiber and Silicon Photonic Phased Array,” J. Lightwave Technol. 39(3), 839–844 (2021). [CrossRef]

9. L. V. Amitonova and J. F. de Boer, “Compressive imaging through a multimode fiber,” Opt. Lett. 43(21), 5427 (2018). [CrossRef]

10. L. V. Amitonova and J. F. de Boer, “Endo-microscopy beyond the Abbe and Nyquist limits,” Light: Sci. Appl. 9(1), 81 (2020). [CrossRef]

11. T. Čižmár and K. Dholakia, “Exploiting multimode waveguides for pure fibre-based imaging,” Nat. Commun. 3(1), 1027 (2012). [CrossRef]

12. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560 (2019). [CrossRef]

13. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” CVPR (2015).

14. P. Fan, T. Zhao, and L. Su, “Deep learning the high variability and randomness inside multimode fibers,” Opt. Express 27(15), 20241 (2019). [CrossRef]

15. N. Borhani, E. Kakkava, C. Moser, and D. Psaltis, “Learning to see through multimode fibers,” Optica 5(8), 960 (2018). [CrossRef]

16. L. Zhang, R. Xu, H. Ye, K. Wang, B. Xu, and D. Zhang, “High definition images transmission through single multimode fiber using deep learning and simulation speckles,” Optics and Lasers in Engineering 140, 106531 (2021). [CrossRef]

17. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Springer (2015).

18. G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, “EMNIST: an extension of MNIST to handwritten letters,” arXiv:1702.05373 [cs] (2017).

19. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 [cs] (2017).

20. E. Tajahuerce, V. Durán, P. Clemente, E. Irles, F. Soldevila, P. Andrés, and J. Lancis, “Image transmission through dynamic scattering media by single-pixel photodetection,” Opt. Express 22(14), 16945 (2014). [CrossRef]

21. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

22. D. Keysers, T. Deselaers, C. Gollan, and H. Ney, “Deformation Models for Image Recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1422–1435 (2007). [CrossRef]

23. X. Chang, L. Bian, and J. Zhang, “Large-scale phase retrieval,” eLight 1(1), 4 (2021). [CrossRef]

24. M. Lan, D. Guan, L. Gao, J. Li, S. Yu, and G. Wu, “Robust compressive multimode fiber imaging against bending with enhanced depth of field,” Opt. Express 27(9), 12957 (2019). [CrossRef]

25. M. Lyu, H. Wang, G. Li, and G. Situ, “Exploit imaging through opaque wall via deep learning,” Optics Express (2017).

Edge detection in single multimode fiber imaging based on deep learning

Abstract

1. Introduction

2. Method

2.1 Principle of edge detection using neural network

2.2 Structural design of neural network

2.3 Training of neural network

3. Results and discussions

3.1 Experimental setup

3.2 Simulation results

3.3 Experimental results

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Equations (10)

Optics Express