Abstract
We propose a new edge detection scheme based on deep learning in single multimode fiber imaging. In this scheme, we creatively design a novel neural network, whose input is a one-dimensional light intensity sequence, and the output is the edge detection result of the target. Different from the traditional scheme, we can directly obtain the edge information of unknown objects by using this neural network without rebuilding the image. Simulation and experimental results show that, compared with the traditional method, this method can get better edge details, especially in the case of low sampling rates. It can increase the structural similarity index of edge detection imaging from 0.38 to 0.62 at the sampling rate of 0.6%. At the same time, the robustness of the method to fiber bending is also proved. This scheme improves the edge detection performance of endoscopic images and provides a promising way for the practical application of multimode fiber endoscopy.
© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Medical images have become an important basis and means for clinical medical diagnosis, pathological analysis, and treatment [1,2]. Edge detection [3] is one of the key technologies in medical image processing, and its purpose is to determine the edges of the target in images under the noise backgrounds [4]. The performance of edge detection determines the results of medical image analysis and even affects the subsequent treatment process. In view of the problem of poor edge detection in the field of endoscopy, a new edge detection method is proposed in this article.
Currently, edge detection in endoscopic imaging is usually performed by morphological methods or using edge detection operators directly [5–7]. In 2005, B.V.Dhandra et al. proposed a method of edge detection for endoscopic images by using morphological watershed segmentation technology [7]. In 2010, M. Häfner et al. got edges from endoscopic using Canny [6]. Jamil A. M. Saif et al. concluded that the Canny algorithm is more suitable for endoscopic images than Sobel and Prewitt in 2016 [5]. Their common disadvantage is that the result of edge detection will be poor if the image has a lot of noise. However, when using an endoscope based on a single multimode fiber (MMF), it is not easy to obtain a clear enough image [8]. Although some optimization algorithms have been proposed, such as compressive sensing (CS) [9,10], an ideal image still needs sufficient measurements [11] and keeping the fiber as unbent as possible.
In this paper, a new framework is proposed for high-quality edge detection in MMF imaging at low sampling rates. The method mainly uses deep learning (DL) [12,13], a machine learning technology widely used in various fields. In the field of multimode fiber, DL is used for image transmission [14], image classification [15], high-definition imaging [16] and so on. This method does not require image reconstruction and then edge detection, but directly obtains the edge of the target by the light intensity sequence. Compared with the edge detection method using an edge extraction operator from CS reconstructed images, DL can significantly reduce the number of the measurements and has a better edge detection result. We also demonstrate that DL is superior to CS in terms of robustness against fiber bending. The analysis results show that the edge detection results of this method are excellent in the disturbed environment, which is very helpful for practical application.
2. Method
2.1 Principle of edge detection using neural network
The experiment is divided into two steps. The first step is the calibration stage, where the speckle passing through the MMF is collected by the CMOS. The collected speckles are denoted by ${I_m}(x,y)$, where $m = 1,2,\ldots ,M$. The second step is the measurement stage, in which a bucket detector with no spatial resolution is used to receive the light intensity transmitted through the target. The object is denoted as $T(x,y)$ and the light field in the measurement stage is the same as that in the calibration stage. Therefore, the light intensity sequence collected by the bucket detector can be expressed as:
In edge detection in MMF imaging using CS and edge detection operator, two steps are needed. The first step is to reconstruct the image with CS, and the second step is to extract the edge from the reconstructed image with the edge extraction operator. In this paper, the Canny operator with excellent performance is used for comparison. First, an image reconstructed by CS can be viewed as finding a solution:
which minimizes the ${L_1}$ norm in the sparse basis subject to where $\Psi $ is the transform operator to the sparse basis [12]. The edge detection imaging can be expressed as:Here, $\nabla {T^{canny}}{(x,y)_{CS}}$ is the edge of reconstructed imaging $T{(x,y)_{CS}}$ using the Canny operator.
The proposed DL method to obtain the edge of the object from the bucket signal ${S_m},m = 1,2,\ldots ,M$ can be formulated as:
where $\mathrm{{\cal R}}\{{\cdot} \} $ is the mapping relationship between the input and output of the neural network. Ideally, we should conduct a large number of repeated experiments to create a dataset and train the neural network. Because repeated experiments are too time-consuming, we use simulated data to create a dataset for training the neural network instead. We assume that where ${\mathrm{{\cal R}}_{simulated}}$ represents the neural network mapping relationship trained by simulation data. In the process of training the network, we can express that2.2 Structural design of neural network
Inspired by U-Net [17], we propose a neural network framework as shown in Fig. 1. The input of this neural network is the normalized one-dimensional bucket signal, and the output is the predictive edge detection of the object. Although the length of the input is fixed, we can standardize any under-sampled signal to a fixed length in experimental preprocessing. Therefore, the neural network we proposed can be used at any sampling rate. For the design of this neural network, a large number of convolution layers and pool layers are used to extract deeper information hidden in the bucket value sequence and reduce the dimension. At the meantime, many concatenate lays are designed for feature fusion. In order to prevent over-fitting, dropout layers, skip connection layers and batch normalization layers are used to solve this problem.
2.3 Training of neural network
Before training the neural network, a large amount of labeled data needs to be prepared. We randomly select 20,000 images from EMNIST handwritten letters database [18], binarize them, and adjust the size of each image to 150 * 150 pixels, which is consistent with the speckle pattern size in the experimental calibration stage. Next, the simulation experiment is carried out with these images, and the bucket signal generated by each image under the same optical fiber condition can be obtained. For data labels, the Canny operator [3] is used to extract edges from the image as labels for each piece of data. After completing the above steps, our training data set is ready. As the test set, we take another 500 images from EMNIST and make the test set by the same method. In the simulation and experiment, the size of the target is 150 * 150 pixels, so the sampling rate β = M/22500. In order to study the influence of β on edge detection, we mainly consider 5 different cases for M: 135, 225, 675, 1125, 2250, so that β = 0.6%, 1%, 3%, 5% and 10% respectively.
The training process of a neural network is essentially an optimization of the parameter set, including the weighting factors and biases in neurons. We iteratively optimize the network using the above-simulated training dataset as constraints to enable the network to accurately output edge detection images. The mean square error (MSE) [19] is used as the loss function in the training process:
3. Results and discussions
3.1 Experimental setup
The experimental setup is schematically shown in Fig. 2. Similar to recovering images from MMF using CS [9], our experimental setup is divided into two phases: the calibration phase and the detection phase. In the calibration phase, the continuous wave laser (λ = 532mm) is guided to the phase-only spatial light modulator (SLM, Holoeye Pluto NIR-011-A) through a beam expander and polarizer, and then modulated by SLM with M random-phase patterns. The modulated beam reflected by the reflector is collimated (Thorlabs F220FC-532) into a 50 cm long MMF (Corning ClearCurve MMF) with a core diameter of 50 microns and a numerical aperture (NA) of 0.22. At the output end of the fiber, a CMOS camera (XiMEA xiQ-M002RG) is used to record the speckle pattern corresponding to the light field modulated by SLM one by one. The size of the speckle field recorded is 150 * 150 pixels. In the detection phase, most of the setup remains the same, we just replace the CMOS camera with a bucket detector (Thorlabs PDA100/A) with no spatial resolution and place a target object in front of it. In both stages, the SLM uses the same modulation scheme to ensure the same speckle pattern at the output end of the fiber.
3.2 Simulation results
Before experimenting, we conduct a simulation experiment on the computer to test the performance of the algorithm. In order to visually observe the difference between the DL and the CS, we took the letter “W” from the test set for the simulation experiment. At five different sampling rates, DL and CS are used to generate edge detection images respectively. In Fig. 3, we can observe that the edge detection performance by CS is poor, especially at low sampling rates, but the DL is excellent at all current sampling rates. To evaluate their differences more accurately, we use structural similarity (SSIM) [20] as the evaluation metric. SSIM is an index to measure the similarity between two images, whose value ranges from −1 to 1. The larger the value, the more similar the two images are. This index can be calculated by this formula [21]:
3.3 Experimental results
After using the simulation to preliminarily verify the superiority of the DL compared with the traditional method, experimental data is used for further verification. Like the simulation on the computer, the two algorithms are compared at five different sampling rates. We stabilize the MMF and try not to let it shake as much as possible. The experimental setup is given in Fig. 2. In order to verify the generalization of neural networks trained using EMNIST as the dataset, we use the Chinese letter “cloud” as an object that is very different from the English alphabets from EMNIST. The results of the experiment are shown in Fig. 5, the object image and the results generated by DL and CS under different sampling rates are given. It is obvious that under the five different sampling rates of the experimental data, the edge detection produced by the DL is superior to that of the CS. From the sampling rate of 0.6% to 10%, the SSIM of edge detection of CS processing increases from 0.38 to 0.63, while that of DL processing is always greater than 0.62. In particular, the edge detection of the CS is very blurry at the sampling rate of 0.6%, but the DL is still clearly visible. Edge detection using DL is almost noise-free, which is a good fit for the field of image recognition [22]. And the quantified data evaluated by SSIM is shown in Fig. 6, it can be observed that the experimental results are roughly consistent with the simulation results. Although the contour of edge detection using DL does not exactly match the target, this result still has considerable advantages over CS for target detection or other applications. The source of error in the experimental results is due to the use of the EMNIST dataset in training since Chinese characters and letters are not that similar. The generalizability of this neural network is one area where further work can be improved, and phase retrieval [23] may be used to improve efficiency and security.
Anti-interference performance is also a very important point in the field of endoscopy and disturbance is unavoidable in actual detection. We also test the performance of the DL edge detection when the MMF is disturbed. Previous research [24] has shown that the CS algorithm is robust against fiber bending. To investigate the robustness of the DL edge detection against fiber bending, we demonstrate our experiment by using the 2D translation stage to translate MMF. Since it has been proved that performance along the X and Y directions is very similar [25], we only translate MMF in the transverse direction (X-axis). In this experiment, we set the bending distance of the fiber along the X-axis from 0 to 8 mm, and the sampling rate of the CS and DL method is 10%. Different from the previous situation where the fiber is stationary, instead of 20,000 light-field images of the fixed fiber during the calibration phase, 4,000 light-field images for each of 5 different curvature cases are used when creating the neural network training set. The results are shown in Fig. 7, it is obvious that with the increase of fiber disturbance distance, the edge detection using CS deteriorates sharply, while the edge detection using DL deteriorates only slightly. The edge detection of CS is very fuzzy after 4mm bending, but the edge detection results of DL can still be clearly identified. The data quantified by the SSIM index is shown in Fig. 8, we can observe that as the disturbance distance increases, the edge detection SSIM index using CS processing drops from 0.66 to 0.36, but the SSIM index using DL consistently remains greater than 0.72. The anti-interference performance of DL is better than that of CS, which means that in the practical application of endoscope, DL will have a better effect than CS in the face of inevitable interference, such as interference during observation in vivo, bending or vibration of optical fiber, etc. We believe that in the actual application of endoscopes, DL will be able to play a greater advantage
4. Conclusion
In conclusion, we propose a new method of edge detection in the field of MMF-based endoscopy. After training the neural network, we only need to collect the sequence of light intensity in the calibration phase to easily obtain the boundary outline of the target. Compared with the traditional method, DL only needs one step to achieve edge detection, and it has better performance under the same sampling rate, and DL has a stronger anti-disturbance ability. In the actual complex environment, DL can use fewer sampling times to realize the edge detection of the target more simply and easily. In addition, our experiment demonstrated the performance of DL in the experiment when the similarity between the target image and the training set image is low. Further work will focus on improving the generalization performance. We believe that this method will make the edge detection of MMF imaging closer to commercial application.
Funding
National Natural Science Foundation of China (62071059).
Disclosures
The authors declare that they have no conflicts of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
References
1. M. J. Gora, M. J. Suter, G. J. Tearney, and X. Li, “Endoscopic optical coherence tomography: technologies and clinical applications [Invited],” Biomed. Opt. Express 8(5), 2405 (2017). [CrossRef]
2. X. Wen, P. Lei, K. Xiong, P. Zhang, and S. Yang, “High-robustness intravascular photoacoustic endoscope with a hermetically sealed opto-sono capsule,” Opt. Express 28(13), 19255 (2020). [CrossRef]
3. J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-8(6), 679–698 (1986). [CrossRef]
4. D. Marr and E. Hildreth, “Theory of edge detection,” Proc. R. Soc. Lond. B. 207(1167), 187–217 (1980). [CrossRef]
5. J. A. M. Saif, M. H. Hammad, and I. A. A. Alqubati, “Gradient Based Image Edge Detection,” IJET 8(3), 153–156 (2016). [CrossRef]
6. M. Hafner, A. Gangl, M. Liedlgruber, A. Uhl, A. Vecsei, and F. Wrba, “Endoscopic Image Classification Using Edge-Based Features,” in 2010 20th International Conference on Pattern Recognition (IEEE, 2010), pp. 2724–2727.
7. B. V. Dhandra and R. Hegadi, “Classification of Abnormal Endoscopic Images using Morphological Watershed Segmentation,” 6 (n.d.).
8. T. Fukui, Y. Kohno, R. Tang, Y. Nakano, and T. Tanemura, “Single-Pixel Imaging Using Multimode Fiber and Silicon Photonic Phased Array,” J. Lightwave Technol. 39(3), 839–844 (2021). [CrossRef]
9. L. V. Amitonova and J. F. de Boer, “Compressive imaging through a multimode fiber,” Opt. Lett. 43(21), 5427 (2018). [CrossRef]
10. L. V. Amitonova and J. F. de Boer, “Endo-microscopy beyond the Abbe and Nyquist limits,” Light: Sci. Appl. 9(1), 81 (2020). [CrossRef]
11. T. Čižmár and K. Dholakia, “Exploiting multimode waveguides for pure fibre-based imaging,” Nat. Commun. 3(1), 1027 (2012). [CrossRef]
12. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560 (2019). [CrossRef]
13. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” CVPR (2015).
14. P. Fan, T. Zhao, and L. Su, “Deep learning the high variability and randomness inside multimode fibers,” Opt. Express 27(15), 20241 (2019). [CrossRef]
15. N. Borhani, E. Kakkava, C. Moser, and D. Psaltis, “Learning to see through multimode fibers,” Optica 5(8), 960 (2018). [CrossRef]
16. L. Zhang, R. Xu, H. Ye, K. Wang, B. Xu, and D. Zhang, “High definition images transmission through single multimode fiber using deep learning and simulation speckles,” Optics and Lasers in Engineering 140, 106531 (2021). [CrossRef]
17. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Springer (2015).
18. G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, “EMNIST: an extension of MNIST to handwritten letters,” arXiv:1702.05373 [cs] (2017).
19. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 [cs] (2017).
20. E. Tajahuerce, V. Durán, P. Clemente, E. Irles, F. Soldevila, P. Andrés, and J. Lancis, “Image transmission through dynamic scattering media by single-pixel photodetection,” Opt. Express 22(14), 16945 (2014). [CrossRef]
21. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]
22. D. Keysers, T. Deselaers, C. Gollan, and H. Ney, “Deformation Models for Image Recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1422–1435 (2007). [CrossRef]
23. X. Chang, L. Bian, and J. Zhang, “Large-scale phase retrieval,” eLight 1(1), 4 (2021). [CrossRef]
24. M. Lan, D. Guan, L. Gao, J. Li, S. Yu, and G. Wu, “Robust compressive multimode fiber imaging against bending with enhanced depth of field,” Opt. Express 27(9), 12957 (2019). [CrossRef]
25. M. Lyu, H. Wang, G. Li, and G. Situ, “Exploit imaging through opaque wall via deep learning,” Optics Express (2017).