Wide-field color imaging through multimode fiber with single wavelength illumination: plug-and-play approach

Hailong Zhang; Hailong Zhang; Lele Wang; Lele Wang; Qirong Xiao; Qirong Xiao; Jianshe Ma; Jianshe Ma; Yi Zhao; Yi Zhao; Mali Gong; Mali Gong; Mali Gong

doi:10.1364/OE.507252

1. Introduction

Fiber optic imaging excels at capturing images in challenging, hard-to-reach spaces where other detectors face difficulties. This technology offers the advantages of compact size and the ability to navigate complex environments. Consequently, its applications span various fields, such as biomedical endoscopy [1], industrial borescopes [2], video surveillance and security [3], as well as environmental monitoring [4]. Researchers aim to further minimize the size of optical fibers in imaging systems, facilitating access to narrower spaces with minimal disruption. Presently, the prevalent commercial optical fiber imaging system utilizes a single-mode optical fiber bundle. Nevertheless, single-mode fiber can only transmit fundamental mode information, limiting each fiber to transmitting a single pixel of an image. Consequently, achieving high-resolution imaging requires fiber bundles with thousands of individual fibers, imposing severe constraints on the minimal size of single-mode fiber bundles. Nevertheless, single-mode fiber can only transmit fundamental mode information, limiting each fiber to transmitting a single pixel of an image. Consequently, achieving high-resolution imaging requires fiber bundles with thousands of individual fibers, imposing severe constraints on the minimal size of single-mode fiber bundles. Multimode fiber (MMF) allows parallel transmission of numerous light modes, boasting a mode count per unit area one to two orders of magnitude higher than single-mode fibers [5]. This suggests the potential for achieving high-resolution imaging using a single or a few MMFs, substantially diminishing the necessary size of optical fibers.

Owing to mode dispersion and intermodal coupling in MMF [6,7], incident light with image information generates a complex speckle intensity distribution at the MMF output, resembling light scattering in turbid media [8,9]. Considering MMF can be treated as a linear space-varying imaging system, currently widely used methods for wide-field imaging through MMF are based on inverse calculations and generally include transmission matrix calculation [10,11] and deep learning [12,13]. Among them, deep learning methods, as a data-driven approach, can learn the mapping between any image and its corresponding speckle pattern, offering greater universality, robustness, and ease of use. It’s worth noting that current wide-field MMF imaging research primarily centers on grayscale imaging—single-wavelength illumination and detection. While brightness offers valuable content-related insights into shape and structure, many optical fiber imaging scenarios demand color information, such as clinical medicine [14] and industrial monitoring [15]. Existing MMF color imaging methods require the addition of multiple light sources for illumination, such as lasers at different wavelengths [16,17], LEDs [18], or white light laser [13], on top of the grayscale imaging system. Utilizing multiple lasers at different wavelengths or employing a white light laser for illumination and changing detection method necessitate significant modifications to the typical grayscale imaging system, resulting in a substantial increase in the cost and space requirements of the imaging system. And it should be noted that due to the need to reconstruct images from speckle patterns with different color channels, the problems of local distortion, low definition and edge blur caused by insufficient existing graphics reconstruction algorithms will greatly increase the difficulty of color image reconstruction [19,20].

We posit that employing “colorization” techniques can achieve wide-field color imaging through MMF with single-wavelength illumination in a plug-and-play manner, preserving the typical grayscale imaging system. Colorization, automatically transforming grayscale images into color images through computation, has been a persistent focus in graphics and finds extensive application in various scenarios, including surveillance systems [21], historical image restoration [22], remote sensing [23], astronomical imaging [24], etc. Essentially, grayscale images only contain luminance information, whereas color images comprise multiple facets, including luminance, color, saturation, etc. Yet, considering the interconnections and constraints between grayscale and color images, encompassing structural and shape information, local and global relationships, prior knowledge, and semantic information [25], grayscale images can be realistically reconstructed into color images through the colorization process using deep learning models involving information supplementation and inference. In MMF imaging with single-wavelength illumination, original images transform into speckle patterns upon passing through the MMF. While the image information and semantic knowledge within image datasets persist, they adopt a more intricate and disordered form. Data-driven deep learning methods emerge as a logical choice for colorization.

In this paper, for wide-field color imaging through MMF with single-wavelength illumination via colorization, we devised SpeckleColorNet, a neural network combining U-Net and cGAN architectures. Additionally, we crafted a unified loss function and adopted a progressive training strategy to tackle the task of extracting semantic information from speckle patterns. We showcased MMF color imaging's superiority over grayscale imaging in enhancing image readability and information retrieval capabilities, employing the Peripheral Blood Cells (PBC) dataset and a typical MMF imaging system. Additionally, we highlighted our method's superior performance in color image reconstruction compared to other deep learning-based colorization methods. Our fully data-driven approach, tailored for plug-and-play, achieves high accuracy, clarity, and intricate detail in color image reconstruction. To demonstrate the color image reconstruction method’s high fidelity, we conducted downstream color cell image classification, comparing neural network classification accuracy between reconstructed and original color images. Our approach will expedite the cost-effective and seamless integration of MMF in fields requiring high-fidelity color imaging capabilities, such as clinical healthcare and industrial monitoring.

In the subsequent section, we will first introduce the single-wavelength illumination MMF imaging system employed in the experiment and derive the theoretical foundation for reconstructing original images from speckle patterns. Next, we provide a detailed exposition of SpeckleColorNet, step-by-step training approach and combined loss function. Then, we present color imaging results based on the PBC dataset, showcasing the superiority and the accuracy through comparisons with alternative approaches and downstream experiments. Finally, we discuss the performance of the proposed method and conclude by outlining the broad prospects of this work in MMF imaging applications.

2. Principle and setup

2.1 Experimental setup

In order to demonstrate the plug-and-play capability of the method proposed in this paper, we used a typical MMF grayscale imaging system, and the experimental setup is shown in Fig. 1. We used a single-mode fiber-coupled continuous wave solid-state laser (MGL-U-532, Changchun New Industry Photoelectric Technology Co., Ltd.) to provide single-wavelength illumination with a center wavelength of 532 nm, which passed through a collimator (FC520-6.1-APC, Shenzhen Lubang Technology Co., Ltd.) and then formed a beam with a diameter of about 6 mm. We loaded the RGB images in the dataset onto a Digital Micromirror Device (DMD, DLP LightCrafter 4500, Texas Instruments, pixel size 7.56 µm, number of pixels 912 × 1140) in screen projection mode. The light beam, after undergoing intensity modulation by the DMD, carried grayscale image information and was coupled into an approximately 2-meter-long MMF (LMA-GDF-30/250-M, Nufern Inc., core NA 0.062, first cladding NA 0.46, cladding diameter 247 µm, core diameter 30 µm, coating diameter 395 µm) by objective lens Obj1 (Shanghai Zhaoyi Optoelectronics Technology Co., Ltd., 0.4 NA, 20×, WD 10.4 mm, Semi APO). The speckle formed through the MMF was collected by the objective lens Obj2 (Shanghai Zhaoyi Optoelectronics Technology Co., Ltd., 0.7 NA, 40×, WD 0.17 mm, Semi APO), then was received by the camera (MER2-301-125U3 M, China Daheng (Group) Co., Ltd., pixel size 3.45 µm, number of pixels 2048 × 1536, 8 bit). We adjusted the distance between the objective lens and the camera to ensure that the speckle pattern covered as many camera sensor pixels as possible.

Fig. 1. Schematic diagram of experimental setup for wide-field color imaging through MMF with single-wavelength illumination.

Download Full Size | PDF

2.2 Physical model

In the optical system of this article, after being modulated by DMD, the light wave ${E_{({x,\; y} )}}$ carrying the grayscale images propagated through the imaging system, forming a light intensity distribution ${I_{({x^{\prime},y^{\prime}} )}}$ on the camera detector plane. The forward process can be divided into three basic components: propagation in free space, modulation of the objective lens and transmission through MMF. The specific process is shown in Fig. 2.

Fig. 2. Schematic diagram of information transmission and mapping relationships in the experimental device.

Download Full Size | PDF

The propagation in free space can be represented in angular spectrum form [26], taking the example of propagation from the DMD to the entrance pupil of the objective lens Obj1. It can be expressed as follows:

(1)$$E_{in}^s = {\mathrm{{\cal F}}^{ - 1}}\left\{ {\mathrm{{\cal F}}({{A_{DMD}}{E_{ill}}} ){e^{i\frac{{{{({2\pi {f_{{x_1}}}} )}^2} + {{({2\pi {f_{{y_1}}}} )}^2}}}{{{k_0}{n_0}}}{z_1}}}} \right\}$$

where ${E_{ill}}$ is the complex amplitude of illuminating light, which can be approximately regarded as a plane wave, ${A_{DMD}}$ is the amplitude modulation by DMD, ${k_0} = \frac{{2\pi }}{\lambda }$, ${n_0}$ is the average refractive index of medium, $({{f_{{x_1}}},{f_{{y_1}}}} )$ is the spatial frequency in the two-dimensional spatial coordinate system $({{x_1},{y_1}} )$, $\mathrm{{\cal F}}$ and ${\mathrm{{\cal F}}^{ - 1}}$ represent Fourier transform and inverse transform respectively. The light propagation in the process of ${z_1}$, ${z_2}$, ${z_3}$, and ${z_4}$ can be expressed in the form of formula (1).

The modulation of the objective lens can be expressed in two parts: phase modulation and aperture limitation.

(2)$$E_{out}^{obj} = E_{in}^{obj}\left[ { - i\frac{{{k_0}}}{{2f}}({{x_{obj}}^2 + {y_{obj}}^2} )} \right]P({{x_{obj}}^2 + {y_{obj}}^2} )$$

where $E_{in}^{obj}$ and $E_{out}^{obj}$ are the incident and exit light fields of the objective lens respectively, f is the focal length of the objective lens, $({{x_{obj}},\textrm{}{y_{obj}}} )$ are the coordinates of the objective lens. The beam diameter in the optical system of this article is much smaller than the objective lens diameter, so the aperture restriction term P can be ignored.

In optical fiber waveguides, the optical field can be represented as a combination of a series of eigenmodes. Expressing the decomposition of the incident light wave into a linear combination of these modes can be represented by the following formula:

(3)$$E_{in}^{MMF}({{\xi_{MMF}},\; {\eta_{MMF}}} )= \mathop \sum \nolimits_{l,\; \; m} {a_{lm}}(0 ){E_{lm}}({x,y} )$$

where ${E_{lm}}({x,y} )$ is the field distribution of the modes with subscripts l and m, ${a_{lm}}(0 )$ is the initial complex amplitude coefficient of each mode. Due to the presence of mode dispersion and intermodal coupling in MMF [6,7], the loss coefficient ${\alpha _{lm}}$ and transmission constant ${\beta _{lm}}$ of each mode can be used to calculate the changes in amplitude and phase during its transmission. The terminal mode coefficient ${a_{lm}}(z )$ can be expressed as

(4)$${a_{lm}}(z )= {a_{lm}}(0 )\textrm{exp}\left( { - \frac{{{\alpha_{lm}}}}{2}z + i{\beta_{lm}}z} \right)$$

From this, the light field distribution at the output end can be obtained based on the terminal mode coefficient and mode field.

(5)$$E_{out}^{MMF}({{x_M},\; {y_M},z} )= \mathop \sum \nolimits_{l,\; \; m} {a_{lm}}(z ){E_{lm}}({x,y} )$$

This equation illustrates that in MMF, the complex transmission process of different modes exhibits effects similar to strong scattering. Additionally, structurally determined MMF have a specific modulation effect on the transmitted optical field. By combining Eq. (3) and Eq. (5), we can obtain:

(6)$$E_{out}^{MMF}({{x_M},\; {y_M}} )= \mathop \sum \nolimits_{{\xi _M},\; \; {\eta _M}} {T_M}({{x_M},\; {y_M};\; {\xi_M},\; {\eta_M}} )E_{in}^M({{\xi_M},\; {\eta_M}} )$$

So far, by establishing a system of the above equations, we can achieve forward modeling from the light wave ${E_{({x,\; y} )}}$ modulated by the DMD to the light intensity ${I_{({x^{\prime},y^{\prime}} )}}$ on the CCD. It can be observed that when the optical system is determined, ${I_{({x^{\prime},y^{\prime}} )}}$ has a fixed mapping with ${E_{({x,\; y} )}}$. This determines that we can reconstruct images from speckle patterns received on the camera, and reverse learning of the mapping and reconstruction image through the construction of an artificial neural network is a feasible approach for imaging through MMF.

3. Design of neural network and training strategy

3.1 Architecture of SpeckleColorNet and training strategy

We believe that the key to reconstructing the original color images from the grayscale speckle patterns consists of two main components: (1) efficient and comprehensive extraction of features from complex speckle patterns and the semantic information contained in the dataset, and (2) the high-fidelity generation of color images. Considering the significant differences between speckle patterns and original images, the high-fidelity requirements to reconstruct color images, and the challenges associated with generating color images that adhere to natural color distributions, we proposed a convolutional neural network called SpeckleColorNet for reconstructing original color RGB images from grayscale speckle patterns output from MMF, based on the U-Net and cGAN [27,28] architectures, as illustrated in Fig. 3. SpeckleColorNet was trained in a step-by-step manner. In Step 1, we used a U-Net type neural network, which has strong training stability and is suitable for processing the translation between images in two domains with large differences, such as speckle patterns and origin images. However, the images it generates often suffer from over-smoothing and insufficient detail. So, in this paper, it was used to extract high-level features from speckle patterns to reconstruct the basic morphological information of grayscale images under the constraint of MSE loss with the grayscale images obtained by converting the original RGB image. In Step 2, cGAN type neural network extracted high-level features from the reconstructed grayscale image and the semantic information it carried to perform colorization, enhancing the richness of details and local authenticity of the reconstructed image. The images cGAN type neural network generates exhibit excellent richness in details, but it often proves challenging to train. Therefore, we trained it based on the images generated in Step 1, employing a combination loss function to alleviate the training difficulty.

Fig. 3. Schematic diagram of the convolutional neural network SpeckleColorNet for reconstructing original color images from grayscale speckle patterns, which was trained in a step-by-step manner. In Step 1, U-Net type neural network extracted high-level features from speckle patterns to reconstruct basic grayscale patterns. In Step 2, cGAN type neural network extracted high-level features and semantic information for colorization and local detail restoration.

Download Full Size | PDF

This approach, compared to end-to-end neural networks, offered better interpretability and ease of adjustment when dealing with image reconstruction tasks where the initial information and target images have significant differences. In Step 1, we used the grayscale speckle patterns captured by the camera as the input to generator G1, with the grayscale images obtained by converting the original RGB image as the ground truth. Generator G1 was trained under the MSE loss, and the speckle dataset was input to the trained G1 to obtain the reconstructed grayscale images. In the subsequent training in Step 2, we used the reconstructed grayscale images as conditional information input into generator G2 to generate the reconstructed RGB color images. The reconstructed RGB color images, along with the original RGB color images, were then fed into discriminator D to assess the authenticity of the reconstructed images, thereby providing effective feedback on the quality of the reconstructed images. During the concurrent training of generator G2 and discriminator D, generator G2 strived to generate more convincing images to deceive discriminator D. This resulted in higher image quality and a closer alignment with the distribution characteristics of real images.

Due to the necessity for high-level feature extraction and image generation both in reconstructing grayscale images from speckle patterns and in reconstructing the original color images from grayscale images, we used Encoder-Decoder structure to build both G1 and G2, based on convolutional layers and transposed convolutional layers, as shown in Fig. 4. G1 and G2 shared a similar structure, with the distinction that the input for G2 involved replicating the reconstructed grayscale images generated by G1 into three channels. This approach was adopted based on our experimental observations, which indicated that configuring the input images of G2 as three channels results in improved training stability. The four features this structure contains are: (1) We aimed to deepen the convolutional layers and gradually increased the number of feature channels to 512. This method not only helped in extracting high-level features but also extended the receptive field of the convolution kernels in deep layers, enhancing the ability to capture long-distance correlations and robustness. (2) The Skip Connection structure was employed to preserve fine-grained feature map details and context information while assisting in gradient propagation to prevent vanishing gradients. (3) We made extensive use of Instance Normalization and employ Leaky-ReLU during the down-sampling process. These techniques contributed to improve training stability and faster model convergence. (4) The extensive use of dropout layers served a dual purpose: it could suppress overfitting issues stemming from complex speckle image datasets during G1 training, and it could also enhance the generative capacity of the cGAN architecture during G2 training, by promoting diversity, reducing mode collapse, and enhancing robustness. By randomly deactivating a fraction of neurons during training, dropout introduces variability in the learning process, preventing the model from overly relying on specific neurons. This diversity encourages the generation of a wider range of outputs, mitigating the risk of mode collapse where the generator produces limited and repetitive patterns. Additionally, dropout contributes to the robustness of the model, making it less sensitive to minor variations and improving its adaptability to different inputs.

Fig. 4. The structure of generator G1 and G2 in SpeckleColorNet. Color images were reconstructed through a step-by-step training manner. In Step 1, G1 took the captured speckle patterns from the camera as input and produced the reconstructed grayscale image as output. In Step 2, G2 took the reconstructed grayscale image as input and generated the reconstructed RGB color image as output.

Download Full Size | PDF

In order to achieve the most realistic and detail-rich image reconstruction, as well as to balance the performance of the deep convolutional generator and discriminator, we improved the traditional cGAN discriminator by utilizing multiple convolutional layers to obtain patch-level classification results for high-level features, as illustrated in Fig. 5. The discriminator initially processed the input images through a convolutional layer to extract the fundamental features of the images. Subsequently, more advanced features were extracted through a series of convolutional layers with increasing feature channels, while incorporating Instance Normalization and Leaky-ReLU to obtain improved feature representations. Finally, a single-channel “authenticity map” containing multiple pixels was output through a convolutional layer. This “authenticity map” were judged whether the patch represented by each pixel (16 × 16 in this article) resembled a real-world image. This method is also known as PatchGAN [28] which could enhance local perception, capture fine textures, and facilitate adversarial training. PatchGAN discriminated based on local image regions rather than the entire image, allowing the generator to focus on finer details and textures. This approach provided a more granular assessment of the generated image, facilitating better control over the realism of local structures. It's worth noting that, considering the significant training challenges arising from the vast differences between input and target images, we introduced a regularization technique known as Spectral Normalization [29] (SN) to modify all the convolutional layers in the discriminator, which is advantageous for enhancing training stability, improving the generator's training effectiveness, preventing mode collapse, and reducing training oscillations. SN helps prevent the discriminator from becoming too sensitive to small changes, contributing to more consistent and effective learning.

Fig. 5. The structure of discriminator D in SpeckleColorNet. The discriminator guided the generator G2 to reconstruct higher-quality RGB color images with the help of Spectral Normalization and PatchGAN. SNConv refers to Spectral Normalized convolution.

Download Full Size | PDF

3.2 Loss function

The setting of the loss function determines whether the generated images are realistic, clear, and rich in detail. In the step-by-step training of SpeckleColorNet, the training objective of Step 1 is to faithfully reconstruct the grayscale images ${\tilde{y}_1}$ from the grayscale speckle patterns x. Therefore, we used Mean Squared Error (MSE) between ${\tilde{y}_1}$ and the grayscale images ${y_1}$ converted from the color image ${y_2}$ as its loss function, which is formulated as follows:

(7)$${\mathrm{{\cal L}}_{MSE}}({{G_1}} )= {\mathrm{\mathbb{E}}_{{y_1}\sim {P_{data}}({{Y_1}} )}}[{{{||\tilde{y}}_1} - {y_1}||_2} ]$$

where ${G_1}({\cdot} )$ is generator G1, ${||\cdot||_2}$ represents MSE loss. After the completion of Step 1 training, we input the training set within the speckle dataset into the trained G1 to obtain the reconstructed grayscale image ${\tilde{y}_1}$, which served as the input for Step 2 training.

During the training of Step 2, we considered all the positive factors that could enhance the quality of the final color image reconstruction to achieve more accurate colorization and improve the detail integrity and clarity of images. Consequently, we developed a combined loss function that included adversarial loss, mean absolute Deviation loss (MAD), perceptual loss, and Total Variation (TV) loss.

1) Adversarial loss: To reconstruct highly realistic color images, we calculated the adversarial loss of discriminator at the patch-level using BCEWithLogitsLoss. By having the generator and discriminator compete with each other during the training process, the adversarial loss helped the generator gradually produce images that were challenging to distinguish from real ones.

The goal of the generator was to produce images that resembled real ones, making it difficult for the discriminator to differentiate between them. Therefore, its adversarial loss can be expressed as:

(8)$${\mathrm{{\cal L}}_{adv}}({{G_2}} )= {\mathrm{\mathbb{E}}_{{{\tilde{y}}_2}\sim {P_{fake}}}}[{ - \log ({D({{{\tilde{y}}_2},\; {{\tilde{y}}_1}} )} )} ]$$

where ${G_2}({\cdot} )$ is generator G2, D$({\cdot} )$ is discriminator D. ${\tilde{y}_1}$ served not only as the input to generator G2 but also as a conditional term. It was concatenated with the images generated by G2 along the channel dimension and then fed into discriminator D. The training objective of generator was to minimize the probability assigned by the discriminator that the generated images were fake.

The discriminator's objective was to accurately distinguish between real and generated images, so its adversarial loss can be expressed as:

(9)$${\mathrm{{\cal L}}_{adv}}(D )= 0.5 \times \{{{\mathrm{\mathbb{E}}_{{y_2}\sim {P_{real}}}}[{ - \log ({D({{y_2},\; {{\tilde{y}}_1}} )} )} ]+ {\mathrm{\mathbb{E}}_{{{\tilde{y}}_2}\sim {P_{fake}}}}[{ - \log ({1 - D({{{\tilde{y}}_2},\; {{\tilde{y}}_1}} )} )} ]} \}$$

where the parameters have the same meanings as in Eq. (8). The objective of discriminator training was to maximize its ability to differentiate between real and generated images.

2) MAD loss: While MSE loss is commonly used and efficient, it may result in overly smooth generated images, leading to a loss of fine details and high-frequency information. To address this issue during Step 2 training and achieve more realistic and detailed images, we opted for MAD loss. This loss helps maintain structural and detail consistency between the reconstructed RGB color image ${\tilde{y}_2}$ and the original RGB color image ${y_2}$. The formula for this loss is expressed as:

(10)$${\mathrm{{\cal L}}_{MAD}}({{G_2}} )= {\mathrm{\mathbb{E}}_{{y_2}\sim {P_{data}}({{Y_2}} )}}[{{||G_2}({{{\tilde{y}}_1}} )- {y_2}||_1} ]$$

where ${||\cdot||_1}$ represents MAD loss.

3) Perceptual loss: When using cGAN architecture for image reconstruction tasks, the traditional MAD loss could be highly sensitive to noise and minor distortions. This could lead to excessive smoothing, color distortions, and the generation of artifacts, ultimately resulting in distorted textures [30]. To mitigate this issue, we introduced a pre-trained VGG-16 network on the ImageNet dataset to compute perceptual loss [31]. The perceptual loss in this paper measured the similarity between images by comparing the differences in deep feature representations of a pre-trained VGG-16 network when provided with the reconstructed image ${\tilde{y}_2}$ and the target image ${y_2}$ as inputs. This helped the reconstructed images receive more similar content and structure to the original images. The formula for this is expressed as:

(11)$${\mathrm{{\cal L}}_{perceptual}}({{G_2}} )= \mathop \sum \nolimits_i ||{\phi _i}({{{\tilde{y}}_2}} )- {\phi _i}{({{y_2}} )||_1}$$

where i represents the index of the selected VGG-16 feature layer, ${\phi _i}({\cdot} )$ represents the feature representation of the image at the $i$-th layer, and in this paper, we selected the feature maps output by all pooling layers.

4) Total variation loss (TV loss): In order to increase the spatial smoothness of the reconstructed image and make the picture clearer, we also introduced TV loss [32], which can be expressed as

(12)$${\mathrm{{\cal L}}_{TV}}({{G_2}} )=|| {\nabla _x}{\tilde{y}_2} + \; {\nabla _y}{\tilde{y}_2}||_1$$

where ${\nabla _x}$ and ${\nabla _y}$ represent the horizontal and vertical differential operation matrices respectively.

5) Full loss: We combined the loss functions described above through a weighted sum to obtain the complete loss function, represented as:

(13)$${\mathrm{{\cal L}}_{step2}}({{G_2},\; D} )= {\mathrm{{\cal L}}_{adv}}({{G_2},\; D} )+ \alpha {\mathrm{{\cal L}}_{MAD}}({{G_2}} )+ \beta {\mathrm{{\cal L}}_{perceptual}}({{G_2}} )+ \gamma {\mathrm{{\cal L}}_{TV}}({{G_2}} )$$

where α, β, and γ are adjustable weights that are tuned based on the characteristics of the dataset. In this paper, we set α = 10.0, β = 5.0, and γ = 5.0.

The optimization objective of parameter learning was set as

(14)$$G_2^\ast{=} argmi{n_G}mi{n_D}{\mathrm{{\cal L}}_{step2}}({{G_2},\; D} )$$

4. Experiments and results

4.1 Preparation of dataset and implementation details

Considering the potential development of MMF imaging in the field of clinical medical imaging, to validate the effectiveness and accuracy of the proposed wide-field color imaging method through MMF with single-wavelength illumination, we chose the publicly available peripheral blood cell image (PBC) [33,34] dataset as an example to demonstrate the reconstructed color images.

This dataset comprises 17,092 RGB color cell pathology images with a pixel count of 360 × 363 collected by the CellaVision DM96 analyzer. These images represent eight types of Peripheral Blood Cells (3329 neutrophils, 3117 eosinophils, 1218 basophils, 1214 lymphocytes, 1420 monocytes, 2895 immature granulocytes, 1551 erythroblasts, and 2348 platelets). The high imaging quality and accuracy of label classification of this dataset have been confirmed by pathologists. We downsized these cell pathology images using anti-aliasing interpolation method [35] to a pixel count of 256 × 256, forming the original RGB color image ${y_2}$. Next, we loaded ${y_2}$ onto the DMD in screen projection mode to achieve intensity modulation, and captured the corresponding grayscale speckle pattern formed after passing through the MMF using a camera. We used the open-source image processing library Pillow (Python 3.8) to convert ${y_2}$ into a grayscale image ${y_1}$.

It's worth noting that, preprocessing is required before feeding captured speckle patterns into the neural network. The common preprocessing of speckle patterns typically involves cropping circular speckles from the raw images captured and then compressing these circular speckles to a pixel count permissible within the computational capacity, often set at 128 × 128 or 256 × 256 pixels. This approach results in large portions of consistently black regions at the corners of the images eventually fed into the neural network. This leads to the loss of information from many pixels that could otherwise contribute valuable data, causing ineffective computations for associated neurons. Therefore, in this paper, to maximize the utilization of detected information and efficiently train neural networks, our optimized approach involves further cropping the inscribed square of the circular speckle obtained after the initial cropping process, followed by compression, thereby obtaining the speckle pattern x. This method removed consistently black regions, allowing for the retention of information-rich pixels. This helps avoid conveying pixel values devoid of information to the neural network. The comparison of the two methods is shown in Fig. 6. We randomly partitioned the dataset containing paired speckle patterns and original images into training, validation, and test sets following a ratio of 7:1:2.

Fig. 6. (a) The common preprocessing of speckle patterns typically involves cropping circular speckles from the raw images captured and then compressing these circular speckles to a pixel count permissible within the computational capacity. (b) Our optimized approach involves further cropping the inscribed square of the circular speckle obtained after the initial cropping process, followed by compression.

Download Full Size | PDF

We implemented our network using Pytorch 1.7.0 and Python 3.8 on a system with two NVDIA GTX3090 GPUs and CUDA (Version: 12.1). In the training of Step 1, generator G1 was trained by the Adam optimizer with a batch size of 500 for 100 times, the learning rate ${l_1}$ was set to $2 \times {10^{ - 4}}$. In the training of Step 2, for accelerating the training procedure, generator G2 and discriminator D was trained by the Adam optimizer with a batch size of 10 for 80 times, where the momentum parameters ${\beta _1}$ and ${\beta _2}$ was set to 0.5 and 0.999, the learning rate ${l_2}$ was set to $2 \times {10^{ - 5}}$.

4.2 RGB color image reconstruction results and performance assessment of SpekcleColorNet

The results of reconstructing color RGB images from the grayscale speckle patterns captured by CCD using the proposed SpeckleColorNet, combined loss, and step-by-step training manner are shown in Fig. 7. Figure 7 presents the comparative results of two sets of reconstructed images. In each set, the first row consists of the grayscale speckle patterns x captured by the CCD, the second row contains the original color images ${y_2}$, the third row displays the grayscale images ${\tilde{y}_1}$ reconstructed in Step 1 using the speckle pattern x as input, and the fourth row showcases the color images ${\tilde{y}_2}$ reconstructed in Step 2 using the reconstructed grayscale image ${\tilde{y}_1}$ as input. It can be observed that compared to grayscale images, color images exhibit a distinct advantage in readability. Furthermore, the training of cGAN in Step 2 enhanced the realism of image details, addressing the issue of detail loss when reconstructing complex images only using U-Net type networks. To demonstrate the advantages of the proposed step-by-step training manner, we also performed one-step color image reconstruction using speckle patterns x as neural network input and color images as the reconstruction target. We used the U-Net type network or cGAN type network with a similar structure to the G1 or G2&D parts of the above neural network SpeckleColorNet for one-step color image reconstruction, under similar training conditions as above, the reconstruction results are shown in the fifth and sixth rows. When using only the U-Net type network for one-step color image reconstruction, the reconstructed images are very blurry and do not meet the imaging requirements at all. When using only cGAN type network, the significant differences between the speckle pattern x and the target color image ${y_2}$ resulted in training difficulties and pattern collapse, making it impossible to achieve image reconstruction.

Fig. 7. Two sets of examples of Speckle-image pair dataset and the reconstructed results. 1) The first and second row respectively displays the speckle patterns captured by CCD and the original color images. 2) The third and fourth row respectively shows the grayscale images reconstructed in Step 1 and the color images reconstructed in Step 2 during the SpeckleColorNet step-by-step training process. 3) The fourth and fifth row respectively shows the results of one-step color image reconstruction using the U-Net type and cGAN type networks.

Download Full Size | PDF

To quantitatively demonstrate the ability of SpeckleColorNet to reconstruct color images from grayscale speckle patterns in step-by-step training manner, we also plotted the curves of the loss function with respect to the number of epochs during training, as illustrated in Fig. 8.

Fig. 8. (a) During the training process of Step 1, the change in loss function (MSE) for both the training and validation datasets as a function of epochs. (b) During the training process of Step 2, the change in loss function (MAD) for both the training and validation datasets as a function of epochs.

Download Full Size | PDF

Table 1 shows the index results of different neural networks and training methods to achieve color image reconstruction on the test set, including MSE, MAD, PSNR, SSIM. PSNR and SSIM are commonly used index to measure the similarity between two images. Calculating the PSNR and SSIM between the reconstructed color image ${\tilde{y}_2}$ and the original color image ${y_2}$ allowed us to quantify the capability of the proposed method. The formulas for calculating PSNR and SSIM are as follows:

(15)$$PSNR({{{\tilde{y}}_2},\; {y_2}} )= 20 \times lo{g_{10}}\frac{L}{{\sqrt {MSE({{{\tilde{y}}_2},\; {y_2}} )} }}$$

(16)$$SSIM({{{\tilde{y}}_2},\; {y_2}} )= \frac{{({2{\mu_{{{\tilde{y}}_2}}}{\mu_{{y_2}}} + {C_1}} )({2{\sigma_{{{\tilde{y}}_2}{y_2}}} + {C_1}} )}}{{({\mu_{{{\tilde{y}}_2}}^2\mu_{{y_2}}^2 + {C_1}} )({\sigma_{{{\tilde{y}}_2}}^2\sigma_{{y_2}}^2 + {C_2}} )}}$$

where MSE represents the mean squared error between the reconstructed color images ${\tilde{y}_2}$ and the original color images ${y_2}$, ${\mu _{{{\tilde{y}}_2}}}$ and ${\mu _{{y_2}}}$ represent the means of ${\tilde{y}_2}$ and ${y_2}$, ${\sigma _{{{\tilde{y}}_2}{y_2}}}$ represents the covariance between ${\tilde{y}_2}$ and ${y_2}$, ${\sigma _{{{\tilde{y}}_2}}}$ and ${\sigma _{{y_2}}}$ represent the variances of ${\tilde{y}_2}$ and ${y_2}$. ${C_1}$ and ${C_2}$ are two parameters defined as ${C_1} = {({{K_1}L} )^2}$ and ${C_2} = {({{K_2}L} )^2}$, respectively. As is commonly agreed upon, we set ${K_1} = 0.01$ and ${K_2} = 0.03$. Here, L represents the dynamic range of image pixels, and since 8-bit images were used in this paper, L = 255. From the data, it can be seen that step-by-step training manner of SpeckleColorNet outperformed other methods not only in terms of visual quality but also across all four kinds of evaluation index.

Table 1. Comparison of index of color image reconstruction results using different methods

View Table

4.3 Downstream task: comparison of classification results of origin and reconstructed color images

As MMF imaging holds promise for applications in fields such as clinical medicine and industrial monitoring, where imaging authenticity is crucial, the accuracy of reconstructed color images from grayscale speckle patterns are of paramount importance in our research. In this paper, we validated the reconstruction results by completing a downstream task. Specifically, we employed a neural network with a simple architecture as depicted in Fig. 9 to classify the original and the corresponding reconstructed color images of eight different types of cells from the PBC dataset, respectively.

Fig. 9. The neural network architecture used for classifying original color images and the corresponding reconstructed color images of different types from PBC dataset, respectively.

Download Full Size | PDF

The classification results are presented in Fig. 10. From the results, it can be observed that the classification accuracy of reconstructed color images was slightly lower than that of the original color images. However, the classification accuracy for each sub-class remained above 90%. This indicated that the reconstructed results preserved the majority of image features and demonstrated good reconstruction accuracy.

Fig. 10. Normalized confusion matrix for classification of original color images and the corresponding reconstructed color images of different types from PBC dataset.

Download Full Size | PDF

5. Discussion

The SpeckleColorNet neural network, trained with the two-step method and combined loss function proposed here, enables color imaging in monochromatic illumination MMF systems in a plug-and-play manner. Experimental results showed that this “colorization” approach reconstructed color images with high readability, clarity, detail richness, and accuracy. It's noteworthy that wide-field imaging through MMF with deep learning typically employed a one-step training approach. This entails employing convolutional neural networks, like U-Net, to directly reconstruct original images from speckle patterns in an end-to-end approach. As shown in this paper, with this traditional method, image clarity and the ability to reconstruct details were often insufficient for reconstructing complex images. The step-by-step training method, combining U-Net type networks and cGAN type networks as mentioned here, further enhanced image clarity and detail representation compared to U-Net type networks alone. Additionally, it helps mitigate the mode collapse issue that cGAN networks may face when directly translating images between two significantly different sets, like reconstructing original images from speckle patterns. Thus, the method proposed here is applicable not only for color imaging with single-wavelength illumination but also shows potential for significantly enhancing the quality of reconstructed images. This suggests its wide applicability in various fields, such as edge detection [36] and hyperspectral imaging [37].

Another thing worth mentioning is that the process of collecting the speckle dataset from the optical system in this paper took approximately 1.5 hours. Additionally, the imaging setup was placed in a normal room temperature environment without any additional temperature stabilization or vibration isolation. High-quality reconstructed color images demonstrated the robustness of the optical imaging system and image reconstruction algorithm established in this paper during the extended collection process. This suggested the practicality of the proposed method in clinical and industrial applications.

6. Conclusion

In summary, we proposed a method for wide-field color imaging through MMF using single-wavelength illumination. The method employed a data-driven approach for color imaging without modifying the traditional MMF grayscale imaging system, ensuring plug-and-play functionality. Leveraging the linear mapping invariance between MMF input and output light fields, along with semantic information from input images, we designed SpeckleColorNet, a neural network combining U-Net and cGAN architectures. Additionally, we designed a combined loss function and employed a step-by-step training approach. Experimental results with the PBC dataset and a traditional MMF grayscale imaging optical system showed that our plug-and-play color imaging method markedly enhanced readability, clarity, detail richness, and imaging accuracy compared to monochrome imaging methods. Furthermore, we demonstrated the superior imaging quality of our method compared to other deep learning methods. To validate our color image reconstruction method, we conducted downstream color cell image classification. Results indicated that the classification accuracy of reconstructed color images was slightly lower than that of the original ones. However, sub-class accuracy remained above 90%, suggesting that the color image reconstruction method, based on colorization techniques, preserved most image features with good accuracy. This method contributes to MMF application in fields like clinical medicine and industrial monitoring, demanding high-fidelity color imaging. Compared to multi-wavelength illumination for color imaging, it presents advantages of lower cost, reduced system complexity, easier deployment, and a smaller overall system footprint. It holds promising prospects for various applications.

Funding

National Natural Science Foundation of China (62075113, 62122040).

Acknowledgments

The authors would like to thank Hufei Duan, Yonghong He from Tsinghua University for the valuable suggestions.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Keiser, F. Xiong, Y. Cui, et al., “Review of diverse optical fibers used in biomedical research and clinical practice,” J. Biomed. Opt 19(8), 080902 (2014). [CrossRef]

2. Z. Yi, Z. Jiang, J. Huang, et al., “Optimization method of the installation direction of industrial endoscopes for increasing the imaged burden surface area in blast furnaces,” IEEE Trans. Ind. Inf. 18(11), 7729–7740 (2022). [CrossRef]

3. A. C. Caputo, Digital video surveillance and security (Elsevier, 2014), Chap. 4.

4. M. B. Stuart, A. J. McGonigle, and J. R. Willmott, “Hyperspectral imaging in environmental monitoring: A review of recent developments and technological advances in compact field deployable systems,” Sensors 19(14), 3071 (2019). [CrossRef]

5. Y. Choi, C. Yoon, M. Kim, et al., “Scanner-free and wide-field endoscopic imaging by using a single multimode optical fiber,” Phys. Rev. Lett. 109(20), 203901 (2012). [CrossRef]

6. L. G. Wright, Z. M. Ziegler, P. M. Lushnikov, et al., “Multimode nonlinear fiber optics: massively parallel numerical solver, tutorial, and outlook,” IEEE J. Select. Topics Quantum Electron. 24(3), 1–16 (2018). [CrossRef]

7. S.-Y. Lee, V. J. Parot, B. E. Bouma, et al., “Efficient dispersion modeling in optical multimode fiber,” Light: Sci. Appl. 12(1), 31 (2023). [CrossRef]

8. S. Yoon, M. Kim, M. Jang, et al., “Deep optical imaging within complex scattering media,” Nat. Rev. Phys. 2(3), 141–158 (2020). [CrossRef]

9. G. Wetzstein, A. Ozcan, S. Gigan, et al., “Inference in artificial intelligence with deep optics and photonics,” Nature 588(7836), 39–47 (2020). [CrossRef]

10. S. Li, S. A. Horsley, T. Tyc, et al., “Memory effect assisted imaging through multimode optical fibres,” Nat. Commun. 12(1), 3751 (2021). [CrossRef]

11. S. Li, C. Saunders, D. J. Lum, et al., “Compressively sampling the optical transmission matrix of a multimode fibre,” Light: Sci. Appl. 10(1), 88 (2021). [CrossRef]

12. N. Borhani, E. Kakkava, C. Moser, et al., “Learning to see through multimode fibers,” Optica 5(8), 960–966 (2018). [CrossRef]

13. X. Hu, J. Zhao, J. E. Antonio-Lopez, et al., “Unsupervised full-color cellular image reconstruction through disordered optical fiber,” Light: Sci. Appl. 12(1), 125 (2023). [CrossRef]

14. O. Yélamos, R. Garcia, B. D’Alessandro, et al., “Understanding Color,” in Photography in Clinical Medicine, P. Pasquali, (Springer Nature), 99–113 (2020).

15. M. Abd Al Rahman and A. Mousavi, “A review and analysis of automatic optical inspection and quality monitoring methods in electronics industry,” Ieee Access 8, 183192 (2020). [CrossRef]

16. D. Wu, L. Qin, J. Luo, et al., “Delivering targeted color light through a multimode fiber by field synthesis,” Opt. Express 28(13), 19700–19710 (2020). [CrossRef]

17. S. Ohayon, A. Caravaca-Aguirre, R. Piestun, et al., “Minimally invasive multimode optical fiber microendoscope for deep brain fluorescence imaging,” Biomed. Opt. Express 9(4), 1492–1509 (2018). [CrossRef]

18. N. Shabairou, E. Cohen, O. Wagner, et al., “Color image identification and reconstruction using artificial neural networks on multimode fiber images: Towards an all-optical design,” Opt. Lett. 43(22), 5603–5606 (2018). [CrossRef]

19. D. Liao, Y. Qian, J. Zhou, et al., “A manifold alignment approach for hyperspectral image visualization with natural color,” IEEE Trans. Geosci. Remote Sensing 54(6), 3151–3162 (2016). [CrossRef]

20. R. Lukac and K. N. Plataniotis, Color image processing: methods and applications (CRC press, 2018), Chap. 10.

21. F. Brahim, S. Hamid, A. Herman, et al., (Springer Nature, 2013), pp. 27–36.

22. S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification,” ACM Trans. Graph. 35(4), 1–11 (2016). [CrossRef]

23. S. Yoo, H. Bahng, S. Chung, et al., “Coloring with limited data: Few-shot colorization via memory augmented networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (IEEE, 2019), pp. 11283–11292.

24. A. Popowicz and B. Smolka, “Overview of grayscale image colorization techniques,” in Color Image and Video Enhancement, M. E. Celebi, (Springer Nature2015), pp. 345–370.

25. I. Žeger, S. Grgic, J. Vuković, et al., “Grayscale image colorization methods: Overview and evaluation,” IEEE Access 9, 113326–113346 (2021). [CrossRef]

26. J. W. Goodman, Introduction to Fourier optics (Roberts and Company publishers, 2005), Chap. 3.

27. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arxiv, arXiv:1411.1784 (2021). [CrossRef]

28. P. Isola, J.-Y. Zhu, T. Zhou, et al., “Proceedings of the IEEE conference on computer vision and pattern recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2017), pp. 1125–1134.

29. T. Miyato, T. Kataoka, M. Koyama, et al., “Spectral normalization for generative adversarial networks,” arxiv, arXiv:1802.05957 (2017). [CrossRef]

30. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, et al., “High-resolution image synthesis and semantic manipulation with conditional gans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2018), pp. 8798–8807.

31. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arxiv, arXiv:1409.1556 (2018). [CrossRef]

32. H. A. Aly and E. Dubois, “Image up-sampling using total-variation regularization with a new observation model,” IEEE Trans. on Image Process. 14(10), 1647–1659 (2005). [CrossRef]

33. A. Acevedo, S. Alférez, A. Merino, et al., “Recognition of peripheral blood cell images using convolutional neural networks,” Comput. Methods Programs Biomed. 180, 105020 (2019). [CrossRef]

34. A. Acevedo, A. Merino, S. Alférez, et al., “A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,” Data Brief30 (2020). [CrossRef]

35. R. Szeliski, Computer vision: algorithms and applications (Springer Nature, 2022), Chap. 3.

36. G. Wu, Z. Song, M. Hao, et al., “Edge detection in single multimode fiber imaging based on deep learning,” Opt. Express 30(17), 30718–30726 (2022). [CrossRef]

37. U. Kürüm, P. R. Wiecha, R. French, et al., “Deep learning enabled real time speckle recognition and hyperspectral imaging using a multimode fiber array,” Opt. Express 27(15), 20965–20979 (2019). [CrossRef]

Methods	MSE	MAD	PSNR	SSIM
SpeckleColorNet (step-by-step)	0.0089	0.0625	22.9928	0.6750
Only U-Net (one-step)	0.0096	0.0686	22.5761	0.6243
Only cGAN (one-step)	0.0126	0.0768	18.9875	0.5118

Wide-field color imaging through multimode fiber with single wavelength illumination: plug-and-play approach

Abstract

1. Introduction

2. Principle and setup

2.1 Experimental setup

2.2 Physical model

3. Design of neural network and training strategy

3.1 Architecture of SpeckleColorNet and training strategy

3.2 Loss function

4. Experiments and results

4.1 Preparation of dataset and implementation details

4.2 RGB color image reconstruction results and performance assessment of SpekcleColorNet

4.3 Downstream task: comparison of classification results of origin and reconstructed color images

5. Discussion

6. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (1)

Equations (16)

Optics Express