Low sampling high quality image reconstruction and segmentation based on array network ghost imaging

Xuan Liu; Tailin Han; Tailin Han; Cheng Zhou; Cheng Zhou; Cheng Zhou; Jipeng Huang; Mingchi Ju; Bo Xu; Lijun Song; Lijun Song; Lijun Song

doi:10.1364/OE.481995

1. Introduction

The second-order correlation in ghost imaging (GI) was introduced by Hanbury Brown and Twiss to measure the angular size of stars, namely the HBT effect [1,2]. Subsequently, Shi et al. [3,4] obtained the first experimental results based on entangled two-photon GI through the HBT effect. Abbas et al. [5,6] extended the HBT effect from space to space-time domain, and gave the generalized analytical results of spatiotemporal HBT effect and spatiotemporal coupling HBT effect. Then they demonstrated the spatiotemporal GI and interference (STGII) of spatiotemporal dynamic objects (STDO) by extening the relevant theory of optical coherence to the spatiotemporal domain [7]. GI is a new imaging technology that uses one-dimensional measurement signal and two-dimensional optical field distribution correlation calculation to obtain target information, in which one-dimensional measurement signal is the overall statistical information after the interaction between the target and two-dimensional light field. Two-dimensional light field is in different states at different times, and can be randomly or preset design such as Hadamard matrix, Fourier matrix, etc. Two-dimensional light fields can be generated by rotating ground glass, digital micromirror device (DMD), spatial light modulator (SLM), LEDs and some machining modulators, etc. Due to its unique imaging methods, high sensitivity, strong anti-noise and other imaging advantages, it can make up for the shortcomings of traditional optical imaging, causing extensive attention in scientific research and application fields, and achieving phased results in application fields such as lidar imaging [8–10], terahertz imaging [11–13], X-ray imaging [14–17], spectral imaging [18–20], microscopic imaging [21–25], etc. However, it is difficult to achieve fast and high-quality GI. The main limitation is that the optical field modulation rate of spatial optical modulation equipment is limited, and the higher the image quality and the larger the number of pixels, the higher the number of measurements required.

At present, on the premise of ensuring the imaging quality, the imaging speed of GI can be improved mainly from three aspects: improving the modulation rate of spatial optical modulation equipment, reconstruction algorithm and spatial light field optimization. On the first hand, rotating ground glass, spatial light modulator, digital projector, digital mirror device are commonly used spatial light modulation equipment. In terms of modulation rate, digital micromirror devices have a higher modulation rate up to 22kHz. In recent years, some new spatial light modulation methods for GI have been reported. The LED array developed by Sun et al. [26] can modulate the spatial light field up to 12.5MHz, and achieve the high speed GI of 25,000 fps at $32\times 32$ resolution. However, this modulator is difficult to achieve the fast modulation of small size and high image resolution with the current technical level. The optical phased array proposed by Liu et al. [27] can further improve the modulation rate of spatial light field, which can theoretically reach GHz. It is still difficult for optical fiber arrays with high spatial resolution, and it is unavoidable to store a large amount of data for processing under high-speed acquisition. Secondly, the proposed reconstruction methods such as difference, compressed sensing and deep learning have greatly improved the imaging performance of GI and greatly reduced the number of measurements on the premise of ensuring the imaging quality. Due to resource and time consumption and other factors, differential and compressed sensing methods have been greatly improved, but it is still difficult to achieve fast imaging under low sampling. The exciting thing is that deep learning does have amazing capabilities. It can be trained with data sets to dramatically reduce the number of measurements and image reconstruction times.

Thirdly, the design and optimization of space light field has played a surprising effect. In terms of spatial light field optimization, researchers are interested in random light field optimization [28–31] and orthogonalized light field ordering [32–35]. Zhou [36] proposed a multi-scale random speckle optimization method, which can obtain high-quality imaging at low sampling times. Subsequently, Zhang [37] proposed singular value decomposition to compress ghost imaging, which also achieved high-quality imaging. And Sun [32] sorts and optimizes the Hadamard orthogonal matrix to further improve the imaging quality. In recent years, researchers have combined machine learning with CGI to obtain high-quality imaging at low sampling times [38–41]. Hu et al. [42] proposed a modulation light field optimization method based on dictionary learning, which greatly improved the imaging quality of specific targets. Higham et al. [43] proposed a deep convolutional auto-encoder network that achieves high-quality imaging at low sampling times by training binary weights in the encoder for target modulation. Subsequently, the physics-enhanced deep learning technology of CGI was proposed by Wang [44], which uses the prior knowledge of physics to guide the training of deep learning, which proves that the method is superior to some other widely used CGI algorithms in terms of robustness and fidelity. Then Chen et al. [45] introduced the method of physical enhancement into hyperspectral ghost imaging and proposed an end-to-end V-DUnet method for obtaining 3D hyperspectral images in GI via sparsity constraints(GISC spectral camera). The results of differential GI and measured values are trained as network input to realize the rapid reconstruction of high-quality 3D hyperspectral images. However, their researches mainly focus on using deep learning to optimize the quality of CGI reconstructed images based on a single-pixel detector, and the research on parallel multi-task CGI detection with multi-pixel detector has not been mentioned.

In this work, we report a technique for multi-task CGI detection based on the array detector (multi-pixel detector) and the array neural network (Array-Net). We integrate array CGI into deep learning. Through the automatic design and optimization of the array spatial light field, the target features are directly extracted from the one-dimensional signal under extremely low sampling times to simultaneously achieve high-quality reconstruction and image-free segmentation. In order to improve the imaging speed, we binarize the trained floating-point spatial light field, and compensate the binary spatial light field through the fine-tuning training of the Array-Net. On the premise of high-quality CGI reconstruction, fast light field modulation is realized on light field modulation devices such as DMD. In addition, the technique has good scalability to be applied to array detector with any number of detection units. Numerical simulations and experiments confirm the effectiveness and superiority of our method in parallel multi-task CGI detection under low sampling times. It is helpful to promote the development of CGI in real-time detection, semantic segmentation, target recognition and other resource-constrained multi-task detection fields.

2. Methods

2.1 Array detection CGI reconstruction method

In a array detection CGI system, The transmission coefficient of the target object is denoted as $T(x,y)$ (the size is $r\times c$), and the N modulated sub-light fields are represented as $A^{m}_{n}(x, y)$, where $m (m=1,2,\ldots,M)$ is the number of measurements, $n (n=1,2,\ldots,N)$ represents the $n$th sub-light field, $x=1,2,3,\ldots,r, y=1,2,3,\ldots,c$. The transmission beam carrying the target information $T(x,y)$ is modulated by N modulation sub-light fields $A^{m}_{n}(x, y)$, and the modulated echo signals are collected by the multi-pixel detector. The each bucket detection signal collected by each detection unit of the multi-pixel detector is represented as $B^{m}_{n}=\sum _{x=1}^r\sum _{y=1}^c A^{m}_{n}(x,y)T_{n}(x,y)$, where $T_{n}(x,y)$ is the target area corresponding to each sub-light field. The target object can be obtained through $A^{m}_{n}$ and $B^{m}_{n}$:

(1)$$O_{n}(x,y) = \langle B^{m}_{n}A^{m}_{n}{(x,y)}\rangle - \langle B^{m}_{n}\rangle \langle A^{m}_{n}{(x,y)}\rangle,$$

where, $m$ is the number of measurements, $m=1,2,3,\ldots,M$, $\langle \cdot \rangle =\frac {1}{M}\sum _{m=1}^M(\cdot )$, $n$ is the $n$th sub-light field, $n=1,2,\ldots,N$, $O(x,y)$ is the entire target image, which is composed of multiple $O_{n}(x,y)$.

2.2 Array-Net imaging and image-free segmentation method

The light field of CGI is mainly designed and optimized manually, and it is difficult to obtain high-quality imaging light field under low sampling times. Therefore, we embed the array CGI into the neural network as a network layer of the neural network, which is represented as the physical layer. And the function of the physical layer is to simulate the imaging process of array CGI. The light field can be automatically designed and optimized by the powerful information processing capabilities of deep learning. The schematic diagram of the array neural network imaging and segmentation method is shown in Fig. 1. We can learn that the array CGI and the neural network jointly learn the training process from Fig. 1. The light beam generated by the light source is modulated by the spatial light field A after transmitting the object. Then all the echo signal is measured by the array detector and converted into the bucket detection signals. The reconstructed images of multiple sub-targets can be reconstructed by the physical layer CGI reconstruction formula (1), and then input into the Array-Net to obtain reconstructed images and segmented images.

Fig. 1. Schematic diagram of array neural network imaging and segmentation method. It illustrates the joint learning and training process of array CGI and neural network.

Download Full Size | PDF

When training the Array-Net, the loss values of the reconstructed image and the segmented image output by the network are calculated respectively. Then, the gradient optimization algorithm is used to adjust the element value of the CGI spatial light field matrix A based on the loss function value, and the automatic design and optimization of the spatial light field matrix will be realized. The high-quality reconstruction and segmentation results of array CGI can be obtained simultaneously at low sampling times with the trained network.

2.3 Network architecture and training

In order to solve the problem that traditional imaging can only be imaged first and then segmented, we designed an array neural network, and the structure diagram of the network is displayed in Fig. 2. The structure of the network is similar to the W-shaped network, and the input data of the network mainly includes the physical layer, the compensation module, the coding module, the residual module, the reconstruction network and the segmentation network. The physical layer is to simulate the CGI imaging process, which is mainly composed of a CGI encoding module, a CGI decoding module and a fusion module. The fusion module is to fuse the physical reconstruction sub-images of multiple sub-light fields into a complete detection image. The compensation module compensates the reconstructed image obtained through physical knowledge. The specific details will be explained later. The encoding module encodes the input image, extracts image features, and converts low-dimensional information into high-dimensional features. And the encoding module mainly includes convolution layers, batch normalization layers and activation functions. The convolution layer mainly has $7\times 7$ and $3\times 3$ convolution kernels, and the activation function is the ReLU function. The residual module can include multiple residual blocks(ResBlocks), each of which consists of two convolution layers. The reconstructed network is to decode and convert the high-dimensional features output by the encoding module into low-dimensional features, and reconstruct the target image. And it also includes convolution layers, batch normalization layers, and activation functions. The convolution layers and activation functions in the reconstructed network are similar to the encoding module. The segmentation network and the encoding module are combined by skip connections, similar to the Unet network [46]. This module fuses the features of different scales and resolutions of the image, that is, the fusion of pixel-level features and semantic-level features, which can realize pixel-level semantic segmentation and obtain high-quality segmentation results. The segmentation network is mainly composed of upsampling layers, convolution layers, batch normalization layers and activation functions. More specifically the upsampling layer is implemented with the upsampling function, the convolution layer includes $7\times 7$ and $3\times 3$ convolutions, and the activation function uses the ReLU function.

Fig. 2. Array-Net. The Array-Net can realize the reconstruction and segmentation of the target at the same time. The function of the left branch is to perform image reconstruction. Another branch is doing image segmentation.

Download Full Size | PDF

However, when the experiment is performed, the data obtained by the CGI experiment is bucket detection signals, which cannot be directly input into the network as an input image. The Array-Net need different network architecture for training and testing. Fig. 2 is the network architecture of the training network, which mainly has 8 parts: input image, a physical layer, a compensation module, an encoder, a residual module, a reconstruction network, a segmentation network and output images. For the network used in the test mode, the input image and the CGI encoder in the physical layer need to be removed, and directly use the bucket detection signals as the input of the network. The network can learn target features from one-dimensional bucket detection signals, and output reconstructed and segmented images in parallel.

As we all know, the loss function generally only supports calculating the loss value of one output result, and it is powerless for multiple output results. For example, common loss functions, such as MAE (mean absolute error), MSE (mean square error) and binary cross-entropy, etc. [47–49], are only valid for one training result and cannot satisfy the task of training with multiple output results. In order to train the Array-Net, we propose a weight loss function, it is defined as:

(2)$$L = \alpha L_{mse} + \beta L_{dice},$$

where, $L$ represents the total loss function of the Array-Net. $L_{mse}$ represents the loss function of the image reconstruction network, generally using the mean square error loss function. And the loss function of the image segmentation network is $L_{dice}$ , which generally uses the dice loss function. $\alpha$ and $\beta$ represent the weights of the reconstruction network and segmentation network respectively, and $\alpha + \beta =1$. Since the reconstructed image has more detailed information than the segmented image, in terms of the loss function, the weight of the reconstruction network is greater than the weight of the segmentation network. The parameter values of $\alpha$ and $\beta$ are selected as 0.99 and 0.01. And the initial parameters learning rate $r$, $\beta _{1}$ and $\beta _{2}$ of the Adam optimizer are set to 0.0002, 0.5 and 0.999 respectively. All training tasks were completed on a workstation(@Intel-Xeon CPU and 1 $\times$ @Nvidia-GeForce-2080Ti GPU).

The parameters of the convolution kernel in the neural network are generally floating-point matrices. However, it is difficult to use floating-point matrices on the DMD, which limits the imaging speed of CGI. To improve the imaging speed, we binarized the trained convolution floating-point spatial light field matrix. The binarization method is expressed as:

(3)$$\begin{aligned} A^{\prime} = sign(A) = \begin{cases} 1, & A \ge 0\\ -1, & otherwise \end{cases} \end{aligned}$$

where, $A^{\prime }$ represents a binary matrix whose elements are only +1 and -1.

We can observe that the binary matrix $A^{\prime }$ is not equal to $A$. Therefore, when using the binary matrix $A^{\prime }$ as the spatial light field for CGI imaging, the quality of the reconstruction and segmentation results output by the Array-Net is low. To solve this problem, we compensate the binary matrix $A^{\prime }$ to make it as close as possible to the floating-point light field matrix $A$. There is an error value between the binary light field matrix $A^{\prime }$ and the floating-point light field matrix $A$ which can be defined as:

(4)$$A = A^{\prime} + \varepsilon,$$

where, $A$ is the floating-point spatial light field matrix trained by the Array-Net, and $A^{\prime }$ is the binary matrix obtained after the binarization function sign. $\varepsilon$ is the error matrix.

Thus, equation (1) can be rewritten as:

(5)$$ \begin{aligned} O_n(x, y) & =\left\langle B_n^m A_n^m(x, y)\right\rangle-\left\langle B_n^m\right\rangle\left\langle A_n^m(x, y)\right\rangle \\ & =\left\langle B_n^m\left(A_n^m(x, y)^{\prime}+\varepsilon_n^m(x, y)\right)\rangle-\left\langle B_n^m\right\rangle\left\langle A_n^m(x, y)^{\prime}+\varepsilon_n^m(x, y)\right\rangle\right. \\ & =\left\langle B_n^m A_n^m(x, y)^{\prime}\right\rangle-\left\langle B_n^m\right\rangle\left\langle A_n^m(x, y)^{\prime}\right\rangle+\left\langle B_n^m \varepsilon_n^m(x, y)\right\rangle-\left\langle B_n^m\right\rangle\left\langle\varepsilon_n^m(x, y)\right\rangle \\ & =O_n(x, y)^{\prime}+O_n(x, y)_{error} \end{aligned} $$

Where, $O_{n}(x,y)$ is the $n$th reconstructed sub-image of the CGI with floating point spatial light field $A$. $O_{n}(x,y)^{\prime }$ represents the CGI reconstructed image based on the binary spatial light field $A^{\prime }$. $O_{n}(x,y)_{error}$ is the error image between the $O_{n}(x,y)$ and $O_{n}(x,y)^{\prime }$.

It can be found from the equation (5) that there is a large error in the reconstructed image obtained by using the binary matrix $A^{\prime }$ for CGI, which will reduce the output image quality of the Array-Net. In order to improve the output image quality of the network, we use a compensation module to compensate the reconstructed image $O_{n}(x,y)^{\prime }$, and fine-tune the network for training. That is, only the compensation module is trained, and other weight parameters of the network are fixed. The compensation module is shown in Fig. 2. The compensation module is mainly composed of $1\times 1$ convolution layers, batch normalization layers, activation functions and residual blocks.

2.4 Performance evaluation

To objectively confirm the effectiveness of our method, we use the peak signal-to-noise ratio (PSNR) [50] and the Dice coefficient (DICE) [51] to evaluate the quality of the output results. The PSNR is used to measure the quality of the target image and the reconstructed image.

(6)$$PSNR = 10 \times log_{10}\left[\frac{maxVal^{2}}{MSE}\right],$$

where, $maxVal^{2}$ is the maximum possible pixel value of the image. The $MSE$ is expressed as:

(7)$$MSE=\frac{1}{r\times c}\sum_{x=1}^r\sum_{y=1}^c\left[T(x,y)-O(x,y)\right]^{2},$$

where, $T(x,y)$ represents the target image ($r \times c$ pixels), and $O(x,y)$ denotes the reconstructed image.

The DICE is a commonly used evaluation criterion in segmentation tasks. It is a set similarity measure, which is usually used to calculate the similarity between two samples, and the value ranges from 0 to 1. The range of DICE value is from 0 to 1, the closer the value is to 1, the better the segmentation result. And the DICE is defined as:

(8)$$DICE = \frac{2\left| S_{T}\cap S_{P} \right|}{S_{T}+S_{P}},$$

where, $S_{T}$ is the ground truth area, and $S_{P}$ is the prediction area.

3. Results

3.1 Numerical simulation results

In order to demonstrate that our method can perform multi-task CGI detection at low sampling times and realize parallel image reconstruction and segmentation, numerical simulations of CGI with different pixel detectors are conducted.

Numerical simulation of CGI with single-pixel detector. We used 1280 images in the WBC (White Blood Cell) dataset [52] as the training set and 320 images as the test set. To increase the diversity of the WBC dataset, we randomly flipped and rotated the images. All the dataset images ($120 \times 120$ pixels color image) were grayscaled and resized to $128 \times 128$ pixels. In addition, since the ground true segmentation results in the WBC dataset is three-category images, they are not suitable for our network, so we convert the ground true segmentation results into two-category images.

Firstly, we used the training set to train the Array-Net, then performed binarization on the optimized floating-point spatial light field, and finally we re-trained the Array-Net to obtain the binary spatial light field matrix (dimensions $512 \times 16384$). The total training epochs are 80 (The training epochs of the floating-point spatial light modulation matrix are 40, and the epoch of fine-tuning training are 40.) for CGI with the single-pixel detector. And the training process of the Array-Net took about 2 hours. To illustrate the effectiveness of our binarization fine-tuning method for the floating-point spatial light field, we conducted simulation experiments using the floating-point spatial light field and the binary spatial light field, respectively. Among them, the resolution of the target image is $128 \times 128$ pixels, and the number of measurements is 512. Fig. 3 shows the numerical simulation results of CGI with single-pixel detector using floating-point spatial light field (Fig. 3 (a)) and binary spatial light field (Fig. 3 (b)) respectively. It can be found that the PSNR and DICE values in Fig. 3 (a) are 28.897dB and 0.944, respectively, and the PSNR and DICE values in Fig. 3 (b) are 28.194 and 0.938. The PSNR values and DICE values are relatively close, which confirms the effectiveness of our binarization fine-tuning method for the floating-point spatial light field.

Fig. 3. The numerical simulation results of CGI with single-pixel detector using floating-point spatial light field and binary spatial light field respectively. The number of measurements for CGI is 512. (a) is the numerical simulation result of CGI based on floating-point spatial light field. (b)is the numerical simulation results of the CGI with binary spatial light field. The real target image is the ground truth target image. The reconstructed image is the output of the reconstruction network in the Array-Net. And the segmented image is the output of the segmentation network in the Array-Net.

Download Full Size | PDF

Next, to demonstrate the generalization of our method, we perform numerical simulations of multi-task CGI detection with single-pixel detector using different target images, which is shown in Fig. 4. The used binary speckle patterns is obtained through the binary fine tuning training method of spatial light field matrix, whose size is $512\times 16384$. It can be observed that the structure of white blood cells can be well restored, and the PSNR values are all above 20dB, which can obtain high reconstruction quality. The segmentation result obtained by the Array-Net is relatively close to the ground truth, and the calculated DICE value also proves this conclusion. Therefore, the results show that our method can achieve parallel multi-task CGI for different target images with the single-pixel detector.

Fig. 4. The numerical simulation results of CGI with single-pixel detector using different target images. (a)-(f) are the numerical simulation results of CGI with different target images. The real image and ground truth are the original target and segmented image, respectively. The reconstructed image and the segmented image are the numerical simulation results of the Array-Net.

Download Full Size | PDF

Lastly, the CGI numerical simulation comparison of the imaging quality with different algorithms was conducted. The result is shown in Fig. 5. The images in Fig. 5 are reconstructed by TVAL3 (Total Variation Augmented Lagrangian Alternating-direction Algorithm) [53], physics-informed DNN [44], the fine-tuning process [44] and Array-Net with and without learned pattern speckles respectively. The learned pattern speckles were used by physics-informed DNN, the fine-tuning process and Array-Net algorithms, and the TVAL3 algorithm is not. It can be observed from Fig. 5 that the reconstructed image quality of the TVAL3 algorithm is the worst, and the fine-tuning process algorithm is the best. Our Array-Net is only worse than fine-tuning process and outperforms physics-informed DNN. The numerical simulation result confirms that our method has some advantages in imaging quality.

Fig. 5. The numerical simulation results of CGI with different algorithms at 512 sampling times. (a) and (b) are the images ($128\times 128$ pixels) reconstructed by TVAL3, physics-informed DNN, the fine-tuning process and Array-Net with and without learned patterns.

Download Full Size | PDF

Numerical simulation of CGI with array detector. Through the above simulation experiments, it is confirmed that our Array-Net can perform parallel multi-task CGI detection using the single-pixel detector. In order to illustrate that our Array-Net can be applied to the array detector of different detection units, we conducted a numerical simulation experiment of multi-task CGI detection with the $2\times 2$ array detector. The training and testing data of the Array-Net are both resized to $256 \times 256$ pixels, and the number of measurements $M$ is 512. The network fine-tuning training based on the binary space light field (dimensions $512 \times 65536$) is carried out to obtain high-quality numerical simulation results. For CGI with the $2\times 2$ array detector, it takes 80 epochs to train. The training of the Array-Net was about 5 hours. To illustrate the generalization of the Array-Net, we used different target images from Fig. 6. By comparing the real image with the reconstructed image, it can be found that even at a sampling rate of 0.78$\%$ (sampling rate $\gamma ={M}/{N}\times 100\%={512}/{65536}\times 100\%=0.78\%$), the main features of the white blood cells in the reconstructed image are consistent with the real image. And the details of the white blood cells in the reconstructed image are also relatively clear. On the other hand, it can be observed that the segmented images obtained by the Array-Net are relatively similar to the ground truths. Meanwhile, in order to objectively evaluate the segmentation performance of our method, we calculated the DICE value of the segmentation results. The DICE values are all above 0.9, confirming the high segmentation performance of our method. Therefore, we can conclude that our method can also achieve high reconstructed and segmented image quality with the $2\times 2$ array detector. And our method can also effectively reduce the amount of data by 93.8%, which can be applied to resource-constrained fields.

Fig. 6. The numerical simulation results of CGI with $2\times 2$ array detector. (a)-(f) are the numerical simulation results of CGI for different target images. The images of the numerical simulation are $256\times 256$ pixels. And the number of measurements for CGI is 512.

Download Full Size | PDF

In order to illustrate the generalization of our method, which is effective for different types of data sets, we performed numerical simulations using the Carvana dataset [54]. All the 5088 images of the Carvana dataset ($1918\times 1280$ pixels) are grayscaled and resized to $256\times 256$ pixels. The epochs of the Array-Net on the Carvana dataset were 80 and the training time was about 17 hours. In Fig. 7, the results were obtained by multi-task CGI detection with the array detector at 512 sampling times. In order to test the generalization of the Array-Net, the target car images which is at different locations were input into the Array-Net. And the output images are shown in Figs. 7(a)-(f) respectively. It can be found that although the types and locations of cars in the test images are different, high-quality reconstructed and segmented images can still be obtained. And the PSNR and DICE values in Fig. 7 confirm that our method is advanced and generalizable.

Fig. 7. The numerical simulation results of CGI with $2\times 2$ array detector using the Carvana dataset. (a)-(f) are the numerical simulation results of CGI for cars in different positions under the measurement times of 512. The resolution of the car images is $256\times 256$ pixels.

Download Full Size | PDF

To demonstrate the robustness of our method, we conducted the numerical simulation of multi-task CGI detection under different noise levels with the $2\times 2$ array detector, which is shown in Fig. 8. We use the signal-to-noise ratio (SNR) of the bucket detection signal to measure the noise level. And the Gaussian white noise is added to the bucket detection signal. The SNR is expressed as:

(9)$$SNR=10\log_{10}\frac{B_{s}}{B_{n}}$$

where, $B_{s}$ and $B_{n}$ represent the effective power of the bucket detection signal and noise, respectively.

Fig. 8. The simulation results of the proposed method with some other CGI algorithms with a low sampling ratio under different noise levels. The images in (a), (c) and (e) are reconstructed by MHGI, TVAL3 and our Array-Net. (b) and (d) are the segmentation results of the reconstructed images in (a) and (c) using the U-Net. And (f) is the segmentation result of the output of our Array-Net.

Download Full Size | PDF

It can be observed that other CGI algorithms such as MHGI (Multi-resolution progressive computational ghost imaging based Hadamard) [33] and TVAL3 (Total Variation Augmented Lagrangian Alternating-direction Algorithm) [53] need to reconstruct the image before segmentation, while our method can realize the reconstruction and segmentation of the target during detection, which shows that our method can greatly improve the detection efficiency of CGI. And the segmentation results of MHGI and TVAL3 are segmented by U-Net [46]. From Fig. 8, we can find that MHGI can get great results when noise is free. but when the SNR value is 30dB, the result of MHGI is already difficult to distinguish the target object. As for the results of the TVAL3 algorithm, although higher-quality results are obtained compared with MHGI, the reconstructed image is blurred and the resolution is lower. The quality of the reconstructed image and segmented image obtained by our method is better than that of the MHGI and TVAL3 algorithms. Even when the SNR is 15dB, the details of the target can still be clearly observed. The results suggest that our method is robust and can obtain high-quality results under different noise levels.

To more specifically compare the performance of the three CGI algorithms more specifically, we calculated their PSNRs and DICEs, which are shown in Fig. 9. And it shows that the PSNR values of our method is better than that of MHGI and TVAL3 algorithms. Even when the SNR of the bucket signal is 15dB, the PSNR value of our method still exceeds 25dB, while the PSNR value of the other two algorithms is already very low. From Fig. 9(b), it can be found that the noise level has little impact on the segmentation performance of our method. Even when the SNR is 5dB, the DICE value of our method is still above 0.8, while the DICE values of the MHGI and TVAL3 algorithms are below 0.3. Therefore, the numerical simulation results of our method and MHGI and TVAL3 algorithms under different noise levels prove that our method has good robustness. Furthermore our method avoids the problem of imaging first and then processing in traditional methods, realizes multi-task image-free CGI detection, and improves detection efficiency.

Fig. 9. The numerical curves of PSNR (a) and DICE (b) under different noise levels.

Download Full Size | PDF

3.2 Experimental results

In order to prove that our method can realize parallel multi-task image-free CGI detection under low sampling times, actual array detection CGI experiments are carried out. The experiment system is illustrated in Fig. 10, which includes a collecting lens, an imaging lens, a DMD and a four-quadrant detector. The four-quadrant detector is $2 \times 2$ array detector (First Sensor, QP50-6, photodiode with $4\times 12~mm^2$ active area, $50~mm^2$ Quadrant PIN detector). DMD is a micromirror array device with $1024\times 768$ micromirrors. Each micromirror can switch in two directions of $\pm 12^\circ$, corresponding to 1 and 0. And the modulation frequency of DMD can load the binary spatial light field at the speed of 22K patterns/s. Under ambient lighting (Thorlabs MCWHLP1 Cool White LED), the transmitted light of the target object is irradiated to the DMD through the lens ($D_{1}=25.4mm, f_{1}=75mm$) for light field modulation. The echo signal reflected by the mirror is focused on the four-quadrant detector through the lenses ($D_{2}=50mm, f_{2}=250mm, D_{3}=35.4mm, f_{3}=60mm$). The target object is a film model of diameter 25.4mm with the white blood cell (see Fig. 10 Object). And it is about 22.6cm away from the DMD, which is about 27.2cm away from the four-quadrant detector.

Fig. 10. The experiment system diagram of CGI.

Download Full Size | PDF

Multi-task CGI detection with the single-pixel detector. The reconstructed target object is a white blood cell image (see Fig. 11 Object). Fig. 11 shows the experimental reconstruction and segmentation results of CGI with the binary spatial light field (by binarizing the trained floating-point spatial light field, dimensions $512\times 16384$) when the number of measurements is 512. We collected some experimental CGI data and divided it into training data and test data. The training data was added to the training dataset for secondary fine-tuning network training to perform high-quality multi-task CGI detection in the experimental environment. The resolution of reconstructed images and segmented images are $128 \times 128$ pixels. Figs. 11(a)-(f) are the experimental results of the target object at different angles. The CGI images in Fig. 11 are the results of correlation calculation of the binary spatial light field and the bucket detection signal based on the CGI reconstruction algorithm. It can be found that the image quality of the CGI reconstruction algorithm is very poor, a large amount of target information is lost, and the outline of the target cannot even be distinguished. However, when the bucket detection signal of the single-pixel detector is input into our Array-Net, our network can output high-quality reconstructed and segmented images. The results verify that our method can output high-quality reconstruction and segmentation results at the same time under low sampling times, and realize multi-task CGI detection.

Fig. 11. Experimental results of multi-task CGI detection with the single-pixel detector at low sampling times. (a)-(f) are the experimental results of the same target rotated at different angles.

Download Full Size | PDF

Multi-task CGI detection with the four-quadrant detector. The target object is shown as object in the Fig. 12. The experiment was carried out under the measurement times of 512 (sampling rate $\gamma =0.78\%$) with the binary spatial light field (dimensions $512\times 65536$), which is obtained by binarizing the trained floating-point spatial light field. It is necessary to use part of the collected experimental data for secondary network fine-tuning training to obtain high-quality CGI detection results. And we can see from Fig. 12 that the object information has been lost in the reconstructed image of the CGI algorithm, and the outline of the target cannot be distinguished. The reconstructed image and the segmented image are the results of our method detecting objects from different angles in Fig. 12. The reconstructed image is similar to the object in terms of details and contours. Even after the object is rotated, the reconstructed image can still successfully restore the object. In addition to obtaining high-quality reconstructed images, the segmentation results of the object under different angles can also be generated by our Array-Net. The results in Fig. 12 confirm that at a sampling rate of 0.78%, our method is not only effective for detectors with different numbers of detection units, but also can output reconstructed images and segmented images at the same time, realizing parallel multi-task CGI detection.

Fig. 12. Experimental results of multi-task CGI detection with the four-quadrant detector at low sampling times. The images in (a)-(f) are the experimental results of the same target at different angles.

Download Full Size | PDF

4. Conclusion

In this paper, we propose and demonstrate a new multi-task array CGI detection method based on deep learning techniques, which integrates array CGI into deep learning to achieve high-quality reconstruction and image-free segmentation at very low sampling times parallel acquisition of images. There are four advantages in our Array-Net. First of all, Through the binary matrix fine-tuning training method of spatial light field, fast light field modulation on modulation devices such as DMD can be realized, and the imaging efficiency is improved; secondly, high-quality reconstructed and segmented images can be obtained simultaneously through our Array-Net, which confirms that image-free multi-task CGI detection can be achieved. Furthermore, our method is not only applicable to array detector with different numbers of detection units, but also solves the problem of partial information loss in reconstructed images due to gaps between detection units. Lastly, our method directly infers target images and segmented images from one-dimensional bucket detection signals at an extremely low sampling rate of 0.78%, achieving reconstruction and image-free segmentation. And effectively reduce the amount of data by 93.8%. Numerical simulation and experimental results show the effectiveness and advancement of this method. In summary, our method is meaningful for fast multi-task array CGI detection with array detector. And it will also be valuable in real-time detection, semantic segmentation, object recognition, etc.

Funding

Key Research and Development Projects of Jilin Province Science and Technology Department (20210201042GX); Jilin Province Advanced Electronic Application Technology Trans-regional Cooperation Science and Technology Innovation Center (20200602005ZP); Key Program for Science and Technology Development of Jilin Province (20220204134YY); Science Foundation of the Education Department of Jilin Province (JJKH20221150KJ); Special Funds for Provincial Industrial Innovation in Jilin Province (2019C025); Science and Technology Planning Project of Jilin Province (20200404141YY).

Disclosures

The authors declare that there are no conflicts of interest related to this paper.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. R. Hanbury Brown and R. Q. Twiss, “A new type of interferometer for use in radio astronomy,” Phil. Mag. 45(366), 663–682 (1954). [CrossRef]

2. R. Hanbury Brown and R. Q. Twiss, “A test of a new type of stellar interferometer on sirius,” Nature 178(4541), 1046–1048 (1956). [CrossRef]

3. T. B. Pittman, Y. Shih, D. Strekalov, and A. V. Sergienko, “Optical imaging by means of two-photon quantum entanglement,” Phys. Rev. A 52(5), R3429–R3432 (1995). [CrossRef]

4. D. Strekalov, A. Sergienko, D. Klyshko, and Y. Shih, “Observation of two-photon ghost interference and diffraction,” Phys. Rev. Lett. 74(18), 3600–3603 (1995). [CrossRef]

5. A. Abbas and L.-G. Wang, “Hanbury brown and twiss effect in spatiotemporal domain,” Opt. Express 28(21), 32077–32086 (2020). [CrossRef]

6. A. Abbas and L.-G. Wang, “Hanbury brown and twiss effect in the spatiotemporal domain ii: the effect of spatiotemporal coupling,” OSA Continuum 4(8), 2221–2231 (2021). [CrossRef]

7. A. Abbas, C. Xu, and L.-G. Wang, “Spatiotemporal ghost imaging and interference,” Phys. Rev. A 101(4), 043805 (2020). [CrossRef]

8. W. Gong, C. Zhao, H. Yu, M. Chen, W. Xu, and S. Han, “Three-dimensional ghost imaging lidar via sparsity constraint,” Sci. Rep. 6(1), 1–6 (2016). [CrossRef]

9. W. Gong, H. Yu, C. Zhao, Z. Bo, M. Chen, and W. Xu, “Improving the imaging quality of ghost imaging lidar via sparsity constraint by time-resolved technique,” Remote Sensing 8(12), 991 (2016). [CrossRef]

10. X. Mei, W. Gong, Y. Yan, S. Han, and Q. Cao, “Experimental research on prebuilt three-dimensional imaging lidar,” Chin. J. Lasers 43, 0710003 (2016). [CrossRef]

11. R. I. Stantchev, D. B. Phillips, P. Hobson, S. M. Hornett, M. J. Padgett, and E. Hendry, “Compressed sensing with near-field thz radiation,” Optica 4(8), 989–992 (2017). [CrossRef]

12. L. Olivieri, J. S. T. Gongora, L. Peters, V. Cecconi, A. Cutrona, J. Tunesi, R. Tucker, A. Pasquazi, and M. Peccianti, “Hyperspectral terahertz microscopy via nonlinear ghost imaging,” Optica 7(2), 186–191 (2020). [CrossRef]

13. S.-C. Chen, Z. Feng, J. Li, W. Tan, L.-H. Du, J. Cai, Y. Ma, K. He, H. Ding, Z.-H. Zhai, Z.-R. Li, C.-W. Qiu, X.-C. Zhang, and L.-G. Zhu, “Ghost spintronic thz-emitter-array microscope,” Light: Sci. Appl. 9(1), 1–9 (2020). [CrossRef]

14. H. Yu, R. Lu, S. Han, H. Xie, G. Du, T. Xiao, and D. Zhu, “Fourier-transform ghost imaging with hard x rays,” Phys. Rev. Lett. 117(11), 113901 (2016). [CrossRef]

15. D. Pelliccia, A. Rack, M. Scheel, V. Cantelli, and D. M. Paganin, “Experimental x-ray ghost imaging,” Phys. Rev. Lett. 117(11), 113902 (2016). [CrossRef]

16. A. Schori and S. Shwartz, “X-ray ghost imaging with a laboratory source,” Opt. Express 25(13), 14822–14828 (2017). [CrossRef]

17. A.-X. Zhang, Y.-H. He, L.-A. Wu, L.-M. Chen, and B.-B. Wang, “Tabletop x-ray ghost imaging with ultra-low radiation,” Optica 5(4), 374–377 (2018). [CrossRef]

18. L. Bian, J. Suo, G. Situ, Z. Li, J. Fan, F. Chen, and Q. Dai, “Multispectral imaging using a single bucket detector,” Sci. Rep. 6(1), 1–7 (2016).

19. Z. Liu, S. Tan, J. Wu, E. Li, X. Shen, and S. Han, “Spectral camera based on ghost imaging via sparsity constraints,” Sci. Rep. 6(1), 1–10 (2016).

20. Z. Li, J. Suo, X. Hu, C. Deng, J. Fan, and Q. Dai, “Efficient single-pixel multispectral imaging via non-mechanical spatio-spectral modulation,” Sci. Rep. 7(1), 1–7 (2017). [CrossRef]

21. Y. Wu, P. Ye, I. O. Mirza, G. R. Arce, and D. W. Prather, “Experimental demonstration of an optical-sectioning compressive sensing microscope (csm),” Opt. Express 18(24), 24565–24578 (2010). [CrossRef]

22. N. Radwell, K. J. Mitchell, G. M. Gibson, M. P. Edgar, R. Bowman, and M. J. Padgett, “Single-pixel infrared and visible microscope,” Optica 1(5), 285–289 (2014). [CrossRef]

23. Y. Liu, J. Suo, Y. Zhang, and Q. Dai, “Single-pixel phase and fluorescence microscope,” Opt. Express 26(25), 32451–32462 (2018). [CrossRef]

24. J. Peng, M. Yao, J. Cheng, Z. Zhang, S. Li, G. Zheng, and J. Zhong, “Micro-tomography via single-pixel imaging,” Opt. Express 26(24), 31094–31105 (2018). [CrossRef]

25. W. Li, Z. Tong, K. Xiao, Z. Liu, Q. Gao, J. Sun, S. Liu, S. Han, and Z. Wang, “Single-frame wide-field nanoscopy based on ghost imaging via sparsity constraints,” Optica 6(12), 1515–1523 (2019). [CrossRef]

26. Z.-H. Xu, W. Chen, J. Penuelas, M. Padgett, and M.-J. Sun, “1000 fps computational ghost imaging using led-based structured illumination,” Opt. Express 26(3), 2427–2434 (2018). [CrossRef]

27. C. Liu, J. Chen, J. Liu, and X. Han, “High frame-rate computational ghost imaging system using an optical fiber phased array and a low-pixel apd array,” Opt. Express 26(8), 10048–10064 (2018). [CrossRef]

28. X. Wang, Y. Tao, F. Yang, and Y. Zhang, “An effective compressive computational ghost imaging with hybrid speckle pattern,” Opt. Commun. 454, 124470 (2020). [CrossRef]

29. F. Liu, X.-F. Liu, R.-M. Lan, X.-R. Yao, S.-C. Dou, X.-Q. Wang, and G.-J. Zhai, “Compressive imaging based on multi-scale modulation and reconstruction in spatial frequency domain,” Chin. Phys. B 30(1), 014208 (2021). [CrossRef]

30. J. Cao, D. Zhou, Y. Zhang, H. Cui, F. Zhang, K. Zhang, and Q. Hao, “Optimization of retina-like illumination patterns in ghost imaging,” Opt. Express 29(22), 36813–36827 (2021). [CrossRef]

31. C. Zhang, J. Tang, J. Zhou, and S. Wei, “Singular value decomposition compressed ghost imaging,” Appl. Phys. B 128(1), 1–11 (2022). [CrossRef]

32. M.-J. Sun, L.-T. Meng, M. P. Edgar, M. J. Padgett, and N. Radwell, “A russian dolls ordering of the hadamard basis for compressive single-pixel imaging,” Sci. Rep. 7(1), 1–7 (2017).

33. C. Zhou, T. Tian, C. Gao, W. Gong, and L. Song, “Multi-resolution progressive computational ghost imaging,” J. Opt. 21(5), 055702 (2019). [CrossRef]

34. H. Ma, A. Sang, C. Zhou, X. An, and L. Song, “A zigzag scanning ordering of four-dimensional walsh basis for single-pixel imaging,” Opt. Commun. 443, 69–75 (2019). [CrossRef]

35. W.-K. Yu, “Super sub-nyquist single-pixel imaging by means of cake-cutting hadamard basis sort,” Sensors 19(19), 4122 (2019). [CrossRef]

36. C. Zhou, H. Huang, B. Liu, and L. Song, “Hybrid speckle-pattern compressive computational ghost imaging,” Acta Optica Sinica 36, 0911001 (2016). [CrossRef]

37. X. Zhang, X. Meng, X. Yang, Y. Wang, Y. Yin, X. Li, X. Peng, W. He, G. Dong, and H. Chen, “Singular value decomposition ghost imaging,” Opt. Express 26(10), 12948–12958 (2018). [CrossRef]

38. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]

39. R. Zhu, H. Yu, Z. Tan, R. Lu, S. Han, Z. Huang, and J. Wang, “Ghost imaging based on y-net: a dynamic coding and decoding approach,” Opt. Express 28(12), 17556–17569 (2020). [CrossRef]

40. C. Zhou, X. Liu, Y. Feng, X. Li, G. Wang, H. Sun, H. Huang, and L. Song, “Real-time physical compression computational ghost imaging based on array spatial light field modulation and deep learning,” Opt. Lasers Eng. 156, 107101 (2022). [CrossRef]

41. H. Liu, L. Bian, and J. Zhang, “Image-free single-pixel segmentation,” Opt. Laser Technol. 157, 108600 (2023). [CrossRef]

42. C. Hu, Z. Tong, Z. Liu, Z. Huang, J. Wang, and S. Han, “Optimization of light fields in ghost imaging using dictionary learning,” Opt. Express 27(20), 28734–28749 (2019). [CrossRef]

43. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]

44. F. Wang, C. Wang, C. Deng, S. Han, and G. Situ, “Single-pixel imaging using physics enhanced deep learning,” Photonics Res. 10(1), 104–110 (2022). [CrossRef]

45. Z. Chen, Z. Liu, C. Hu, H. Wu, J. Wu, J. Lin, Z. Tong, H. Yu, and S. Han, “Hyperspectral image reconstruction for spectral camera based on ghost imaging via sparsity constraints using v-dunet,” arXiv, arXiv:2206.14199 (2022). [CrossRef]

46. O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional netwx orks for biomedical image segmentation, in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

47. X. Zhai, Z. Cheng, Y. Chen, Z. Liang, and Y. Wei, “Foveated ghost imaging based on deep learning,” Opt. Commun. 448, 69–75 (2019). [CrossRef]

48. Y. Ni, D. Zhou, S. Yuan, X. Bai, Z. Xu, J. Chen, C. Li, and X. Zhou, “Color computational ghost imaging based on a generative adversarial network,” Opt. Lett. 46(8), 1840–1843 (2021). [CrossRef]

49. X. Liu, T. Han, C. Zhou, J. Hu, M. Ju, B. Xu, and L. Song, “Computational ghost imaging based on array sampling,” Opt. Express 29(26), 42772–42786 (2021). [CrossRef]

50. H. Huang, C. Zhou, W. Gong, and L. Song, “Block matching low-rank for ghost imaging,” Opt. Express 27(26), 38624–38634 (2019). [CrossRef]

51. A. A. Taha and A. Hanbury, “Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool,” BMC Med. imaging 15(1), 29 (2015). [CrossRef]

52. X. Zheng, Y. Wang, G. Wang, and J. Liu, “Fast and robust segmentation of white blood cell images by self-supervised learning,” Micron 107, 55–71 (2018). [CrossRef]

53. C. Li, W. Yin, H. Jiang, and Y. Zhang, “An efficient augmented lagrangian method with applications to total variation minimization,” Comput. Optim. Appl. 56(3), 507–530 (2013). [CrossRef]

54. V. Iglovikov and A. Shvets, “Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation,” arXiv, arXiv:1801.05746 (2018). [CrossRef]

Low sampling high quality image reconstruction and segmentation based on array network ghost imaging

Abstract

1. Introduction

2. Methods

2.1 Array detection CGI reconstruction method

2.2 Array-Net imaging and image-free segmentation method

2.3 Network architecture and training

2.4 Performance evaluation

3. Results

3.1 Numerical simulation results

3.2 Experimental results

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Equations (9)

Optics Express