Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Polarization motivating high-performance weak targets’ imaging based on a dual-discriminator GAN

Open Access Open Access

Abstract

High-level detection of weak targets under bright light has always been an important yet challenging task. In this paper, a method of effectively fusing intensity and polarization information has been proposed to tackle this issue. Specifically, an attention-guided dual-discriminator generative adversarial network (GAN) has been designed for image fusion of these two sources, in which the fusion results can maintain rich background information in intensity images while significantly completing target information from polarization images. The framework consists of a generator and two discriminators, which retain the texture and salient information as much as possible from the source images. Furthermore, attention mechanism is introduced to focus on contextual semantic information and enhance long-term dependency. For preserving salient information, a suitable loss function has been introduced to constrain the pixel-level distribution between the result and the original image. Moreover, the real scene dataset of weak targets under bright light has been built and the effects of fusion between polarization and intensity information on different weak targets have been investigated and discussed. The results demonstrate that the proposed method outperforms other methods both in subjective evaluations and objective indexes, which prove the effectiveness of achieving accurate detection of weak targets in bright light background.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Imaging quality is susceptible to some key factors in practical applications, such as weather, light intensity and exposure time. When performing detection under bright light field, the target is not completely visible, resulting in dark and weak targets. Specifically, the target energy is weak while the background energy is strong, resulting in the target being overwhelmed by the background, making it difficult to detect and process. In addition, the lack of texture and shape features does not make it easy to extract target features in detection. Therefore, fast and accurate detection of targets in such scenarios have positive implications for improving the detection distance and accuracy of visible detection systems. Due to the limitation of a single sensor in the visible detection system, it is not possible to effectively and comprehensively describe the imaging scene. In this way, the fusion of information from different sensors will be considerably helpful to improve imaging quality, which can fuse useful information, remove redundant information, and finally obtain images with richer content, better human eye vision and machine readability [1].

Image fusion has been developing in different fused sources [24]. These fusion methods can be generally classified into conventional methods and deep learning-based methods. Traditional methods are divided into three categories, including sparse representation-based methods [5], pseudo-color-based methods [67], and multi-scale transform-based methods [8], of which multiscale transform-based methods are the most widespread [914]. Traditional fusion methods mainly perform the relevant mathematical transformations in the spatial and frequency domains, whose fusion performance depends on the feature extraction ability and fusion strategy. Although these methods can achieve good fusion results in the early stage, with the change of fusion scenes, the fusion rules become more and more complicated, resulting great limitations for conventional fusion methods. Secondly, the correlation of different source images is not considered and the same transform is used to extract features, which leads to poor performance of the final fusion.

Deep learning-based methods have been found to solve these problems with great success in image fusion. In general, based on the framework structure, deep learning-based methods can be classified into auto-encoder (AE)-based methods, convolutional neural network (CNN)-based methods and generative adversarial network (GAN)-based methods. In deep learning, DenseFuse [15] is a classical AE structured image fusion method that uses pre-trained encode and decode to perform feature extraction and reconstruction to realize the fusion of infrared and visible images. Jian et al. [16] further proposed a symmetric encoding and decoding structure based on residual blocks, which achieved good fusion results. In the end-to-end CNN network structure, Zhang et al. [17] proposed a gradient and intensity combined loss function together to guide the model to generate fused images and accomplish various fusion tasks. Liu et al. [18] proposed a method to extract features based on traditional pyramid decomposition and then reconstruct fused images using CNN. However, image fusion is an unsupervised problem with no standard fused images as ground-truth. Therefore, GAN-based architectures have been proposed, which have attracted widespread attention. Ma et al. [19] first used GAN for the task of fusion of infrared and visible images, which is known as FusionGAN. In the FusionGAN network system, infrared and visible images are concatenated together as the input to the network to generate fused images. Then the generated and visible images are fed into the discriminator network for adversarial gaming to learn more visible images information. Although this method can achieve good results, the design using only one discriminator inevitably makes the fused images focus on meaningful information of one-sided source images. Thus, Ma et al. [20] designed another fusion network for infrared and visible images with two discriminators, called DDcGAN. Although all the above-mentioned fusion networks based on the GAN can produce visually desirable fusion results, but they cannot highlight salient information about the target, making the fusion results appear low-contrast and ignoring the typical details or features presented in the source images. Therefore, attention mechanism is introduced into the field of image fusion to obtain salient information from source images.

In some environments, the thermal radiation information of the infrared image can complement the target information by fusing images in the infrared and visible light dimensions. However, in infrared imaging, the resolution of infrared images is very low. Specifically, the target is blurred and it is difficult to distinguish the target features. As well as the target information is not comprehensive in infrared images, and the details are partially lost. Compared with infrared light, polarized light has strong anti-reflection and anti-scattering properties [2123], which can represent the surface features of objects with low-contrast targets even when the target information is overwhelmed by strong illumination or is difficult to obtain under low-illumination. Moreover, the image resolution is the same in polarization imaging. The polarization imaging can describe the inherent polarization properties of the scene in a stable manner. Therefore, the polarization components can be fused with the visible image instead of the infrared information, which is more responsive to the surface characteristics of the object target and enhances the characteristic information of the target [2426].

According to the Stokes vector representation method [27], polarization imaging can provide not only intensity images, but also polarization images. The intensity images represent most of the light intensity information, which capture the reflected and transmitted light of the target. While polarization images record polarization characteristics, providing detailed characteristics of the weak target, such as surface shape, contrast, and roughness. Since the polarization image is not correlated with the intensity image, the source images can be fused to provide complementary target information from different dimensions. Polarization images are able to show valuable weak target information compared with intensity images, containing more texture and polarized information of low-contrast objects, and can be used as complementary information to intensity images. However, polarization images have a large amount of redundant background information. The direct fusion of this redundant information will harm the quality of fused images and is not suitable for human eye vision. We expect the fusion results to preserve most of the intensity distribution except the weak target from the intensity images and to record the high-contrast polarized characteristics presented in the weak target from the polarization images, which can contain the rich background information of the intensity images and the prominent target information of the polarization images, obtaining images with better information.

To achieve weak target imaging under bright light background and inspired by the attention mechanism [28], we propose an attention-guided dual-stream generative adversarial learning network. The polarization images are fused with intensity images to represent the surface features of the object target, to enhance the characteristic information of targets and to realize the imaging of weak targets. In our network, the generator can generate fused images containing background information of the intensity image and polarization information of different materials, and no ground-truth is required to satisfy the fusion task.

Our contributions can be summarized in the following three aspects:

  • (1) Based on the fact that DoLP images can provide feature information such as surface shape, contrast and roughness of the target even under bright light and fusion of intensity and polarization information significantly improve the image quality, we explore the effect of polarization on the fusion results of different weak targets in the laboratory to achieve weak targets’ polarization imaging.
  • (2) The adopted attention-based mechanism for the GAN uses the self-attention block and the detail supplement block to fuse the background intensity information and the high-contrast polarized information effectively.
  • (3) We design a suitable loss function to train the network to preserve the background brightness of the intensity images and the polarization information of the polarized images.

The rest of this paper is structured as follows. A review of related work is provided in Sec. 2. In Sec. 3 we introduce our proposed network framework. In Sec. 4, we present the effect of polarization on the fusion results of different weak target under bright light background in the laboratory, followed by the generalization experimental results of outdoor weak target. Finally, we conclude this paper in Sec.5.

2. Theoretical backgrounds

In this section, some theoretical backgrounds have been introduced, including the polarization information representation and basic model of the GAN.

2.1 Polarization information

Polarization information is more sensitive to surface roughness, 3D shapes [29] and target material [30]. Therefore, polarization is widely used in the field of imaging [3134]. As for polarization information, it can be detected directly by calculations between intensity images at four polarization angles $({{I_{0^\circ }},{I_{45^\circ }},{I_{90^\circ }},{I_{135^\circ }}} )$ captured by a polarization camera. Specifically, one of polarization information representation, the Stocks vector $({{S_0},{S_1},{S_2}} )$ [27] can be obtained as follows:

$$\begin{array}{l} {S_0} = ({{I_{0^\circ }} + {I_{45^\circ }} + {I_{90^\circ }} + {I_{135^\circ }}} )/2,\\ {S_1} = {I_{0^\circ }} - {I_{90^\circ }},\\ {S_2} = {I_{45^\circ }} - {I_{135^\circ }}, \end{array}$$
where ${S_0}$ indicates the total intensity, ${S_1}$ indicates the intensity difference between horizontally polarized and vertically polarized lights, and ${S_2}$ indicates the intensity difference between 45° polarized and 135° polarized lights. In addition, the degree of linear polarization (DoLP) and angle of polarization (AoP) describe the polarization characteristics, expressed as:
$$D\textrm{o}LP = \sqrt {{{({{S_1}} )}^2} + {{({{S_2}} )}^2}} /{S_0},$$
$$\begin{array}{l} AoP = arctan({{S_2}/{S_1}} )/2,\\ \end{array}$$

The DoLP value is always between 0 and 1, indicating the ratio of polarized light to total light intensity.

2.2 Generating adversarial networks

The concept of the GAN was first introduced in a paper published by Goodfellow et al. [35] in 2014. In GAN, the real data of the input network is assumed to be $X = \{{{x_1},{x_2},{x_3}, \cdot{\cdot} \cdot ,{x_n}} \}$ obeying a specific distribution. Specifically, the generator fits the distribution of the real data X from the random noise in the training to generate the data $G(X )$ so that the fake data $({{P_G}} )$ is closer to the real data $({{P_{data}}} )$. The discriminator determines whether the generated data is real, distinguishes the distributions of real data X and $G(X )$, and generates distribution probabilities.

Through continuous iterative training, the generator network gradually learns how to generate more realistic data, while the discriminator network gradually becomes more accurate. Eventually, the generator network can generate new data similar to the training data.

The first proposed GAN network model has many drawbacks. During the training iterations, the discriminator cannot effectively fit the data although it can identify the authenticity of the data, and the generator cannot update the loss in a timely way. This leads to undesirable results and the generated images are not of high quality. Mao et al. [36] proposed a least squares GAN that uses least squares to calculate the loss. The loss function is defined as follows:

$$\begin{array}{l} \mathop {\min }\limits_D {V_{LSGAN}}(D )= \frac{1}{2}{E_{X\sim {P_{\textrm{data}}}}}[{({D(x )- a} )^2}] + \frac{1}{2}{E_{X\sim {P_G}}}[{({D({G(x )} )- b} )^2}], \end{array}$$
$$\mathop {\min }\limits_G {V_{LSGAN}}(G )= \frac{1}{2}{E_{X\sim {P_G}}}[{({D({G(x )} )- c} )^2}],$$
where a, b and c are the probability labels that guide the training. Specifically, a and b are the labels for real and fake data, and c is the label for the generated fake data to successfully deceive the discriminator. Therefore, b should be as close to 0 as possible, and conversely, a and c should be as close to 1 as possible.

3. Proposed method

In this section, we first describe the specific dataset production, including the generalized outdoor dataset and the indoor experimental dataset. Then, detailed descriptions of the designed network framework are represented, introducing the structures of the designed generator and discriminator. Finally, the designed loss function and the setting of hyperparameters are introduced.

3.1 Dataset construction

  • (1) Experimental outdoor dataset: To obtain a well-trained network, a large number of training set is desirable. Outdoor scenes in bright light are captured in a single shot by a division of focal-plane polarimeter (DoFP) camera (LUCID, PHX055S-PC). The camera was mounted on a fixed tripod and captured intensity images in the four polarization directions $({{I_{0^\circ }},{I_{45^\circ }},{I_{90^\circ }},{I_{135^\circ }}} )$ at an image resolution of 2480 × 1860. The entire capturing process is controlled by a computer. According to Eqs. (1) and (2), ${S_0}$ and DoLP images can be calculated as the input for training the network. We acquired 100 sets of images. Each set includes both intensity and polarization images, resized to 768 × 576. 60 of these sets are used for training and the rest of the images for validation and testing. The training images are cropped into small patches of size 128 × 128, with a total of 25,664 patches. Meanwhile, we use flip and rotate to increase the data and improve the training efficiency.
  • (2) Experimental indoor dataset: We have also built indoor scenarios of weak targets under bright light background, as shown in Fig. 1. It consists of a LED light source, a black box, targets, a DoFP camera, and a computer. The LED is used as the light source; The black box (60cm × 100 cm) is used to construct a dark condition. Objects with various materials and complexities are selected as the investigating targets.

    The unpolarized light emitted from the LED illuminates the targets placed the closed black box. By adjusting the size of hole on the box, the half or part of the targets are illuminated to represent weak targets under bright light. Finally, the light reflected from the targets is captured by the DoFP camera, and the images of the Stokes parameters are stored in computer.

  • (3) Training details: We train the generator and discriminator with the Adam optimizer, where the batch size is 32, the learning rate is set to 0.0001 at the beginning and decays at 0.999, and the epoch is set to 15. In the training of the GAN, we train the Discriminator ${D_{{S_0}}}$ and ${D_{DoLP}}$ 3 times, and then train the generator 1 time. The proposed networks are trained and tested on a GeForce RTX 3090 (24 G RAM) machine.

 figure: Fig. 1.

Fig. 1. Schematic of indoor experiment

Download Full Size | PDF

3.2 Framework overview

  • (1) Architecture of generator: The generator structure is shown as the blue block of the network framework diagram in Fig. 2. The generator mainly consists of three blocks: self-attention block, detail-complement block and fusion block, which will be described in detail below. Specifically, the ${S_0}$ image and the DoLP image are used as the source images. The feature information of the source images is introduced into the self-attention block and the detail-complement block, respectively. The semantic information and depth detail information are extracted using a multi-network combination approach.

The self-attention block extracts contextual semantic features, considering global correlations which are independent of the receptive field. However, this block is affected by the maximum pooling layer, which causes details lost on the feature layers. Therefore, we use the detail supplement block to retain these lost parts and extract the depth feature information of the source images. Finally, the fusion block is used to fuse the information obtained from the self-attention block and the detail-complement block to reconstruct the fused image.

 figure: Fig. 2.

Fig. 2. Illustration of the overall framework. Network architecture consists of two discriminators and one generator. The part located in the blue block indicates the generator and the part located in the yellow block indicates the discriminator. Layers of the same color denote layers with the same function but no shared parameters. The same color arrows indicate the same feature mapping.

Download Full Size | PDF

Self-attention block (Fig. 2(a)): The purpose of this block is to compute the attention map that helps the generator to focus more on the target region. The feature maps captured from the source images are limited, which are changed with spatial location. In this case, the extraction of features requires attention mechanism and long-term dependency. Thus, global dependencies need to be depicted to correlate the different locations between the fusion result and the source images. For CNNs, on the one hand the interaction between images and convolution kernels is content independent. On the other hand, the long-range modeling capability of convolution is weak when extracting local features. Therefore, it is difficult for CNN-based methods to extract the salient features of intensity and polarization images.

To obtain the self-attention map of the source images. In the self-attention block, feature extraction is first performed by a 5 × 5 convolutional layer with a concatenation operation. Then the fused features are extracted by 3 × 3 convolutional layer. The batch normalization layer (BN) is added after all the convolutional layers to avoid gradient disappearance, and the activation layer is Leaky ReLU. After that, the maximum pooling layer is used to improve the ability of extracting features with the stride set to 2. Then the feature maps are performed the self-attention operation, as shown in Fig. 3, which follow the self-attention mechanism in [37]. Finally, the feature map is upsampled twice using nearest neighbor interpolation, which keeps the same image scale as the input image. Each upsampling goes through a 3 × 3 convolutional layer to compress the data in the feature layer to obtain the final self- attention map.

 figure: Fig. 3.

Fig. 3. Flowchart of self-attention mechanism operation.

Download Full Size | PDF

Detail supplement block (Fig. 2(b)): All convolutional layers of the detail supplement block use zero padding to keep the feature maps the same size, and the stride gets set to 1. Besides, the convolutional layers are followed by a batch normalization and a Leaky ReLU activation layer. Specifically, the shallow features of the source images are first extracted through a 5 × 5 convolution layer. The obtained shallow convolutional features are then passed through a dense block (DB) [15] to obtain the deeper nonlinear feature information. The DB architecture preserves as much detailed information as possible and avoids overfitting. The DB architecture is shown in Fig. 2, where each convolutional layer outputs 16 channels. Then, the input of each layer in the DB is directly connected to all previous layers for feature reuse.

Fusion block (Fig. 2(c)): The feature maps obtained from the previous blocks are concatenated as the input to the fusion block. The difference is that the first two layers use 3 × 3 convolutional layers followed by normalization and Leaky ReLU activation layers, and the third layer uses 1 × 1 convolutional layers and tanh activation layers as replacements to reconstruct the final fused image.

  • (2) Architecture of discriminator (PathGAN): In this method, ${D_{{S_0}}}$ and ${D_{DoLP}}$ are two Markov discriminators with identical systems and are independent [38], as shown in the yellow block of Fig. 2. Because Markovian discriminator can constrain high-frequency information in the data, it is often used in vision tasks that require a high degree of detail clarity, as shown in Fig. 4. The network structure consists of five 3 × 3 convolutional layers and uses batch normalization (BN) to prevent gradient explosion. The output of the discriminator is a matrix of size 5 × 5. Each matrix value can reflect a local region of the discriminated image, which is the receptive field. The final obtained probability is an average of the probabilities of all pixel blocks. Thus, this discriminator forces the generator to pay more attention to details in adversarial gaming and retains more information about the source images.

 figure: Fig. 4.

Fig. 4. Markovian discriminators applied in our method.

Download Full Size | PDF

3.3 Loss function

  • (1) Generator loss function: The training generator G learns the map from the source images (${S_0}$ and DoLP) to generate the fused image. The loss function of the generator consists of two components: adversarial loss and content loss:
    $${L_G} = {L_{adv}}(G )+ \lambda {L_{con}},$$
    where ${L_G}$ is the total loss, the first term ${L_{adv}}$ is the adversarial loss, and the second term ${L_{con}}$ is the content loss. The $\lambda $ is the hyperparameter, which is the balance factor between the two loss functions. Here, the $\lambda $ is set as 50.

Adversarial loss of the generator: In the proposed network structure, two Markovian discriminators with independent architecture are set up so that the fusion result retains more information about the intensity and polarization, respectively. The adversarial loss is mainly divided into two parts, which are used to distinguish the probability distributions of the generated images from ${S_0}$ and DoLP, respectively. The adversarial loss of generators is defined as:

$${L_{adv}}(G )= \frac{1}{N}{\sum\limits_{n = 1}^N {({{D_{{S_0}}}({I_f^n} )- c} )} ^2} + \frac{1}{N}{\sum\limits_{n = 1}^N {({{D_{DoLP}}({I_f^n} )- c} )} ^2},$$
where ${I_f}$ is the fusion image, the ${D_{{S_0}}}({\cdot} )\textrm{}$ is the probability that ${S_0}$ discriminates the fusion image as the intensity image. Similarly, the ${D_{DoLP}}({\cdot} )$ denotes the probability that DoLP discriminates the fusion image as the polarization image. Where $n \in {\mathrm{\mathbb{N}}_N}$, N represents the number of fused images, $I_f^n$ represents the ${n_{th}}$ fused image.

Content loss: Content loss encourages the generator to generate images with similar data distribution compared to the source images. For the intensity image, the background information is more abundant, which is consistent with the observation habits of human eyes. For the polarization image, the polarization information still has high-contrast under low illumination conditions. Moreover, we design a suitable loss function to fit the pixel distribution of the source images, and output the fusion result with reasonable distribution. The content loss function consists of two aspects, including structural similarity loss ${L_{ssim}}$ and intensity loss ${L_{int}}$, denoted as:

$${L_{con}} = {L_{ssim}} + \beta {L_{int}},$$
where $\beta $ is the factor that control the balance between these, and the value $\beta $ = 10.

Considering the contrast and structural similarity between the fusion image and the source images, the loss ${L_{SSIM}}$ is added to the content loss as follows:

$${L_{ssim}} = 1 - ({{\gamma_w} \cdot SSIM({{I_{{S_0}}},{I_f}} )+ ({1 - {\gamma_w}} )SSIM({{I_{DoLP}},{I_f}} )} ),$$
where ${I_{{S_0}}}$ and ${I_{DoLP}}$ denote the intensity image and the line-polarization image, respectively, and ${\gamma _w}$ is the weighting coefficient. Structural similarity measure $SSIM({x,y} )$ is a value to calculate the brightness, contrast and structural difference between x and y [39]. This is calculated as follows:
$$SSIM({x,y} )= \frac{{({2{\mu_x}{\mu_y} + {c_1}} )({2{\sigma_{xy}} + {c_2}} )}}{{({\mu_x^2 + \mu_y^2 + {c_1}} )({\sigma_x^2 + \sigma_y^2 + {c_2}} )}},$$
where µ denotes the pixel mean of the image and $\sigma $ denotes the standard deviation of the image. The larger the SSIM, the higher the structural similarity between the two terms. ${c_1}$ and ${c_2}$ are constants, set to 1 × 10−4 and 9 × 10−4 respectively.

The weighting factor ${\gamma _w}$ indicates the local detail richness in the source images, where the greater the local variance, the greater ${\gamma _w}$. The ${\gamma _w}$ is calculated as follows:

$${\gamma _w} = \frac{{f({\sigma_{{S_0}}^2} )}}{{f({\sigma_{{S_0}}^2} )+ f({\sigma_{DoLP}^2} )}},$$
where $f(x )= \max ({x,0.0001} )$ enhances the stability of the function.

The loss function ${L_{SSIM}}$ has a relatively weak constraint on the pixel value distribution. We introduce intensity loss to fit the distribution of source images. The intensity loss is defined as follows:

$${L_{int}} = \frac{1}{{HW}}||{{I_f} - \max ({{I_{{S_0}}},{I_{DoLP}}} )} ||_F^2,$$
where H and W represent the height and width of the image, respectively. The max operation selects the maximum pixel value between the two terms. ${|| \cdot ||_F}$ denotes the Frobenius norm.
  • (2) Discriminator loss function: If the approach based on GAN is designed with only one discriminator, some information about the presence of the input image may be lost. In the proposed network, two independent discriminators are used to constrain the generator. Therefore, the loss function is defined as:
    $${L_{{D_{{S_0}}}}} = \frac{1}{N}{\sum\limits_{n = 1}^N {({{D_{{S_0}}}({I_{{S_0}}^n} )- a} )} ^2} + \frac{1}{N}{\sum\limits_{n = 1}^N {({{D_{{S_0}}}({I_f^n} )- b} )} ^2},$$
    $${L_{{D_{DoLP}}}} = \frac{1}{N}{\sum\limits_{n = 1}^N {({{D_{DoLP}}({I_{DoLP}^n} )- a} )} ^2} + \frac{1}{N}{\sum\limits_{n = 1}^N {({{D_{DoLP}}({I_f^n} )- b} )} ^2},$$
    where ${L_{{D_{{S_0}}}}}$ denotes the loss of the intensity image discriminator, and ${L_{{D_{DoLP}}}}$ denotes the loss of the polarization image discriminator. $I_{S0}^n$ represents the ${n_{th}}$ intensity image, and $I_{DoLP}^n$ represents the ${n_{th}}$ polarization image. ${D_{{S_0}}}({\cdot} )$ is dedicated to distinguish the intensity image from the fused image, while ${D_{DoLP}}({\cdot} )$ is intended to distinguish the polarization image from the fused image. The individual discriminators are in an adversarial relationship with the generators. The generator is forced to improve the artifact capabilities during training and strive to capture the key features of the intensity and polarization images.

4. Experimental results

In this section, we first explore the effect of polarization on the fusion results of different weak targets under bright light in the laboratory, and then compare the fusion results of our network with several other advance methods in terms of subjective and objective aspects within the indoor and outdoor datasets. These methods include four traditional methods, such as CVT [9], DTCWT [10], Wavelet [13], GTF [14], and seven deep-learning based methods, GANMcC [40], Perceptual_FusionGan [41], PMIG [17], U2Fusion [2], DIFNet [42], PFNet [24], and TIPFNet [43] respectively. The parameters of these 11 methods are set as reported in the original works.

4.1 Indoor polarization information fusion results

  • (1) Subjective evaluation: We use the successfully trained network model to explore the effect of polarization information on the fusion results. Thus, we first explore the effect of different polarization components on the fusion results in the laboratory under bright light, choosing the background to be wooden board and the target to be ink letter, as can be seen from Fig. 5. It can be seen that the polarization components ${S_1}$ and ${S_2}$ have the weakest effect on highlighting weak targets. AoP can present some salient target information, however it is not stable and has a lot of noise. DoLP presents the best effect and is the most stable.

Different target materials have different polarization characteristics. Under the same experimental conditions, the following test is performed to investigate the target polarization fusion results for different materials. As shown in Fig. 6, the target of (a1) is an ink letter with the wooden board in the background; the target of (a2) is an iron letter with a wooden board in the background; the target of (a3) is a laser inscription on iron plate with an iron board in the background; the target of (a4) is a paper letter with an iron board in the background; and the target of (a5) is a paper letter with a wooden board in the background. As can be seen from the fusion results Fig. 6(c1) ∼(c3), because of the higher degree of polarization of ink and iron, the DoLP images can highlight the weak target information in low contrast (the red box is the information supplemented by DoLP), and are fused with the ${S_0}$ images to generate images with improved quality, which can detect the target information completely. Compared with the ${S_0}$ images, the fused images have better imaging effect. Because the degree of polarization of iron is higher than that of the paper strip, the DoLP image also brings out the complete target information as is evident from Fig. 6(c4). In Fig. 6(c5), the degree of polarization of both the wooden board and the paper strip are relatively low, resulting in failure of highlighting target information by the DoLP image, however some improvement can still be observed.

 figure: Fig. 5.

Fig. 5. Imaging weak targets: (a1–a4) ${\textrm{S}_0}$; (b1–b4) ${S_1}$, $\textrm{}{S_2}$, AoP, DoLP; (c1–c4) fusion results from ${\textrm{S}_0}$ and ${S_1}$, $\textrm{}{S_2}$, AoP, DoLP respectively.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. Imaging weak targets with different materials: (a1–a5) ${\textrm{S}_0}$; (b1–b5) DoLP; (c1–c5) fusion results from ${\textrm{S}_0}$ and DoLP.

Download Full Size | PDF

To further explore the effect of image fusion for weak targets of different complexity, we tested a metal ruler strip, a magic cube, a plastic toy car and a small fan, and the results are shown in Fig. 7. For complex targets, our fusion method also significantly improves the imaging effect. The polarized information supplemented by the DoLP images is indicated by red and yellow boxes in the fusion results. Compared with the ${S_0}$ images, our results have most of the target information. Although the imaging results differ from those under normal illumination, most of the target information can be represented, which considerably improve the imaging effect.

 figure: Fig. 7.

Fig. 7. Imaging weak targets with different complexity: (a1–a4)$\textrm{}{\textrm{S}_0}$; (b1–b4) DoLP; (c1–c4) fusion results from ${\textrm{S}_0}$ and DoLP.

Download Full Size | PDF

Subsequently, as illustrated in Fig. 8, we present an additional comparison of the fusion results with other algorithms. The polarization and intensity images are the first two, respectively, followed by images showing the results of other methods. The last image is from our proposed method. The red and yellow boxes indicate supplementary polarization information. Perceptual_FusionGan, GANMcC and PMIG appear to have significant distortion and noise, while DIFNet presents a lack of texture. Other traditional methods are not as effective in retaining polarization information. Overall, our fusion result maintains high quality fusion while preserving rich background intensity information and high contrast polarization information.

  • (2) Objective evaluation: Subjective evaluation relies on the visual perception of the human eye, which is more individualized. Therefore, this paper introduces seven important objective metrics for evaluating the quality of fusion images, including information entropy (EN) [44], standard deviation (SD) [45], mutual information (MI) [46], QAB/F [47], structural similarity index measure (SSIM) [39], peak signal to noise ratio (PSNR), and visual information fidelity (VIF) [48].

 figure: Fig. 8.

Fig. 8. Fusion results of the indoor dataset by different methods.

Download Full Size | PDF

EN and SD are used as important indexes to calculate the information content and gradient of the fused images. MI indicates the correlation between the two images by calculating the mutual information about the source image and the fused image. SSIM is an important index of the structural similarity between the source and fused images. PSNR measures the distortion of the fused images, and the higher the PSNR value, the more informative the fused images are. QAB/F is a pixel-level image fusion quality assessment metric that reflects the quality of the visual information obtained from the fusion of the input images. VIF calculates the fidelity of fused image information and can evaluate the loss of image information during the fusion process. The higher the value of these objective metrics, the better the quality of the fusion represented.

As shown in Table 1, on the indoor test set, our method achieves the maximum data averages for the three metrics of SSIM, QAB/F, and VIF, and sub-optimal performance for SD, MI, and PSNR, indicating that our method preserves stronger contrast with good texture and image fidelity, which is better suited for human eye perception. Our method also achieves sub-optimal performance in terms of SD, MI and PSNR. Overall, our method is optimistic in terms of performance on the indoor dataset test.

Tables Icon

Table 1. Objective evaluation of the indoor dataset by different methods. The average values of the different metrics are shown in this table (red: optimal, blue: suboptimal)

4.2 Outdoor polarization information fusion results

  • (1) Subjective evaluation: As can be seen from Fig. 910, for each set of images, the first two images are the polarization image and the intensity image, respectively. While, the following images are the results of other methods. Image produced from our proposed method is the last one. The red and green boxes are the enlargement of the fusion results.

For the polarization image (DoLP), it can be considered as the most intense polarization state. As can be seen on the polarization images in Fig. 9 and Fig. 10, the polarization characteristics of DoLP are not overwhelmed by the bright light, exhibiting a stable imaging effect. Specifically, the high contrast target information is reflected from the DoLP image in Fig. 9 because of the high degree of polarization from glass and people. The Chinese character markers on the building under backlight in Fig. 10 are also shown in similar fashion. Moreover, polarized light has strong penetration, resulting in the visibility of target inside the window, as can be seen in Fig. 9. Meanwhile, the DoLP images reflect the target surface properties and usually contain more texture, which can be shown by the enlarged green boxes in Fig. 9 and Fig. 10.

 figure: Fig. 9.

Fig. 9. Fusion results of the outdoor dataset in scene 1 by different methods.

Download Full Size | PDF

 figure: Fig. 10.

Fig. 10. Fusion results of the outdoor dataset in scene 2 by different methods.

Download Full Size | PDF

Considering the arithmetic method, the subjective fusion results of most methods achieve appropriate polarization and intensity image fusion. However, it is obvious that our results can maintain not only the high-contrast polarization information but also the rich background intensity information.

In Fig. 9, most of the methods suffer from some loss of information or contrast. The fusion results of traditional methods do not look natural, especially like GTF due to the smoothing of the image, which results in the disappearance of the image texture, in which the important polarization information is not preserved. Relative to the deep learning methods, DIFNet appears to have a noticeable lack of target information and texture. Perceptual_FusionGan does not have sufficient polarization information although the background information is better preserved. For PMIG, the polarization information is obvious, but the background information is not sufficient. The overall brightness of GANMcC and U2Fusion is low, resulting in the inability to achieve good results. The fusion result of PFNet does not preserve the background information of the intensity image well, resulting in low image contrast. In contrast, our method preserves a lot of obvious plant details (as shown in the green box), and salient polarization information of the person (as shown in the red box). In addition, the background brightness of our images is closer to the intensity image. The final fusion result is not disturbed by the redundant information of the polarization image. Otherwise it causes the overall image to be dark, low-contrast and have overall bad visual effect.

As can be seen from Fig. 10, our method performs significantly better compared to other methods. The result preserves not only sufficiently salient polarization information, but also rich intensity information of the background. The polarization information in the red and green boxes in Fig. 10 is clear and obvious. The enhancement of polarization information for weak targets is fully represented here. Furthermore, human eye visual effect as an important subjective evaluation point, our images look natural.

In conclusion, our method retains the information of both polarization and intensity dimensions well. The fusion results show high-contrast target polarization information and rich background intensity information, enabling weak targets polarization imaging in bright light background.

  • (2) Objective evaluation: As shown in Table 2, our method achieves not only the maximum data averages for four metrics on the outdoor test set, including MI, SSIM, QAB/F and VIF, but also obtains suboptimal performance for the EN, SD and PNSR.

Tables Icon

Table 2. Objective evaluation of the outdoor dataset by different methods. The average values of the different metrics are shown in this table (red: optimal, blue: suboptimal)

The values of MI and SSIM indicate that the fusion results are more informative with respect to the source images. The maximum values of QAB/F and VIF indicate that our fusion results retain more target information and the results are more consistent with the visual appeal and perception of human vision. Although the results in EN and SD are suboptimal, which also means that the results of our method have sufficient target information and texture. Regarding the suboptimal results achieved by PNSR, the possible reason is that we fuse more polarization information, and the redundant information of the polarization image brings a lot of noise, which leads to the degradation of the PSNR metric. However, from the subjective results, our images retain more information of the source images and appear natural.

4.3 Ablation study

  • (1) Ablation study of the network structure: the generator is mainly composed of a self-attention block, a detail supplement block and a fusion block. In order to prove the effectiveness of the network structure, we made some ablative modifications to the network structure. To verify the effect of the detail supplement block, the block is removed directly. Hence, the generator consists of the self-attention block and the fusion block. To verify the effect of the self-attention mechanism, we remove the self-attention block. Thus, the generator model consists of the detail supplement block and the fusion block. As shown in Fig. 11, the texture of plant, the salient information of people and the railings on the building in the image are attenuated. As can be confirmed from Table 3, among various configurations, the method of removing the detail supplement block has the worst results in terms of metrics compared to the method of combined detail supplement block and the self-attention block. Although, the method of removing the self-attention block gives improved results, however our proposed method still outperforms it significantly.
  • (2) Ablation experiments with loss function: We fit the pixel distribution of polarization image fusion with structural similarity loss ${L_{ssim}}$ and intensity loss ${L_{int}}$, and set hyperparameters to adjust the weights between them. To verify the effect of ${L_{ssim}}$ and ${L_{int}}$, we remove one of them and retrain the model respectively. The subjective results can be seen from Fig. 11. In the absence of ${L_{ssim}}$, although the fusion result contains rich textures, the overall image contrast is low. On the other hand, the fusion result suffers from severe missing texture, bias, and visual distortion in the absence of ${L_{int}}$, including missing wall texture and distortion from the tower crane. In conclusion, it is not enough to use a single loss for the pixel fitting of the fusion result, but it needs ${L_{ssim}}$, and ${L_{int}}$ together to preserve more texture and contrast. As can be seen from Table 3, for EN, SSIM, QAB/F and VIF, our method has outperformed other methods.

5. Conclusion

In this paper, a new network of polarization and intensity image fusion is proposed to achieve imaging of weak targets under bright light. The network framework consists of a dual-discriminator structure and a generator structure, which enables the generator to consider information from both polarization and intensity images. Meanwhile, a self-attention mechanism is introduced to emphasize context dependency, which can highlight salient information in the source images. Because polarization images can provide detailed features such as surface shape, contrast and roughness even under poor illumination and low contrast, the effect of polarized light on the fusion results of different weak targets is first explored in a bright light background, i.e., image fusion with different polarization components, image fusion with different materials, and image fusion with different complexity. In addition, the generalizability and advancement of the network is discussed. Compared with other 11 advanced fusion algorithms, our proposed method is more natural in the subjective evaluation of fused images and it meets the human visual effect. On objective evaluation, based on different evaluation metrics, the proposed method obtains multiple optimal values.

 figure: Fig. 11.

Fig. 11. Visualization results of the ablation on the dataset. The groups are DoLP images, ${S_0}$ images, fusion results without the self-attention block, fusion results without the detail supplement block, fusion results without the $\textrm{}{L_{int}}$, fusion results without the ${L_{ssim}}$ and our fusion results.

Download Full Size | PDF

Tables Icon

Table 3. Objective evaluation of ablation on the dataset (bold: optimal)

Funding

Anhui Provincial Key Research and Development Plan (2023z04020018); National Natural Science Foundation of China (61775050).

Disclosures

The authors declare no conflicts of interest.

Data availability

The data that support the findings of this study are openly available at [49] .

References

1. J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Inf. Fusion 45, 153–178 (2019). [CrossRef]  

2. H. Xu, J. Ma, J. Jiang, et al., “U2fusion: A unified unsupervised image fusion network,” IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 502–518 (2022). [CrossRef]  

3. H. Zhang, Z. Le, Z. Shao, et al., “MFF-GAN: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion,” Inf. Fusion 66, 40–53 (2021). [CrossRef]  

4. H. Xu, J. Ma, and X.-P. Zhang, “MEF-GAN: Multi-exposure image fusion via generative adversarial networks,” IEEE Trans. on Image Process. 29, 7203–7216 (2020). [CrossRef]  

5. B. Yang and S. Li, “Multifocus image fusion and restoration with sparse representation,” IEEE Trans. Instrum. Meas. 59(4), 884–892 (2010). [CrossRef]  

6. L. B. Wolff, “Polarization vision: a new sensory approach to image understanding,” Image Vis Comput. 15(2), 81–93 (1997). [CrossRef]  

7. H. Shen and P. Zhou, “Near natural color polarization imagery fusion approach,” in Proceedings of the IEEE International Congress on Image and Signal Processing, (2010), pp. 2802–2805.

8. M. Bina, D. Magatti, M. Molteni, et al., “Backscattering differential ghost imaging in turbid medium,” Phys. Rev. Lett. 110(8), 083901 (2013). [CrossRef]  

9. F. Nencini, A. Garzelli, S. Baronti, et al., “Remote sensing image fusion using the curvelet transform,” Inf. Fusion 8(2), 143–156 (2007). [CrossRef]  

10. J. Lewis, R. O’Callaghan, S. Nikolov, et al., “Pixel-and region-based image fusion with complex wavelets,” Inf. Fusion 8(2), 119–130 (2007). [CrossRef]  

11. V. Naidu, “Image fusion technique using multi-resolution singular value decomposition,” Def. Sc. Jl. 61(5), 479–484 (2011). [CrossRef]  

12. A. Toet, “Image fusion by a ratio of low-pass pyramid,” Pattern Recognit Lett. 9(4), 245–253 (1989). [CrossRef]  

13. L. Chipman, T. Orr, and L. Graham, “Wavelets and image fusion,” in Proceedings of the IEEE International Conference on Image Processing, (1995), pp. 248–251.

14. J. Ma, C. Chen, C. Li, et al., “Infrared and visible image fusion via gradient transfer and total variation minimization,” Inf. Fusion 31, 100–109 (2016). [CrossRef]  

15. H. Li and X.-J. Wu, “Densefuse: A fusion approach to infrared and visible images,” IEEE Trans. on Image Process. 28(5), 2614–2623 (2019). [CrossRef]  

16. L. Jian, X. Y. ang, Z. Liu, et al., “Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion,” IEEE Trans. Instrum. Meas. 70, 1 (2021). [CrossRef]  

17. H. Zhang, H. Xu, Y. Xiao, et al., “Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity,” in Proceedings of the AAAI Conference on Artificial Intelligence, (2020), pp. 12797–12804.

18. Y. Liu, X. Chen, J. Cheng, et al., “A medical image fusion method based on convolutional neural networks,” in Proceedings of 2017 20th International Conference on Information Fusion, (2017), pp. 1–7.

19. J. Ma, W. Yu, P. Liang, et al., “FusionGan: A generative adversarial network for infrared and visible image fusion,” Inf. Fusion 48, 11–26 (2019). [CrossRef]  

20. J. Ma, H. Xu, J. Jiang, et al., “DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion,” IEEE Trans. on Image Process. 29, 4980–4995 (2020). [CrossRef]  

21. B. Lin, X. Fan, D. Li, et al., “High-Performance Polarization Imaging Reconstruction in Scattering System under Natural Light Conditions with an Improved U-Net,” Photonics. 10(2), 204 (2023). [CrossRef]  

22. D. Li, B. Lin, X. Wang, et al., “High-Performance Polarization Remote Sensing with the Modified U-Net Based Deep-Learning Network,” IEEE Trans. Geosci. Remote Sensing 60, 1–10 (2022). [CrossRef]  

23. X. Wang, T. Hu, D. Li, et al., “Performances of Polarization-Retrieve Imaging in Stratified Dispersion Media,” Remote. Sens. 12(18), 2895 (2020). [CrossRef]  

24. J. Liu, J. Duan, Y. Hao, et al., “Semantic-guided polarization image fusion method based on a dual-discriminator GAN,” Opt. Express 30(24), 43601–43621 (2022). [CrossRef]  

25. J. Zhang, J. Shao, J. Chen, et al., “PFnet: an unsupervised deep network for polarization image fusion,” Opt. Lett. 45(6), 1507–1510 (2020). [CrossRef]  

26. J. Zhang, J. Shao, J. Chen, et al., “Polarization image fusion with self-learned fusion strategy,” Pattern Recognit. 118, 108045 (2021). [CrossRef]  

27. B. Schaefer, E. Collett, R. Smyth, et al., “Measuring the stokes polarization parameters,” Am. J. Phys. 75(2), 163–168 (2007). [CrossRef]  

28. F. Liu, P. L. Han, Y. Wei, et al., “Deeply seeing through highly turbid water by active polarization imaging,” Opt. Lett. 43(20), 4903–4906 (2018). [CrossRef]  

29. X. Wang, R. Girshick, A. Gupta, et al., “Non-local neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), pp. 7794

30. D. Li, K. Guo, Y. Sun, et al., “Depolarization Characteristics of Different Reflective Interfaces Indicated by Indices of Polarimetric Purity (IPPs),” Sensors 21(4), 1221 (2021). [CrossRef]  

31. P. Wang, D. Li, X. Wang, et al., “Analyzing Polarization Transmission Characteristics in Foggy Environments Based on the Indices of Polarimetric Purity,” IEEE Access 8, 227703–227709 (2020). [CrossRef]  

32. D. Li, C. Xu, L. Yan, et al., “High-performance scanning-mode polarization based computational ghost imaging (SPCGI),” Opt. Express 30(11), 17909 (2022). [CrossRef]  

33. D. Li, C. Xu, M. Zhang, et al., “Measuring glucose concentration in solutionbased on the Indices of Polarimetric Purity,” Biomed. Opt. Express 12(4), 2447–2459 (2021). [CrossRef]  

34. Y.-Q. Zhao, P. Gong, and Q. Pan, “Object detection by spectropolarimeteric imagery fusion,” IEEE Trans. Geosci. Remote Sensing 46(10), 3337–3345 (2008). [CrossRef]  

35. I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Information Processing Systems, (2014), pp. 2672–2680.

36. X. Mao, Q. Li, H. Xie, et al., “Least squares generative adversarial networks,” in Proceedings of IEEE International Conference on Computer Vision, (2017), pp. 2794–2802.

37. H. Zhang, I. Goodfellow, D. Metaxas, et al., “Self-attention generative adversarial networks,” in Proceedings of the 36th International Conference on Machine Learning, (2019), pp. 7354–7363.

38. H. Zhang, J. Yuan, X. Tian, et al., “GAN-FM: Infrared and visible image fusion using GAN with full-scale skip connection and dual Markovian discriminators,” IEEE Trans. Comput. Imaging 7, 1134–1147 (2021). [CrossRef]  

39. Z. Wang and A. C. Bovik, “A universal image quality index,” IEEE Signal Process. Lett. 9(3), 81–84 (2002). [CrossRef]  

40. J. Ma, H. Zhang, Z. Shao, et al., “GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion,” IEEE Trans. Instrum. Meas. 70, 1–14 (2021). [CrossRef]  

41. Y. Fu, X.-J. Wu, and T. Durrani, “Image fusion based on generative adversarial network consistent with perception,” Inf. Fusion 72, 110–125 (2021). [CrossRef]  

42. H. Jung, Y. Kim, H. Jang, et al., “Unsupervised deep image fusion with structure tensor representations,” IEEE Trans. on Image Process. 29, 3845–3858 (2020). [CrossRef]  

43. K. Li, M. Qi, S. Zhuang, et al., “TIPFNet: a Transformer-based infrared polarization image fusion network,” Opt. Lett. 47(16), 4255–4258 (2022). [CrossRef]  

44. J. W. Roberts, J. A. Van Aardt, and F. B. Ahmed, “Assessment of image fusion procedures using entropy, image quality, and multispectral classification,” J Appl Remote Sens. 2, 1–28 (2008). [CrossRef]  

45. Y.-J. Rao, “In-fibre bragg grating sensors,” Meas. Sci. Technol. 8(4), 355–375 (1997). [CrossRef]  

46. G. Qu, D. Zhang, and P. Y. an, “Information measure for performance of image fusion,” Electron. Lett. 38(7), 313–315 (2002). [CrossRef]  

47. C. Xydeas and V. Petrovic, “Objective image fusion performance measure,” Electron. Lett. 36(4), 308–309 (2000). [CrossRef]  

48. H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Trans. on Image Process. 15(2), 430–444 (2006). [CrossRef]  

49. X. Zeng, “Polarization-motivating-high-performance-weak-targets-imaging-based-on-dual-discriminator-GAN,” Github, (2023) https://github.com/zxb116/Polarization-motivating-high-performance-weak-targets-imaging-based-on-dual-discriminator-GAN

Data availability

The data that support the findings of this study are openly available at [49] .

49. X. Zeng, “Polarization-motivating-high-performance-weak-targets-imaging-based-on-dual-discriminator-GAN,” Github, (2023) https://github.com/zxb116/Polarization-motivating-high-performance-weak-targets-imaging-based-on-dual-discriminator-GAN

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. Schematic of indoor experiment
Fig. 2.
Fig. 2. Illustration of the overall framework. Network architecture consists of two discriminators and one generator. The part located in the blue block indicates the generator and the part located in the yellow block indicates the discriminator. Layers of the same color denote layers with the same function but no shared parameters. The same color arrows indicate the same feature mapping.
Fig. 3.
Fig. 3. Flowchart of self-attention mechanism operation.
Fig. 4.
Fig. 4. Markovian discriminators applied in our method.
Fig. 5.
Fig. 5. Imaging weak targets: (a1–a4) ${\textrm{S}_0}$; (b1–b4) ${S_1}$, $\textrm{}{S_2}$, AoP, DoLP; (c1–c4) fusion results from ${\textrm{S}_0}$ and ${S_1}$, $\textrm{}{S_2}$, AoP, DoLP respectively.
Fig. 6.
Fig. 6. Imaging weak targets with different materials: (a1–a5) ${\textrm{S}_0}$; (b1–b5) DoLP; (c1–c5) fusion results from ${\textrm{S}_0}$ and DoLP.
Fig. 7.
Fig. 7. Imaging weak targets with different complexity: (a1–a4)$\textrm{}{\textrm{S}_0}$; (b1–b4) DoLP; (c1–c4) fusion results from ${\textrm{S}_0}$ and DoLP.
Fig. 8.
Fig. 8. Fusion results of the indoor dataset by different methods.
Fig. 9.
Fig. 9. Fusion results of the outdoor dataset in scene 1 by different methods.
Fig. 10.
Fig. 10. Fusion results of the outdoor dataset in scene 2 by different methods.
Fig. 11.
Fig. 11. Visualization results of the ablation on the dataset. The groups are DoLP images, ${S_0}$ images, fusion results without the self-attention block, fusion results without the detail supplement block, fusion results without the $\textrm{}{L_{int}}$, fusion results without the ${L_{ssim}}$ and our fusion results.

Tables (3)

Tables Icon

Table 1. Objective evaluation of the indoor dataset by different methods. The average values of the different metrics are shown in this table (red: optimal, blue: suboptimal)

Tables Icon

Table 2. Objective evaluation of the outdoor dataset by different methods. The average values of the different metrics are shown in this table (red: optimal, blue: suboptimal)

Tables Icon

Table 3. Objective evaluation of ablation on the dataset (bold: optimal)

Equations (14)

Equations on this page are rendered with MathJax. Learn more.

S 0 = ( I 0 + I 45 + I 90 + I 135 ) / 2 , S 1 = I 0 I 90 , S 2 = I 45 I 135 ,
D o L P = ( S 1 ) 2 + ( S 2 ) 2 / S 0 ,
A o P = a r c t a n ( S 2 / S 1 ) / 2 ,
min D V L S G A N ( D ) = 1 2 E X P data [ ( D ( x ) a ) 2 ] + 1 2 E X P G [ ( D ( G ( x ) ) b ) 2 ] ,
min G V L S G A N ( G ) = 1 2 E X P G [ ( D ( G ( x ) ) c ) 2 ] ,
L G = L a d v ( G ) + λ L c o n ,
L a d v ( G ) = 1 N n = 1 N ( D S 0 ( I f n ) c ) 2 + 1 N n = 1 N ( D D o L P ( I f n ) c ) 2 ,
L c o n = L s s i m + β L i n t ,
L s s i m = 1 ( γ w S S I M ( I S 0 , I f ) + ( 1 γ w ) S S I M ( I D o L P , I f ) ) ,
S S I M ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 ) ,
γ w = f ( σ S 0 2 ) f ( σ S 0 2 ) + f ( σ D o L P 2 ) ,
L i n t = 1 H W | | I f max ( I S 0 , I D o L P ) | | F 2 ,
L D S 0 = 1 N n = 1 N ( D S 0 ( I S 0 n ) a ) 2 + 1 N n = 1 N ( D S 0 ( I f n ) b ) 2 ,
L D D o L P = 1 N n = 1 N ( D D o L P ( I D o L P n ) a ) 2 + 1 N n = 1 N ( D D o L P ( I f n ) b ) 2 ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.