Dual-path joint correction network for underwater image enhancement

Dehuan Zhang; Jiaqi Shen; Jingchun Zhou; Erkang Chen; Weishi Zhang; Weishi Zhang

doi:10.1364/OE.468633

1. Introduction

With the scarcity of land resources, the precious resources in the marine world have gradually become the focus, and people have begun to formulate marine strategies [1] actively. Underwater images are an essential medium of ocean information. The quality of underwater images is significant for exploring and utilizing the deep sea [2], such as target detection, submarine mineral, and marine energy exploration. To date, underwater optical imaging has been one of the challenging fields in computer vision research [3–5]. The most common underwater imaging model derives from the Jaffe McGlamery model [6–9].

(1)$${I^c}(x) = {J^c}(x){t^c}(x) + {B^c}(1 - {t^c}(x)),c \in \{ r,g,b\} ,$$

where ${I^c}(x)$ is the observed intensity in the color channel c of the input image at the pixel x, ${J^c}$ represents the restored image, ${B^c}$ indicates the background light, and ${t^c}$ is the transmission map, where c represents the red, green, and blue channels.

(2)$${t^{c = }}{e^{ - {\beta ^c}d(x)}},$$

where $d(x)$ is the distance from the camera to the radiant object, and ${\beta ^c}$ is the spectral volume attenuation coefficient for the channel c, where c is one of the red, green, and blue channels.

As shown in the basic model of underwater optical images [10] in Fig. 1, due to the complexity of the underwater scene, the absorption of light by water and the scattering of light by suspended particles are the main reasons for the attenuation of light energy [11]. The scattering includes forward scattering (the reflected light is randomly deviated from the propagation trajectory and then received by the camera) and backscattering (part of the light is reflected by water or suspended particles before reaching the target scene and then received by the camera). Due to the different light attenuation rates of different wavelengths, red light attenuates the fastest in light propagation in water, and blue-green light is the slowest attenuating light, so the collected underwater image has a blue and green tone [12]. The attenuated direct transmission weakens the scene’s intensity, and the surrounding scattered light causes the image’s actual scene color distortion, low contrast, and blurred details [13,14]. Many researchers are devoted to image enhancement methods aimed at obtaining high-quality underwater images [15], exploring the underwater world [16].

Fig. 1. The underwater optical imaging model.

Download Full Size | PDF

Traditional methods based on non-physical models do not consider the relevant complex factors of underwater imaging models and directly improve images’ visual effect by adjusting the images’ pixel values. Traditional methods based on physical models rely on underwater imaging models to generate clear images by estimating atmospheric light and transmission maps [17]. Learning-based methods accurately extract feature information to restore the degraded image, which overcomes the shortcomings of traditional methods, such as enough robustness of detailed information restoration. Learning-based methods combined with the physical model utilize the convolution neural network to obtain the transmission map and atmospheric light, which improves the accuracy of the estimated parameters and then performs an invertible procedure to obtain the restored underwater image.

Although the traditional underwater image enhancement methods [18–20] ignore the optical characteristics of underwater imaging, it is easy to introduce color casts, artifacts, and aggravated noise. Traditional restoration methods [21–24] rely on a specific prior, which leads to the complexity and limitations of estimating parameters, and cannot effectively solve the color distortion of images and removal of artificial light sources. When the traditional methods are invalid for some significantly distorted images or the processing effect is not ideal because of limited technology, deep learning technology has become the mainstream method for image enhancement. The deep learning methods [25–27] mainly revolve around the problems of overall image color distortion and loss of background detail information to restore images. Selective absorption of light in water can cause the change of color tones, and different channels of an RGB image have different absorption coefficients. This leads to the phenomenon of color cast in the extraction of color features in the process of restoring underwater images. In this study, we individually process the RGB color channels to extract different color features for color correction and highlight representative color features by combining the color feature fusion attention module [28,29]. However, the color features alone cannot make the feature extraction comprehensive, the scattering of light will cause a haze effect. The degradation of image quality will make the extraction of texture details appear blurred and unclear. Therefore, a dual-dimensional attention module (DDAM) is designed to extract texture features with high accuracy and refine background information to restore the texture details of images. The fusion of color features and texture features are reconstructed into comprehensive and rich feature content, forming a dual-path network in which the Light Absorption Correction Path (LACP) and the Light Scattering Correction Path (LSCP) are designed in parallel. The dual-path connected U-Net structure aims to enhance the semantic information of the context to obtain rich global features that can focus on the texture and color of the image. The Dual-path Joint Correction Network (DJC-NET) is divided into two stages to effectively extract multi-level feature information and better integrate the characteristics of different features. Figure 2 shows the color distribution and tricolor histogram of the original degraded underwater image and the enhanced result after DJC-NET processing.

Fig. 2. An example of an underwater image before and after enhancement. Top row, from left to right are the degraded underwater image, the corresponding three-dimensional color distribution, and the tricolor histogram. Bottom row, from left to right are enhanced underwater images, and the corresponding three-dimensional color distribution and tricolor histogram.

Download Full Size | PDF

It is evident from Fig. 2 that the color distribution of the original image is heavily greenish, while the color distribution of the enhanced image is well-balanced and broad. The results of the tricolor histogram show that the RGB components of the original image are not only deviated but also that no pixels are distributed between pixel values 0-50, while the RGB components of the enhanced result image are evenly distributed. DJC-NET obtains high-quality features in vivid color, background detail, and contrast-friendly. The main contributions are as follows:

1) We design a novel Triple Color Feature Extraction Module (TCFEM) for underwater image enhancement, which reconstructs the color balance of shallow-deep features by adjusting the unbalanced distribution of RGB triple color channels due to light absorption.
2) We develop a novel Color Feature Fusion Attention Module (CFFAM) to efficiently realize the interaction of color channel information and recalibrate the valuable information of features according to the channel dependencies.
3) We propose a novel dual-dimensional attention mechanism to preserve background texture information of underwater images, enhancing rough texture edges obtained by blurred inputs caused by light scattering.
4) We introduce a joint residual enhancement method of two sub-paths to overcome the interference caused by light’s absorption and scattering effects, ensuring that the network maintains suitable accuracy in deeper cases.

2. Related work

In recent years, deep learning methods have relied on the ingenious network structure design and loss function construction through convolution layers to effectively extract different features, from low-level details to high-level semantics, which can significantly improve the visual representation of underwater images. This section describes two types of implementation processes of the branch network: single-branch structure and multi-branch structure network to introduce advanced methods.

2.1 Single branch deep learning methods

The single-branch deep learning method can intuitively show the structural composition of the UIE network, and the algorithm has better adaptability. Wang et al. [30] improved the generative adversarial network for underwater images (UWGAN) to generate realistic underwater images. [31] introduced a real-time underwater image enhancement model (FUnIE-GAN) based on conditional generative adversarial networks. However, the discriminator stops prematurely in the training process, which affects the learning of the generator. The generator and the discriminator restrict each other, resulting in unstable image recovery and significantly reducing the network performance. Li et al. [32] designed an underwater image enhancement convolutional neural network CNN model (UWCNN) based on underwater scene priors. UWCNN adopts a more intuitive, lightweight single-branch CNN model, which is suitable for the frame-by-frame enhancement of multi-class scenes and videos. Dudhane et al. [33] introduced a deep end-to-end network with a single-branch color feature extraction module combined with a dense residual feature extraction module. The structure of such a single-branch network is relatively simple. The enhancement of underwater images cannot achieve a pleasing effect and cannot be applied to multiple types of scenes.

2.2 Multi-branch deep learning methods

Although the enhancement effect of the single-branch network structure on the image is barely satisfactory, the multi-branch network is more accurate in processing details. In the single-branch end-to-end network model, it is usually challenging to determine the contribution and role of the component modules. In contrast, the multi-branch network focuses on separately processing feature information at different levels to comprehensively restore degraded images.

Wang et al. [34] proposed a CNN network (UIE-Net) based on a pixel interference strategy, which contains enhancement tasks from two branches: color correction (CC-Net) and haze removal (HR-Net). Wang et al. [35] introduced a dual-branch joint dehazing network with a triple-based color correction module and an essential feature extraction module, aiming to preserve the image’s color balance and sharpness by learning transformation coefficients in an iterative mechanism. Although this kind of dual-branch structure considers the task learning of various features of color correction and haze removal, it fails to be widely used in various scenarios due to the use of synthetic datasets for training.

Li et al. [36] proposed a multi-branch fusion network (Water-Net) that combines the input with color correction, contrast enhancement, and sharpening of branches. Wu et al. [37] proposed a two-stage underwater image convolutional neural network (UWCNN-SD) based on structural decomposition, which divided the enhanced network into two branches of high frequency and low frequency and combined them to refine the network again to optimize image quality. However, this method is difficult to effectively perform structural decomposition of high-frequency and low-frequency components, which makes it challenging to achieve the ideal enhancement effect for significantly degraded images.

Li et al. [38] designed an RGB, lab, and HSV three-branch path network (Ucolor) based on transmission guidance by GDCP as the constraint input. However, GDCP exhibits some failure conditions that misguide the image restoration. [36] and [37] utilize traditional methods for pre-processing, Water-Net adpoted the results of traditional methods as the inputs of network, Ucolor introduces medium transmission. The introduction of traditional methods may lead to the existence of effective enhancement for a single underwater scene, the complexity of parameter estimation, and the instability of the recovery process.

Although these methods are successful in varying degrees, their performance is limited by the accuracy of the assumptions used in the target scenes. Compared with the single-branch and multi-branch methods on display, our method effectively solves such problems and performs well in various underwater scenes.

3. Proposed method

In this section, we discuss the specific enhancement details of the underwater image enhancement method. Figure 3 shows the general framework and detailed structure of the DJC-NET. The distorted images are fed into an end-to-end network that simultaneously goes through a two-stage LSCP and LACP correction process.

Fig. 3. The specific architecture of Dual-path Joint Correction Network. Light absorption correction paths and light scattering correction paths are used to improve image quality and extract accurate feature information. The restoration of the degraded image is achieved through the U-Net automatic codec, and an enhanced result image is obtained.

Download Full Size | PDF

Through the processing in the LSCP stage, the color cast caused by light absorption is corrected, and the final color feature information is constructed. By processing the LACP branch, an effect similar to Gaussian blurring caused by light scattering is corrected, and the background texture information of the image is extracted. The color and texture features are tightly connected to an integrative feature fed into the U-Net [39] part of DJC-NET. The extracted feature information is gradually downsampled to a latent vector in the encoding stage. In the decoding stage, the latent vector is upsampled to the original size of the image. The image after each upsampling is skip-connected with the output of the symmetric layer of the corresponding encoding stage, which plays a role in better capturing and disseminating the contextual semantic information, and finally outputs the enhanced result image.

3.1 Light absorption correction path

When light is transmitted underwater, it is absorbed by the water medium, resulting in severe attenuation of the light energy irradiated on the target. The effect of light absorption becomes more robust with increasing object-camera distance as more energy is absorbed by water. Different channels of RGB images have different absorption coefficients, resulting in an unbalanced pixel distribution in the R, G, and B channels of image and a severe color cast.

In the LSCP stage, the triplet color feature extraction module is used to individually process the three-color channels of R, G, and B separately. The neural network can accurately assign weights to the color channels and extract color feature information. The color feature obtains the dependency between channels by combining the Color Feature Fusion Attention Module (CFFAM). Since the attenuation of the red channel is higher than that of the blue and green channels, CFFAM is used to endow different learning weights to different color channels to compensate for the feature map of the red channel and promote the overall color features.

3.1.1 Triplet color feature extraction module

The detailed structure of the triplet color feature extraction module is shown in Fig. 3. Inspired [40], the average intensities of the red, green, and blue channels in the same scene are equal, implying that the RGB distribution of an undistorted image is balanced. However, the attenuation characteristics of light in water are entirely different from those in air, red light has the lowest energy. It is absorbed first, so when the underwater scene reaches the human eye or sensor, it loses its color and is covered with a blue-green filter. Due to the unbalanced grayscale ratio of R, G, and B color in underwater scenes, we perform high-precision extraction of features for each channel separately. TCFEM learns its strong feature representation according to each channel’s different characteristics, reflecting the importance of various features. We use multiple deep convolutional filters of size $3 \times 3$ to extract features for each channel and then cascade $C{F^\textrm{r}}$, $C{F^\textrm{g}}$, and $C{F^\textrm{b}}$ with each other to form the new color feature $CF$.

(3)$$C{F^\textrm{r}} = F_1^\textrm{r} \ast F_2^\textrm{r} \cdot{\cdot} \cdot{\ast} F_n^\textrm{r},$$

(4)$$C{F^\textrm{g}} = F_1^\textrm{g} \ast F_2^\textrm{g} \cdot{\cdot} \cdot{\ast} F_n^\textrm{g},$$

(5)$$C{F^b} = F_1^b \ast F_2^b \cdot{\cdot} \cdot{\ast} F_n^b,$$

(6)$$CF = Cat(C{F^\textrm{r}},C{F^\textrm{g}},C{F^\textrm{b}}) \in {R^{N \times H \times W}},$$

where ${\ast} $ represents the convolution operation, $Cat$ represents the feature concatenation, H and W represent the height and width of the input image, respectively, and n and N represent the number of feature maps.

3.1.2 Color feature fusion attention module

The basic structure of the color feature fusion attention module is shown in Fig. 4. Channel-wise attention-based works [28,29] cannot effectively extract representative color features, and lack the critical information of RGB feature map integration. Therefore, we design the Color Feature Fusion Attention Module to learn the interdependencies between color features. CFFAM addresses the beneficial information ignored when extracting feature information from convolutional streams, aiming to suppress less useful features and only allow more informative features to pass further. This improvement is due to the recalibration of the feature responses towards the most informative and important components of the inputs.

Fig. 4. Schematic illustration of the color feature fusion attention module.

Download Full Size | PDF

The Color Feature Fusion Attention Module consists of $3 \times 3$ convolutional layers, RELU layers, 1 × 1 convolutional layers, and an improved SE block, providing refinement with different weights for each channel. The process of $C{F_{{k_1}}}$ extracting features through convolutional flow is expressed as:

(7)$$C{F_{{k_2}}} = {f^{1 \times 1}}(\delta ({f^{3 \times 3}}(C{F_{{k_1}}}),$$

where represents the RELU activation function, ${f^{3 \times 3}}$ represents the convolution operation with a filter size of 3 × 3, and ${f^{1 \times 1}}$ represents the convolution operation with a filter size of 1 × 1.

The $C{F_{{k_2}}}$ feature is first squeezed using global average pooling to obtain the feature map of the global receptive field, and the size of the feature map is stretched to 1 × 1×C. This process is expressed as:

(8)$${D_{k = }}{F_{\textrm{sq}}}(C{F_k}) = \frac{1}{{H \times W}}\sum\limits_x^H {\sum\limits_y^W {C{F_k}} } (x,y),$$

where D_k represents the channel-wise attention coefficient, H and W are the height and width of the feature map, respectively.

The fully connected operation is first used to perform a nonlinear transformation on the squeezed result in the excitation step. Then, by learning parameters ${W_1}$ and ${W_2}$ to generate weights for each feature channel, we aim to explicitly model the correlation between feature channels, using different activation functions to achieve multiple activations.

(9)$$U = {F_{\textrm{ex}}}(D,W) = \sigma (h(D,W)) = \sigma ({W_2} \ast \delta ({W_1} \ast D)),$$

where $\delta$ represents the ReLU activation function, $\sigma$ represents the Sigmoid activation function, ${\ast} $ is the convolution operation, ${W_1}$ and ${W_2}$ are the weights of the two fully connected layers.

The $C{F_{{k_2}}}$ feature is multiplied by itself after applying a 1 × 1 convolution operation and an activation function sigmoid, and then performs residual path mapping to obtain $C{F_{{k_3}}}$. Depending on the applied 1 × 1 convolution, cross-channel interaction and information integration can be achieved, and the parameters of the network can also be reduced. This process is expressed as:

(10)$$C{F_{{k_3}}} = {F_{Interact}}(C{F_{{k_2}}}) = \sigma ({f^{1 \times 1}}(C{F_{{k_2}}})) \cdot C{F_{{k_2}}} + C{F_{{k_2}}},$$

where ${F_{Interact}}(C{F_{{k_2}}})$ represents the interaction of information across channels, $\sigma$ represents the Sigmoid activation function, and ${f^{1 \times 1}}$ represents the convolution operation with a filter size of 1 × 1. The final output of the CFFAM is reconstructed as:

(11)$$\widetilde {CF} = {F_{\textrm{scale}}}(U,C{F_{{k_3}}}) + C{F_{{k_1}}} = C{F_{{k_3}}} \cdot U + C{F_{{k_1}}},$$

where ${F_{\textrm{scale}}}(U,CF)$ represents the channel-wise multiplication between feature maps, $C{F_{{k_3}}} \in {R^{N \times H \times W}}$ and U to complete the recalibration of the original features in the channel dimension.

3.2 Light scattering correction path

Water contains a greater variety and amounts of suspended particles than air, which can cause severe light scattering. As a result, the underwater image is disturbed, the contrast is significantly declined, the texture is disordered, and the details are blurred. Due to lighting unevenness and noise issues caused by light scattering, the Light Scattering Correction Path was designed to help us model these factors and produce more pleasing tone, lighting variation, and texture information.

Motivated by attention mechanisms [41–43], the LACP branch comprises a dual-dimensional attention module and a residual network. In the LACP stage, through the visual feature attention module, various texture features are extracted from the feature map for each channel, and the feature information details are visually enriched. The extracted features are extracted using the pixel attention module to extract internal relationships, focus on the essential parts of the features, and form recalibrated texture detail features. Setting the input feature as $F \in {R^{C \times H \times W}}$, the entire process that goes through the dual-dimensional attention module can be described as:

(12)$${F_1} = MA{P_v}(F) \otimes BF,$$

(13)$${F_2} = MA{P_s}({F_1}) \otimes {F_1},$$

where $MA{P_v}$ is the visual attention map, and $MA{P_s}$ is the pixel attention map. $BF$ is the basic feature extracted by the filter, and ${\otimes}$ is the multiplication of the corresponding element. ${F_1}$ is the feature reconstructed by the visual feature attention module and ${F_2}$ is the final reconstructed output feature vector.

3.2.1 Visual feature attention module

The structure of the optical feature attention module is depicted in Fig. 3. The visual feature attention module is more valuable for focusing on the “what” of the image information, given the input. First, we perform a convolution operation of 3 × 3 to extract the basic features BF, and then use pooling to extract higher-order features further. However, the pooling process reduces both features and parameters. To reduce the error in feature extraction, we use global max pooling and global average pooling to aggregate the spatial information of feature maps simultaneously. In image processing tasks, especially underwater image enhancement, we need to extract global contextual information because of the continuity of the scene in the image. However, due to the influence of suspended particles in the water, the underwater image contains more information irrelevant to noise and image enhancement, so Gaussian filtering can effectively extract the practical information in the image. Considering the global information, we adopt the combination of avgpool and maxpool to remove the influence of noise and irrelevant information on the underwater image enhancement task.

The global max pooling can reduce the error of the increase in the variance of the estimated value caused by the limited neighborhood size during feature extraction, and achieve more background information on the image. The global average pooling averages the feature points in the neighborhood, which can reduce the error of the mean shift caused by the parameter error of the convolution layer during feature extraction, and aims to preserve rich texture details [44]. Double pooling is used to generate two different feature descriptors as $F_m^v$, and $F_a^v$, which represent the features passed through the max-pooling and the average pooling layers. Then, sum and combine elements to output a new feature vector, finally, and pass the sigmoid excitation function to improve the performance of the local network. Then, $F_m^v$ and $F_a^v$ are merged by element summation to output a new feature vector, and lastly, the sigmoid excitation function is used to improve the performance of the local network. In short, the process can be described as:

(14)$$\begin{aligned} MA{P_v}(F) &= \sigma (Maxpool({f^{3 \times 3}}(F)) + Avgpool({f^{3 \times 3}}(F))),\\ &= \sigma (Maxpool(BF) + Avgpool(BF)), \\ &= \sigma (F_m^v + F_a^v), \end{aligned}$$

where $\sigma $ represents the Sigmoid activation function, and ${f^{3 \times 3}}$ represents a convolution operation with the kernel size of 3 × 3.

3.2.2 Pixel attention module

The detailed structure of the pixel attention module is shown in Fig. 3. The features output by the optical feature attention module are used as the input features of the pixel attention module. Unlike the visual feature attention module, the spatial module attention mainly focuses on the spatial information of “where”. PA (Pixel Attention) performs pixel-level attention on high-level feature map pixels, which enables the network to pay attention to the local features of objects and the spatial relationship between parts of objects, and enables the network to embed contextual features of different scales.

Similarly, global max pooling and global average pooling are applied first in the spatial dimension, and then the pooling features are connected to generate new and effective feature descriptors. The global max pooling can filter out more useless information and select more distinctive features. The global average pooling is biased towards obtaining overall characteristics and preventing the loss of too much high-dimensional information. We apply 2D convolutions to aggregate information on the concatenated feature descriptors, generating pixel attention maps that can encode places of emphasis or suppression $MA{P_s}$. Through the role of smooth convolutional layers in PA, the problem of artifacts caused in the task of image enhancement is solved, and valuable image pixels get more attention. Therefore, this structure can help DJC-NET to reconstruct damaged information from low-quality images. The calculation process of $MA{P_s}$ is:

(15)$$\begin{aligned} MA{P_s}({F_1}) &= \sigma ({f^{3 \times 3}}([Avgpool({F_1});Maxpool({F_1})])),\\ &= \sigma (F_m^s + F_a^s), \end{aligned}$$

where $\sigma $ represents the Sigmoid function, and ${f^{3 \times 3}}$ represents the convolution operation with a filter size of 3 × 3.

3.3 U-Net architecture

The structure of the U-Net network proposed in this paper is shown in Fig. 5. The U-Net network has excellent image feature learning and reconstruction capabilities. It mainly consists of an encoding path for contextual semantic information and a symmetrical decoding path for precise localization [45].

Fig. 5. U-net architecture for underwater image restoration.

Download Full Size | PDF

The reconstructed features are fed into the restoration network of the U-Net. In the encoder stage, each downsampling always uses two filters of size 3 × 3 for multiple convolution operations to generate complete high-level semantic information. Subsequently, the ReLU activation function is used to reduce the interdependence of the parameters, which alleviates the occurrence of overfitting. Lastly, using max-pooling with stride 2 as the core of downsampling, the feature tensor size becomes smaller, and the number of feature channels doubles in each downsampling stage.

In the decoder stage, the potential high-dimensional vector is upsampled to form the original size of the symmetric encoding layer. Each time the upsampling operation is performed, two convolution operations with a convolution kernel of 3 × 3 are performed, and the rectified linear unit ReLU function is used as the excitation function after each convolution layer. The deconvolution operation used in the extension path of the U-Net is designed to maximize the preservation of crack details lost by layer-by-layer pooling. The number of channels of the feature map is gradually reduced to 3 with the sequential operation of upsampling, and the resolution of the final output image is 512 × 512 pixels.

3.4 Loss function

To achieve a good balance between visual quality and quantitative scores, we employ loss function ${L_1}$ and the perceptual loss ${L_{per}}$.The loss function ${L_{total}} = {L_1} + {L_{per}}$ represents the minimum absolute deviation and measures the sum of all absolute differences between the true and predicted values. ${L_1}$ has good robustness and can maintain consistency between pixels as a content loss; it is expressed as:

(16)$${L_1}(X) = \frac{1}{N}\sum\limits_{x \in X} {|{\hat{J}(x) - J(x)} |} ,$$

where x represents the index of the pixel within the region of X, $\hat{J}(x)$ is the image pixel reconstructed via the DJC-NET network, and $J(x)$ is the corresponding real image pixel value.

The perceptual loss compares the features obtained from the enhanced image with the features of the original image, making the high-level information close. The perceptual loss defined on the VGG-19 network pre-trained on ImageNet can be represented as:

(17)$${L_{per}} = \sum\limits_{x = 1}^H {\sum\limits_{y = 1}^W {|{\hat{J}(x,y) - J(x,y)} |} }.$$

The total loss function is defined as:

(18)$${L_{total}} = {L_1} + \lambda {L_{per}},$$

where $\lambda$ is the hyper-parameter, which is empirically set to 0.2.

4. Experiments and analysis

In this sub-section, we first refer to the natural underwater image enhancement benchmark UIEB created by Li et al. [36] and to the underwater color deviation subset UCCS in the RUIE dataset established by Liu et al. [46] under natural light. By describing the experimental setup and implementation details, to demonstrate the feasibility of our method, we compare our model with non-deep learning methods and state-of-the-art deep learning underwater image enhancement methods, presenting our experimental results through visual result graphs and objective data. Ultimately, we conduct a series of ablation studies to verify the role of each module structure in the network.

4.1 Underwater image enhancement benchmark datasets

Underwater Image Enhancement Benchmark Datasets (UIEB): UIEB [36] contains 890 high-resolution raw underwater images, corresponding high-quality reference images, and 60 challenge images for which no corresponding reference images were obtained. UIEB covers a variety of underwater scenes, and the content of the images covers a wide range, such as marine life, divers, and seabed corals and reefs, and the quality of underwater images is significantly declined.

Underwater Color Cast Sets (UCCS): The UCCS is a subset of RUIE [46], which provides a good platform for various algorithms to evaluate the ability to correct the color deviation. The test of the UCCS dataset effectively verifies the efficient applicability of the DJC-NET. The UCCS dataset forms three tonal underwater environment categoriese: blue, blue-green, and green.

4.2 Experimental implementation details

Training Details: We implemented the DJC-NET using the Pytorch. During the training process, the epochs of training are set to 500, and the batch size is 16. We use the Adam optimizer as the optimization algorithm. The learning rate is set to $1{e^{ - 4}}$, and the default values of ${\beta _\textrm{1}}$ and ${\beta _\textrm{2}}$ are 0.5 and 0.999, respectively. During the training process, the learning rate is kept constant, input and output resolution is 256 × 256, and the model is trained using 800 pairs of raw and sharp images selected from the UIEB.

Testing Details: We use the remaining 90 images in UIEB to test how well our method handled degraded images. For generalization evaluation, we test using the UCCS dataset of three water types, each containing 100 images of underwater degradation.

Comparison Methods: We compare DJC-NET with state-of-the-art methods, including traditional methods and deep learning methods. Traditional methods include the ULAP [21] method and SMBL [23], and deep learning methods include UWGAN [30], UWCNN [32], FUnIE-GAN [31], Water-Net [36], and the latest Ucolor method [38]. We compared the methods’ output corresponding results based on existing models. For UWCNN, we need to use precise water types and have limitations.

Evaluation Metrics: For quantitative measurement, we use the peak signal-to-noise ratio (PSNR) [47], structural similarity index metric (SSIM) [48], underwater color image quality evaluation (UCIQE) [49], and underwater image quality measure (UIQM) [50] as objective reference standards for image quality. The PSNR is a full-reference image quality evaluation metric, and it is based on the error between corresponding pixels. The higher the PSNR score, the better the image quality. SSIM measures the visual impact of three features of an image: brightness, contrast, and structure. A higher SSIM value suggests a higher similarity between the dehazed image and the ground truth. UCIQE mainly measures the degree of dehazing and color recovery of distorted images. UCIQE is one of the most comprehensive image evaluation criteria. UIQM is necessary for evaluating color, sharpness, and contrast.

4.3 Visual comparison results

Visual Comparison of UIEB: Owing to the light absorption and scattering by suspended particles, the images acquired often have poor visibility, including color shift, low contrast, noise, and blurring issues. As a result, underwater images usually present various scenes, which seriously degrades the quality of underwater images. Therefore, we select some images in the test set to show the corresponding results of the contrasting methods, and mainly divide them into six underwater scenes: blue scenes, green scenes, yellow scenes, shallow water area scenes, yellow-white scenes, and low-illuminated scenes.

As shown in Figs. 6–11, color deviation and low contrast seriously affect the visual quality of underwater images. Traditional methods usually only consider one underwater image type and cannot effectively remove color casts. ULAP uses the difference between the maximum value of G-B intensity and the value of R intensity in one pixel of the underwater image to estimate the scene depth. However, there are some color interference problems in the underwater image objects that cause ULAP to lose its enhancement effect.SMBL estimates the transmission map of the R channel through the new underwater dark channel prior, and then uses ULAP and reversed saturation map for optimization. However, the transmission map estimation of the SMBL method relies on the information of the red channel, which leads to failure in images with severe red channel attenuation.

Fig. 6. Visual comparison of blue scenes on the UIEB.

Download Full Size | PDF

Fig. 7. Visual comparison of green scenes on the UIEB.

Download Full Size | PDF

Fig. 8. Visual comparison of yellow scenes on the UIEB.

Download Full Size | PDF

Fig. 9. Visual comparison of shallow water area scenes on the UIEB.

Download Full Size | PDF

Fig. 10. Visual comparison of yellow-white scenes on the UIEB.

Download Full Size | PDF

Fig. 11. Visual comparison of low-illuminated scenes on the UIEB.

Download Full Size | PDF

UWGAN is not robust in real underwater images due to the use of synthetic datasets, which makes most scenes appear foggy and blue. Although we retrained UWCNN using the UIEB dataset, compared to our network, the UWCNN network structure is too simple to recover underwater images effectively. Owing to the mutual constraints of the generator-discriminator, FUnIE-GAN has no obvious effect on the removal of color casts for blue, green, and yellow scenes, and red color difference is introduced in other scenes. The enhancement effects of Water-Net and Ucolor are relatively natural and realistic, and some scenes can effectively remove the color cast problem and improve the contrast. But Water-Net is based on the pre-processing of traditional methods, since these traditional methods introduce partial red exposure (Fig. 7). Compared with Ucolor, our method more effectively enhances the contrast of underwater images and restores the color of underwater images. Benefiting from our dual-path structure, it can be seen in Figs. 6–11. The removal of color casts and the enhancement of contrast are better than the enhancement effects of all contrast methods. In addition, our method also significantly outperforms other methods in terms of brightness and sharpness, and is suitable for changing underwater environments.

Visual Comparison of UCCS: The UCCS is mainly divided into three water tones, namely blue, blue-green and green, so we show the results of the contrasting method and our method through these three scenes. Among them, images1-2 are blue scenes, images3-4 are blue-green scenes, and images5-6 are green. The results show that the local objects are marked with recognition boxes to prove that our method is the most realistic for the detail restoration of degraded images.

Figure 12 presents the enhanced results on the UCCS dataset. In the blue scene, the enhancement effects of the traditional methods ULAP and SMBL are similar, and there is a problem that the color cast removal effect is not obvious. The results of UWGAN and UWCNN fail to correct the intrinsic color, and the reconstruction results appear as blue and dark green. FUnIE-GAN, Water-Net, and Ucolor effectively remove color projection, but cannot achieve the expected effect for recognizing distant local objects. Our method is robust in recognizing near and far objects in blue scenes. In the blue-green scene, ULAP, UWCNN, and Ucolor cannot solve the problem of image color cast, and the color correction effect for the original image is very tiny.

Fig. 12. Visual comparison of blue, blue-green, and green scenes on the UCCS.

Download Full Size | PDF

SMBL and Water-Net’s enhancement effects make objects clearer, but introduce a red appearance, resulting in over-saturated enhancement results. UWGAN and FUnIE-GAN incorrectly rectified the original image, and their enhanced results show too much intense deep blue and yellow-green, resulting in severe color casts in complex scenes. Our method does not introduce other incorrect tones to degrade the image and effectively removes color casts in blue-green scenes. In green scenes, the enhancement results of ULAP, SMBL, and UWCNN still have green severe chromatic aberration, which makes the visibility of objects shallow. UWGAN, FUnIE-GAN, and Water-Net all show incorrect color enhancement results with oversaturation. Although Ucolor improves the green color cast, the enhancement results are limited, and it still has problems such as low contrast and blurred images. In green scenes, our method shows a clear improvement in blurriness and achieves visually pleasing results in terms of color and texture detail.

4.4 Evaluation metrics

To better evaluate and compare the generality and performance of our method in real applications, we use 90 images from UIEB and 300 images from UCCS for test validation. All methods are used to crop images to test metrics. For UIEB, we sequentially use PSNR, SSIM, UCIQE, and UIQM to evaluate image restoration performance, and our method achieves the highest score compared to the contrasting methods. For UCCS, we adopt UCIQE and UIQM to analyze the enhancement results of the algorithm, and our method ranks first in UCIQE and ranks third in UIQM.

Table 1 shows the full-reference image quality evaluation results and the evaluation results without reference metrics, using PSNR, SSIM, UCIQE, and UIQM to evaluate the performance of all methods. A higher PSNR score means that the resulting image is closer to the reference image in image content, while a higher SSIM score means that the resulting image is more similar to the reference image in terms of image texture and structure. The higher UCIQE score indicates that the structure image performs excellently in color saturation and contrast, and a higher UIQM score indicates that it is more consistent with human visual perception. Our method achieves the best performance in terms of full-reference image quality assessment, proving our proposed method is good at handling details textures. Our method far outperforms other contrastive methods in no-reference image quality assessment, indicating that our method is closer to human visual perception in terms of color saturation and contrast.

Table 1. The PSNR, SSIM, UCIQE, and UIQM of different enhancement methods on the UIEB.^a

View Table | View all tables in this article

Table 2 shows the results of no-reference metrics, using two commonly used evaluation metrics for underwater images (UCIQE and UIQM). Our method achieves the highest scores in UCIQE and has good visuals. Although both Water-Net and FUnIE-GAN have higher UIQM scores than our method, from the visual results, both Water-Net and FUnIE-GAN introduce a strong color cast. They have serious blurring problems while enhancing underwater images. Our method achieves satisfactory results from visual comparison and objective evaluation of generalization capability in dealing with underwater images.

Table 2. The UCIQE and UIQM of different enhancement methods on the UCCS.^a

View Table | View all tables in this article

4.5 Ablation study

To demonstrate the effectiveness of the proposed DJC-Net, we conduct different ablation studies to analyze different elements, containing different modules and dual paths, respectively. We first construct our base network as the baseline of the enhancement network, which mainly consists of U-Net and convolution layers. Subsequently, we add the different modules into the base network as:

(a) Base: Add nothing.
(b) Base + TCFEM: Add the TCFEM into the baseline.
(c) Base + TCFEM + CFFAM: Add both TCFEM and CFFAM to the baseline.
(d) Base + TCFEM + CFFAM + Residual: Based on (c), add a residual mapping.
(e) Base + TCFEM + CFFAM + Residual + Avgpool: Add the avg pool operation to (d).
(f) Base + TCFEM + CFFAM + Residual + Avgpool + Maxpool: Add max pool operation to (e).

Table 3 shows the comparison results of PSNR and SSIM with different modules added. With the increase of modules, the performance is gradually increased, which proves that different components play an essential role in DJC-NET. Although the SSIM in (b) is slightly lower than (a), it can be seen from the visual comparison in Fig. 13 that after (b), the red artifacts and halos in (a) are eliminated. Therefore, it is proved that the contribution of TCFEM to the network is to adjust the image’s color balance. When CFFAM was added to the network, PSNR and SSIM were significantly improved, respectively, effectively proving that CFFAM is the core factor for improving network performance. But after adding (e) and (f), it can be seen from the visual demonstration that the removal of the color cast is significantly improved, while eliminating part of the aperture.

Fig. 13. Visual comparison of the ablation study of different modules. The alphabetical numbers below correspond to the numbers in Table 3.

Download Full Size | PDF

Table 3. Performance of the ablation study of different modules on the UIEB^a

View Table | View all tables in this article

We conduct an in-depth study to verify the vital role of the light absorption correction path (LACP) and the light scattering correction path (LSCP) in the network. The following experiments were conducted:

(a) We construct the U-Net network and convolutional layers as the baseline
(b) We add the LACP branch to the baseline.
(c) We add the LASP branch to the baseline.
(d) We add LACP and LSCP to the baseline to build to DJC-NET.

Table 4 shows the comparison results of PSNR and SSIM with LACP and LASP. With the gradual superposition of LACP and LSCP, a complete DJC-NET is finally formed. It can be found from Table 4 that PSNR and SSIM are gradually increasing, which tend to become flat. As shown in Fig. 14, after adding LACP, although the local aperture is resolved and the red appearance is partially improved, it causes red artifacts. Therefore, after adding LSCP, the red artifact phenomenon is greatly improved. We found that only when LACP and LSCP were applied simultaneously, the problems of artifacts and aperture are completely resolved. It is found that the color features obtained after LACP correction have an essential impact on the color balance of the image and content. While LSCP focuses more on the enhancement of texture details, and has an essential contribution to removing noise and artifacts. Experiments show that only by simultaneously focusing on the restoration of color and texture, the degraded underwater image can eliminate the phenomenon of color cast, low contrast, artifact, and blur, so DJC-NET processes the image closer to the reference image.

Fig. 14. Visual comparison of the ablation study of dual-path structure. The alphabetical numbers below correspond to the numbers in Table 4.

Download Full Size | PDF

Table 4. Performance of the ablation study of dual-path structure on the UIEB.^a

View Table | View all tables in this article

4.6 Run time

The running time of our model is recorded with one NVIDIA RTX A5000 GPU for the image whose resolution is 256 × 256, 512 × 512, and 1024 × 1024, respectively. The average run time of 100 images on the UIEB is shown in Table 5.

Table 5. Average run time for different sizes of 100 images

View Table | View all tables in this article

5. Conclusion

In this paper, we have designed a novel dual-path feature joint correction network called DJC-NET for underwater image enhancement. Benefiting from the design of the triple color feature extraction module, color feature fusion attention module and the dual dimensional attention mechanism, DJC-NET effectively removes the color cast caused by light absorption and successfully improves the blurring of image texture caused by light scattering. The U-Net is adopted to integrate the two sub-paths, LSCP, and LACP, to convey the correlation of the two feature extraction branches, which significantly improves the overall recovery performance of the network. The quantitative and qualitative evaluations have demonstrated that our proposed method outperforms existing state-of-the-art underwater image enhancement methods, where it is particularly noteworthy that our method achieves the best performance in both UIEB and UCCS. Furthermore, our method can be used as a guide for subsequent research on underwater image color correction. The effectiveness of critical components in DJC-NET has been validated in ablation studies. Experiments demonstrate that our method is superior to state-of-the-art underwater image enhancement methods in multi-class, diverse scenes.

Funding

Fundamental Research Funds for the Central Universities (3132019205, 3132019354); Liaoning Provincial Natural Science Foundation of China (20170520196); National Natural Science Foundation of China (61702074).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Dataset UIEB, Ref. [36] and Dataset UCCS, Ref. [46].

References

1. T. Li, S. Rong, W. Zhao, L. Chen, Y. Liu, H. Zhou, and B. He, “Underwater image enhancement using adaptive color restoration and dehazing,” Opt. Express 30(4), 6216–6235 (2022). [CrossRef]

2. P. Zhuang, C. Li, and J. Wu, “Bayesian retinex underwater image enhancement,” Eng. Appl. Artificial Intelligence 101, 104171 (2021). [CrossRef]

3. W. Zhang, P. Zhuang, H. Sun, G. Li, S. Kwong, and C. Li, “Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement,” IEEE Trans. on Image Process. 31, 3997–4010 (2022). [CrossRef]

4. C. Li, C. Guo, L. Han, M. Cheng, J. Gu, and C. Chen, “Low-light image and video enhancement using deep learning: A survey,” IEEE Trans. Pattern Anal. Mach. Intell. (to be published). [CrossRef]

5. C. Li, C. Guo, and C. Chen, “Learning to enhance low-light image via zero-reference deep curve estimation,” IEEE Trans. on Pattern Analysis and Machine Intelligence 44(8), 4225–4238 (2021). [CrossRef]

6. T. Li, Q. Yang, S. Rong, L. Chen, and B. He, “Distorted underwater image reconstruction for an autonomous underwater vehicle based on a self-attention generative adversarial network,” Appl. Opt. 59(32), 10049–10060 (2020). [CrossRef]

7. J. Zhou, Y. Wang, W. Zhang, and C. Li, “Underwater image restoration via feature priors to estimate background light and optimized transmission map,” Opt. Express 29(18), 28228–28245 (2021). [CrossRef]

8. H. Li and P. Zhuang, “DewaterNet: a fusion adversarial real underwater image enhancement network,” Signal Processing: Image Commun. 95, 116248 (2021). [CrossRef]

9. J. Zhou, D. Zhang, W. Ren, and W. Zhang, “Auto color correction of underwater images utilizing depth information,” IEEE Geoscience and Remote Sensing Lett. 19, 1–5 (2022). [CrossRef]

10. J. Zhou, T. Yang, W. Chu, and W. Zhang, “Underwater image restoration via backscatter pixel prior and color compensation,” Eng. Appl. Artificial Intelligence 111, 104785 (2022). [CrossRef]

11. W. Zhang, Y. Wang, and C. Li, “Underwater image enhancement by attenuated color channel correction and detail preserved contrast enhancement,” IEEE J. Oceanic Eng. 47(3), 718–735 (2022). [CrossRef]

12. Q. Jiang, Y. Gu, C. Li, R. Cong, and F. Shao, “Underwater image enhancement quality evaluation: Benchmark dataset and objective metric,” IEEE Trans. Circ. Sys. Video Tech. (to be published). [CrossRef]

13. W. Ren, J. Zhang, J. Pan, S. Liu, J. Ren, J. Du, X. Cao, and M. Yang, “Deblurring dynamic scenes via spatially varying recurrent neural networks,” IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 1 (2021). [CrossRef]

14. J. Zhou, D. Zhang, and W. Zhang, “Classical and state-of-the-art approaches for underwater image defogging: a comprehensive survey,” Front Inform. Technol. Electron. Eng. 21(12), 1745–1769 (2020). [CrossRef]

15. W. Ren, W. Pan, H. Zhang, X. Cao, and M. Yang, “Single image dehazing via multi-scale convolutional neural networks with holistic edges,” Int. J. Comput. Vis. 128(1), 240–259 (2020). [CrossRef]

16. J. Zhou, T. Yang, W. Ren, D. Zhang, and W. Zhang, “Underwater image restoration via depth map and illumination estimation based on single image,” Opt. Express 29(19), 29864–29886 (2021). [CrossRef]

17. M. Iqbal, M. M. Riaz, S. Sohaib Ali, A. Ghafoor, and A. Ahmad, “Underwater Image Enhancement Using Laplace Decomposition,” IEEE Geosci. Remote Sensing Lett. 19, 1–5 (2022). [CrossRef]

18. G. Singh, N. Jaggi, S. Vasamsetti, H. K. Sardana, S. Kumar, and N. Mittal, “Underwater image/video enhancement using wavelet based color correction (WBCC) method,” IEEE Underwater Technology (UT) 2015, 1–5 (2015). [CrossRef]

19. L. Zheng, H. Shi, and S. Sun, “Underwater image enhancement algorithm based on CLAHE and USM,” IEEE International Conference on Information and Automation (ICIA), 2016, 585–590 (2016).

20. H. Zhang, D. Li, L. Sun, D. Li, and Y. Li, “An underwater image enhancement method based on local white balance,” 5th International Conference on Mechanical Control and Computer Engineering (ICMCCE), 2055–2060 (2020).

21. W. Song, Y. Wang, D. Huang, and D. Tjondronegoro, “A rapid scene depth estimation model based on underwater light attenuation prior for underwater image restoration,” 19th Pacific-Rim Conference on Multimedia (PCM), 678–688 (2018).

22. M. Yang, J. Hu, C. Li, G. Rohde, K. Du, and Y. Hu, “An in-depth survey of underwater image enhancement and restoration,” IEEE Access 7, 123638–123657 (2019). [CrossRef]

23. W. Song, Y. Wang, D. Huang, A. Liotta, and C. Perra, “Enhancement of underwater images with statistical model of background light and optimization of transmission map,” IEEE Trans. Broadcast. 66(1), 153–169 (2020). [CrossRef]

24. J. Zhou, T. Yang, and W. Zhang, “Underwater vision enhancement technologies: a comprehensive review, challenges, and recent trends,” Appl. Intell., 1–28 (2022). [CrossRef]

25. Y. Wang, J. Guo, H. Gao, and H. Yue, “UIEC^2-Net: CNN-based underwater image enhancement using two color space,” Signal Processing: Image Commun. 96, 116250 (2021). [CrossRef]

26. C. Fabbri, M. Islam, and J. Sattar, “Enhancing underwater imagery using generative adversarial networks,” IEEE International Conference on Robotics and Automation (ICRA), 2018, 7159–7165 (2018).

27. Y. Ueki and M. Ikehara, “Underwater image enhancement with multi-scale residual attention network,” International Conference on Visual Communications and Image Processing (VCIP), 2021, 1–5 (2021).

28. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” IEEE Conference on Computer Vision and Pattern Recognition, 2018, 7132–7141 (2018).

29. P. Hambarde, S. Murala, and A. Dhall, “UW-GAN: single-image depth estimation and image enhancement for underwater images,” IEEE Trans. Instrum. Meas. 70, 1–12 (2021). [CrossRef]

30. N. Wang, Y. Zhou, F. Han, H. Zhu, and J. Yao, “UWGAN: underwater GAN for real-world underwater color restoration and dehazing,” arXiv:1912.10269 (2019).

31. M. Islam, Y. Xia, and Y. Sattar, “Fast underwater image enhancement for improved visual perception,” IEEE Robot. Autom. Lett. 5(2), 3227–3234 (2020). [CrossRef]

32. C. Li, S. Anwar, and F. Porikli, “Underwater scene prior inspired deep underwater image and video enhancement,” Pattern Recognition 98, 107038 (2020). [CrossRef]

33. A. Dudhane, P. Hambarde, P. Patil, and S. Murala, “Deep underwater image restoration and beyond,” IEEE Signal Process. Lett. 27, 675–679 (2020). [CrossRef]

34. Y. Wang, J. Zhang, Y. Cao, and Z. Wang, “A deep CNN method for underwater image enhancement,” IEEE International Conference on Image Processing (ICIP), 2017, 1382–1386 (2017).

35. K. Wang, L. Shen, Y. Lin, M. Li, and Q. Zhao, “Joint iterative color correction and dehazing for underwater image enhancement,” IEEE Robot. Autom. Lett. 6(3), 5121–5128 (2021). [CrossRef]

36. C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,” IEEE Trans. on Image Process. 29, 4376–4389 (2020). [CrossRef]

37. S. Wu, T. Luo, G. Jiang, M. Yu, H. Xu, Z. Zhu, and Y. Song, “A two-stage underwater enhancement network based on structure decomposition and characteristics of underwater imaging,” IEEE J. Oceanic Eng. 46(4), 1213–1227 (2021). [CrossRef]

38. C. Li, S. Anwar, J. Hou, R. Cong, C. Guo, and W. Ren, “Underwater image enhancement via medium transmission-guided multi-color space embedding,” IEEE Trans. on Image Process. 30, 4985–5000 (2021). [CrossRef]

39. Y. Hashisho, M. Albadawi, T. Krause, and U. Lukas, “Underwater color restoration using u-net denoising autoencoder,” International Symposium on Image and Signal Processing and Analysis (ISPA), 2019, 117–122 (2019).

40. E. Lam, “Combining gray world and retinex theory for automatic white balance in digital photography,” The Ninth International Symposium on Consumer Electronics (ISCE), 2005, 134–139 (2005).

41. J. Sheng, G. Lv, G. Du, Z. Wang, and Q. Feng, “Multi-scale residual attention network for single image dehazing,” Digital Signal Processing 121, 103327 (2022). [CrossRef]

42. C. Wang, H. Shen, F. Fan, M. Shao, C. Yang, J. Luo, and L. Deng, “EAA-Net: a novel edge assisted attention network for single image dehazing,” Knowledge-Based Systems 228, 107279 (2021). [CrossRef]

43. S. Woo, J. Park, J. Lee, and I. Kweon, “Cbam: convolutional block attention module,” The European Conference on Computer Vision (ECCV), 2018, 3–19 (2018).

44. H. Chen, H. Li, Y. Li, and C. Chen, “Shaping visual representations with attributes for few-shot recognition,” arXiv:2112.06398 (2021).

45. L. Thampi, R. Thomas, S. Kamal, A. Balakrishnan, T. Mithun, and M. Supriya, “Analysis of U-Net based image segmentation model on underwater images of different species of fishes,” International Symposium on Ocean Technology (SYMPOL), 2021, 1–5 (2021).

46. R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-World underwater enhancement: challenges, benchmarks, and solutions under natural light,” IEEE Trans. Circuits Syst. Video Technol. 30(12), 4861–4875 (2020). [CrossRef]

47. J. Korhonen and J. You, “Peak signal-to-noise ratio revisited: is simple beautiful?” Fourth International Workshop on Quality of Multimedia Experience, 2012, 37–38 (2012).

48. Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

49. M. Yang and A. Sowmya, “An underwater color image quality evaluation metric,” IEEE Trans. on Image Process. 24(12), 6062–6071 (2015). [CrossRef]

50. K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired underwater image quality measures,” IEEE J. Oceanic Eng. 41(3), 541–551 (2016). [CrossRef]

Methods	PSNR	SSIM	UCIQE	UIQM
ULAP	15.9746	0.7495	0.5920	2.3019
SMBL	16.6807	0.8005	0.5886	2.5982
UWGAN	14.1214	0.7558	0.5020	2.9232
UWCNN	13.2686	0.614	0.4643	2.6345
FUnIE-GAN	16.7588	0.6894	0.5424	2.9363
Water-Net	19.2053	0.8531	0.5763	2.9996
Ucolor	21.2573	0.8793	0.5534	3.0603
Ours	21.3027	0.8817	0.6319	3.1557

Methods	UCIQE	UIQM
ULAP	0.4636	2.2192
SMBL	0.4800	2.4127
UWGAN	0.4308	2.5991
UWCNN	0.4094	2.2955
FUnIE-GAN	0.5024	3.0741
Water-Net	0.5694	3.1595
Ucolor	0.5179	3.0069
Ours	0.6033	3.0103

Model	(a)	(b)	(c)	(d)	(e)	(f)
Base	✓	✓	✓	✓	✓	✓
+TCFEM		✓	✓	✓	✓	✓
+CFFAM			✓	✓	✓	✓
+Residual				✓	✓	✓
+Avg pool					✓	✓
+Max pool						✓
PSNR	17.6814	18.1078	20.2517	20.4189	20.7660	21.3027
SSIM	0.8296	0.8271	0.8748	0.8782	0.8779	0.8871

Model	(a)	(b)	(c)	(d)
Base	✓	✓	✓	✓
+LACP		✓		✓
+LSCP			✓	✓
PSNR	17.6814	20.4189	20.8109	21.3027
SSIM	0.8296	0.8782	0.8775	0.8871

Methods	PSNR	SSIM	UCIQE	UIQM
ULAP	15.9746	0.7495	0.5920	2.3019
SMBL	16.6807	0.8005	0.5886	2.5982
UWGAN	14.1214	0.7558	0.5020	2.9232
UWCNN	13.2686	0.614	0.4643	2.6345
FUnIE-GAN	16.7588	0.6894	0.5424	2.9363
Water-Net	19.2053	0.8531	0.5763	2.9996
Ucolor	21.2573	0.8793	0.5534	3.0603
Ours	21.3027	0.8817	0.6319	3.1557

Dual-path joint correction network for underwater image enhancement

Abstract

1. Introduction

2. Related work

2.1 Single branch deep learning methods

2.2 Multi-branch deep learning methods

3. Proposed method

3.1 Light absorption correction path

3.1.1 Triplet color feature extraction module

3.1.2 Color feature fusion attention module

3.2 Light scattering correction path

3.2.1 Visual feature attention module

3.2.2 Pixel attention module

3.3 U-Net architecture

3.4 Loss function

4. Experiments and analysis

4.1 Underwater image enhancement benchmark datasets

4.2 Experimental implementation details

4.3 Visual comparison results

4.4 Evaluation metrics

4.5 Ablation study

4.6 Run time

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (14)

Tables (5)

Equations (18)

Optics Express