In digital photography, the improvement of imaging quality in low light shooting is one of the users’ needs. Unfortunately, conventional smartphone cameras that use a single, small image sensor cannot provide satisfactory quality in low light level images. A color-plus-mono dual camera that consists of two horizontally separate image sensors, which simultaneously captures both a color and mono image pair of the same scene, could be useful for improving the quality of low light level images. However, an incorrect image fusion between the color and mono image pair could also have negative effects, such as the introduction of severe visual artifacts in the fused images. This paper proposes a selective image fusion technique that applies an adaptive guided filter-based denoising and selective detail transfer to only those pixels deemed reliable with respect to binocular image fusion. We employ a dissimilarity measure and binocular just-noticeable-difference (BJND) analysis to identify unreliable pixels that are likely to cause visual artifacts during image fusion via joint color image denoising and detail transfer from the mono image. By constructing an experimental system of color-plus-mono camera, we demonstrate that the BJND-aware denoising and selective detail transfer is helpful in improving the image quality during low light shooting.
© 2017 Optical Society of America
Recently, dual cameras, consisting of two separate image sensors, lenses, and image signal processors, have become a major trend in smartphone development due to the benefits they provide with respect to image enhancement and post effects when capturing photos. Using dual cameras in smartphones helps to meet consumer needs in a broad range of applications, such as dual zoom (using two different optical zoom lenses), bokeh (i.e., shallow depth of field) effect, and improvement in low light level imaging capability.
Among these reasons, when we consider the tremendously small form factor required for image sensors in current smartphones, the improvement of image quality in low-light shooting conditions stands out as one of the user’s most important needs. The small image sensor found in a smartphone usually suffers from noise during low-light shooting, as well as a lack of sharpness due to the failure of the passive auto-focus (AF) function in the camera. Obviously, a trade-off exists between image sensor size and image quality when shooting in low-light conditions.
To overcome this problem, a dual-lens camera setup (i.e., two image sensors and two lenses slightly separated in the horizontal direction) seems to be a good approach that helps improve image quality in low-light shooting conditions by using an additional camera. The two separate cameras should complement each other, in the sense that, especially in low light shooting conditions, the auxiliary camera is capable of capturing images that are, in some aspects, of better quality than the primary camera.
An appropriate choice for the auxiliary camera is a mono sensor camera without any color filter array as it can generally provide good performance in low-light shooting conditions. It is known that a mono sensor, due to the absence of a color filter, takes pictures with a higher signal-to-noise ratio (SNR) and with more detail information when shooting in low-light conditions than does a sensor with a color filter [1,2]. Given a color and mono image pair taken by a dual-lens camera, an algorithm for image fusion can then be applied in order to produce a visually plausible image in low-light shooting conditions. This process of registering and combining two images from a dual camera to improve the image quality is known as image fusion.
Image fusion techniques have a long history in the fields of remote sensing, medical imaging, and consumer photography [3–9]. In consumer photography, multi-focus image fusion and multi-exposure image fusion have been studied as effective quality enhancement techniques and have come to be widely adopted in consumer electronics [6,7]. One technique in particular that can improve images taken in low-light conditions involves the fusion of a flash and no-flash image pair taken by a single camera in a time sequential manner . The no-flash image is denoised by applying a joint bilateral filter using the flash image as a reference. Furthermore, detail information from the flash image is transferred to the no-flash image in order to sharpen the detail of the no-flash image.
However, we would like to stress that most of such multiple image fusion applications in digital photography typically deals with image pairs with no pixel disparity (or a very small amount of disparity) because they are usually taken in a time sequential manner .
With the introduction of pixel disparity in a stereo image pair taken from a multi-spectral dual camera setup, in which the sensors are slightly separated in the horizontal direction, per-pixel registration should be carefully considered via a disparity-compensated image fusion process. A fused image generated without the use of a perceptually plausible disparity compensation method is prone to producing visual artifacts (e.g., image structure distortions and image blur) . To the best of our knowledge, no method has yet been perfected for accomplishing per-pixel registration in real scene imagery. Stereo matching is one of the most popular research areas in computer vision for dense disparity map estimation and still pursuing an acceptable performance of matching algorithms. Inevitably, disparity estimation errors in current techniques of stereo matching will make it difficult for the fusion algorithm to produce high-quality fused images without introducing any noticeable visual artifacts. Therefore, a new method should be developed to tackle these problems in digital photography using a dual camera.
This paper proposes a method for the enhancement of images taken in low-light level conditions via image fusion techniques, using a color-plus-mono camera to improve the imaging quality. To mitigate the effect of disparity compensation errors during image fusion of a given color and mono image pair, the proposed approach is based on an adaptive guided filter for denoising and a selective detail transfer method that is aware of a binocular just-noticeable-difference (BJND) model of binocular image fusion in human vision . To that end, we measure the degree of dissimilarity and BJND between the corresponding pixels in the color and mono image pair. Based on the BJND of the image pair, we apply an adaptive denoising algorithm using the guided filter and selective detail transfer from the mono image to only the reliable pixels with unnoticeable dissimilarity .
The main element of novelty in our approach is the use of BJND information and dissimilarity information to jointly denoise the color image and locally transfer the detail of the mono image in a very selective fashion. The proposed method can improve the trade-off between artifacts and detail enhancement. The main difference of our approach is BJND-aware denoising and detail transfer; this allows us to automatically adjust the amount of local detail transferred according to local content characteristics and human visual system characteristics (i.e., BJND). This is achieved by ‘BJND-aware merge’ and ‘detail manipulation’ processes in the proposed method.
An experimental color-plus-mono camera system constructed to demonstrate the effectiveness of the proposed method. Validation data sets were captured using the dual camera, and a validation test was conducted in order to measure performance of the proposed algorithm via a subjective test using the data sets.
Our results indicate that the proposed method is quite capable of improving image quality, in terms of denoising capability and detail enhancement, while mitigating visual artifacts due to incorrect per-pixel registration. Additionally, our results show that the proposed method outperforms classical guided filter-based denoising and detail transfer-based techniques.
The remainder of this paper is organized as follows. Section 2 describes the most relevant previous works related to denoising and detail transfer for multi-modal images. In Section 3, we present a dual camera system that simultaneously captures color and mono image pairs of the scenes. Section 4 describes the BJND-aware denoising and selective detail transfer algorithm. In Section 5, validation experiments are presented to evaluate the performance of the proposed algorithm. Finally, conclusions are drawn in Section 6.
2. Related work
Image filtering techniques with a guidance image, known as joint or guided filtering, have been successfully applied to image enhancement tasks, such as joint resolution upsampling [13,14], cross-modality denoising [15–17]. The idea behind joint filtering is that the guidance image contains less noisy and true high-frequency information that can be transferred to the target image. For a given target image, the guidance image can be either images from different modalities [9,15,18], filtering outputs from previous iterations , or the target image itself [16,20].
The main goal of joint filtering is to enhance the quality of the target image in terms of noise or low spatial resolution while avoiding transferring erroneous structures [17,19]. For example, the bilateral filter  can be applied for edge-aware smoothing  or joint upsampling . For digital photography with flash and no-flash image pairs, Petschnigg et al.  proposed a denoising and detail transfer method that is closely related to our work. They used a joint bilateral filter that references a flash image to denoise a no-flash image.
Their basic bilateral filter  is an edge-preserving smoothing filter that averages together spatially close pixels of similar intensity, combined with a classic low-pass filter with an edge-stopping function that down-weights the pixels when their intensity difference is large. Based on the observation that the flash image contains more true high-frequency information than the no-flash image, the joint bilateral filter is used for denoising. That is, the basic bilateral filter is modified to compute the edge-stopping function using the flash image, instead of the no-flash image. Furthermore, transfer of the detail information of the flash image to the no-flash image is performed, with extraction of the detail information also performed using the basic bilateral filter.
Guided image filter  provides another type of joint filter construction that assumes a local linear model between the guidance image and the filter output. When the guided filter is applied during the fusion of a color and mono image pair, the target image is the noisy color image and the guidance image is the mono image in grayscale. The result is that the guided filter is applied to each RGB channel separately using the guidance image.
The guided filter relies on the mono image as an estimator of the color image. However, if the mono reference image is not appropriate (e.g., mis-registration due to disparity errors) then the mono pixels will not be an accurate estimator of the color pixels, and the filter may not only fail to improve the image quality but also introduce visual artifacts . Figure 1 shows an example of such visual artifacts in the denoising and detail transfer using the guided filter with a disparity-compensated mono image. This observation from Fig. 1 indicates that a new fusion algorithm must be robust in accounting for incorrect correspondence information. As shown in Fig. 1(d), it is possible to preserve most of the useful information in a variety of different images without producing visual artifacts.
3. Dual camera system
In this paper, we demonstrate the use of a color-plus-mono dual camera capable of taking better quality pictures than does a single sensor color camera in low-light illumination conditions. Figure 2 shows a dual camera consisting of a color camera and a mono camera.
The color camera used in our experiment is the PointGrey Flea3® (model FL3-U3-13S2C-CS) . It has a Sony IMX035 CMOS sensor (1/3”, 3.63 μm pixel size) with a maximum image resolution of 1328 x 1048 pixels, and supports imaging functionalities of auto gain and auto white balance. The mono camera is also the PointGrey Flea3® (model FL3-U3-13S2M-CS) . The specification of the mono camera is identical to that of the color camera except for a lack of a color filter array (CFA).
It is worth noting that the quantum efficiency (QE, a sensor’s ability to turn light into an electrical signal) of a mono sensor tends to be higher than that of a color sensor because the color filter that sits on top of its photodiode absorbs some of the incoming photons. As shown in Table 1 (as mentioned in ), in our dual camera, the QE of the mono sensor (at 525 nm) is 76%, while the QE of the color sensor (at 525 nm) is only 54%. Hence, the mono sensor improves low light performance, compared with the color sensor.
Both camera lenses are identical, having a focal length of 6 mm and a horizontal field of view of 42.7°. The camera baseline between the color and mono cameras is 35 mm, as close as possible, in order to minimize pixel disparity.
Each camera is connected to a personal computer via a Universal Serial Bus 3.0 interface. Also, note that the image sensors used in our system offers frame synchronization functionality for use in stereo camera systems [21,22]. In our capture configuration, the mono camera is set to be the master camera and the color camera is set to be the slave camera. In this configuration, the slave color camera is triggered in sync with the mono image capture. Figure 2(c) shows an example of image capture in frame synchronization. Our dual camera ensures that frames are synchronized within at least 10 milliseconds of each other.
In capture, we have used the PointGrey capture software (FlyCapture® 2.10) . It performs white balancing, gamma correction, and other nonlinear tone-mapping operations to produce an input color and mono image pair. Note that, when constructing our test database (see Experiments section), we used an auto mode provided with the capture software.
To calibrate the dual camera, the OpenCV camera calibration tool is used . Calibration is achieved using a large checkerboard calibration object, and yields two camera matrices, representing the internal and external camera parameters for each camera. In this paper, the position of the left camera (i.e., mono) was chosen to be the origin of the 3D coordinate system. In capture, stereo rectification is performed using the precomputed camera matrixes, and these rectified color and mono image pairs are then used as input for the proposed image fusion algorithm.
4. Proposed approach
The rectified color and mono images are fused via an adaptive denoising and selective detail transfer process based on the computation of a BJND model for the image pair. Figure 3 shows the overall procedure of the proposed method.
As shown in Fig. 3, when two images are acquired of the same location, at the same local illumination, but using different image sensors, the two input images must be normalized before being processed . To this end, a well-known histogram matching technique is performed to adjust the histogram of a target color image to match an N-bin histogram of the reference mono image . Here, the target is a RGB color image and the reference is a grayscale image, so each channel of the target image is matched against the single histogram derived from the reference image. In our experiments, a 64-bin histogram was used for the matching, as implemented in the MATLAB® image processing toolbox. Figure 4 shows an example of histogram matching. As shown in Fig. 4(c), the normalized image shows higher contrast than the original input image, but has also had its noise level amplified.
Stereo matching is a computer vision technique for correspondence matching that estimates the disparity information between two views . However, stereo matching is a difficult task for real scene images, and, due to the difficulty of solving an ill-posed optimization problem (e.g., occlusion problem), the disparity maps estimated by any state-of-the-art techniques typically contain many errors. Furthermore, stereo matching between multi-spectral or multi-modal stereo image pair is a difficult task. Generally, stereo matching itself requires further research.
In this paper, instead of developing a new stereo matching technique, we use an optical flow-based method that is publicly available and has also been used often in disparity remapping for stereoscopic images [26–28]. As seen from Fig. 4(d), the lower right region (see the ‘Davinci’ box) has severe errors with respect to disparity estimation. As noted earlier, we must carefully consider such error regions in order to mitigate visual artifacts arising from the image fusion algorithm
4.1 BJND-aware denoising
For image denoising, two guided filtering operations are first performed for the BJND-aware denoising operation, as shown in Fig. 3. The individual guided filtering operation uses the input color image itself Cinput as a guidance image, such that16]. On the contrary, the joint guided filtering uses the input mono image Minput as a guidance image, such that
The joint guided filter relies on the mono image to be an accurate reference for the color image. However, when the reference image is not very reliable, then the filter will fail to produce a visually plausible result. In our dual camera system, the reference mono image requires disparity compensation to denoise the corresponding pixels in the color image. Note that the disparity map estimated by a stereo matching algorithm in real scene images usually includes unreliable disparity pixels. This means that incorrect reference pixels could potentially be used as the target pixels are denoised, causing the generation of severe visual artifacts in the filtered image, as shown in Fig. 1(c). To mitigate the effect of using incorrect pixels as a reference, the proposed method adaptively combines the individual guided filter output and joint guided filter output via the BJND-aware merge, as shown in Fig. 5.
In human visual system (HVS), the concept of a just-noticeable difference (JND) implies a visibility threshold, which a human can perceive changes in pixel values, and depends on the luminance and the contrast of a local image region (i.e., luminance and contrast masking effects). Recently, several studies have concentrated on finding the visibility threshold for a stereo image. The binocular JND (BJND) is related to the inter-difference between the left and right views that a human can recognize. In , it was empirically demonstrated that a human cannot realize a luminance change when viewing the stereo image if the luminance change in one viewpoint image is less than the BJND. The BJND model in  is also dependent on the well-known HVS characteristics of luminance adaptation and contrast masking.
In this paper, the proposed algorithm uses fusion thresholds to determine whether the fusion artifacts are imperceptible or at least tolerable to the human visual system when viewing the final fused image. The author has assumed that if the inter-difference (i.e., difference of luminance levels between two images) is larger than the BJND threshold, the fusion algorithm yields visually noticeable artifacts in the final image as it evokes perceptual differences in the HVS.
Accordingly, the adaptive merging rules for the two filter outputs are derived from the notion of BJND . This paper uses the principle of BJND to determine a fusion threshold that can be used to denoise a color image using the guidance mono image without introducing severe visual distortions. Pixels with a luminance difference below the fusion threshold are entirely obtained by the joint guided filter output, pixels over the fusion threshold are computed by a weighted combination between the individual guided and joint guided filter outputs, and pixels with very high difference are computed from the individual guided filter output only.
1) BJND map computation: In order to compute a BJND map for a color and mono image pair, we adopt a BJND model recently provided in , and summarized below.
Given a pair of color and mono images with a disparity map associated with the color image, the BJND at a pixel position (x,y) is defined as follows:11]. The background luminance bg is computed by averaging luminance values in a 5x5 region centered at the corresponding pixel position. The edge height eh is calculated by 5x5 Sobel operators at the corresponding pixel position .
Note that BJNDc for the color image is a function of the background luminance level bgm, the edge height ehm, and the noise amplitude nm of the corresponding pixel position in the mono image. If the mono image has no noise, BJNDc is equivalent to , which is defined byEquation (5) suggests that background luminance affects the BJND, which indicates the luminance masking process . Moreover, if the noise is very small, the BJND at the color image is linearly proportional to the edge height of the corresponding position in the mono image, thus, as the background luminance increases, the sensitivity of the BJND to edge height decreases.
Figure 6(a) shows an example of a BJND map. The histogram in Fig. 6(b) indicates that the BJND values are typically smaller than 0.06 (i.e., 15 in 8-bit levels of a grayscale image). Note that, in this paper, the BJND values are normalized to [0.0, 1.0], where 1.0 represents 256 grayscale levels.
2) Dissimilarity measure: As noted, it is clear that the guidance mono image does not convey meaningful information to the target color image if the correspondence information is incorrect. To this end, we adopt a dissimilarity measure between corresponding pixels to measure the degree of reliability of the correspondence information, with a large dissimilarity value indicating poor reliability in the correspondence information (i.e., disparity) of a pixel. In this paper, a simple inter-difference measure is used based on the Gaussian-weighted mean absolute difference (MAD) of intensity values to measure between the color and disparity-compensated mono values in a pixel neighborhood. Note that many previous algorithms for stereo image processing [10, 29] have successfully used absolute difference of intensity values as their dissimilarity measure to compare it with the BJND values.
Let be a local 11x11 square window centered at the pixel position of (x,y) in the color image Cinput, and let be a local window centered at the disparity-compensated position (i.e., and ) in the mono image Minput, where d is the disparity at (x,y). The dissimilarity value at (x,y) is then given by30], is used to mitigate undesirable blocking artifacts, and Nw is the size of a local window (i.e., Nw = 11). Note that in order to calculate the MAD value the color image is first converted into a grayscale image. In our implementation, rgb2gray() function in MATLAB is used to get the luminance value. Namely, the luminance value is obtained by a weighted sum of RGB channel values. Also, since pixel values are normalized to [0.0, 1.0], a zero value represents maximum similarity, while a one value indicates maximum disparity. Figure 6(c) shows an example of the dissimilarity map.
3) Weight map generation: The weight map is computed using a weighting function that is based on the BJND and an exponential function with the dissimilarity value of each pixel. An example of the weighting function is shown in Fig. 7.
The weight value at a pixel (x,y) is given byFigure 6(d) shows an example of the weight map computed for the BJND-aware merge of the individual and joint guided filter outputs.
4) Weighted combination: After computing the weight map W, the denoised color image is obtained by using a weighted average between the individual guided filter output Cbase and joint filter output Cjoint as follows:Eq. (8), we are able to exponentially increase the influence of the individual filter output component in the same formula as an increase in dissimilarity is observed, if the dissimilarity value between the corresponding pixels in question is greater than the BJND.
4.2 BJND-aware selective detail transfer
As noted, a mono camera typically provides more detail information than a color camera. Thus, detail information is better extracted from the mono image, as shown in Fig. 8. Notably, in the proposed approach, the detail layer is further manipulated to reduce visual artifacts in the resultant images. Finally, the manipulated detail information is transferred to the color image to produce the final output image.
It is important to note that the detail transfer techniques in  is also used to transfer the local variations in the flash image to the non-flash image. Further, they do not transfer the detail information for shadows and specular regions caused by the flash, because those regions produce a poor detail estimate. The main characteristic and difference of our approach is BJND-aware detail transfer that allows us to automatically adjust the amount of local detail transferred according to local content characteristics and human visual system characteristics (i.e., BJND). This is achieved by the ‘detail manipulation’ process in the proposed method, which should be the main difference of our approach.
1) Extraction of detail layer: We perform individual guided filtering for the mono image (i.e., ) to extract the detail information. Specifically, the following ratio is computed to extract the detail layer from the mono image :31] in computer vision. Note that δ is added to both the numerator and denominator of the ratio to avoid division by zero. Additionally, some low signal values in the mono image contain noise that can generate spurious detail, and δ also works to reject these low signal values. In practice, we set δ = 0.02 across all our results, following the precedent of previous works , and compute the ratio for each RGB color channel. Figure 9(b) shows an example of the detail layer. As shown in the figure, we can observe local detail variation in the detail layer. Also, the frequency spectrum of the detail layer is shown in Fig. 10.
2) Detail manipulation: As used in the weight map computation, the detail manipulation function is given by using the BJND and an exponential function with the dissimilarity value of each pixel calculated as follows:
Figure 9 shows an example of the detail manipulation function, and the result of the manipulated detail transfer layer. In Fig. 9(a), the original detail value is 0.8, and is varied according to the dissimilarity value. Importantly, the original detail value is unchanged if the dissimilarity value is lower than a BJND value (i.e., 0.06 in this case). In addition, it increases exponentially if the dissimilarity value is higher than the BJND.
3) Detail transfer: To transfer the manipulated detail, it is multiplied with the denoised color image as follows:Eq. (11), the manipulated detail value of a pixel is one if its dissimilarity value is very large, as shown in Fig. 9(a). This means that the final image is totally determined by the denoised image, since using the detail transfer is highly likely to produce visual artifacts.
Figure 11 shows examples of the results of processing using the proposed approach. The original color and mono image pair found in Figs. 11(a) and 11(b) was captured in 10 lux lighting conditions. Figure 11(e) shows that the BJND-aware denoising operator yielded better denoising performance, while improving the sharpness of the image. In addition, we can observe that the selective detail transfer in Fig. 11(f) further improved the detail information, thanks to a sharp mono image. In addition, we can observe in Fig. 12 that the proposed method yielded better visual quality, while reducing visual artifacts, than the conventional approaches without considering the selective transfer .
5. Experiments and results
5.1 Data sets and visual results
Some commercially produced dual cameras are available in current smartphones, and are equipped with a pair of color and mono cameras. However, we cannot directly compare the performance of the proposed approach with the proprietary algorithms of these smartphone vendors, since they use not only different fusion algorithms, but have entirely distinct image processing pipelines embedded into them via hardware and/or software modules. Therefore, we constructed our test data set for performance evaluation of the proposed approach using our dual camera, capturing sixteen pairs of color and mono images in low-light conditions, with a spatial resolution of 1328 x 1048 pixels. Importantly, the color and mono image pairs were generated to include various low light levels (from 10 lux to 4 lux). Seven scenes were captured in 10 lux condition, six scenes were captured in 6 lux condition, and three scenes were captured in 4 lux condition. Note that, under 4 lux conditions, scenery is usually not discernible, even by the mono camera, and thus another approach may be required, such as color-plus-infrared cameras [32,33]. The color and mono images and the corresponding disparity maps are available online .
Figures 13 to 15 show examples of the test data sets captured in 6 lux condition and the results of their processing. We can observe that the proposed approach in Figs. 13(d), 14(d), and 15(d) can improve the image quality in terms of denoising and image sharpness. In particular, we can see characters in the images much more clearly when those images are processed with the proposed method. Notably, such an enhancement in sharpness cannot be achieved by any existing techniques that work with a single color image captured by a single camera. It is worth noting that, in our database, the same gain for each RGB channel does not cause hue changes and saturation changes. As can be observed in Fig. 13, the color chart does not present critical saturation changes. Figure 16 shows another example of the test image captured in 4 lux conditions and the results of their processing, and an opportunity to observe improvement of image quality in a very low light condition.
5.2 Parameter selection
The important parameters to be tuned are alpha in the weight function and epsilon in the guided filter. Those parameters directly affect the generation of the final enhanced image. Therefore, we describe the impact of parameter selection on the final image.
The proposed method aims to mitigate noticeable visual artifacts in the final fused image since they are critical to the perceived image quality. This is achieved by using the BJND-aware denoising and detail transfer. In particular, the proposed BJND-aware detail manipulation method prevents unreliable guidance pixels of the mono image from transferring their detail information to the color image. In general, this helps the reduction of visual artifacts, however, more or less sacrifices the detail enhancement. Therefore, there may be an inevitable trade-off between the generation of visual artifacts and the amount of detail enhancement in such a selective transfer method. In the proposed method, one can control this trade-off by adjusting the α parameter of the weighting function in Eq. (8) and Eq. (11).
As α value decreases, the slop of the weighting function right after BJND becomes small. This results in not only large detail transfer but also the introduction of many visual artifacts since the most detail information extracted from the mono image is transferred to the color image. As α value increases, the slop of the weighting function becomes steep. This results in not only little detail transfer but also few visual artifacts. We can observe this phenomenon in Fig. 17. As shown in the figure, an α value of 30 yields plausible visual results in our database.
In the guided filter, the parameter ε should be tuned . The patches with variance much larger than ε are preserved, whereas those with variance much smaller than ε are smoothed. The effect of ε in the guided filter is similar with the range variance in the bilateral filter . Both parameters determine “what is an edge/a high variance patch that should be preserved” . One can control the amount of detail transferred by choosing an appropriate setting for the ε parameter. As ε value increases, we generate increasingly smoother versions of and hence capture more detail in of Eq. (10). However, visual artifacts are also amplified near edges in the fused image. As shown in Fig. 18(c), the fused image shows higher contrast than the others, but has also had its noise level around edge amplified. These artifacts are caused by unwanted noise amplified with excessive smoothing and/or unreliable correspondence information around edges of the guidance image. In practice, we set ε = 0.001 across all our result, which is 0.1% of the total range of color values.
A classic Gaussian low-pass filter blurs across all edges and will therefore create strong peaks and valleys in the detail image that cause halos. However, as many previous studies have reported [9,16], the bilateral filter and guided filter does not smooth across strong edges and thereby reduces halos, while still capturing detail. In our visual results, it is also hard to observe annoying halo artifacts by appropriately tuning the parameter of the guided filter
It is also important to note that although the global parameters are empirically set to control how much detail is transferred over the entire image, it can depend on local content characteristics. The detail manipulation in the proposed method is to further automatically adjust the amount of local detail transferred according to local content characteristics and human visual characteristics (i.e., BJND). This is the main characteristic of our approach.
5.3 Quantitative evaluation results
For quantitative validations, a SSIM-based fusion quality metric is adopted to assess the fusion performance of different methods objectively. Here, we use Yang et al.’s metric based on structural similarity (SSIM) . It is defined as follows:30], and the local weight λw is calculated as follows:
Three different algorithms were compared as follows:
- 2) Joint bilateral filter and detail transfer  without using flash shadow and specular masks. The author used the codes publicly available with default parameter settings.
- 3) Disparity-aware joint guided filter and detail transfer (modified version of  with the guided filter and disparity compensation)
The results are shown in Table 2. The results reveal that the proposed method outperforms the competitive algorithms in terms of a SSIM-based objective quality assessment. Note that the result of  is lower than that of the individual guided filter, because they are not based on disparity-aware joint filtering techniques.
5.4 Subjective assessment experiment and results
To demonstrate the effectiveness of the proposed approach, we also conducted a subjective assessment test with respect to the overall improvement of image quality in low light conditions. By the subjective quality assessment, we compared the performance between the proposed approach vs three reference approaches (two baseline approaches and one conventional approach) in the improvement of overall image quality. The first baseline approach (hereafter, hColor) was the histogram-matched color images using the mono images. The second baseline approach (hereafter, Guided) was the individual guided filter that uses only the color image as a guidance image . The conventional approach was based on the disparity-aware joint guided filtering and detail transfer without the use of our BJND-aware denoising and selective detail transfer (i.e., modified version of ). Note that we do not directly compare the original color image (captured with a camera) with the image processed by the proposed method, since it is expected that the improvement in image quality is obvious.
A total of 20 subjects (7 females) participated in the subjective experiments. The average age of the subjects was 24.00 ± 2.13 years old (range, 20 - 27 years). All of the subjects had normal or corrected-to-normal vision.
To assess the overall image quality of processed images, a modified version of the double stimulus continuous scale (DSCQS) was used [36, 37]. In each assessment condition, two versions of the same image, version A and version B, were randomly displayed on a 15.6-inch monitor. One version was always the test image, processed with the proposed approach, while the other version was the reference image processed with either the baseline or conventional approach. A total number of 48 pairs of the proposed and corresponding reference images were randomly presented to each subject. Version A was presented for 10 seconds, followed by a resting time of 3 seconds with mid-gray image. Version B was presented for the following 10 seconds, followed by a resting time of 10 seconds to allow the subject to give his/her scores. To increase sensitivity of subjective judgment, the subject was allowed to switch between the two images until he/she had a mental measure of the image quality (up to 10 seconds) .
Subjects used a continuous scale divided into five segments, identified by five verbal labels (excellent, good, fair, poor, and bad) . For the analysis of subjective assessment, the continuous ratings were converted into scores in the range of [0-100]. The difference mean opinion score (DMOS) for each test condition was then computed by subtracting the subjective score of the test image from that of the corresponding reference image. As such, positive values of DMOS indicated that the test image (i.e., proposed approach) provide better image quality than the reference image.
Figure 19 shows the DMOSs for the overall image quality between the proposed and two baseline approaches (i.e., (Proposed vs hColor) and (Proposed vs Guided)). In the figure, the x-axis represents an image index and ambient illumination condition in lux. As shown in the figure, the proposed approach improved the overall image quality of all the test images, compared to the two baseline approaches. The average DMOS between the proposed and hColor approaches was 18.16 ± 2.66 (mean ± standard deviation). The average DMOS between the proposed and Guided approaches was 14.69 ± 2.94.
From the figure, we can also observe that the proposed approach outperforms the conventional approach that do not consider BJND-aware denoising and selective detail transfer. The average DMOS between the proposed and Guided approaches was 12.14 ± 5.61. Note that some test images show a small improvement because the resultant images that were processed with the conventional approach already contained few visual artifacts.
The statistical results of the subjective assessment are presented in Table 3. As shown in the table, compared with the images processed with the two baseline approaches, the overall improvement of the image quality was statistically significant (p<0.001). The t-test was conducted with the null hypothesis that there is no change of MOSs. More importantly, the overall improvement in image quality of the proposed approach was significantly higher than that of the conventional approach. These results reveal that the proposed approach can improve the image quality by considering BJND-aware denoising and selective detail transfer.
Experiments were performed on a computer equipped with a dual 3.20 GHz CPU and 4 GB memory. We used the codes of the guided filter and bilateral filter available online in MATLAB implementation. The proposed method takes 63.4 seconds for enhancing a color-plus-mono image pair of size 1328 x 1048. The current implementation version of the proposed method is not enough fast for real-time applications due to the inefficient MATLAB implementation of the guided filter. Note that the individual guided filter used in our experiments takes 32.7 seconds for an image pair. In the BJND-aware denoising and detail transfer of the proposed algorithm, the main computational burden is the guided filter and BJND computation. Encouragingly, the computing time of the proposed method can be dramatically reduced by the efficient implementation of the guided filter. The proposed algorithm uses three guided filtering operations to compute in Eq. (1), in Eq. (2), and in Eq. (10). Particularly, the main computational burden in the guided filter is the mean filter with box windows of radius r. As reported in previous studies , the box filter can be efficiently computed in time using the integral image technique or a moving sum method. Therefore, and can be computed in time (i.e., linear time). However, cannot be simply computed in time using the integral image technique because the computation of covariance in the guided filter requires pixel-based disparity compensation between color and mono images. A brute-force implementation for the computation of is time with kernel radius r. Namely, the time complexity depends on the image size and kernel radius. Similarly, in the BJND computation, the background luminance bg in Eq. (3) is computed by averaging luminance values in a 5x5 region centered at each pixel position. This mean filter can be computed in time using the integral image technique. However, the edge height eh in Eq. (3) is calculated by 5x5 Sobel operators at each corresponding pixel position of the mono image. Also, a brute-force implementation of this computation is time with kernel radius r.
In our dual camera setup, the color and mono image pair has large disparity depending on the scene depth. The artifacts mainly arise at the mismatched corresponding pixels between the color and mono. The blending of very different structures in two images often results in structure distortions, as observed in Fig. 12. Such visual artifacts are seriously perceived especially at the high contrast regions according to the human visual system characteristics (e.g., more annoying artifacts at the salient regions in an image according to the visual attention model [12,24]). It is important to note that a new enhancement method could be developed to consider the perceptual effects of the artifacts by image saliency analysis, which requires further experiments and studies.
Furthermore, the conversion of the color to grayscale (i.e., decolorization) is important and should be carefully considered in the image enhancement for a dual camera. In the proposed method, decolorization is performed before the dissimilarity measure. Hence, it will impact the weight map generation in the denoising process and the detail manipulation in the selective detail transfer. Recently, valuable studies [38,39] have been published on decolorization methods that mostly consider contrast preservation and visual distinctiveness of the converted grayscale image. An additional experiment was conducted to investigate whether the state-of-the-art conversion algorithms can improve the performance of the proposed method. So, the rgb2gray() function used in the proposed method was changed to a state-of-the-art conversion in . As a result, the objective quality of the final enhanced images was slightly, but not significantly, improved for our database (the objective performance based on SSIM-based metric  was changed from 0.9352 to 0.9355). The author would like to point out that, in our color-plus-mono camera setup, we must also consider a new conversion method to locally match the corresponding intensity levels and/or structures between the converted color and mono images, together with the reduction of information loss of a single color image when converting it into grayscale. Or we can develop a new camera calibration tool for the color-plus-mono camera that, in advance, constructs a mapping table or a mapping function between the color and mono images.
It is worth further noting that the proposed algorithm is based on the detail transfer from the mono image. Therefore, it is preferable that, in the camera system configuration, the mono image can provide much more detail information than the corresponding color image. Generally, this indicates that it would be effective if the mono sensor has a much higher resolution than that of the color sensor. However, it does not mean that the color sensor can have a very small resolution by pairing with a high resolution mono sensor. Considering that we may not need image fusion in some usage scenarios (e.g., sufficient light conditions or user selection), the color camera should work as a stand-alone primary camera. It means that the color sensor should also have a sufficient resolution for these situations.
This paper presented an enhancement method of low-light level images captured via a color-plus-mono camera. The proposed algorithm performs an adaptive denoising of the color image via a BJND-aware guided filtering operation that adaptively merges the individual and joint guided filter outputs according to the BJND calculated for the color and mono image pair. We then selectively transfer the detail information of the mono image after manipulation based on the BJND with the intent of reducing visual artifacts in the final resultant images. The experimental results reveal that the BJND-aware denoising and selective detail transfer substantially improves the overall image quality compared with baseline and conventional approaches that do not consider the BJND when fusing color and mono images. Importantly, these results have indicated that the reliability of the corresponding pixel information and the tolerable fusion thresholds (such as BJND) should be taken into account when developing an image fusion algorithm for multi-spectral and multi-modal images.
National Research Foundation of Korea (2016R1D1A1B03931087).
The author thanks S. J. Kim and H. W. Jang for their help in constructing the experimental system and subjective test.
References and links
1. A. Chakrabarti, W. T. Freeman, and T. Zickler, “Rethinking color cameras,” in Proceedings of International Conference on Computational Photography (IEEE, 2014), pp. 1–8.
2. Pointgrey, “Capturing consistent color,” https://www.ptgrey.com/truecolor.
3. R. S. Blum, Z. Xue, and Z. Zhang, “An overview of image fusion,” in Multi-Sensor Image Fusion and Its Applications (CRC, 2005).
4. A. P. James and B. V. Dasarathy, “Medical image fusion: a survey of the state of the art,” Inf. Fusion 19, 4–19 (2014). [CrossRef]
5. Z. Wang, D. Ziou, C. Armenakis, D. Li, and Q. Li, “A comparative analysis of image fusion methods,” IEEE Trans. Geosci. Remote Sens. 43(6), 1391–1402 (2005). [CrossRef]
8. P. J. Burt and R. J. Kolczynski, “Enhanced image capture through fusion,” in Proceedings of the 4th International Conference on Computer Vision (IEEE, Berlin, Germany, 1993), pp. 173–182.
9. G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama, “Digital photography with flash and no-flash image pairs,” ACM Trans. Graph. 23(3), 664–672 (2004). [CrossRef]
10. Y. J. Jung, H. G. Kim, and Y. M. Ro, “Critical binocular asymmetry measure for the perceptual quality assessment of synthesized stereo 3D images in view synthesis,” IEEE Trans. Circ. Syst. Video Tech. 26(7), 1201–1214 (2016). [CrossRef]
11. Y. Zhao, Z. Chen, C. Zhu, Y. P. Tan, and L. Yu, “Binocular just-noticeable-difference model for stereoscopic images,” IEEE Signal Process. Lett. 18(1), 19–22 (2011). [CrossRef]
12. Y. J. Jung, H. Sohn, S. Lee, F. Speranza, and Y. M. Ro, “Visual importance- and discomfort region-selective low-pass filtering for reducing visual discomfort in stereoscopic displays,” IEEE Trans. Circ. Syst. Video Tech. 23(8), 1408–1421 (2013). [CrossRef]
13. J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007). [CrossRef]
14. Q. Yang, R. Yang, J. Davis, and D. Nister, “Spatial-depth super resolution for range images,” in Proceedings of International Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–8.
15. Q. Yan, X. Shen, L. Xu, S. Zhuo, X. Zhang, L. Shen, and J. Jia, “Cross-field joint image restoration via scale map,” in Proceedings of International Conference on Computer Vision (IEEE, 2013), pp. 1537–1544. [CrossRef]
17. X. Shen, C. Zhou, L. Xu, and J. Jia, “Mutual-structure for joint filtering,” in Proceedings of International Conference on Computer Vision (IEEE, 2015), pp. 3406–3414.
18. E. Eisemann and F. Durand, “Flash photography enhancement via intrinsic relighting,” ACM Trans. Graph. 23(3), 673–678 (2004). [CrossRef]
19. Q. Zhang, X. Shen, L. Xu, and J. Jia, “Rolling guidance filter,” in Proceedings of European Conference on Computer Vision (Springer, 2014), pp. 815–830.
20. C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proceedings of International Conference on Computer Vision (IEEE, 1998), pp. pp. 839–846. [CrossRef]
21. Pointgrey, “Color USB3 vision,” http://www.ptgrey.com/flea3-13-mp-color-usb3-vision-sony-imx035-camera.
22. Pointgrey, “Mono USB3 vision,” http://www.ptgrey.com/flea3-13-mp-mono-usb3-vision-sony-imx035-camera.
23. Pointgrey, “FlyCapture SDK,” https://www.ptgrey.com/flycapture-sdk.
24. Y. J. Jung, H. Sohn, and Y. M. Ro, “Visual discomfort visualizer using stereo vision and time-of-flight depth cameras,” IEEE Trans. Consum. Electron. 58(2), 246–254 (2012). [CrossRef]
25. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis. 47(1), 7–42 (2002). [CrossRef]
26. A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imaging Vis. 40(1), 120–145 (2011). [CrossRef]
27. H. Sohn, Y. J. Jung, S. Lee, F. Speranza, and Y. M. Ro, “Visual comfort amelioration technique for stereoscopic images: disparity remapping to mitigate global and local discomfort causes,” IEEE Trans. Circ. Syst. Video Tech. 24(5), 745–758 (2014). [CrossRef]
28. M. Lang, A. Hornung, O. Wang, S. Poulakos, A. Smolic, and M. Gross, “Nonlinear disparity mapping for stereoscopic 3D,” ACM Trans. Graph. 29(4), 75 (2010). [CrossRef]
30. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). [CrossRef] [PubMed]
31. Z. Liu, Y. Shan, and Z. Zhang, “Expressive expression mapping with ratio images,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques (ACM, 2001), pp. 271–276.
32. A. Toet, “Natural colour mapping for multiband night vision imagery,” Inf. Fusion 4(3), 155–166 (2003). [CrossRef]
33. X. Bai, F. Zhou, and B. Xue, “Fusion of infrared and visual images through region extraction by using multi scale center-surround top-hat transform,” Opt. Express 19(9), 8444–8457 (2011). [CrossRef] [PubMed]
34. CVIPLab, “Color and mono image database,” https://sites.google.com/site/gachoncvip/projects/dual-camera.
35. C. Yang, J. Zhang, X. Wang, and X. Liu, “A novel similarity based quality metric for image fusion,” Inf. Fusion 9(2), 156–160 (2008). [CrossRef]
36. ITU-R BT.500–11, “Methodology for the subjective assessment of the quality of television pictures,” (2002).
37. H. Sohn, Y. J. Jung, and Y. Man Ro, “Crosstalk reduction in stereoscopic 3D displays: disparity adjustment using crosstalk visibility index for crosstalk cancellation,” Opt. Express 22(3), 3375–3392 (2014). [CrossRef] [PubMed]
38. Y. Song, L. Bao, X. Xu, and Q. Yang, “Decolorization: is rgb2gray () out?” in Proceedings of SIGGRAPH Asia Technical Briefs (ACM, 2013), p. 15.
39. C. Lu, L. Xu, and J. Jia, “Contrast preserving decolorization with perception-based metrics,” Int. J. Comput. Vis. 110(2), 222–239 (2014). [CrossRef]