Endoscopic video defogging using luminance blending

Endoscopic video sequences provide surgeons with direct surgical field or visualisation on anatomical targets in the patient during robotic surgery. Unfortunately, these video images are unavoidably hazy or foggy to prevent surgeons from clear surgical vision due to typical surgical operations such as ablation and cauterisation during surgery. This Letter aims at removing fog or smoke on endoscopic video sequences to enhance and maintain a direct and clear visualisation of the operating field during robotic surgery. The authors propose a new luminance blending framework that integrates contrast enhancement with visibility restoration for foggy endoscopic video processing. The proposed method was validated on clinical endoscopic videos that were collected from robotic surgery. The experimental results demonstrate that their method provides a promising means to effectively remove fog or smoke on endoscopic video images. In particular, the visual quality of defogged endoscopic images was improved from 0.5088 to 0.6475.


Introduction:
Interventional endoscopes (e.g. bronchoscope and colonoscope) integrated with video cameras at their distal tips are widely introduced in minimally invasive surgery. The endoscope provides surgeons with real-time endoscopic video sequences that are shown on medical displays. On the basis of endoscopic vision or surgical field from these images, surgeons can directly visualise and examine abnormal tissues and treat or resect tumours in the body.
Unfortunately, the visual quality of endoscopic video images is unavoidably degraded because of surgical smoke or fog during robotic surgery. These endoscopic foggy images ( Fig. 1) are generally generated from a surgical processing called cauterisation, which is usually employed to limit the bleeding vessels, while other typical operations such as laser ablation can also bring surgical smoke in surgical field. Such fog or smoke commonly distracts surgeons who may wait for a while without doing anything until surgical smoke is gone, which increases surgical time. On the other hand, surgical fog also degrades the clear visualisation of the surgical field from the endoscope, as well as covers the structural details (e.g. vessel structures) on the organ surface. This harmful issue leads to inappropriate device use and incorrectly targeted tissue, increasing surgical risks such as in tissue or tumour resection during endoscopic surgery. Therefore, endoscopic video defogging plays an essential role in enhancing and maintaining a clear field of surgical vision, not only for safety by preventing inadvertent injury, but also for improving precision and reducing operative time.
Endoscopic field defogging methods generally consists of hardware-and software-based strategies. While the former uses typical devices to remove smoke, the latter is algorithmic, i.e. computational photography techniques. This work develops a new luminance blending strategy for surgical video defogging. It combines a contrast enhancement procedure with a fast visibility recovery method to remove fog or smoke on endoscopic video sequences. We also quantitatively and objectively evaluate the experimental results of using our proposed method and others. The main contributions of this work are two-fold: (i) a new luminance blending approach with better performance than other defogging approaches and (ii) an objective image quality metric for quantitative assessment of dehazed images.
The remainder of this Letter is organised as follows. Section 2 briefly reviews work related to current dehazing methods. Our hybrid luminance blending-based dehazing method for vision augmentation is presented in Section 3, followed by the experiment settings in Section 4. Sections 5 shows and discusses the validation results before concluding this work in Section 6.

Related work:
Real-world natural image and video dehazing or defogging techniques are widely discussed in computer vision and computational photography in the literature. Fattal [1] presented a graphical model used to calculate the atmospheric light for hazy-free image recovery. They assume that scene shading and transmission are locally independent of each other, which are not practical in applications. Tarel and Hautiere [2] introduced a fast visibility restoration strategy based on median filtering, but it usually results in colour distortion and easily fails at the image median filtering step that usually introduces null pixels. On the other hand, the fast visibility method also requires more efficient computation for real-time processing.
While He et al. [3] proposed dark channel-based atmospheric light and transmission estimation with soft editing, Meng et al. [4] employed the boundary constraint and contextual regularisation to modify this dark channel-based method, especially, they improved the computational efficiency and skipped soft editing. Nishino et al. [5] estimated two statistically independent components of the scene albedo and depth by using the Bayesian defogging model. While this Bayesian-based method works well, it also results in colour distortion. Ancuti and Ancuti [6] discussed a multi-scale fusion approach that combines the white balance with linearly transformed images extracted from hazy images. This multi-scale fusion approach is generally trapped in dealing with inhomogeneous fog due to loss of transmission depth information. While Sulami et al. [7] proposed a reduced formation model to describe image pixels in small patches as lines that are used to recover the atmospheric light orientation, Galdran et al. [8] presented an improved variational framework using inter-channel contrast in optimisation. More interestingly, fusion-based defogging is generally recognised as a promising framework to address the disadvantages of various dehazing methods [9].
More recently, deep learning-driven methods are increasingly developed for single image dehazing. While Ren et al. [10] employed multi-scale convolutional neural networks for single image dehazing, Li et al. [11] proposed All-In-One Dehazing Network (AOD-NET) to directly create the clean image through a lightweight convolutional neural network instead of separately 280 This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons. org/licenses/by/3.0/) computing the transmission map and the atmospheric light for single image dehazing. Moreover, Ren et al. [12] developed a deep video dehazing method based on semantic segmentation, which can effectively use the abundant information that exists across neighbouring frames for precise dehazing. Liu et al. [13] introduced a simple generic model-agnostic convolutional neural network trained end-to-end to recover clear images from hazy inputs.
Although these methods work well on natural images, they remain challenging to deal with surgical endoscopic video image fog or smoke, particularly in the case of inhomogeneous or thick haze. This work aims to address the problem of hazy images or videos with inhomogeneous or thick haze, particularly foggy endoscopic videos.
3. Approaches: This section details our luminance blending framework for surgical endoscopic video defogging. Our method contains several steps: (i) contrast enhancement, (ii) visibility recovery, and filtering, and (iii) luminance blending. Fig. 2 shows the flowchart of our processing, as discussed in the following section.
3.1. Contrast enhancement: Surgical foggy images are of lowcontrast and limited illumination, especially in hazy regions. The goal of contrast enhancement is to improve the contrast of hazyless regions on the endoscopic image and calculate the luminance L(u, v) and to enhance the luminance of the final defogged surgical image.
The contrast enhancement step assumes (i) most regions on the foggy image are hazy pixels that critically affect the mean of the foggy image and (ii) the level of haze in these regions depends on the distance between the atmospheric light and the scene, as discussed in [6]. On the basis of the assumption, we compute the enhanced luminance L(u, v) by the magnifying difference between the surgical hazy image I(u, v) and its average luminance value l in the three channels c [ {r, g, b} where b is the magnification factor to control the luminance of the augmented foggy regions and U × V are the width and height of the hazy endoscopic image. The original luminance H(u, v) at each pixel is calculated by [14] H where coefficients a = 0.299, b = 0.587, and c = 0.114.

Visibility recovery and filtering:
A widely used physical imaging model is established for hazy images by Koschmieder's law [3] where I(u, v) denotes an observed (foggy) image, F(u, v) refers to as a haze-free image (also called scene radiance), and A 1 indicates the atmospheric light or the sky luminance. The transmission map T(u, v) describes the amount of the unscattered light entering a camera and can be computed by where k and d(u, v) are the atmosphere's scattering factor and the distance between the camera and any objects in a scene. On the basis of (3), we aim to solve hazy-free image F(u, v) under the unknown variables A 1 and T(u, v). However, according to a fast visibility recovery method [2], we did not directly estimate T(u, v) since it is difficult to precisely predict the transmission map related to depth information. To skip T(u, v), the atmospheric veil X (u, v) was employed [6] Then, (3) can be rewritten to calculate F(u, v) This requires the atmospheric light A 1 and veil X (u, v) for which robust estimates can be obtained much more easily than the depth and transmission maps in the original formulation (3). The methods that are used to determine A 1 and veil X (u, v) have been discussed in [2]. Here, we skip the technical details of how to estimate light A 1 and veil X (u, v).
Since the result F(u, v) of the fast visibility recovery usually contains image noise and artefacts, we employ joint bilateral filtering to process F(u, v) and obtain J (u, v).
The bilateral filter is an edge-aware image processing method to denoise and simultaneously preserve edge information [15,16]. The concept of joint bilateral filtering is to perform spatial filtering (particularly a Gaussian kernel) on a low-resolution image and simultaneously apply a range filter to process a highresolution image (here the low-and high-resolution images refer to the recovery image F(u, v) and the original image I(u, v), respectively) [17] V(p, q) = exp p − q where p = (u, v), q = (û,v), variances s s , s c in the region N p centred at the pixel p, and K p is computed by which is the normalisation term to guarantee the sum of the weights for all the pixels to be one.
3.3. Luminance blending: This step is to estimate illumination on image J (u, v) and L(u, v) and blend their illumination to improve the illumination of the defogged endoscopic surgical image. We transfer the images J (u, v) and L(u, v) from the red, green and blue (RGB) to YCbCr colour space. For the Y-component or luminance component of them, we used recursive filtering [18] to estimate the illumination of J (u, v) and L(u, v) and obtain G J (u, v) and G L (u, v). By using image illumination G J (u, v) and G L (u, v), we seek to recognise pixels in hazy regions. So, a  [2] c M2 [3] d M3 [5] e M4 [6] f M5 [4] g M6 [7] h M7 (ours) i Thick-fog image j M1 [2] k M2 [3] l M3 [5] m M4 [6] n M5 [4] o M6 [7] p M7 (ours) weight function W K (G K (u, v) Note that the weight function W K (·) (also called the weight matrix) depends on the level of smoke. In the heavy-smoke case, if the foggy pixel intensity belongs to the range of 16-128, these pixels will be assigned with weight 1. In the thin-smoke case, the pixels on the interval [128, 235] will be assigned with weight 1. The luminance output O Y (u, v) may not be distributed into the full range of pixel intensity, resulting in a low-contrast image. We implement the following linear transformation to stretch its histogram to a specific intensity range [P, Q] . We tested about 1200 frames in this Letter. We compare the proposed method with the following approaches: (i) M1, Tarel et al. [2], (ii) M2, He et al. [3], (iii) M3, Nishino et al. [5], (iv) M4, Ancuti and Ancuti [6], (v) M5, Meng et al. [4], (vi) M6, Sulami et al. [7], and (vii) M7, our method.
We introduce a naturalness metric to depict how natural surgical images appear based on statistically analysing thousands of images [19]. On the other hand, we also employ structural similarity index (SSIM) [20] to evaluate structural information on images. Eventually, we define a hybrid quality metric c to evaluate defogged endoscopic images where S denotes the SSIM and N indicates the naturalness. The coefficient g is set to 0.6, which was experimentally determined to balance the structural information and naturalness.
5. Results and discussion: Fig. 3 visually compares the defogged results of endoscopic images with thin and thick fogs. The visual quality of the results demonstrates that our method works better than others since it removes fog without introducing colour distortion. On the other hand, two surgeons manually inspected all the dehazed results and generally believe that our defogged method outperforms other approaches since the subjective visual quality of using our method is more natural and colourful, which is better than the original foggy image. Table 1 quantitatively compares the objective assessment of the dehazed results obtained from the seven approaches. The quantitative assessment results show that our proposed approach outperforms other methods. While the average naturalness of M1 and our proposed method M7 were comparable, the average SSIM was improved from 0.7978 to 0.9275. More interestingly, the average hybrid quality of our methods was 0.6475, which was much better than other approaches (M5 provides 0.5088).
This work aims to enhance the surgical field visualisation of endoscopic surgery. We developed a new luminance blending defogging algorithm. The experimental results demonstrate that our algorithm outperforms others from subjective and objective evaluations. The effectiveness of our algorithm lies in fusing the advantages of the enhancement and restoration dehazing methods. Our method has several potential limitations including unclear parameter sensitivity, effective enhancement, quality assessment, and heavy processing time. These limitations will be further investigated in the future. In addition, though our method works better than other approaches, it still introduces colour distortion, which will be further investigated.

Conclusion:
We proposed a new luminance blending defogging framework that integrates contrast enhancement, joint bilateral filtering, and visibility recovery to remove smoke in endoscopic videos from robotic surgery. We evaluated our method on endoscopic video sequences acquired from robotic prostatectomy. The experimental results demonstrate the effectiveness of our proposed method, which outperforms other approaches. In particular, our method improved the hybrid quality of the dehazed results from 0.5088 to 0.6475.