Light ﬁeld editing in the gradient domain

This paper presents a new method for light ﬁeld applications such as content replacement and fusion in the gradient domain. This approach is inspired by successful gradient domain based image and video editing techniques. A necessary and important part of gradient-based solutions is recovering the signal of interest from artiﬁcially generated, and typically non-integrable, gradient data. As such, a new algorithm is developed to reconstruct a light ﬁeld from a given gradient data set. In the algorithm, ﬁrst, the 4D Haar wavelet decomposition of the light ﬁeld is obtained from the given gradient data. Then, the light ﬁeld is obtained from a wavelet synthesis step. This algorithm is intended as a building block for gradient-based light ﬁeld editing methods, and as such, its performance is analysed on a set of benchmark light ﬁeld data sets. The proposed reconstruction algorithm is an essential part in developing solutions for two light ﬁeld problems: light ﬁeld editing and light ﬁeld fusion. Results show that processing light ﬁelds in the gradient domain offers signiﬁcant advantages over processing in the intensity domain.


INTRODUCTION
Algorithms operating in the gradient domain have long been studied by the computer vision and image-processing research communities [1][2][3][4][5][6] and with good reason. The gradient of an image encodes local contrast information, to which the human eye is highly sensitive. Recent developments [7] expand the conventional way of modelling light in a scene, by relating intensity not only to the position of a point in the scene but also to the direction from where rays of light hitting that point come.
In this way, a four-dimensional (4D) model called a light field is obtained to depict light from a scene. This model is a simplified version of the plenoptic function [8][9][10]. Light field scene representation recently garnered research interest, mainly due to its desirable feature of allowing users to change the focal plane and depth of field of a rendered scene [7,[11][12][13][14][15] or to eliminate the background of a scene by depth filtering, after the photo was taken [16], but also due to many other emerging applications as outlined in [17]. The richness of information that makes light fields so useful comes, however, with inherent trade-offs and challenges. These include designing and building systems to acquire and store light fields, as well as displays capable of rendering them in meaningful and intuitive ways for day-to-day users. Image-processing problems and the feasibility of their solutions are worthwhile This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology reassessing in a light field framework [17]. This paper addresses the question: Are there light field applications where the gradient domain provides an attractive alternative to the intensity domain? In order to answer this question, first, a 4D integrator was devised and its performance to reconstruct a light field from a set of derivatives was assessed. Then, as an illustration of its use, the integrator was included in two applications, light field editing and light field fusion.
The main developments reported in this work are • an algorithm that can be used to reconstruct a light field from given gradient data; • an evaluation study of this algorithm's performance to reconstruct a set of light fields; • a new gradient-based algorithm that can be used for light field editing tasks such as content replacement and transparency; • a new gradient-based fusion algorithm that can be used to merge two light fields captured under different illumination conditions.

RELATED WORK
Light field editing is challenging mainly because light fields have more dimensions than traditional digital images (i.e. four vs. two dimensions). This higher signal dimensionality hinders visualisation and requires more memory than typical (2D) image editing. Recent efforts [18,19] focussed on understanding users' preferred modes of interacting with light fields, as well as possible workflows. In [18], users had access to a multiview interface and a multifocus interface, with the purpose of performing simple editing tasks on a given light field (e.g. changing colour or brightness of a certain surface or painting on an object). In the multiview interface, users could select a specific subaperture image from the light field and were shown how the edit would propagate in the light field. In the multifocus interface, users could select the depth in the light field at which editing tasks were performed. Later versions of these two interfaces [19] incorporated depth selectivity. The agreement was that the preferred interface is determined by the task at hand. An editing patch based framework called PlenoPatch was introduced by Zhang et al. [20]. The considered editing tasks were removing an object from a given light field, modifying its depth, increasing the resolution of the light field and parallax magnification to increase the depth range of the scene. Their technique used information from a 2D image captured with a digital single-lens reflex camera, with the goal of increasing the resolution of a light field captured with a Lytro [21] camera. An important assumption in their work was that the two cameras depict the scene from the same location. The quality of their results depends on depth estimation.
Assessing the quality of a light field is an open problem. Fu et al. [22] presented a study that compares conventional images and light field images, using an ISO testing standard. Vieira et al. [23] and Viola et al. [24,25] used subjective evaluation and peak signal-to-noise ratio (PSNR) to objectively rank the quality of compressed light field images, by comparing individual images (subaperture images) from a reference light field with individual images from a compressed version of the light field. This paper focusses on developing an algorithm that can be used to reconstruct a light field from gradient data, and the quality of the reconstructed light field is assessed. Then, the algorithm is used in solutions for two light field editing problems: content replacement and light field fusion.

PRELIMINARIES
The approach proposed here to reconstruct a light field (4D signal) from a given gradient was motivated by the connection between the light field gradient components and the 1D Haar wavelet analysis filters. In this section, the Haar wavelet transform and its connection to the signal gradient are reviewed.

Haar filters
The transfer functions of the Haar wavelet filters are [26] H H These filters compute differences and sums of consecutive samples of a given signal, scaled by 1∕ √ 2. The Haar wavelet decomposition of a 1D signal is obtained via a process called wavelet analysis, which begins by filtering the signal with the filters given by Equation (1) and (2) followed by subsampling by 2. This splits the signal in two lower resolution components: an approximation component, obtained by filtering with low-pass filter H L (z ) and a detail component, obtained by filtering with high-pass filter H H (z ). The resolution loss is due to subsampling. The filtering and subsampling procedure is then repeated on the approximation component of the decomposition. For a 1D signal with 2 M elements, the full decomposition is obtained after M such steps. The reversed process, which recovers a signal from its wavelet decomposition is called synthesis, and is done by filtering and up-sampling by 2. This multiresolution signal representation has been used in signal denoising, compression or editing applications.
The analysis -synthesis process is generalisable for signals of higher dimensions [27]. In general, for a multidimensional signal with n dimensions, one step in the analysis splits the signal into 2 n − 1 detail components and one approximation component. When the signal is not symmetric in size (e.g. for a 2D signal with size 2 M × 2 N , where M ≤ N), there are M resolution levels in a full wavelet decomposition [28]. This number of resolution levels is also the highest possible for a 4D signal with size This paper proposes an algorithm in which light fields are reconstructed from gradient data by finding the Haar wavelet decompositions of the said light fields, followed by obtaining the light fields from the decompositions via Haar wavelet synthesis. This approach is motivated by the close connection between the Haar wavelet high-pass filter H H (z ) and the gradient approximation model. This connection is briefly reviewed in the next subsection.

Light field gradient
Let Φbe a light field, that is, a 4D signal, depicting intensity as a function of four variables d 1 , . The multiview light field representation is considered, and the convention is that T be the gradient of the light field. A popular gradient discretisation model approximates each directional gradient component with differences between adjacent signal samples, in the considered direction. In other words, each gradient component Φ i is where z −1 i is the shift operator in the i th direction and Φ(1 − z i −1 ) denotes filtering Φ with a filter with transfer function 1 − z i −1 . With the notation of Section 3.1 in mind, the Haar wavelet analysis high-pass filter H H (z ) can be used to express the components Φ i of a light field gradient ∇Φ as It is this connection that motivated the reconstruction technique developed by Hampton et al. in [29], in the context of wavefront reconstruction, for adaptive optics. This reconstruction technique was further explored and expanded in image and video processing-applications [1,2], and is exploited in this work in light field applications.

LIGHT FIELD RECONSTRUCTION FROM GRADIENT
An algorithm is developed here for the reconstruction of a light field from gradient data. In the first step of the algorithm, called analysis, the wavelet decomposition of the light field is obtained from the given gradient. In the second step, called synthesis, the light field is obtained from the wavelet decomposition.
First, the objective is to obtain the wavelet decomposition from the given gradient of the unknown 4D signal. As mentioned in Section 3.

Analysis step -Detail components
The first 15 detail components are found from the given gradient by following the procedure described in the Appendix. The remainder of the component Φ M −1 LLLL is found iteratively, by obtaining the 15 detail coefficients at consecutive resolution levels, starting with level M − 2 down to 0. The procedure to find these coefficients is a 4D extension of the techniques presented in [1,2] and its details are found in the Appendix.

Analysis step -Approximation component at the lowest resolution
Once all detail components of the Haar wavelet decomposition have been obtained, the lowest resolution approximation component -denoted by Φ 0 LLLL -must be found. This component has size 1 × 2 N −M × 2 P−M × 2 Q−M and its elements are proportional to the sums of all signal samples from consecutive hypercubes with size 2 M × 2 M × 2 M × 2 M . For a 2D visualisation of why this is the case, the reader is referred to [28].
The redundancy in the light field is exploited to produce an estimate of Φ 0 LLLL . This process begins by extracting the gradient of the central view of the light field from the given gradient. This 2D gradient is then used to reconstruct the central view of the light field using the technique developed in [1]. The central view in the light field is then used to approximate the elements of Φ 0 LLLL , by computing the sums of all the elements in consecutive square regions of the central image. This approximation works well when the similarity between light field views is high, as will be illustrated in Section 5.

Synthesis step
Once the signal wavelet decomposition is obtained from the signal gradient as described in Sections 4.1-4.2, the 4D signal is reconstructed using a modified version of the 4D Haar wavelet synthesis process. The modification introduced in the (standard) wavelet synthesis consists in using a 4D iterative Poisson solver at each resolution in order to correct errors, or smooth out discontinuities introduced in the reconstruction by approximating the elements of Φ 0 LLLL . This is an extension of the approach described in [1,2], and its impact is studied and visualised in 2D and 3D in [2,28].

QUALITY EVALUATION OF RECONSTRUCTED LIGHT FIELDS
The reconstruction algorithm was applied to reconstruct 49 light fields from five light field data sets [18,[30][31][32], and the findings are reported in this section. Finding an index that ranks the quality of digital images as human evaluators do is a topic of research in itself. Despite many indices having been developed, no index ranks the results of all image processing and computer vision algorithms in complete agreement with human evaluators. The consensus, however, seems to lean towards PSNR and structural similarity index measure (SSIM) [33], whenever a reference image is available. As such, our approach to assess the ability of the devised algorithm to reconstruct 4D light fields from a gradient data set is based on these two indices. Specifically, we determine the quality of the reconstructed light field by comparing corresponding views from the reconstructed and the original light field, in terms of PSNR and SSIM.
For example, to determine the quality of a reconstructed light field with size 5 × 5 × 256 × 512, each view from the reconstructed light field was compared to its corresponding counterpart from the original light field. The PSNR and SSIM values for the 25 image pairs were computed and a 5 × 5 matrix was generated for each index. The average value of each matrix was then used as an overall quality index for the reconstructed light field.
Three different scenarios are compared. In the first scenario, Φ 0 LLLL is approximated using a constant value and the quality of the reconstructions obtained is depicted by the red contours in Figures 1 and 2. In the second scenario, Φ 0 LLLL is approximated as described in Section 4.2 and the quality indices are depicted in green. In the third scenario, the exact values of Φ 0 LLLL were  [32], (c) light fields from database [18], (d) light fields from database [31] and (e) light fields from database [30].  [32], (c) light fields from database [18], (d) light fields from database [31] and (e) light fields from database [30]. The average performance of the three methods with respect to the two indices on each of the studied data sets is shown in Tables 1 and 2. The values shown in each column of Table 2 are average SSIM values of all points situated on one contour of each graph shown in Figure 2. For Table 1, the values shown in each column are the actual average PSNR values of all points situated on one contour of Figure 1, and not the base 10 logarithmic representation, which was only used there to allow for visualisation. Overall, the results show that the proposed algorithm works well to reconstruct light fields from a given gradient data set.

LIGHT FIELD EDITING
In this section, a gradient-based approach for light field content replacement is presented. Its performance is illustrated on an example that combines content from a different spatial resolution version of the same light field (data set available in Ref. [31]).

The algorithm
The editing algorithm proposed here was developed for the purpose of light field content replacement. Notation: • u and v: the coordinates of a view in a light field; • x and y: the coordinates of a point in a view.
The brightness value of a considered light field point is denoted Φ(u, v, x, y), in agreement with the notation in Section 3.1, under the convention that d 1 = u, d 2 = v, d 3 = x and d 4 = y.
Given a source light field Φ s and a target light field Φ t the objective is to generate a new light field Φ, by replacing content from a user-specified region of the target light field Φ t with content from the source light field Φ s . As this task is approached in the derivative domain, the proposed algorithm begins by computing the gradients of both light fields.
Let the gradients of the two light fields be denoted by where directions u and v index the array of images in the multiusing view representation of the considered light field and x and y index points in a view.
To obtain the gradients of the edited light field in the userspecified region of the source light field, the following procedure was used. First, the magnitude of the source light field gradient was computed: Let where Selecting the threshold T was driven by the gradient's ability to depict edge information. The best results for the application presented in Section 6.2 were obtained when the threshold was set equal to the sum of the mean and median values of each gradient magnitude view.
Each of the binary masks b was then placed in a 4D signal B (viewed as 2D array of 2D images), at the location (u, v) that corresponds to the selected view M 2D (x,y). The 4D signal B was then used to weigh the gradients of the source and target light fields and to generate the gradient of the edited light field: where the symbol "•" denotes element-wise multiplication, and 1 denotes an all-ones 4D signal. Equation (10) defines the gradient of the edited light field in a user-specified region. Outside of that region, the edited gradient was set to be equal to the gradient of the target light field. The gradient data set Φ e was

RESULTS
A light field called jellybeans from database [31] was the starting point of the two examples presented in this section. Two modified versions of the given light field were obtained: the first version by cropping each view to eliminate border artefacts, and the second version by resizing the first one to a lower (spatial) resolution version. The dimensions of the two light fields are listed in Table 3. The fifth dimension indexes colour information (RGB representation considered). The objective was to seamlessly insert the jellybeans from the smaller light field (referred to as source light field) in the larger version of the light field (target light field), at a specified location (determined by its spatial location in a view), with minimal intra-view artefacts and no inter-view artefacts. The editing task was carried out following the procedure described in Section 6.1, with each channel of the colour light field processed individually. Three iterations of the 4D Poisson solver described in [1,2,28] were used at each resolution level during the synthesis step described in Section 4.3. The average brightness value of the central view of the target light field was used to correct the reconstructed light field.
Four of 289 views of the resulting light field are shown in Figure 3. Content from the source light field was inserted in the target light field and can be observed at the bottom of each view. The light field shown in Figure 4 is an illustration of how the two light fields were combined to achieve a transparency effect.

LIGHT FIELD FUSION
In this section, a gradient-based method for multi-exposure light field fusion is presented. Its performance is showcased on an example that combines an over-exposed and an underexposed version of a light field downloaded from the 4-D light field data set available at Ref. [30].

The algorithm
The fusion algorithm proposed here is based on merging two light fields in the gradient domain. The technique relies on operations on the gradient values of the two given light fields, and has two main steps: Step 1: A gradient data set is obtained by combining the gradients of the two given light fields.
Step 2: The fused light field is obtained from this gradient data, with the help of the technique proposed in Section 4.
To complete the first step of the algorithm, namely to produce the gradient from which the fused light field is generated, first the magnitudes of the gradients of the given light fields are computed and compared. This comparison produces a 4-D index map with the locations of the largest gradient components of each of the two source light fields (largest in magnitude). This map is then used to combine the two gradient data sets, by selecting from each data set the points corresponding to  [34]. The final step in the proposed fusion algorithm is obtaining a fused light field from the gradient using the reconstruction algorithm presented in Section 4.

Results
A light field called dishes from database [30] was used in this example. The size of the original light field is 9 × 9 × 512 × 512. All subaperture images were modified to simulate highly underexposed and overexposed light field representations of the scene. The two light fields were fused using the technique proposed in Section 7.1. In Figure 5, the central views and the corresponding spatial gradient components of the luminance channel of these two light fields are shown. The central views of the underexposed and overexposed light field are shown in Figure 5(a) and (d). The spatial derivatives of the central image shown in Figure 5(a) are illustrated in Figure 5(b) and (c), and those of Figure 5(d) in Figure 5(e) and (f). As can be observed from Figure 5, certain details, such as the writing, were largely lost in the underexposed version of the light field, while others, such as the dish contours, were lost in the over-exposed version. The underand over-exposed light fields and the obtained fused light field are illustrated by three representative views in Figures 6 and 7, respectively. The views shown on each row of Figures 6 and 7 differ from each other by a small shift in perspective. As Figure 7 reveals, details that were easily missed in the underexposed or overexposed versions of the light field ( Figure 6) became visible in the fused light field, and inter-view consistency was maintained.

CONCLUSIONS
This paper developed a technique to reconstruct a 4D signal from a given gradient data set, specifically for light field applica-

FIGURE 6
Three corresponding views of the under-exposed (top), and over-exposed (bottom) light fields, respectively

FIGURE 7
The corresponding views of the fused light field, obtained by merging the two light fields in Figure 6 tions. The algorithm was tested and shown to perform well on several benchmark data sets. This method was used to develop solutions for two light field applications: light field editing for content replacement and transparency, and light field fusion for better visibility. The examples presented indicate that the proposed approaches give satisfactory results. The reconstruction algorithm from gradient data is expected to be used in other light field applications in the gradient domain.
To completely determine the first level of the wavelet decomposition, the approximation component Φ M −1 LLLL needs to be found. The approach described here for this task is recursive, and entails finding the 15 detail components at consecutive resolution levels, for all t = M − 2 down to 0.
The following notation is introduced: where i, s ∈ {1, 2, 3, 4} and Φ i are a directional component of the given gradient data. Then, for t = 2 to M, the 15 detail components are given by While Equations (A.17-A.31) do not appear to change for different t, the change comes from the fact that at each resolution, the coefficients Φ F i are updated according to for all i, s ∈ {1, 2, 3, 4}.