Graph embedded incomplete multi-view clustering method with proximity relation estimation

: More and more importance has been attached to multi-view clustering due to its outstanding performance against single-view clustering. Existing studies are on the basis of the assumption that every sample exists in all views. However, in reality, this is often not the case and the real data are always with incomplete views, which lead to the failure of the conventional methods. To address the issue, the authors present a novel method, referred to as incomplete multi-view spectral clustering with proximity relation estimation (IMSC_PRE), which exploits the proximity relations of samples in different views to complete their affinity graphs. Different from existing methods, IMSC_PRE fully excavates the manifold structures of all views to estimate the proximity relations of missing samples in each view, which presents a brand-new strategy to exploit the proximity information lying in the whole samples with good interpretation. Experimental results on four datasets demonstrate the advantages of the proposed method in comparison with the state-of-the-art methods.


Introduction
Nowadays, a great many objects to be recognised or detected are comprised of multiples modalities due to the prosperity of diversiform-data acquiring technique. For example, evidence used for real estate valuation crawled by web spiders may contain environmental images, press comments, economic indicators etc. A certain story is always described in different languages by several newspaper offices, and each kind of language can be regarded as a single view. All above cases illustrate that almost everything can be displayed from distinct perspectives. Clustering data with multiple modalities is proved to be more effective than clustering with single modality by lots of experiments these years, one palpable reason of which is that features from distinct channels always hold complementary and discrepancy information, which can enhance the performance [1,2].
To this day, extensive studies on multi-view clustering have been carried out [1, 3,4]. Kumar et al. [5] introduced a coregularised constraint to utilise traditional spectral clustering method on multi-view data. By minimising the disagreement of kernel matrices constructed from low-dimensional representation of data in diverse views, the method learns a common compact representation of original data. By constraining the l 1 norm on the basis matrix learnt from each view, Liu et al. [6] provided a joint non-negative matrix factorisation (NMF) learning framework rendering meaningful probabilistic interpretation to the common representation matrix. All above techniques hold a common assumption that all modalities contain the whole samples; however, it is unreachable in lots of situations. For instance, a story may not be carried by all the newspaper offices. In other words, there is no newspaper agency covering all the news, which essentially provides an incomplete view. Such situations result in the failure of the existing methods. The incompleteness of views leads to the massive loss of information, and it is difficult to learn a consensus representation due to the different missing rates of views, making the incomplete multi-view clustering become a challenging problem.
To deal with the incomplete challenge, based on NMF, Li et al. [7] presented partial multi-view clustering (PVC) to partition data with two views, in which the representation of paired samples (i.e. the samples which embody all views) are forced to be the same.
Remedying the objective function of PVC by appending a graph Laplacian term, method proposed by Zhao et al. [8] can learn a unified affinity graph over all views for incomplete multi-view clustering. Nevertheless, above two methods can only handle datasets with two views, and they both hold a hypothesis that there are some samples embodying all views, which restrict the fields of application [9]. Hu and Chen [10] first defined an indicator matrix for every individual view and proposed a weighted matrix factorisation-based incomplete multi-view learning model, which is able to handle the clustering cases with more than two views. In [11], missing instances were filled with the average features of the corresponding view, and then a diagonal indicator matrix for each view was set to endow available instances and filled instances with different weights to minimise the negative effect of the missing samples. To some extent, the methods mentioned above can meet the challenge of incomplete multi-view clustering problem; however, they are all based on NMF, resulting in their incapability to datasets with non-linear intrinsic structure in the incomplete multi-view clustering task.
In this paper, a novel graph embedded incomplete multi-view clustering method, dubbed as incomplete multi-view spectral clustering with proximity relation estimation (IMSC_PRE), is brought forward to address the issue. First, IMSC_PRE constructs an affinity graph with the k-nearest criterion (k-nearest graph) from the available instances of each view. Second, we make estimations of the proximity relations related to missing instances by averaging the proximity relations from the other views. Third, a coregularisation term is introduced to get a more discriminative and compact representation shared by all views. On account of the fact that the information carried by each individual view may have significant difference, distinct weights are assigned to the learning models of distinct views.
Overall, the proposed method holds three contributions as follows: (i) our method is the first work to address the incomplete challenging problem from the angle of learning multiple completed affinity graphs from multiple views with arbitrary incomplete conditions. (ii) Compared to the existing methods, our method can well-excavate the manifold structure lying in the multiple views, which is beneficial to group the data with non-linear structure. (iii) In our method, suitable weights of distinct views are adaptively learnt, which can help the modal acquire a compact and J. Eng discriminative consensus representation. Experimental results conducted on several datasets show that our method significantly outperforms the state-of-the-art methods.

Related work
In this section, we briefly introduce the single-view spectral clustering [12]. For data X ∈ R d × n with n samples and d dimensions, we first construct a symmetric similarity graph W ∈ R n × n , whose element W i, j represents the similarity weights of samples x i and x j . The learning model is formulated as follows: (1) In the above formula, U ∈ R n × c is the low-dimensional representation of origin data X, c is the number of clusters. Under the radio-cut measure, L ∈ R n × n is the unnormalised Laplacian matrix of similarity graph W calculated by L = D − W, where D is a diagonal matrix with its ith diagonal element calculated by While under the normalised-cut measure, L ∈ R n × n is the normalised Laplacian matrix of similarity graph W calculated by L = I − D −1/2 W D −1/2 , in which D is the same as before.
It is easy to optimise problem (1), with the columns of solution U equalling to the eigenvector set corresponding to c smallest eigenvalues. Then, traditional clustering method such as k-means is conducted on the representation U to obtain the final clustering result.

Construct complete affinity graphs of distinct views: Let
be the set of instances in the vth view, where n v and d v denote the number of available instances and dimension of the vth view, respectively. To excavate the local manifold structure of each view, we first construct a k-nearest neighbour graph for each view, noted as W (v) ∈ R n v × n v , whose elements are acquired by the formula as follows [13]: where Ψ x j (v) denotes the k-nearest neighbour set of the instance Owing to the incompleteness of the problem, the size of individual affinity graph W (v) is less than the total number of samples, i.e. n v < n, where n denotes the total number of samples. Here, we present a novel strategy to complete the affinity graph by fully excavating the intrinsic structures of all views. To enable the affinity graph W (v) to reveal the proximity relations across all samples, we complete the missing relations concerning absent instances according to the proximity relations of all views.
First, we can exploit the following formula to extend the size of each incomplete graph to the same size n: where G (v) ∈ R n v × n is the index matrix of the vth view, defined as follows: where denotes the set of original instances including the available instances and missing instances of the vth view.
In (3), all elements related to the absent instances are filled in with 0 in the extended affinity graph W (v) ∈ R n × n of each view. However, when the missing ratio is large, these graphs do not contain sufficient information to guide the clustering, resulting in bad performance. To address the issue, based on the idea that proximity relations between instances are approximately consistent across multiple views, we propose a simple and effective approach by exploiting the proximity relations of all views to further complete the graph. Specifically, for each missing instance x i (v) of the vth view (that is to say, the vth view of sample x i is absent), we exploit the following formula to estimate its proximity relation: where W i, : k denotes the ith row of the affinity graph W k of the kth view, i.e. the proximity relations related to the ith instance in the kth view x i k . H ∈ R n × l is a matrix pre-defined to denote the occurrence of the samples in the views, the detail is formulated as follows: Formula (5) enables us to estimate the proximity relations in the missing view and excavate complementary information across all views simultaneously. Then, we symmetrise the graph acquired above by exploiting the following formula: Till now, the affinity graph for each view is completed, by fully excavating the local manifold structures of multiple views. We denote the completed graph as W (v) .

Learn a consensus representation across all views:
After we obtain the completed graphs of all views, we can learn the compact representation of each view by exploiting the spectral clustering as follows: where U (v) ∈ R n × c is the low-dimensional representation of the vth view, c is the number of clusters, L (v) is the Laplacian graph for the vth view, which is calculated by is a diagonal matrix, whose ith diagonal element is calculated by . By virtue of the fact that each view is just a representation form of the same sample set, samples across all views hold a consensus representation U * ∈ R n × c naturally [1,14]. To learn the consensus representation U * for clustering, we introduce a co-regularisation term as follows: where function Γ U (v) , U * is the disagreement term to judge the disagreement of the consensus representation U * and representation U (v) , λ is the trade-off parameter between the above disagreement term and spectral clustering objects. Specifically, introducing the co-regularisation term enables the method to capture the complementary information across different views, which is beneficial to improve the clustering performance. Γ U (v) , U * is defined as follows [ where K U * and K U (v) are the kernel graphs of representations U * and U (v) , respectively. Inspired by [5,15], we exploit the linear kernel to construct the kernel graph, i.e. K x i , x j = x i T x j ; thus, we have K U = UU T and ∥ K U ∥ F 2 = c. By overlooking the scaling factor, we can rewrite (10) as follows: Overall, the objective function to learn the consensus lowdimensional representation of all views can be presented as the following formula:

Endow different weights to distinct views and overall objective function:
Taking into the consideration the fact that the discriminative powers of different views maybe extremely unequal due to the distinct missing rates and meanings of feature, we assign a weight to each view to represent the significance of each view, which can be formulated as follows: where the symbol Υ (v) refers to the objective function of the vth view, α (v) is adopted to weight distinct significance of different views, and r > 1 is the super-parameter to smooth the contribution. By endowing each view with a suitable weight, the proposed method is able to learn a more compact and discriminative representation. Overall, the objective function is formulated as follows:

Model optimisation
The constrained problem (14) is difficult to solve; thus, we update the unknown variables iteratively to achieve the local optimum: Step 1: Update the consensus representation U * : (14), concerning U * can be written as follows with U (v) and α (v) fixed: This is a typical eigenvalue decomposition problem, and the optimal solution to U * is the eigenvector set corresponding to c biggest eigenvalues of matrix ∑ Step 2: Update representation U (v) : By fixing U * and α (v) , the objective function concerning U (v) can be rewritten as follows: Equation (16) is also a typical eigenvalue decomposition problem, and the eigenvector set corresponding to the first c biggest eigenvalues of matrix λU * U * T − L (v) is selected as the optimal solution to variable U (v) .
Step 3: Update weights α (v) v = 1, …, l : The optimisation problem concerning α (v) is degraded to the following formula: where The solution to (17) is formulated as follows [2]: Algorithm 1 (Fig. 1) summarises the optimisation procedure wholly. After the consensus representation U * is acquired, k-means algorithm is utilised on U * to acquire the final clustering result. Notably, two kinds of incomplete multi-view datasets are constructed as follows: (i) For the BUAA and Caltech7 datasets, we construct several incomplete two views datasets with 10, 30, 50, 70, and 90% randomly selected samples as paired samples, and then half of unpaired samples discard the first view, with another half discarding the second view. (ii) For the BBCSport and 3 Sources datasets, we randomly remove 10, 30, and 50% samples from each view, ensuring that no sample lost all views simultaneously.
To assess the performance of each method, three metrics were adopted in our experiment, i.e. the clustering accurate (ACC), normalised mutual information (NMI), and purity (PUR) [21].
There are several points worth noting as follows: (i) For the compared methods excluding BSC and Concat, grid search strategy J. Eng is performed to acquire the best results, which are reported after. (ii) To make it fair, we carry out all the compared methods several times on the used datasets and report the mean clustering results.

Experimental results and analysis
Experimental results of distinct methods on two kinds of incomplete multi-view datasets are displayed in Tables 2 and 3, in which the best results are denoted as bold. We discover the following points from the experimental results. (i) In the experiment, BSV and Concat perform worst than others in most circumstances. They both fill in the missing instances with the average features of the corresponding views, which is harmful to the performance. In contrast with above strategy, our method, which aims to estimate the proximity relations related to absent instance, achieves good performance, which further illustrates the rationality of our method. (ii) From the experimental results, we can find that GPMVC and our method always perform better than PVC, which maybe attributed to the graph embedding technique that GPMVC and the proposed method utilised. This indicates that learning the intrinsic structure can help us to achieve better performance. (iii) For all datasets, our method outperforms the others in most situations, especially on the 3 Sources dataset. The phenomena strongly demonstrate the effectiveness of our estimation for missing proximity relations in affinity graph with proximity relations of all views.    shows the trend of PUR variation with the trade-off parameter λ on 3 Sources dataset with 10 and 30% missing rates. It can be observed that the clustering performance indicators are relatively stable in some interval ranges, which means that our method is insensitive to the super-parameters to some extent. Fig. 3 illustrates the trend of objective function value variation with optimisation iterations on BUAA dataset with 50% paired rate and BBCSport dataset with 30% missing rate. It can be observed from the figures that the objective function is not increasing, and decreases rapidly to the local optimum in the early several iterations. The phenomenon indicates that the proposed method holds a good convergence property.

Conclusion
In this paper, we present a novel co-regularised spectral clustering method with proximity relation estimation for arbitrary incomplete multi-view problem. Different from the existing methods, it is the first work to excavate proximity relations of all views to estimate the proximity relations of missing samples in each view during the affinity graph construction period. The co-regularised term helps the method acquire a consensus representation. We also endow each view with the suitable weight, which is adaptively learnt to achieve a discriminative and compact representation of origin data, followed by k-means clustering algorithm. Experiments on two types of incomplete multi-view datasets demonstrate the effectiveness of our method in comparison with some state-of-theart methods. In the future, we will focus on exploring effective incomplete multi-view clustering methods to handle large-scale datasets.