Color image segmentation based on a convex K-means approach

Image segmentation is a fundamental and challenging task in image processing and computer vision. The color image segmentation is attracting more attention due to the color image provides more information than the gray image. In this paper, we propose a variational model based on a convex K-means approach to segment color images. The proposed variational method uses a combination of $l_1$ and $l_2$ regularizers to maintain edge information of objects in images while overcoming the staircase effect. Meanwhile, our one-stage strategy is an improved version based on the smoothing and thresholding strategy, which contributes to improving the accuracy of segmentation. The proposed method performs the following steps. First, we specify the color set which can be determined by human or the K-means method. Second, we use a variational model to obtain the most appropriate color for each pixel from the color set via convex relaxation and lifting. The Chambolle-Pock algorithm and simplex projection are applied to solve the variational model effectively. Experimental results and comparison analysis demonstrate the effectiveness and robustness of our method.

The Mumford-Shah (MS) model [18] is one of the most popular and influential variational segmentation models. The MS model proposed an energy minimization problem which approximates the true solution by finding optimal piecewise smooth approximations. However, it is very challenging to find or approximate its minimizer due to the MS model is nonconvex. It is worth noting that Cai et al. proposed a two-stage strategy, i.e., smoothing and thresholding (SaT), for image segmentation [25]. In the first stage, a convex optimization problem based on the MS model is solved to get a smooth approximation u of the given image f . This stage could be regarded as the process of smoothing. Then in the second stage, they used the K-means algorithm or other clustering algorithms to obtain thresholds for segmentation, which can be seen as the process of thresholding. The segmentation stage is independent of the optimization problem in the first stage, one can change the number of phases K or thresholds ρ without resolving the optimization problem.
Since the SaT strategy is a flexible methodology for image segmentation, it has been adopted and followed by many papers [26], [19], [20], [27], [28], [29], [30], [31], [21]. And the strategy was extended to images corrupted by Poisson or Gamma noises [32], color images [1] and classification for high-dimensional data and point clouds. For example, literature [1] proposed a three-stage method, i.e., Smoothing, Lifting and Thresholding, achieving a good result for segmenting images corrupted by noise. In [21], Cai et al. revealed the connection between image restoration and segmentation by using the SaT strategy.
In the above segmentation methods based on the SaT strategy, thresholding is recognized as an effective method of image segmentation. A great number of thresholding techniques have been widely used for grayscale and color image segmentation. Furthermore, some clustering algorithms have been used by many image segmentation models to determine thresholds for segmentation, such as the fuzzy clustering [23], [24], [33], [5], [34] and the K-means clustering [1], [25], [26], [32], [35], [36], etc. In [23], a thresholding method based on fuzzy set was proposed to deal with medical images, and the method got good performance visually and numerically. Jia et al. [34] presented a robust self-sparse fuzzy clustering algorithm for solving over-segmentation.
The K-means method is popular due to its simple principle and high calculation rate. In 2017, Condat proposed two methods for image segmentation based on a convex approach to the K-means method. The paper [2] proposed a definition of discrete total variation (TV) and applied it to the segmentation of color images. Meanwhile, the method [37] showed a convex formulation of data clustering and image segmentation based on the "Potts" model (see Section II-A for details), convex relaxation and lifting. However, the K centroids are pre-assigned by the author before the segmentation takes place. The above methods [37], [2] introduced a large number of candidates which did not improve the image quality enough to be worth the extra computing cost, although avoided dealing with a non-convex problem. Furthermore, models in [37], [2] are both based on (a) Given image (b) SLaT [1] (c) Proposed TV regularization, which could cause the staircase effect, especially for noisy images. For the SaT strategy, the segmentation is carried in the thresholding stage and it uses the smoothed image obtained in the smoothing stage. The SaT strategy is quite important, and it can keep main features of the image while clear out the noise and meaningless features so that in all cases it can obtain good segmentation results. Besides, the K-means method has been widely used in segmentation method to determine the thresholds for the thresholding stage in the SaT strategy. However, Figure 1(b) is the result of SLaT [1] which used classical K-means framework to perform the segmentation. We can see that the pixels around the stalks are wrongly classified, there are many "red dots" around it (see the marked area).
Thus the goals of this paper are summarized as follows: a. to avoid solving non-convex and combinatorial problem for the sake of stability; b. to reduce the staircase effect in the model based on the TV regularization; c. to inherit the advantages of the SaT strategy especially the smoothing stage; d. to adopt a new strategy for segmentation to determine the most appropriate color for each pixel.
In summary, we propose a new method to achieve the above goals. The proposed method combines convex relaxation based on the K-means method and lifting. We perform an improved convex relaxation for the classical Kmeans method and transform the problem into an assignment problem by the technique of lifting [1], [38], [37], [2], [39]. On the one hand, we add a square of the first-order derivative term based on the variational method [37] to reduce the staircase effect. The segmentation performance can be well guaranteed by the combination of l 1 and l 2 regularizers. On the other hand, we use the strategy of lifting to determine the most appropriate color for each pixel from the color set, which could overcome the drawback of the classical K-means method in color image segmentation.
The organization of this paper is as follows. In Section 2, we give the basic knowledge used throughout the paper. In Section 3, we propose our model for image segmentation and present the Chambolle-Pock algorithm to solve the suggested model. In Section 4, we present numerical experiments and some comparisons to show the superiority of our one-stage method. Conclusions are drawn in Section 5.

A. Potts model
Firstly, we give some definitions for the classical K-means method in image processing. The data f = (f n ) n∈Ω is a 2D image with values f n , where domain Ω = {1, ···, N 1 }×{1, ·· ·, N 2 }, and n = N 1 × N 2 . The classical K-means problem minimizes the sum of Euclidean distance from every points f n to the nearest centroid c k , aiming at partitioning N points of R d into K groups, i.e., it solves the following problem to partition points into different clusters Secondly, we consider a general problem based on the classical K-means problem. Given an integer K ≥ 2, one wants to partition Ω into K regions Ω k (so K k=1 Ω k = Ω and Ω k Ω k = ∅, for all k = k ), and to find the corresponding centroids c k ∈ R d , so as to per (Ω k ) .
(2) (a) Three-phases In the above model (2), the first term is the data fidelity term, the second term is the penalization of the sum of the region perimeter (the region perimeter is per (Ω k ), and per denotes the perimeter). The parameter λ is to control the level of spatial regularization, which has been adopted in the NP-hard piecewise constant Mumford-Shah problem [40], [18] to obtain spatial homogeneity by penalizing the sum of the region perimeter. The model is named as "Potts" (Penalization of the region perimeter) model. The classical K-means problem is the special case of the equation (2), then we can partition the point cloud f = (f n ) n∈Ω whose c k are the means. Please see [37], [2] for more details.

B. Hill-climbing method
Traditional hill-climbing segmentation [41], [42] is a simple, fast algorithm that clusters colors of an image without any hand-tuning of parameters. Li et al. made a few modifications to get a more stable hill-climbing procedure [20]. Moreover, the most significant advantage of hill-climbing is that it does not need prior knowledge of the number of clusters or the content of the given image and can detect the number of segments. As papers [43], [20] mentioned, the number of peaks is equal to the number of clusters, i.e., the number of segments (K). This process is quite simple to implement: we just group the matrix f into a 3D histogram, then the hill-climbing algorithm attempts to locate the peaks on the color histogram [20]. For more details, see [43], [41], [20], [42] and references therein.

C. The smoothed Potts model
As discussed in the above subsection, we first use the hillclimbing procedure to detect the segments (K) in advance and then we use the K-means algorithm to get the set Σ = c k ∈ [0, 1] 3 , where k = 1, . . . , K of K ≥ 2 colors c k , denoting the triplets of R, G, B values. We can use the MATLAB K-means function "kmeans" to get the c k . Here, it is worth mentioning that there is no doubt that the color set could be specified by human or the K-means method. We can choose the most suitable color for each pixel by the segmentation accuracy to get a finer result.
We can define the segmented RGB color image u, whose R, G, B values at every pixel is one of the c k , close to f (also known as piecewise constant MS problem).
In the segmented image u, every element u n is one of the centroids c k , thus we can use the lifting strategy [1], [38], [37], [2], [39] to reformulate the problem (2) as an equivalent assignment problem, then the unknown variable is the assignment array z indexed by k = 1, . . . , K with one more dimension than f . This reformulation meets the goal (d): to adopt a new strategy for segmentation to determine the most appropriate color for each pixel.
The assignment array z has the following properties: a. The assignment array z belongs to the set A of binary assignment vectors. b. For every n ∈ Ω and k = 1, . . . , K, if u n = c k then z k,n ∈ A Ω is equal to 1 and to 0 else, i.e., we want to choose a c k from the color set to represent the pixel of the n ∈ Ω. c. Each vector z :,n = (z k,n ) K k=1 with elements in 0, 1 and its sum is 1.
Based on the above properties, we can get u by a simple summation u n = K k=1 z k,n c k , ∀n ∈ Ω. (3) The TV of the indicator function of set (1 inside, 0 outside) is equal to the perimeter of that set [38]. We can rewrite the second term in (2) as Here we have some discrete form of the TV, and ∇z 1 denotes the discrete total variation of z, i.e., with ∇ x and ∇ y corresponding to the discrete derivative operators in the x-direction and ydirection, respectively.
As we know, image processing models based on TV regularization may preserve important edge information, but the staircase effect may be introduced. To reduce the staircase effect [25], we add a square of the first-order derivative in (4). In short, we denote The effect of the square of the first-order derivative term can be seen in Figure 2.
As we can see, when we set µ = 0, model 5 becomes the model of [2]. Comparing Figure 2(b) with (c), we can find the (b) has some unwanted tiny structures (see the marked area). We add the square of the first-order derivative term, which is a smooth term. This term works and smoothes out the individual dots, see Figure 2(c). More precisely, Figure  2(b) produces unwanted tiny structures as the method of [2] based on TV regularization; Figure 2(c) can eliminate the unwanted tiny structures near the boundary by introducing the square of the first-order derivative term. So, it is of great significance to add this smooth term. This modification meets the goal (b) and (c): to reduce the staircase effect and inherit the advantages of the smoothing stage.

Images
K-means Human Figure 6

III. THE PROPOSED MODEL
Now, we propose our model as follows where z, w is defined in Section II-C. The set A is nonconvex, so the problem (6) is non-convex. Here we perform a convex relaxation, replacing A with its convex hull (the simplex ∆), which is the set of vectors with nonnegative elements whose sum is 1. This strategy meets the goal (a): to avoid solving non-convex and combinatorial problem for the sake of stability, then we reformulate the problem (6) as follows Since model (7) is convex, there are many methods to solve it such as the split-Bregman algorithm [46], which is used widely to solve many problems based on TV regularization and the alternating direction method of multipliers (ADMM) [47], [48], [49]. Specially, the Chambolle-Pock algorithm [50] proves a convergence rate.
In the following, we use the Chambolle-Pock algorithm to solve (7), first we introduce new variable v = ∇z, then we consider the following primal-dual optimization problem Then the Chambolle-Pock algorithm is defined through the following iterations z (k+1) = arg min z − ∇z, q (k+1) + z, w + ı ∆ (z) Since the objective functions (9) is quadratic, the update of q can be computed efficiently The solution of (11) can be easily obtained by applying the soft thresholding operator. Denoting (a) FRC [44] (b) DPP [45] (c) SLaT [1] (d) LC [2] (e) RSSFC [34] (f) Proposed Fig. 7. The segmentation results of Three-shapes in Figure 6(a). We add the Gaussian noise with meanμ = 0, variance σ 2 = 0.1.
Denoting z k 0 = z k − τ (w + div q (k+1) ), we have then where P ∆ is the simplex projection, which could be computed efficiently by the code from https://lcondat. github.io/.
Our one-stage method has the following three steps: 1. First, we use the hill-climbing procedure to detect the number of segments (K) and use the K-means algorithm to get K colors c k . 2. Second, we solve our one-stage model (merging the process of smoothing and segmentation) via the Chambolle-Pock algorithm to get the assignment array z.
3. Third, we get the segmented image u via simple summation. Then the whole process of our one-stage strategy image segmentation method for clean or noisy images can be summarized by the following Algorithm 1.
All the experiments were performed under MATLAB R2020a and Windows 10(x64) on a PC with an Intel Core (TM) i7 10750H CPU at 2.60GHz and 16.0GB of memory.
To quantitative comparison of the segmentation performance, we use one objective indicator, i.e., the segmentation accuracy (SA) [34]. The SA is computed as follows: where the S i represents a segmentation result, the G i denotes the corresponding Ground Truth, the k is the number of clusters and n is the total number of pixels of images.

A. Segmentation of synthetic images
In this subsection, we show the effectiveness of our method on some simple segmentation tasks. These images have obvious different colors and have a reasonable criterion of the segmentation, e.g., we cut the objects from the background. Meanwhile, we compare the robustness of our method with the SLaT on different noise levels. We use the MATLAB "imnoise" command to add Gaussian noise.
As aforementioned, the color set could be determined by the K-means method or human. In examples of Three-phases and Six-phases, we use the above two methods to specify the color set. The runtime comparisons show in Table I between the color set specified by the K-means method and human (including the time taken in deciding the number of colors). We can learn that the runtime of the color set specified by K-means method faster than human, and two methods can arrow the same segmentation accuracy. Thus, we use the K-means method to choose the color set in the following examples.
We can see that the SLaT has misclassified colors. What's more, our method gets more smooth and clear results than the LC and DPP. We can observe that our method is robust for different levels of noise. Example 2: Six-phases segmentation. Figure 10 shows the segmentation results of the six-phase synthetic image consisted of five overlapping circles with different colors. It is natural for us to cut the five overlapping circles with different colors from the background. From the results, we can observe that the FRC, DPP and RSSFC fail for the case of the image with Gaussian noise. The FRC is unable to tell the five colors, and the DPP can not keep the shape of the black and green circle and the RSSFC fails to maintain the edges of all circles. Although the SLaT gets a relatively good result, it misclassifies the color of the area between the two neighbor circles. Our method is better than the LC as it not only tells the five different colors (see Figure 3(d)) but also preserves the shape of the circle.

B. Segmentation of real-world images
In this subsection, we test our method on six real-world images to show that our method can get a comparable segmentation result to the CKC without complex calculation. The choice of six real-world images is according to the comparison methods [44], [45], [1], [2], [37], [34].
Example 3: Airplane segmentation. We consider a common and simple task of segmenting an image into two parts: foreground (object) and background regions. We use the airplane image to show the superior performance of our method on this task. The SLaT, LC and our method cut the plane from the sky (background) and segment the image into two reasonable parts. Although this is a simple example, the DPP Figure 11(b) has a poor performance on this task. The FRC and RSSFC do not clearly cut the airplane out as there is a small area that is not separated into the background.
Example 4: Hill segmentation. Figure 12 shows the segmentation results of the hill. The FRC and RSSFC fail to separate the hills as an entirety, the DPP is somewhat under-segmented that some small features have vanished, and the LC does not cut the hill clearly out. The SLaT and our method get relatively better results.
Example 5: Flowers segmentation. Figure 4(c) consists of four flowers with red, yellow and green color, see Figure  5(c). There is no doubt that we want to cut the flowers with different colors from the sky (background). As we can see from Figure 13, for the image with Gaussian noise, the FRC fails to distinguish the colors. The DPP does not make out the objects in the flowers. The SLaT misclassifies the area around the receptacle of the first and third flowers, see the red "dots". The LC produces some yellow "dots". The RSSFC misclassifies the second flower. Our method gets a relatively good result.
In the above examples, we find that the FRC, FRC and RSSFC perform badly, the SLaT and LC are much superior to the former methods. In the following examples, we mainly compare our method with the SLaT, LC, RSSFC and CKC for three color images shown in [37] including the Parrot, the Ladybug and the Sunflowers.
Here, we emphasize that the method [37] specified the M (279 in [37]) candidates which takes its values in only K among the M candidates (K M ). The M colors are shown in Figure 14. Meanwhile, the method [37] has shown that it is far superior to a two-step strategy: first estimates the K colors using quantization and then solves the segmentation problem restricted to these M = K colors. For convenience, we call this as Coupling Quantization and Segmentation (CQaS) method [37]. Although [37] proposed an accelerated algorithm and the segmentation results are good, the computational time is still long, as shown in Table  II. In the next subsections, we use the examples used in the [37] to show that our simple modification could get a comparable result to the CKC without complex calculation. For the sake of simplicity, we set K = 6 in the following examples.
Example 6: Parrot segmentation. In Figure 15, the CQaS and RSSFC have a poor performance obviously. The LC gets a more smooth result than the former method, but it yields a different color. The SLaT undercuts the image, and the area of the eye has been removed. Although our method yields a wrong area in the background, it gets a relatively good result at the object (parrot). What's more, our method with a suitable color set gets a comparable good result to the CKC.
Example 7: Ladybug segmentation. In this example, Figure 5(e) shows the palette used. Figure 16 shows the segmentation results of the ladybug. The SLaT, LC, CKC, CQaS and our method all have a good result. Here, we emphasize that our method is less complicated than the CKC and achieves a comparable good result to the CKC.
Example 8: Sunflowers segmentation. The candidate of the palette in this example is shown in Figure 5(f). Sunflowers have some flowers and the green area at the down of the image, we just want to detect the flowers and take the green area as an entirety. The CQaS overcuts the sky for the given image, and the RSSFC undercuts the sky for the given image. The SLaT, LC, CKC and our method all have a good result.
Here, we highlight that our method is less complicated than the CKC and achieves a comparable good result to the CKC, as shown in Table II. V. CONCLUSION In this paper, we propose a new variational segmentation model based on an improved K-means method. We first give a color set determined by human or the K-means method. Then we use a variational model to select the most appropriate colors by SA for each pixel from the color set to get a finer result. The experiments show the effectiveness and robustness of our proposed method. Compared with some state-of-the-art methods, our method has a well running speed while ensuring higher segmentation accuracy. Recently, deep learning has attracted wide attention. In future work, we intend to use data-driven deep learning to train the regularization and combine the variational model for image segmentation.