C-CNNLoc: Constrained CNN for robust indoor localization with building boundary

To enable accurate and reliable indoor localization in a multi-building environment, a novel constrained convolutional neural network (CNN)- based indoor localization system (C-CNNLoc) is proposed using WiFi ﬁngerprinting approach. The proposed network has a sequential structure that ﬁrstly classiﬁes a building, followed by estimating the user’s location coordinate within the pre-detected building. Furthermore, the location accuracy is improved by introducing a new loss function that incorporates a penalty term associated with the building boundary. Experimental results illustrate that the proposed method outperforms the existing solutions on the average distance error. The gain comes from that the approach tailored to a multi-building indoor localization with the sequential structure is prone to successfully correct outliers, that is, predicted location coordinates that lie outside the buildings.

✉ Email: wjshin@ajou.ac.kr To enable accurate and reliable indoor localization in a multi-building environment, a novel constrained convolutional neural network (CNN)based indoor localization system (C-CNNLoc) is proposed using WiFi fingerprinting approach. The proposed network has a sequential structure that firstly classifies a building, followed by estimating the user's location coordinate within the pre-detected building. Furthermore, the location accuracy is improved by introducing a new loss function that incorporates a penalty term associated with the building boundary. Experimental results illustrate that the proposed method outperforms the existing solutions on the average distance error. The gain comes from that the approach tailored to a multi-building indoor localization with the sequential structure is prone to successfully correct outliers, that is, predicted location coordinates that lie outside the buildings.
Introduction: Indoor localization techniques have been actively studied because of the rapidly growing demand for indoor location-based services such as pedestrian navigation or emergency alert services [1]. Among these techniques, WiFi fingerprinting has become one of the most practical approaches due to the ubiquity of the WiFi infrastructures in indoor environments [2]. This technique consists of two-phases, the offline and online phases. In the first, a fingerprint database is constructed by collecting the received signal strength intensity (RSSI) values from different wireless access points (WAPs) at pre-known locations and labelling them with the location information (e.g. building information and location coordinates). In the second phase, to estimate a user's location, the closest match in the fingerprint database is found using the RSSI values collected by the user's device.
With the development of deep learning, deep neural networks (DNNs) have been proposed for WiFi fingerprinting. Specifically, a convolutional neural network (CNN)-based indoor localization system with WiFi fingerprinting (CNNLoc) has been introduced [3]. However, DNNs, including CNNLoc [3], tend to predict a large number of location coordinates outside the buildings (referred to as outliers in this letter). This is mainly because DNNs are generally trained only to minimise the average distance error between the predicted and actual location coordinates with no consideration of indoor positioning. To address this issue, we propose a constrained CNN-based indoor localization system (C-CNNLoc) that is tailored to a multi-building environment with WiFi fingerprinting. The proposed C-CNNLoc has a sequential structure that first recognizes the building in which a user is located, following which it estimates the user's location coordinate within the pre-detected building boundary. In particular, it is trained using a newly defined loss function that is composed of the traditional loss function to reduce the average distance error and a penalty term associated with a building boundary. Specifically, the building boundary can be approximately represented by either a convex or concave polygon, as shown in Figure 1; therefore, we introduce two types of constraints. First, a constraint of the convex polygon is introduced, and then it is utilized to define a constraint of the concave polygon which is composed of a set of convex polygons. Experimental results show that the average distance error is substantially reduced by adopting the sequential structure of C-CNNLoc; moreover, it can be further improved by correcting outliers with a new loss function. By doing so, C-CNNLoc can make a significant improvement in the performance of the average distance error compared with C-CNNLoc [3] by 2.883 m.
Database: We adopt the UJIIndoorLoc database, the official database used in the International Conference on Indoor Positioning and Indoor Navigation (IPIN) 2015 [4]. UJIIndoorLoc is a multi-building and multifloor indoor localization database that includes three buildings in Universitat Jaume I, Spain. The database includes RSSI values obtained from 520 WAPs located at various positions within the three buildings  and the user's location information (e.g. building information and location coordinate). The RSSI values are represented as negative integer values ranging −104 dBm (extremely poor signal) to 0 dBm while the positive value 100 is used to denote when a WAP was not detected. It is noted that most of the RSSI values at each WiFi fingerprint are filled with positive value 100 since the user's device can detect only a small set of the 520 WAPs for each location. This is because one WAP can only cover small areas and its signals cannot propagate across long distances due to the wall loss and interference between the neighbouring WAPs inside the buildings. The UJIIndoorLoc database contains training and validation sets. However, it does not contain a test set, which was provided only to the participants at IPIN 2015 [5]. Therefore, in [3], a new validation set was extracted from the training set, and the existing validation set was used as a test set to evaluate the localization accuracy. The specific statistics of the database are shown in Table 1. #Train, #Valid, and #Test are the numbers of data points in the training, validation, and test sets, respectively.

Ooutline of CNNLoc:
The structure of CNNLoc [3] is composed of the cascade of a stacked auto encoder (SAE) and a CNN, illustrated in Figure 2. The SAE can effectively reduce the dimension of sparse data with mainly insignificant values (e.g. positive value 100 in RSSI data) while preserving their necessary features [6]. Thus, the necessary features of the input RSSI data can be effectively extracted using the SAE, resulting in DNNs that can be easily trained. The resulting output of the SAE is fed to the CNN as input, following which the location coordinates (longitude and latitude) are estimated after the convolutional operations in the CNN. Meanwhile, the parameters of CNNLoc [3], w (weights and biases), can be optimized by minimizing the loss function of (1), as follows where L(w) is a mean square error that measures the extent to which the predicted values deviate from the target values. The details of CNNLoc are omitted here for the sake of brevity, which can be found in [3]. Figure 3, the proposed Constrained-CNNLoc (C-CNNLoc) has a sequential structure in which the user's location is estimated in two steps: (i) building detection and (ii) location coordinate estimation. In the first step, the building is detected using a high-accuracy building classification model [3]. Subsequently, based on the detected building information, one of the three location estimation models tailored to estimate the location coordinate in each building is selected. In the second step, the location coordinate within a building boundary is estimated. This sequential procedure can improve the accuracy of the indoor localization; however, it still suffers from limitations of accuracy due to outliers, predicted location coordinates that lie outside the buildings. To tackle this issue, we impose a constraint on the output of C-CNNLoc. Specifically, a soft constraint is considered by adding an extra penalty term to the loss function in (1) to keep the predicted location coordinates inside of the pre-detected building. In the training procedure, the parameter of C-CNNLoc, w (weights and biases), can be optimized by minimizing the objective function of (2) as follows:

Constrained-CNNLoc: As shown in
where C(w) is the constraint to be enforced on C-CNNLoc. λ ∈ [0, ∞) is a hyper-parameter that controls the relative contribution between the loss function L(w) and the constraint C(w). It is worth mentioning that setting λ to zero is the same as (1) (no constraint); further, an excessively large value of λ results in neglecting L(w). Next, we introduce the details of C(w) associated with a building boundary. As shown in Figure 1, most of the building boundaries can be approximately represented by either a convex or concave polygon. Therefore, we will introduce two types of constraints corresponding to each polygon: the constraints of the convex and concave polygons.
Assuming that there is a convex polygon with four vertices in a 2D plane, as shown in Figure 4, the convex polygon has four edges that constitute a boundary. The vertices of the convex polygon are denoted as Q i (x i , y i ), i ∈ {1, 2, 3, 4}, and the predicted coordinates from a neural network are denoted as P(x, y).
By connecting P(x, y) to each of the four vertices, four triangles are generated. We can determine the four triangular areas using both Q i (x i , y i ), i ∈ {1, 2, 3, 4} and P(x, y) [7]. When P(x, y) is within the boundary, the sum of the four triangular areas is always equal to the area of the convex polygon. By contrast, when P(x, y) is outside the boundary, the sum of the four triangular areas is strictly greater than the area of the convex polygon. One thing that can be noted is that the farther away the predicted coordinate P(x, y) from the convex polygon leads to the greater the sum of the four triangular areas. We define the constraint of the convex polygon C(w) as the difference between the sum of the four triangular areas and area of the convex polygon; therefore C(w) is always greater than or equal to zero (non-negative real number). The equality holds if and only if P(x, y) is within the boundary. As shown in Algorithm 1, this example can be generalized to the constraint of a convex polygon with N vertices. In Algorithm 1, mod is the modulation operation (e.g. mod(4, 7) = 4), and we assumed that the configuration of the vertices Q i (x i , y i ), i ∈ {1, 2, . . . , N} is either clockwise or counter-clockwise. Based on Equation (2), L(w) is minimized subject to soft boundary constraint in the training of the C-CNNLoc. In particular, the softconstrained model imposes a penalty proportional to the amount by which it is violated. As a result, the outliers (predicted coordinates outside the building) are prone to be properly corrected by the C-CNNLoc.

Algorithm 1 Constraint of a convex polygon
Next, we focus on the constraint of a concave polygon. Note that a concave polygon can be decomposed into a set of simple convex polygons (e.g. triangle, convex quadrilateral) by drawing line segments crossing the concave polygon. Using this property, we extract the boundaries of buildings A, B, and C of Universitat Jaume I, Spain, and decompose them into convex polygons, as shown in Figure 1. Then, we formulate different constraints for the different convex polygons using Algorithm 1. As mentioned before, the constraint of the convex polygon is always greater than or equal to zero; thus if the predicted coordinate is within one of the convex polygons, the corresponding constraint is zero, whereas the others are greater than zero. However, if the predicted coordinate is outside all the convex or concave polygons, all the constraints of the convex polygons will be greater than zero. In this regard, the constraint of the concave polygon is defined as where N is the number of the convex polygons that constitute the concave polygon, and the kth constraint C k (w) is denoted as the constraint with the kth convex polygon. It is worth mentioning that C(w) is zero if the predicted coordinate is within the kth convex polygon and greater than zero when the predicted coordinate is outside it.

Experimental evaluation:
We conduct experiments to verify the effectiveness of C(w) (penalty term depending on the amount of violation). We use the publicly accessible UJIIndoorLoc database [4], and the details of which are provided in Dataset Section. The parameter values of the proposed C-CNNLoc are presented in Table 2, and the specific structures of both the SAE and CNN are provided in [3]. All the experiments are implemented with Python 3.6.10, Keras 2.2.2, and Tensorflow-gpu 1.10.0.
To tune the hyper-parameter λ in (2), we perform a grid search on the validation set strictly separated from test set. We train different neural networks for different λ and identify the one that yields the least average distance error. Therefore, the hyper-parameters λ A , λ B , and λ C are determined to be 0.005, 0.45, and 0.05, respectively, where λ k , k ∈ {A, B, C} represents the hyper-parameter of the location coordinate estimation model of a building k.  The average distance errors and number of outliers, according to λ, on the validation set are shown in Table 3. In the second column of Table 3, λ = 0 indicates that the C-CNNLoc structure is trained using only the loss function L(w) (mean square error) without any constraint C(w), which results in the largest number of outliers. The higher the value of λ, the greater impact of C(w), which effectively reduces the average number of outliers. However, the average distance errors of Buildings A, B, and C increase, when λ A > 0.005, λ B > 0.45, and λ C > 0.05. These results imply a trade-off between the number of outliers and average distance error because, based on (2), the impact of C(w) is directly proportional to the value of λ, and inversely proportional to that of L(w) while optimizing the weights of a neural network. Meanwhile, the optimized value of λ can vary depending on many factors; for example, the manner in which the concave polygon is divided into convex polygons, a length of the convex polygon etc.
As shown in Figure 5, we plot the location coordinates predicted by CNNLoc [3] using the test set (left) to reveal the major limitation of DNNs, including CNNLoc [3], which is their occasional tendency to estimate the location coordinates as being outside the building (outliers). Further, to clearly show the impacts of C(w), the proposed solution to the above-mentioned problem, we plot the test results of the proposed C-CNNLoc with λ A = λ B = λ C = 1, which are larger than the optimized values determined in Table 3. A comparison of the left and right sides of Figure 5, reveals that CNNLoc [3] predicted a large number of location coordinates as being outside the building, whereas the reverse was the case with the proposed C-CNNLoc, due to the newly defined constraint C(w). Meanwhile, some location coordinates predicted using C-CNNLoc converge on an arbitrary point or line because L(w) is somewhat ignored when training C-CNNLoc owing to the significant impact of C(w).
Next, to demonstrate the superiority of the proposed C-CNNLoc in terms of the average distance error, we compare both models, namely, a fully connected neural network (FCNN) and CNNLoc [3]. The fully connected neural network contains three hidden layers with 128, 64, and 32 neurons; Also, it achieves the lowest average distance error compared to other fully connected neural networks we tested such as a FCNN

FIG. 6 Comparison of average distance errors between proposed C-CNNLoc and other models
composed of two hidden layers with 128 and 64 neurons. The average distance errors of the FCNN and CNNLoc [3] are 12.841 and 11.753 m, respectively, as shown in Figure 6. Thus, it can be observed that the necessary features of the input RSSI data are successfully extracted using the SAE. Subsequently, we evaluate the performance of the proposed C-CNNLoc, which, with its newly defined constraints and structural changes, is an improvement on the CNNLoc [3]. The proposed C-CNNLoc with λ = 0 yields an average distance error of 9.129 m; This is a significant improvement over the conventional CNNLoc [3] due to the effectiveness of the sequential structure. Furthermore, to reduce the number of outliers and enhance the quality of indoor localization, we impose a newly defined constraint, which resulted in the proposed C-CNNLoc having optimized λ values (λ A = 0.005, λ B = 0.45, λ C = 0.05), and C-CNNLoc achieves an average distance error of 8.870 m, more accurate than any other available models.
Conclusion: In this letter, we have proposed a novel deep learning-based indoor localization method to enhance the accuracy of indoor positioning by adopting the sequential structure and reducing the number of outliers. Further, a newly defined constraint associated with the building boundary has been imposed on C-CNNLoc as a soft constraint. Experimental results have showed that the proposed method effectively reduced the number of outliers, resulting in improved localization accuracy compared with that of the conventional methods. Meanwhile, using a grid search, we have approximately found the optimized value of λ with the extensive experiments. To address this issue, future work will focus on finding optimal value of λ analytically to reduce the number of experiments and determine it more accurately.