Low-complexity sphere decoding for MIMO-SCMA systems

Multiple-input multiple-output-sparse code multiple access, a non-trivial integration of sparse code multiple access and multiple-input multiple-output techniques, is able to achieve high spectrum efﬁciency and massive user connections. However, this integration also increases the complexity of signal detection. Here, the signal detection problem of multiple-input multiple-output sparse code multiple access is transformed into a tree search problem and use sphere decoding to detect the signal. By setting the initial radius to positive inﬁnity, sphere decoding can achieve optimal maximum likelihood performance while the complexity is high. In order to further reduce the complexity of sphere decoding, a block-wise sorted QR decomposition algorithm is proposed. Based on block-wise sorted QR decomposition, the improved sphere decoding, namely block-wise sorted QR decomposition-sphere decoding, is able to make the tree search more efﬁ-cient. Since only the detection order of each user’s signal has been changed, block-wise sorted QR decomposition-sphere decoding can maintain the optimal maximum likelihood performance. Simulation results and complexity analysis show that block-wise sorted QR decomposition-sphere decoding can achieve optimal performance and both hard-output and soft-output block-wise sorted QR decomposition-sphere decoding have much lower complexity than joint message passing algorithm. Furthermore, given the same signal-to-noise


INTRODUCTION
Future wireless communication systems demand high spectral efficiency and massive connectivity. For supporting these demands, non-orthogonal multiple access (NOMA), which improves spectral efficiency and accommodates massive connectivity by resource sharing among multiuser, has been proposed and becomes the research focus for several years [1]. Sparse code multiple access (SCMA) is one of the typical multicarrier NOMA schemes that can be efficiently combined with the orthogonal frequency division multiple access (OFDMA) subcarriers [2]. In SCMA, the information bits of different users are directly mapped to the sparse multi-dimensional codewords. Thanks to the sparsity and the elaborately constructed codebooks, overload transmission with high reliability in SCMA can be achieved.
Based on JMPA, many improvements have been made [14][15][16][17] to further reduce the detection complexity of MIMO-SCMA. Particularly, the complexity of Gaussian approximation MPA (GA-MPA) [15][16][17] is able to achieve linear complexity cost with the number of users. However, they all suffer non-ignorable performance loss. Sphere decoding (SD), which was firstly introduced in [18] for lattice code decoding, has been widely studied in communication systems, such as MIMO detection [19], channel code decoding [20], and SCMA detection [21,22]. If we set the initial radius to positive infinity, then based on tree search and tree pruning, SD is able to achieve optimal ML performance while has much lower complexity. However, when using SD for SCMA detection, the rank-deficient channel matrix will result in a partial exhaustive search which makes the complexity still high [23].
Here, based on SD, we propose an optimal low complexity detection approach for MIMO-SCMA systems. To the best of the authors' knowledge, this paper is the first one to investigate SD for MIMO-SCMA systems. Owing to the multiantenna equipped at the receiver, the rank-deficient problem of the SCMA channel matrix can be avoided. Moreover, it is well known that the complexity of SD in MIMO detection can be reduced by sorted QR decomposition (SQRD). However, there exists one major difference between the MIMO detection and MIMO-SCMA detection. While the estimated signals of the MIMO system are comprised of several one-dimensional complex symbols, the estimated signals of the MIMO-SCMA system are comprised of several multi-dimensional complex symbols. As a result, the channel matrix of the MIMO system can be permuted arbitrarily when performing SQRD. However, the channel matrix in MIMO-SCMA cannot be permuted arbitrarily, otherwise, the elements of different multi-dimensional complex symbols will be interleaved with each other, which makes SD infeasible. To maintain the order of the elements of each multi-dimensional complex symbol being unchanged while reducing the number of traversal nodes in SD, we propose a block-wise sorted QR decomposition (BSQRD) algorithm ultimately related to the nature of MIMO-SCMA. In BSQRD, the columns of the channel matrix are grouped by several blocks and only the order of the blocks can be permuted while the order of the columns in each block is not permitted to change. We call the SD that is based on BSQRD as BSQRD-SD. Since the tree search of BSQRD-SD is more efficient than SD and only the detection order of the estimated signal has been changed, BSQRD-SD has the same optimal ML performance while it has lower complexity. Simulation results and complexity analysis show that BSQRD-SD can achieve optimal performance while the complexity of hard-output BSQRD-SD can be 9.4% of JMPA and the complexity of soft-output BSQRD-SD can be 24.6% of JMPA. Furthermore, it is worth noting that under the same signal-to-noise ratio (SNR), the complexity of BSQRD-SD decreases with the increase of receiving antennas while the complexity of JMPA increases linearly with that.
and M is the codebook size of SCMA. The codeword x j is a sparse vector with d v non-zero entries. Let s j ∈  j with cardinality | j | = M , be the d v -dimensional complex symbol that is composed of the non-zero entries of x j . Then x j can be denoted as  Let h j,n r = [h j 1,n r , h j 2,n r , … , h jN,n r ] T be the channel vector between user j and the n r th receiving antenna of BS. The channel between the transmitter and receiver is assumed to be i.i.d. flat Rayleigh fading. Thus, each channel gain of h j,n r is complex Gaussian distributed according to  (0, 1). Then at the BS, the receiving signal of n r th antenna y n r ∈ ℂ N ×1 can be written as H j,n r s j + n n r = H n r s + n n r , where H n r = [H 1,n r , , and n n r ∼  (0, 2 I) denotes the complex additive white Gaussian noise (AWGN) with each element having zero mean and variance 2 . Consequently, the total receiving signal of the N r receiving antennas can be given by y = Hs + n, (2) where Since there are J users and the BS has N r antennas, the MIMO-SCMA system size is represented as J × N r . Similar to traditional MIMO systems that require receiving antennas to be more than the number of the users to make the channel matrix be column full rank, we suppose that in

DETECTION PROBLEM FORMULATION
We assume the perfect channel information is available at the BS. Then according to (2), the optimal ML detection for where Then, left multiplying (2) byQ H we can get whereỹ = Q H y andñ is the first d v J entries ofQ H n. Since multiplying with a unitary matrix does not change the statistical properties of the noise,ñ is still the complex AWGN with zero mean and 2 variance. Thus, the ML detection of (3) can be reformulated asŝ

M-ary tree model for MIMO-SCMA detection
Similar to the detection of MIMO [24], the ML detection problem of MIMO-SCMA that is formulated as (5) can be converted into a weighted tree search problem. The tree model is shown as Figure 2, and the tree has the following important properties: 1. The tree has Jd v + 1 layers. The 1 → Jd v layers correspond to the 1 → Jd v rows of R and the entries of the symbol vector of s = s 1: After initialising PM (s J +1:J ) = 0, the Euclidean distances d (s) = ‖ỹ − Rs‖ 2 in (5) can be computed recursively as with d (s) = PM (s 1:J ). Then according to [25], the SD based on Schnorr-Euchner (SE) enumeration strategy [26] can be applied to solve the problem of (5). The SD detection constrains the search of the nodes to lie in the sphere withỹ as its center and C as its radius. SE-based SD traverses the tree depth first and in ascending order of their branch metric (which leads to efficient pruning of the tree). The radius C is adaptively updated to √ d (s) once a leaf node s with d (s) < C 2 has been reached. The performance of SD is related to the initial radius. If we set the initial radius to +∞, then we can be assured that at least one leaf node will be found. Thus, SD can always achieve optimal ML detection.

SD based on block-wise sorted QR decomposition
A common approach to reduce the complexity of SD without sacrificing the performance for traditional MIMO detection is to reorder the columns of H, that is, applying QR decomposition of HP where P is a permutation matrix. The tree pruning will be more efficient if the main diagonal entries of R after decomposition of HP are descending ordered.
The basic idea to reduce the complexity of SD for MIMO-SCMA detection is similar to the above approach. However, since the symbols in s = [s T 1 , s T 2 , … , s T J , ] are multi-dimensional, H cannot be permuted arbitrarily. Otherwise, the entries of symbols s j , (1 ≤ j ≤ J ) will be interleaved with each other and SD cannot be applied. In order to avoid the elements of multi-dimensional symbols being interleaved, we first divide H into J blocks by grouping every d v columns of H as (7), and then only block-wise permutation of H is allowed.
Note that in SD, the nodes at layer l will be pruned, if the path metric where C is the radius of the sphere in the current search. Thus, according to (6) we can see that only the item where z k , R k,k , s k is the kth or (k, k)th element of z in (9), R, and s, respectively.
Proof. Since H can be decomposed into the form of H = QR where Q = [q 1 , q 2 , … , q d v J ] ∈ ℂ N r N ×d v J is column orthogonal and R is upper triangular, we can get Meanwhile, in MIMO-SCMA, the columns of V are orthogonal. Thus, the columns of each block of (7) are also orthogonal, that is, (11) are orthogonal. The orthogonality leads to in the descending order. Since in QR decomposition, R k,k , (1 ≤ k ≤ Jd v ) are calculated in the order of k = 1, 2 … , Jd v , the basic idea to make J , J −1 , … , 1 in the descending order is by minimising 1 , 2 , … , J when performing the QR decomposition. Now, follow the fact that after permutation of H, 1 Thus, the first d v rows of R will be computed. Next, 2 and the second d v rows of R are obtained, and so on. The details of the BSQRD are shown in Algorithm 1.
We call the SD based on BSQRD as BSQRD-SD. Now, once the channel matrix H has been decomposed as HP = QR (P is the permutation matrix obtained by permutating the columns of identity matrix according to column order vector p), the detection problem is reformulated asẑ = arg min z∈ ‖ỹ − Rz‖ 2 , where z = Ps is the permuted symbol vector of s and  is the set that contains all the possible values of z. Thus, in BSQRD-SD, to obtain the symbol vector that has the same order ofŝ, we need to reorder the detected symbol vector of BSQRD-SD according toŝ = Pẑ. Note that since BSQRD-SD only changes the order of the estimated signal, it also achieves the optimal ML performance as SD if we set the initial radius to +∞.

Complexity analysis
In this subsection, we analyse the computation complexity of the proposed BSQRD-SD. The complexity cost is quantified by the complex floating point operations (flops) where each multiplication operation is counted as three flops and each addition is counted as one flop.
Overall, the complexity of BSQRD-SD consists of the operations performing BSQRD and tree search. Comparing BSQRD shown in Algorithm 1 and the modified Gram-Schmidt algorithm for traditional QRD, we find that only the column sorting  [21].
For the tree search of BSQRD-SD, when the nodes at layer Suppose that N l nodes are visited at layer (l − 1)d v + 1 → ld v , then the overall complexity cost of BSQRD-SD can be given by  The complexity analysis results of SD, BSQRD-SD, and JMPA are shown in Table. 1. The flops of Log-Max JMPA are M d f N r N (4d f + 3)N iter [27] where d f denotes the number of users that occupy the same physical resource element in SCMA, for example, OFDMA subcarrier, and N iter denotes the number of iterations used in JMPA. Note that despite the complexity results of SD and BSQRD-SD have the same expression, the number of visiting nodes at each layer, that is, N l is different, and thus the complexity is different. Since the number of visiting nodes at each layer is a random variable related to the channel, the average complexity costs of SD and BSQRD-SD are counted by Monte Carlo simulation.

SIMULATION RESULTS
In this section, we provide some simulation results and numerical complexity results that follow from the analysis in Section 3.3 to evaluate the bit error rate (BER) performance and complexity of the proposed detection approach for MIMO-SCMA. The SCMA codebook designed in [28], where J = 6, d v = 2, N = 4, and the codebook size M = 4, 8, 16, is utilised. Figure 3a-c compares the hard-output BER performance of ML, JMPA, SD, GA-MPA, and the proposed BSQRD-SD with different codebook sizes in 6 × 3 MIMO-SCMA system. Note that since the complexity of ML detection is too large, we omit the ML simulation when codebook size M > 4 and consider the optimal performance of SD as a benchmark. First, as we have analysed before that BSQRD-SD can achieve the optimal ML performance, we observe that the BER performance of BSQRD-SD is completely the same as ML and SD in these figures, which further verify this conclusion. Meanwhile, it can also be observed that JMPA achieves almost the same performance as ML and SD. However, GA-MPA suffers from significant performance loss, especially in the case of larger codebook size, for example, M = 8 and M = 16.
In Figure 4 we show the average flops cost comparison between JMPA, SD, and BSQRD-SD (hard output). In our simulation, the system settings of JMPA are d f = 3 and N iter = 4. From the figure, we can obviously see that BSQRD-SD has lower complexity than JMPA and SD. Meanwhile, it is worth noting that the complexity of SD-based methods is susceptible to SNR. Specifically, in high-SNR region, SD and BSQRD-SD require almost the same flops, which are only 9.4% of the JMPA. While in low-SNR region, for example, 0 dB, the flops of SD and BSQRD-SD are 35.1% and 25.9% of JMPA, respectively.
In order to show the reduced complexity of BSQRD-SD compared with SD, Figure 5 compares the average number of visiting nodes of SD and BSQRD-SD. From the figure, we can see that the proposed BSQRD algorithm can efficiently lower the number of visiting nodes when performing tree search in SD, which accordingly reduces the complexity efficiently.
As the number of the visiting nodes of soft-output SD and BSQRD-SD will much increase 1 , Figure 6 shows the average flops cost of soft-output SD, BSQRD-SD, and JMPA. From the figure, it is interesting to see that in low-SNR region (e.g. 0 dB), the complexity of SD is larger than JMPA, while the complexity of BSQRD-SD is still lower than JMPA. That is to say, without the proposed BSQRD algorithm, the complexity of SD can be larger than JMPA. Specifically, in high-SNR region, the flops of SD and BSQRD-SD are 48.9% and 24.6% of JMPA, respectively. While in low-SNR region, for example, 0 dB, the flops of SD and BSQRD-SD are 114.6% and 78.9% of JMPA, respectively. Figure 7a-c compares the BER performance of SD, JMPA, GA-MPA, and the proposed BSQRD-SD with different codebook sizes in the LDPC coded MIMO-SCMA system. The  LDPC code is of length 648 and rate 5/6 that used in the IEEE 802.11 standard. It can be observed that BSQRD-SD always has the same BER performance as SD and JMPA no matter what codebook size is. However, due to the error caused by the Gaussian approximation, GA-MPA suffers non-ignorable performance loss, in which the performance gap increases with the increase of the codebook size.
To show the relationship between the complexity of different detection approaches and the number of receiving antennas, we further present the flops cost of JMPA, SD, and BSQRD-SD with a different number of receiving antennas (N r = 3, 4, 5, 6) in Figure 8. The SNR is fixed at 8 dB. It can be seen from the figure that with the increase of N r the complexity of JMPA increases linearly, while the computation overheads of SD and BSQRD-SD slightly decrease with the increase of the number of receiving antennas. This is because the complexity of SD is related to its performance. The better BER performance the fewer nodes have to be visited during tree search (i.e. lower complexity). Increasing the number of receive antennas can lead to a larger diversity gain. Thus, SD-based methods have lower complexity when increasing the number of receiving antennas. Since massive MIMO has become a trend, more antennas have been equipped at the BS and thus the proposed MMSEBSQRD-SD can be a very promising detection approach for MIMO-SCMA systems.
To further analyse the BER performance of the proposed algorithm in the scenario that the BS is equipped with large-scale antennas, performance evaluations are performed in 6 × 32 and 6 × 64 MIMO-SCMA systems. As shown in Figure 9, the performance of BSQRD-SD is similar to that of JMPA. This performance result along with the complexity advantage illustrated before indicates that BSQRD-SD has more potential to be used than JMPA in this scenario. Moreover, Figure 9 also illustrates the impact of channel estimation errors on the proposed algorithm. In our simulation, we consider that the noise channel estimation is modelled as H est = H + e where e is the channel estimation error that independent with H and is complex Gaussian distributed according to  (0, 1). Three cases, that is e = 5%, 10%, 20% are considered and compared with the perfect channel, that is, e = 0%. As shown in the figure, when e is less than 10%, the performance of BSQRD-SD under imperfect channel is almost the same as that under perfect channel. While e = 20%, the proposed algorithm only suffers a slight performance loss. In summary, the performance of the proposed algorithm under imperfect channel is consistent with JMPA and is robust to the channel estimation error.

CONCLUSIONS
In this paper, SD has been investigated to detect signals in the multiuser MIMO-SCMA system. Since the detection problem of SD is equivalent to the ML detection problem and the tree search of SD is more efficient than ML, SD is able to achieve optimal performance while having lower complexity. Moreover, we proposed the BSQRD algorithm for MIMO-SCMA channel matrix QR decomposition. Based on BSQRD, we proposed a low complexity SD-based detection approach, namely BSQRD-SD, for MIMO-SCMA system. Simulation and analysis results show that both hard-output and soft-output BSQRD-SD have lower complexity than SD and traditional JMPA while can achieve optimal ML performance. Meanwhile, simulation results also show that the complexity of BSQRD-SD decreases with the increase of the number of receiving antennas, which makes it a very suitable detection approach for the future massive MIMO-SCMA system.