Low-complexity demapping algorithm for two-dimensional non-uniform constellations in ATSC 3.0

The Advanced Television Systems Committee (ATSC) ﬁnalized its next-generation digital terrestrial broadcast standard in 2016, known as ATSC 3.0. In order to enhance spectral efﬁciency, the standard employs two-dimensional non-uniform constellations (2D-NUCs) for signal mapping. As a result, a 2D demapper, with computational complexity much higher than one-dimensional (1D) demappers, has to be designed for the receiver. In this paper, a suboptimal low-complexity 2D-demapping method is proposed for ATSC 3.0. The proposed method takes advantage of the 2D-NUC’s feature that constellation points with same most signiﬁcant bits tend to locate in the same subregion. Simulation results show that compared with the conventional Max-Log-MAP 2D demapping, the proposed method achieves at least 74 % complexity reduction with negligible performance loss. More-over, compared with another 2D demapper specially designed for ATSC 3.0, the proposed method dispenses with high-complexity exponential and logarithmic operations at a small performance cost of at most 0.14 dB.


INTRODUCTION
The Advanced Television Systems Committee (ATSC) finalized its next-generation digital terrestrial television standard, known as ATSC 3.0, in 2016 [1]. In the standard, non-uniform constellations (NUCs) [2][3][4], instead of the conventional uniform quadrature-amplitude modulation (QAM) constellations, are used to enhance spectral efficiency. There are two types of NUCs in ATSC 3.0: one-dimensional NUCs (1D-NUCs) and two-dimensional NUCs (2D-NUCs). Both are independently optimized for each code rate based on its target signal-to-noise ratio (SNR) [5]. 1D-NUCs are designed by relaxing the uniform constraint of conventional QAM constellations. In other words, a 1D-NUC with modulation order M is composed of two 1D non-uniform pulse amplitude modulations (PAMs) with modulation order √ M : one for in-phase component and the other for quadrature component. Since the in-phase and quadrature components of constellation points are independent, the demapping process of 1D-NUCs can be accomplished by applying 1D demappers for √ M points [6,7] to in-phase and quadrature components separately. To further increase spectral efficiency, 2D-NUCs are designed by removing the independence This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Communications published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology constraint in 1D-NUCs. Since the in-phase and quadrature components of 2D-NUCs are no longer independent, the demapping process cannot be decomposed into two independent 1D demappers. Instead, a 2D demapper for M points, which takes into account the in-phase and quadrature components simultaneously, is required. Note that one 2D demapper for M points has significantly higher complexity than two 1D demappers for √ M points [8][9][10], especially for high-order modulation. Therefore, demapping process for high-order 2D-NUCs becomes one of the bottlenecks in ATSC 3.0 receiver implementation.
In straightforward 2D demapping, Euclidean distances from the received signal to all constellation points are required to compute the demapper output log-likelihood ratio (LLR). To reduce the computational complexity of 2D demappers, most researches focused on decreasing the constellation points (and thus the distances) included in LLR computation. There are three main approaches. The quadrant search reduction (QSR) method was first proposed in [11][12][13] for the 2D demapper in second-generation terrestrial digital video broadcasting (DVB-T2) [14] and was also employed in [15,16] for Mary phase shift keying (PSK), uniform QAM, and digital video broadcasting via satellite second-generation extensions (DVB-S2X) systems. QSR method exploits the symmetric property of the constellations to collect subsets of points which provide most of the information in LLR computation. Only distances to the constellation points in the subsets are included in LLR computation. Thus, the method is highly dependent on the shape and mapping of the constellation applied. However, the constellations studied in [11-13, 15, 16] are different from the constellations employed in ATSC 3.0. The second approach, called condensed symbols reduction (CSR) [17,18], takes advantage of the condensation feature that some constellation points in 2D-NUCs appear in clusters, as shown in Figure 1a,b. As the distances to the points in a cluster are almost identical, only one point in each cluster is included in the LLR computation. The CSR method is especially useful for 2D-NUCs designed for low and medium code rates where the condensation feature is more prominent. Fuentes et al. [10] employed QSR method together with CSR method to decrease the number of constellation points included in the Log-MAP 2D-demapping algorithm. Fuentes' method significantly reduces 2D-demapping complexity with a performance degradation of less than 0.1 dB for ATSC 3.0. The concept of the third approach is to consider only constellation points in the neighbourhood of the received signal. Barrueco et al. [19] proposed an adaptive-subregion demapping algorithm which defines an adaptive-size square region around the received signal based on noise power and then considers only the constellation points inside the region. The adaptivesubregion demapping algorithm, together with CSR, further reduces the number of points in LLR computation, but at the same time imposes the computational overhead for identifying all constellation points inside the subregion. As the size and location of the subregion are adaptive, the amount of computational overhead is output sensitive. In contrast, Fuentes' method requires fixed amount of computation, which is preferred in hardware implementation.
In this paper, we propose a low-complexity 2D-demapping method for ATSC 3.0, in which a QSR method is employed together with CSR to reduce the computational complexity. In the proposed method, the signal plane is divided into subregions. For each subregion, a subset which contains all nearest constellation points for received signals in the subregion is found. The subsets are then further downsized by the CSR method. Thus, when a signal is received, the LLR is computed by using only the distances to the constellation points in the corresponding subset. Compared with the Log-MAP algorithm, the Max-Log-MAP algorithm [20,21], and Fuentes' method, the proposed 2D-demapping method is shown to have a significantly lower complexity. Moreover, the subsets in the proposed method, unlike those in the adaptive-subregion demapping algorithm [19], are determined beforehand. Therefore, the proposed method has fixed complexity and is suitable for hardware implementation. Finally, simulation results show that compared with the Max-Log-MAP algorithm, the complexity reduction is achieved with negligible impact on bit error rate (BER) performance.
The outline of the paper is as follows. The proposed method is described in Section 2. In Section 3, the BER performance of the proposed method is compared with those of the Log-MAP algorithm, the Max-Log-MAP algorithm, and Fuentes' method for additive white Gaussian noise (AWGN) channel and independent and identically distributed (i.i.d.) Rayleigh fading channel. Complexity comparisons are presented in Section 4. Finally, conclusions are drawn in Section 5.

PROPOSED DEMAPPING METHOD
Low-density parity-check (LDPC) codes with code rates ranging from 2/15 to 13/15 were adopted for the physical layer specification of ATSC 3.0 [22]. For each code rate, three 2D-NUCs with modulation orders 16, 64, and 256 are designed based on target SNR. We will focus on 2D-NUCs with the highest modulation order, 2D-256NUCs, in the rest of the paper since it has the highest complexity. Figure 1  In the Log-MAP algorithm, the optimal maximum-likelihood (ML) LLR is computed for each coded bit b as where y and 2 h are the received signal and the noise variance after channel equalization, respectively, and  b =0 and  b =1 are the sets of constellation points with b = 0 and b = 1, respectively. Fuentes' demapping method is based on (1) with both  b =c replaced by their own subsets. It is noted that both the Log-MAP algorithm and Fuentes' method require a large number of high-complexity exponential and logarithmic operations and thus, are resource intensive. One popular way to solve this problem is to approximate both summations in (1) by their most significant terms to remove the exponential and logarithmic operations. The resulting Max-Log-MAP algorithm computes the Max-log LLR for each coded bit as which is a good approximation of the ML LLR, especially at high SNRs. It is observed that no exponential and logarithmic operations are required in the Max-Log-MAP algorithm. However, to obtain the Max-Log LLRs for all coded bits, the demapper still requires computing Euclidean distances from the received signal to all constellation points. In this paper, we propose a demapping method which significantly reduces the number of distances computed. 2D-NUCs in ATSC 3.0 are quadrant symmetric, and the quadrant of a constellation point is determined by the two most significant bits, b 0 and b 1 . Thus, the bit-LLRs of a received signal y = I + jQ outside the first quadrant are the same as those of y ′ = |I | + j |Q| (in the first quadrant) for all bits except the signs of bit-LLRs for b 0 and b 1 . As a result, we only have to consider received signals located in the first quadrant. The set consists 64 constellation points in the first quadrant. The set  can be further partitioned into four equal-size subsets,  00 ,  01 ,  10 , and  11 based on bits b 2 and b 3 . Figure 2 plots  00 ,  01 ,  10 , and  11 for code rates 4/15, 8/15, and 12/15. It is noted that the constellation points of each subset tend to group together. Therefore, it is easy to find two lines L 1 (I, Q) and L 2 (I, Q) to partition the first quadrant into four regions, each of which contains one subset of constellation points. Let R 00 , R 01 , R 10 , and R 11 denote the region which contains  00 ,  01 ,  10 , and  11 , respectively. The subsets  00 ,  01 ,  10 ,  11 , the lines L 1 (I, Q), L 2 (I, Q), and regions R 00 , R 01 , R 10 , R 11 are plotted in Figure 2 for code rates 4/15, 8/15, and 12/15. In the following, we will use received signals located inside R 01 to illustrate the proposed method.
Denote the constellation points that attain the minimums in (2) for a received signal y bŷ In other words,ŝ b =c (y) is the nearest constellation point with bit b equal to c. Let  b =c (R 01 ) be the set of constellation points each of which is the nearest pointŝ b =c (y) for some received signal y in R 01 . We first consider the Max-log LLR for bits b 4 , b 5 , b 6 , and b 7 . Note that subset  01 contains 16 constellation points corresponding to all possible combinations of bits b 4 , b 5 , b 6 , and b 7 . Thus, for most of the received signals y in R 01 , the nearest constellation pointŝ b =c (y) can be found in subset  01 for = 4, 5, 6, 7. The exception occurs only if the received signal y lies near the boundary of R 01 . In such cases, by the quadrant symmetry of the constellation, it can be easily proved that the nearest constellation pointŝ b =c (y) is still a constellation point in the first quadrant, that is,ŝ b =c (y) ∈ . However, it may be an element of set  ⧵  01 and lie near the boundary of R 01 . Figure 3a shows  b 6 =0 (R 01 ) and  b 6 =1 (R 01 ) for code rate 12/15, which confirms the analysis above. Note that R 01 is the upper left region in the first quadrant and  01 is the set of 16 points inside. Next, consider the Max-log LLR for bits b 2 and b 3 . Note that for every constellation point in  01 , b 2 = 0 and b 3 = 1. Therefore, for most of the received signals y in R 01 , the nearest (y) are elements of sets  10 ∪  11 and  00 ∪  10 , respectively, and lie near the boundary of R 01 . Figure 3b shows  b 2 =0 (R 01 ) and  b 2 =1 (R 01 ) for code rate 12/15. By following a similar analysis, we find thatŝ b 0 =0 (y) andŝ b 1 =0 (y) are constellation points of set , that is, in the first quadrant, and located inside or near the boundary of R 01 ;ŝ b 0 =1 (y) and s b 1 =1 (y) are constellation points in the fourth and second quadrants located near the x-axis and y-axis, respectively. Figure 3c shows  b 0 =0 (R 01 ) and  b 0 =1 (R 01 ) for code rate 12/15. From the constructive method above, it is easy to see that  b =c (R 01 ) is a subset  b =c for all and c. Moreover, the Maxlog LLR value remains unchanged if the constellation point sets (2) are replaced by their subsets  b =c (R 01 ). Therefore, calculating the Max-log LLR by using can decrease the demapping complexity with no performance degradation (compared to the Max-Log-MAP algorithm). As for how many distances are required to calculate the Max-log LLRs for all eight bits, only Euclidean distances to all constellation points in are required. Figure 3d shows (R 01 ) for code rate 12/15. From the figure, it is obvious that the number of distance calculations in the proposed method is much smaller than 256. The number of constellation points in  b =c (R 01 ) can be further reduced by CSR method. If there exist clusters of constellation points in  b =c (R 01 ), it suffices for the demapper to calculate a single distance from the received signal to a representing point in each cluster, and use it to approximate the distances to the other points in the same cluster. By retaining only the representing point for each cluster and discarding the rest, the CSR method decreases the number of points in  b =c (R 01 ) and thus, the demapping complexity. However, the improvement comes at a price of BER performance degradation induced by distance approximation. Barrueco et al. showed that to achieve a balance between demapping complexity and BER performance, the distance between any two points in a cluster is better limited to less than 0.1 [19]. In this paper, we adopt 0.05 as the distance limit. It is noted that by this limit, no cluster forms in 2D-256NUCs for code rates 12/15 and 13/15. With a slight abuse of notation,  b =c (R 01 ) and (R 01 ) are used to denote the subsets after CSR in the rest of the paper if not specified otherwise.
Until now, we have used the received signal located in R 01 as an example to illustrate the proposed method. By the same argu-ment, we can obtain the constellation sets  b =c (⋅) and union of sets (⋅) for the other three regions R 00 , R 10 , and R 11 . Figure 4 shows  b =0 (R 00 ),  b =1 (R 00 ), and (R 00 ) for code rate 8/15. Note that R 00 is the lower left region in the first quadrant and  00 is the set of 16 points inside. By an argument similar to the one for R 01 , it follows that before CSR, all 16 constellation points in  00 are either an element of  b =0 (R 00 ) or  b =1 (R 00 ). From the figure, it is observed that after applying CSR, only 7 constellation points (out of 16) remain in (R 00 ). In fact, the number of points in (R 00 ) is reduced from 40 to 30 by CSR. Table 1 lists the boundary lines, L 1 (I, Q) and L 2 (I, Q), and the numbers of constellation points in (R 00 ), (R 01 ), (R 10 ), and (R 11 ) for all code rates.

PERFORMANCE EVALUATION
A bit-interleaved coded modulation (BICM) module compliant with ATSC 3.0 is used in the simulation, which provides results comparable with ATSC 3.0 systems for AWGN channel and  i.i.d. Rayleigh fading channel. A block length of 64,800 bits is adopted for the LDPC code, and the 2D-256NUCs for different code rates are adopted for modulation. In the receiver, perfect channel estimation is assumed, and the sum-product algorithm [23] with a maximum of 50 decoding iterations is employed by the LDPC decoder. Finally, the SNR values in the subsequent results are given by E s ∕N 0 where E s is the average energy of a 2D-NUC signal, and N 0 ∕2 is the power spectral density of AWGN. Figures 5 and 6  algorithms. It is observed that the Max-Log-MAP algorithm has worse performance compared with the Log-MAP algorithm. Moreover, the higher the code rate is, the smaller the performance loss becomes. Since a high code rate is meant to be used at a high target SNR, this observation is in agreement with the fact that the Max-log LLR is a good approximation of the ML LLR for high SNR.
We then compare the performance of the proposed method with the Max-Log-MAP algorithm. The performance losses (in dB measured at BER 10 −4 ) of the proposed method are listed in Table 2 for all code rates, where Δ A, ML and Δ A, Max are used to denote the performance losses relative to the Log-MAP and the Max-Log-MAP algorithms, respectively, for AWGN channel; and Δ R, ML and Δ R, Max for i.i.d. Rayleigh fading channel. Note that the LLR value calculated by the Max-Log-MAP algorithm is the same as that calculated by the proposed method  the same performance as the Max-Log-MAP algorithm. The same also holds true for code rate 13/15 as shown in Table 2. In contrast, for the 2D-256NUC with code rate less than or equal to 11/15, the CSR method becomes applicable to some  Table 2). Upon analysis, it is found that the performance loss is affected by two factors: one is the number of constellation points appearing in clusters; the other is noise variance 2 h at target SNR. On the one hand, as code rate decreases, the number of constellation points appearing in clusters increases, (which can be observed from Figure 1), and thus, more constellation points are substituted by the representing points; consequently, the performance deteriorates. On the other hand, as code rate decreases, the target SNR decreases, and thus, noise variance increases, which compensates for the performance loss caused by CSR method as shown by (5). This observation is supported by Figures 5-8 and Table 2, whereas code rate decreases, the performance loss first tends to increase from code rates 11/15 to 5/15 and then decrease when code rate is less than 5/15. It is noted that although the proposed method suffers performance loss compared to the Max-Log-MAP algorithm for code rates less than or equal to 11/15, the loss is negligible (less than 0.02 dB for AWGN channel and 0.04 dB for i.i.d. Rayleigh fading channel).
Next, we compare the performance of the proposed method with Fuentes' method [10]. The performance losses (in dB measured at BER 10 −4 ) of Fuentes' method are listed in Table 2 for all code rates. From Table 2, it is observed that the proposed method achieves a similar performance as Fuentes' method for high code rates (from 10/15 to 13/15). In contrast, for medium and low code rates, the proposed method is outperformed by Fuentes' method. However, the performance loss (less than 0.14 dB for i.i.d. Rayleigh fading channel) is chiefly inherited from the Max-Log-MAP algorithm.
Finally, we compare the performance of the proposed method with the Log-MAP algorithm (which computes the optimal ML LLR). It is observed that the performance loss is more significant for i.i.d. Rayleigh fading channel compared with AWGN channel. However, even for i.i.d. Rayleigh fading channel, the proposed method suffers a performance loss less than 0.12 dB for high code rates (from 10/15 to 13/15) and less than 0.24 dB for other code rates.

DEMAPPING COMPLEXITY
In this section, we compare the computational complexity of the proposed method with those of the Log-MAP algorithm, the Max-Log-MAP algorithm, and Fuentes' method. The complexities are evaluated in terms of number of distance calculations N D , number of additions N A , number of multiplications N M , and number of exponential and logarithmic operations N E . Note that as the complexity of one comparison operation is similar to that of one addition, each comparison is counted as an addition and is included in N A . From (1), it is obvious that in order to compute the ML LLR for all eight bits, the Log-MAP algorithm requires 256 distance calculations, 8 × 256 additions, 2 × 256 multiplications (including 256 multiplications for the squares of distances and 256 divisions), and 264 exponential and logarithmic operations. Fuentes' method is similar to the Log-MAP algorithm except replacing  b =c in (1) by its subset. Thus, the sizes of the subsets, which depend on the signal constellation, determine the computational complexity. The numbers of operations, N D , N A , N M , and N E , of Fuentes' method are listed in Table 2. It is observed that Fuentes' method indeed significantly decreases the complexity of the Log-MAP algorithm. However, both the Log-MAP algorithm and Fuentes' method employ a large number of exponential and logarithmic operations that are more expensive than distance calculations, additions, and multiplications. In contrast, both the Max-Log-MAP algorithm and the proposed method require no exponential and logarithmic operations. From (2) Table 2. Note that the number of distance calculations equals |(R 00 )|, |(R 01 )|, |(R 10 )|, or |(R 11 )|, depending on the location of the received signal. The value of N D in the table is the maximum among the four. The same rule is also applicable to N A and N M . Comparing the proposed method with the Max-Log-MAP algorithm, it is obtained that the proposed method drastically reduces the number of operations required at a nominal cost of computing L 1 (I, Q) and L 2 (I, Q). Specifically, for both N D and N A , it exhibits a reduction ranging from 77% at the highest code rate to 96% at the lowest code rate; for N M , it exhibits a reduction ranging from 74% at the highest code rate to 93% at the lowest code rate. Finally, we compare the proposed method with Fuentes' method. The proposed method not only is free of exponential and logarithmic operations, but also shows a reduction of both N D and N A ranging from 25% to 50% and a reduction of N M ranging from 43% to 67%.