Analysing digital predistortion technique for computation-efﬁcient power ampliﬁer linearisation in the presence of measurement noise

The noise is an inherent part of a transceiver system, which becomes more severe on low-cost systems. The power ampliﬁer (PA) further enhances this noise and the signal to be propagated to the receiver. The conventional approach of digital predistortion (DPD) assumes an ideal transceiver system while extracting data for the predistortion function generation. As a result, performance limitation arises due to residual signal–noise interac-tion. This study presents the accuracy and implementation issues of DPD on a low-cost transceiver having lower bit resolution in the presence of transceiver noise. Different model architectures, as well as processing algorithms, are compared in terms of numerical stability of the solution (condition number of observation matrix), efﬁcient ﬁeld-programmable gate array (FPGA) implementation (dispersion of coefﬁcients, in-band model performance (normalised mean square error), and out-of-band model performance (adjacent channel power ratio). The simulation results are tested on FPGA and direct conversion transceiver-based platform using PA. A long-term evolution signal with 64 quadrature amplitude mod-ulation is used for performance evaluation. The suitability of the various polynomial models for ﬁxed-point implementation and the required memory size for implementing the DPD model is further established.


INTRODUCTION
The higher data rate communication is achieved using complex envelope techniques like long-term evolution (LTE) and LTEadvanced by incorporating higher quadrature amplitude modulation schemes [1]. However, such a signal imposes a strict linearity requirement on the radio-frequency power amplifier (PA) due to the high peak-to-average power. Therefore, it requires operating the PA in the saturation region for the highest efficiency, but it produces distortions; hence, it is essential to compensate for such distortions by the well-established technique of digital predistortion (DPD) [2][3][4][5].
The linearisation using DPD can be achieved using direct or indirect learning methodology. Figure 1 shows the popular implementation of DPD using indirect learning architecture (ILA) [6]. The possibility of identifying a post-distorter, which is equivalent to the predistorter, is mathematically established in [7] for Volterra-based models. This equivalence allows finding the PA inverse in an 'indirect' way using linear regression instead Behavioural model extraction procedure [8] eventually cancels out the distortion of the PA. The z(n) is nonlinear with respect to the g(n); however, z(n) is linear with respect to the x(n). Therefore, the predistortion process is also called the linearisation process [7].
As can be perceived from Figure 1, the ILA scheme has two parallel processes. Process 1 is 'predistorter training or learning' where the inverse model is identified based on available measured input and output of the transmitter system. The identification process for the inverse model is shown in Figure 2 [8]. The first step is to acquire the baseband complex input and the PA's output signal using the appropriate measurement setup; afterwards, the data is de-embedded to the PA reference plane by normalising the PA output with small-signal gain. Because of the signal propagated from baseband to PA, including the transmitter and receiver system, a propagation delay occurs, introducing a mismatch between the input and output data samples. Hence, the immediate step is to estimate this delay and compensate for it. Once the delay is compensated, a suitable inverse model (behavioural model/digital predistorter) is identified. This model's accuracy is validated using established norms such as normalised mean square error and adjacent error power ratio. As can be observed from Figure 1, that channel is not part of the problem statement, and the characteristics of the system do not change very rapidly. The 'training process' is generally carried out in digital signal processor (DSP) in offline mode at an intermittent interval.
The second process is the 'application of the inverse model'. Once the model is identified, each input signal data is applied to the 'inverse model.' This 'predistorted data' is then processed via the transmitter. Due to the expansion characteristics of the inverse model, the predistorted data is divided by a constant value, which ensures that the maximum power of input signal x(n) and the predistorted signal g(n) are the same. It ensures that the maximum value of the signal processed through the Digital-to-analog converter (DAC) remains constant and DAC of the transmitter is not overdriven. Due to the need for continuous data processing at a high rate, the model is generally implemented in field-programmable gate array (FPGA) in the fixed-point calculation environment. The model's coefficients are updated intermittently to account for slow changes in the transmitter/PA characteristics due to thermal effects. The accurate inverse model for DPD depends on the transmitter and feedback receiver performance and accuracy; hence, it should be free from transceiver noise, I/Q imbalance, and DC offset. The impairments such as I/Q imbalance and DC offset [9][10][11] are well explored in the linearisation field, while the impact of noise is not much explored. The transceiver noise reduces the overall system signal-to-noise ratio (SNR), limiting the modelling performance of DPD and, ultimately, the linearisation performance. This source of noise in the transceiver may be quantisation noise of data converters, inherent noise of PA, and low-noise amplifiers. This impact may not quantify on a very high-end transceiver, but it is more prominent on lowcost systems. It should be noted that the DPD techniques are proposed to reduce the distortion introduced by the transmitter path only; therefore, noise or any property of the channel is not impacted by any of the DPD techniques. The aim is to provide a DPD technique with better numerical stability and lower complexity for low-cost transceivers utilising lower-bit DAC/Analog-to-digital converter (ADCs) and other measurement noises.
Low-cost transceivers (transmitter and feedback receiver) using lower precision/lower sampling rate also require high numerical stability and lower implementation complexity. Volterra series is the most comprehensive model to capture the non-linearity and memory effects of PA [12]. However, the Volterra series' number of parameters increases very rapidly with the increase of non-linearity order and memory depth. The popular modelling method, which is derived from the Volterra series, is memory polynomial (MP) [2,13], dynamic deviation reduction model (DDR), and generalised MP (GMP) [14]. DDR and GMP models containing advanced and delay memory terms are shown to have incrementally better performance over MPM with a higher number of coefficients [14]. The DDR, GMP, and MP models are based on the Vandermonde matrix (increasing polynomials order for each column). The Vandermonde matrices are known to be ill-conditioned with an increase in the order of the polynomial. As the MP model is a subset of GMP and DDR models, it has also been established that such models often have ill-conditioned observation data matrix as compared to MPM [13][14][15].
During the 'DPD application' process, in fixed-point arithmetic implementation on lower bits FPGA, the DPD performance reduces sharply compared to floating-point implementation. Polynomial models such as orthogonal MP (OMP) models are proposed to provide a solution for numerical stability problems [15]. Besides, techniques such as principal component analysis (PCA) [6,16] independent component analysis (ICA) [17] are investigated as solutions for improving numerical stability. The lower-order polynomial models, such as b-spline and C-spline models, provide a better basis function for effective implementation on the FPGA [18,19]. Although the numerical stability of these models is better than the Volterra-based models, the performance is still expected to deteriorate in the presence of transceiver noise.
The multiscale PCA has been proposed in the literature related to signal and image processing for the denoising process [20][21][22]. The multiscale PCA technique is proposed for the MP model in [23] for combating the noise in the system. The multiscale PCA technique provides encouraging results, specifically for out-of-band performance, where the PA output is more impacted by system noise. However, in the presence of noise, the issue of model's numerical stability becomes significant. This work establishes that a noisy feedback path for DPD impacts the signal's characteristics for which the DPD was trained. A numerically unstable model compromises the generalisation capability and may lead to limited linearisation performance. Moreover, it is also desirable for any low-cost setup to have a lower digital computational burden supported by limited DSP resources.
Therefore, this work investigates prominent polynomialbased model topologies, algorithms for enhancing the numerical stability of the LS algorithm while targeting the problem of the system noise, and low-cost fixed-point implementation for a practical, low-cost transceiver setup. With comparative analysis, it is reported for the first time that denoising along with numerically stable basis function models improves the performance on a lower resolution system leading to lower memory consumption for DPD application. The study is structured as follows: Section 2 describes the prominent behavioural modelling topologies within the polynomial framework for PA non-linearity. Section 3 describes the algorithms to enhance numerical stability. The measurement setup for proof-of-concept is described in Section 4. Section 5 presents a comparative analysis in terms of (a) in-band DPD model performance, (b) out-of-band DPD model performance, (c) robustness of DPD models for different measurement noise level, (d) robustness of DPD models in low-precision implementation environment, and (e) memory requirement for application of DPD. Section 6 showcases the experimental comparison results. The conclusion is provided in Section 7.

BEHAVIOURAL MODELS
As discussed in the previous section, the ILA scheme requires 'inverse modelling' for DPD identification. The first requirement of the DPD application is the selection of an appropriate model topology. A polynomial-based model with time-delay terms lends itself easily for FPGA implementation. Therefore, the popular models having polynomial basis are considered as follows:

MP model
The MP model is the most popular DPD model, which is a simplified form of the Volterra. The output signal from the MP model can be represented as [2] where M is memory, and N is the non-linearity order of the PA, respectively. The c iq is the model coefficient, and x(n) is the input signal.

GMP model
The GMP model includes additional delay terms to improve the performance of the MP model. The output signal from the GMP model can be represented as [14] z where N a L a , N b L b M b and N c L c M c are the numbers of coefficients for aligned signal and magnitude (memory polynomial), the number of coefficients for signal and delaying envelope and the number of coefficients for signal and leading envelope, respectively. N a and L a are the non-linearity order and delay taps for the aligned signal. Section 4.2 describes the time alignment of the input and output signals. N b and N c represent the nonlinearity order of lagging and leading signals, L b and L c are the number of the delay taps for aligned signal, and M b and M c are the number of the delay taps for lagging signal and leading signals.

OMP model
When the PA is highly non-linear and exhibits strong memory effects, the Volterra and its derivative polynomial models require higher non-linearity order and memory depth to represent the PA behavioural model. Such modelling leads to an ill-conditioned data matrix, limiting the generalisation capability of the solution and the effectiveness of the DPD solution. Also, the ill-conditioned data matrix leads to a high dispersion of the coefficients, which further increases the implementation complexity on FPGA in a fixed-point environment [24,25].
An MP model of limited order is already established in the literature to represent the gains of a PA for the entire input envelope range as given by Equation (1) [18,26]. A higher-order polynomial is required to mimic the high non-linear and saturated PA. The extraction of such models can lead to a numerically ill-conditioned regression matrix, and in the case of sparse data, a highly oscillatory solution is obtained [27]. Hence, the coefficients become extremely sensitive for this data [18], and frequent update in the system is required even if there is a small error in interpretation at input data. OMP was further proposed to enforce the orthogonality between the columns of the observation matrix of the MP model. The output of the OMP model is given as [15] where N and c q are non-linearity order and coefficient of the model.

Cubic-spline (CS) model
These issues of a high-order polynomial can also be mitigated using a CS instead of MP and other regular polynomials. A piece-wise structure is more useful in comparison to high-order polynomials. The lower order polynomials do not have oscillations like the Runge phenomenon [28]. The matrix obtained using piece-wise polynomials has a smaller condition number, making the system additionally stable to errors. The CS coefficients have a local scope, that is, when an estimated value is used for a specific knot, the error introduced will only create local distortion in the shape of the spline, which does not spread to the entire input range [29,19]. This results in a more robust extraction and is used for successive linearising data after the training is performed. Moreover, the replacement of highorder polynomial with the lower order splines allows the effective FPGA implementation. The spline is built on lower-order piece-wise polynomials, which provide interpolation between a set of points known as control points. This approach allows using lower non-linearity order and memory depth. The memory model representing the spline-based PA model can be given as [18] where where x k (n) is the value of x(n)at the knot k . K s is the number of knots in spline, c mk is the complex coefficients. Equation (6) represents the CS, which is a piece-wise polynomial of order three. The CS in Equation (6) is continuous at the knots where the splines connect. This continuity is also maintained at knots for their first and second derivatives [18]. The observation matrix contains monomials given by Equation (6).

ALGORITHMS TO ENHANCE NUMERICAL STABILITY OF THE DPD MODELS
Equations (1) to (4) can be represented in matrix form as where ⃗ Bis a vector containing the model-coefficients. ⃗ W is the observation data matrix. The LS method is used to compute ⃗ B as In this work, we have used Moore-Penrose pseudoinverse for the implementation of least squares for all the DPD models, which is based on the singular vector decomposition method.
The implementation of the above-mentioned polynomialbased models such as GMP, MP, OMP, and CS models depends on the observation matrix's size and numerical properties. There are several popular dimension reduction techniques, such as ICA [30], t-distributed stochastic neighbour embedding [31], PCA, and so forth. This work investigates the impact of the following two established techniques on the DPD models' performance presented in Section 2.

PCA
The PCA technique has been utilised extensively for DPD application as one of the dimensionality reduction methods by using the principal component as a new set of variables [6]. The extracted principal components are orthogonal. The correlation matrix using ⃗ W can be written asR = ⃗ ] denotes a matrix of eigenvectors andΛ = diag( 1 , 2 , 3 , … (M +1)(N +1) ) denotes the diagonal matrix of eigenvalues. The new length of the data matrix is selected using the weight of eigenvalues of data variance. Since the K eigenvectors have all significant variations of data, the residual data is discarded. The new principal components are created as The new data matrix is further denoted as Hence, where ⃗ C = [c 1 , c 2 , c 3 , … c L ] T is a coefficient vector of the PCA-MP model. Similarly, the PCA can be implemented with GMP and OMP to form GMP-PCA and OMP-PCA, respectively.

Multiscale PCA (MSPCA)
Wavelet-based MSPCA has been proposed in [23] to further allow denoising of the observation data matrix. The wavelets are very useful for signal analysis and noise removal in several areas due to their capability to divide a given function into different scale components. The wavelet transform (WT) concentrates signal features in a limited largemagnitude wavelet coefficient [32]. The small value wavelet coefficients are typically noise, which can be removed without affecting the signal quality using a proper threshold level. Figure 3(a) shows the steps involved in the denoising of a signal. WT decomposition is performed by choosing a mother wavelet and the desired level for decomposition. The thresholding is performed on each level, depending on the hard or soft decision. In an inverse WT (IWT), wavelet reconstruction is performed and filtered to get the denoised signal. The MSPCA approach is implemented in the following steps [33,34]: 1. Acquire the observation data matrix using CS as in Equation (4) and normalise each column of the observation matrix. 2. Computation of wavelet decomposition of every column in observation matrices, G m W (m = 1,…K) along with a scaling function coefficient matrix H K W, where K is the number of scales, G and H represents filter and W is the observation data matrix as shown in Figure 3(b). 3. Apply PCA to scaling function coefficient matrices and wavelet to choose relevant principal components to determine control limits of T 2 and Q at each scale. T 2 statistics encapsulate the PCA model variation, and Q statistics encapsulate residual space generated by the PCA model. 4. Reconstruct the coefficient vector and approximated data matrix from the selected scales' scores using the IWT. 5. Apply PCA to the reconstructed observation data matrix and retain suitable principal components.
These steps are summarised in Figure 3(b). In this case, we have used three levels, and Daubechies least-asymmetric wavelet and the number of retained principal components are selected using Kaiser rule [35], where it keeps two principal components, both for the PCA approximations and the final PCA, but one principal component is kept for details at each level. Therefore, the results can be improved by suppressing noise because the details at initial levels are composed essentially of noise with small contributions from the signal. The selection of a mother wavelet function is a crucial step in WT, capable of denoising, component separation, coefficient reconstruction, and feature extraction from the signal in time and frequency domains depending on requirements. The common standard families of WT basis functions are Haar, Coiflets, Daubechies, and Symlets. The Daubechies wavelet is a good choice for denoising the signals [35,36]. The Harr wavelet, which is a symmetric special case of Daubechies wavelet, is useful for edge detection. Hence, we have selected the Daubechies wavelet for our work. The decomposition is performed by successive low-pass filters (LPSs) and high-pass filters (HPFs) as shown in Figure 4(a) [20][21][22].
The signal sequence r[n] is passed through successive LPF and HPF. The HPF produces detailed coefficients k 1 [n], and the LPF produces coarse approximation coefficients l 1 [n]] at the first level. Then, it is further passed through LPF and HPF, and this process lasts until the anticipated level of decomposition is achieved. The maximum number of levels depends on the length of the signal, and it can be written as where U is the length of the signal and P max is the maximum number of levels. To reconstruct the original signal, IWT is used as shown in Figure 4(b). After the necessary processing, the coefficients obtained in the decomposition are passed through LPF and HPF for the reconstruction of the signal. In [37], the author presents a wavelet-based soft and hard threshold to denoise the data. The soft and hard threshold is given as where 'r' is the wavelet coefficient, is the threshold level and is given as and d j is coefficients at level j, C is the number of wavelet coefficients. These two types of wavelet threshold can be used on the basis of characteristics of noise. Here, the threshold is applied on d j, which corresponds to the noise component. By iteration experimentation, we found Daubechies least-asymmetric wavelet with four vanishing moments with third-level decomposition is suitable to mitigate the noise using the hard threshold.

EXPERIMENTAL SETUP
For measurement and preliminary validation of the proposed CS-MSPCA, the experimental setup is shown in Figure 5. The

Time-delay compensation
Before the alignment of the signal, the sampling rate of the transmitted and captured baseband signals should be matched. The transmitter baseband signal is down-sampled to 153.6 MSPS so that it can match the receiver sampling rate. The delay is compensated between baseband signals using a frequencydomain cross-correlation method [38]. The frequency-domain where R represents the Fourier transform. Equation (16) can be re-formulated as where S is the gain, f is the sampling frequency, τ is the time delay, and φ is the phase difference of output signal and input signal. When arg(R(x * z )) is plotted versus frequency, τ is the linear curve slope, and φ is the intercept of the curve with the y-axis. The value of cross-correlation in Equation (17) is maximum when (2πf τ + φ = 0); therefore, time adjusted signal is Figure 6(a) shows the arg(R(x * z ))versus frequency of the output signal. It can be perceived that there is a defined linear region with the constant slope in the frequency band where the signal is present as shown in Figure 6(b). This slope provides a group delay of the signal. The initial phase is given as a phase value at f = 0. There is no delay present in the phase (delay after compensation) as group delay, and the initial phase both are calculated accurately using a linear equation capable of providing interpolation.

DPD model performance metrics
The impact of transceiver noise on modelling performance is studied by investigating two cases. In the first case, we utilise the measurement setup shown in Section 4 and signals/PA described in Section 4.1/4.2. The inverse characteristic of the PA is modelled using various models. The in-band performance is measured in terms of normalised mean square error (NMSE) given as where S is the sample size, y meas (n) is the measured output, and y est (n) is the estimated output. The out-of-band performance is measured in terms of adjacent channel error power ratio (ACEPR) given as where f c is the carrier frequency, B is bandwidth,Ąf is the frequency offset from the carrier frequency. E(f) is the Discrete Fourier transform (DFT) of error [e(n) = y meas (n) − y est (n)]. Y meas (f) is the DFT of measured output y meas (n). Figure 7(a) shows that the NMSE curve of MP, MP-PCA, and MP-MSPCA models is almost the same at memory depth of M = 3 and non-linearity order of N = 1 to N = 12. It is to be noted that the out-of-band component of the signal is purely distortion of the signal and more close to the noise floor. Figure 7(b) shows that MP-MSPCA provides a significant improvement of up to 6 dB as compared to the MP model for out-of-band performance in terms of ACEPR. We introduced additive white Gaussian noise (AWGN) of SNR of 30 dB to the original signal for the second case. From Figure 7(a), deterioration in the model NMSE performance can be observed; however, AWGN-MP-MSPCA provides -36 dB NMSE as compared to the -29 dB NMSE for AWGN-MP and AWGN-MP-PCA models. A similar improvement trend can be observed in Figure 7(b), where AWGN-MP-MSPCA provides -47 dB ACEPR as compared to -36 dB ACEPR for MP and MP-PCA models. Although Figure 7 showcases the advantage of denoising, it is essential to test the algorithm on a range of practical SNR values to assess SNR's influence. Therefore, we have tested it for a range of SNR values (0 to 40 dB) at fix non-linearity order of 4 and memory depth of 3. The value of  Figure 7, which shows that the performance is optimum at a non-linearity order of 4. Figure 8(a) shows the inverse modelling performance in terms of NMSE for MP, MP-MSPCA, OMP, OMP-MSPCA, and CS, CS-MSPCA. The MP, OMP, and CS inverse modelling performances decay as SNR is reduced while using the MSPCA; in all cases, the inverse modelling shows better performance (more than 5 dB reduction in NMSE) to other models, as it is able to denoise. Figure 8(b) shows the inverse modelling performance in terms of ACEPR for MP, MP-MSPCA, OMP, OMP-MSPCA and CS, CS-MSPCA, where using MSPCA, it shows approximately 20 dB at lower SNR values and more than 10 dB at higher SNR. In the case of the noisy signal, the application of the wavelet-based MSPCA method for the observation matrix of the MP model provides more than 5 dB improvement in inband error and more than 11 dB improvement in out-of-band error. The issue of a numerically stable solution in the noisy measurement system is discussed further. Figure 9 shows the in-band modelling performance of the different models in terms of NMSE with the change in nonlinearity order. Figure 9 depicts the performance at memory  Figure 10, which depicts the out-of-band performance of the performance at memory order four (M = 4), and nonlinearity varies from 1 to 12. It is to be noted that although the CS-MSPCA model provides the best performance for both the in-band and out-of-band performance, this improvement is not significant. However, when this performance is combined with the model stability and lower dispersion of coefficients, it leads to a lower implementation cost.

5.2
Metrics for efficient DPD implementation

Condition number
If the observation data matrix obtained in modelling is illconditioned (high condition number), it leads to numerical instability. Hence, it measures the worst-case loss of precision. Moreover, the b digits of precision is lost if the condition number of a matrix is 10b during solving the system [39]. The higher condition problem makes the pseudoinverse calculation of the matrix very sensitive towards minor change using PCA; it is showing subspace of the PCA. The condition number is defined as the ratio of the maximum eigenvalue ( max ) and minimum eigenvalue ( min ) of a matrix and can be written as Cond.no. = max min (21) The condition number of the MP model is increasing with the increase in FPGA bit, and results may vary and increase in FIGURE 11 Condition number versus non-linearity order for inverse modelling inaccuracy. OMP models are established in the literature for better conditioning of polynomial-based models [15]. The condition number of MP and OMP models is shown in Figure 11 for the non-linearity order of four and memory depth of four. Since the final model is achieved after pruning non-linearity order, the observation matrix of the MP model is ill-conditioned. The MP-PCA model has a better condition number than the MP model by retaining only significant principal components. The OMP model has better stability compared to MP, but the condition number increases at higher non-linearity order. MP-MSPCA can provide a much lower condition number. However, CS-MSPCA shows the least condition number because of the denoising and coefficient reduction as only the first significant eigenvalues are selected. Here, the OMP-PCA has a lesser condition number than OMP, and CS-PCA has the equivalent condition number to CS-MSPCA.

FIGURE 12
Dispersion coefficient versus non-linearity order for inverse modelling the highest reduction in the dispersion of coefficient. In the case of CS, the memory depth of two is selected.

Computation complexity
The computation complexity depends on the number of coefficients in the model. It increases with the increase in the non-linearity order and memory depth. Therefore, coefficient reduction techniques such as PCA are useful to reduce the size of the observation matrix. Table 1 compares the number of coefficients for different models where the non-linearity order of six and memory depth of three is used, while in the case of CS model, the non-linearity order of two and memory depth of four is used. The CS model allows the use of lower non-linearity order because of its piece-wise properties. Table 1 shows the CS-MSPCA requires the least number of coefficients in a noisy environment. It is of interest that while inverse model extraction can be done intermittently for the DPD model, DPD application is a continuous process. Every transmitted data is processed continuously through DPD, therefore saving in computational cost enhances exponentially.

Robustness for fixed-point implementation
For the implementation of the DPD algorithm in FPGA's fixedpoint environment, bit resolution is a crucial aspect. It should be noted that even if the model extraction is carried out in the DSP utilising floating-point calculation, the extracted DPD model should be applied in FPGA for faster processing and uninterrupted transmission. Also, the application of the DPD algorithm requires a designated memory space in the FPGA. The memory size can be defined as [40] Memorysize = Bitresolution × Matrixsize (23)  Model performance deteriorates for 16-bit fixed-point implementation. However, the CS-MSPCA model outperforms the other models significantly when the system has noise and lower memory space. Table 1 shows the corresponding memory size for the models under investigation. By processing signal at 16-bit resolution, the memory size requirement is halved as compared to the other models. in-band and out-of-band distortion mitigation capability, memory resource requirement, the capability to perform in a lowprecision fixed-point calculation environment, and the robustness of models in the presence of measurement noise.