Parallel wideband digital up-conversion architecture with efficiency

: Owing to the limited processing speed of current digital signal processing devices, digital generated signal frequency and bandwidth based on digital up-conversion technique has long been a bottleneck. Interpolation on the baseband signal puts great pressure on the following filtering and mixing operation. In this study, an efficient parallel architecture is presented, which moves the interpolation behind the filtering and mixing, and adopts parallel numerically controlled oscillator (NCO) arrays and poly-phase low-pass filter arrays to achieve the anticipated orthogonal mixing signal and prototype filter. Parallel NCO decomposition principle and mathematical derivation are elaborated, and realisation of the poly-phase filter arrays is also discussed. The proposed post-interpolation architecture can effectively relieve the filtering and mixing pressure including operation rate and computational complexity, which will benefit the hardware implementation. In the end, verification test of signal up-conversion with 400 MHz bandwidth is launched to certify the architecture validity.


Introduction
Compared with digital down-conversion (DDC) technique (e.g. [1]), which tends to move the IF wideband signal to baseband and reduce data rate, digital up-conversion (DUC) technique operates in the reverse direction. DUC is a critical kind of technique in signal generation area, especially for advanced arbitrary waveform generation (e.g. [2]). The basic idea is to convert the baseband signal to a specific carrier and increase data rate.
Generally speaking, the common DUC processing always cascades the interpolation, low-pass filter, cascaded integrator comb (CIC) filter and tuning mixing (e.g. [3]), which will undoubtedly lead to high speed pressure and large resource consumption pressure for digital computation. Thus, it is difficult to implement high carrier and wide bandwidth DUC on hardware platform such as ASIC, DSPs and FPGAs.
Take a comprehensive view of the above hardware platforms, the rapidly developed FPGA devices turn out to be more competent in the hardware signal processing applications. Not only its logic resources but also the computation speed explosive enhances and upgrades, especially the splendidly grown IO speed, which offers us the feasibility to research on the wideband DUC implementation.
As for the current situation, plenty of researches promote various parallel DDC structures (e.g. [4,5]) in order to make it easy to realise, which pushes us to do the same work for the DUC technique. This paper mainly focuses on the research on parallel decomposition for several DUC units according to the classic conversion theory, and proposes an efficient parallel architecture that consists of numerical correlative numerically controlled oscillator (NCO) arrays, poly-phase filter arrays and postinterpolator. On the basis of each branch signal that originates from the same input baseband signal, detailed convolution operation of the filtering is also elaborated. The decreased operation speed and reduced computational complexity make the presented DUC architecture much easier to be applied on hardware platforms.

Classic DUC system
For a given baseband signal, the classic block diagram of DUC system (e.g. [6]) is shown in Fig. 1. The baseband signal is modulated with a complex carrier signal to fit for the DAC conversion and complete RF transmission. The first level interpolation, designed according to the anticipated output signal frequency and bandwidth, tremendously increases the data rate and brings much computational pressure and complexity to the following low pass filter (LPF) and NCO. The higher the output signal frequency and bandwidth is, the more difficult the system runs.
Suppose I(n) and Q(n) represent the baseband in-phase and quadrature signal, respectively, the output frequency of NCO is ω 0 . Then, we can get the output signal expression of y(n) as where all the LPF, NCO and DAC modules work at the same high speed of f DAC . Let the interpolation factor be K, the baseband input signal should be limited to f DAC /2K (e.g. [7]), which also gives instructions to the design rule of the low-pass filter h(n) in order to avoid spectrum overlapping.

Parallel decomposition principle
Equivalent structure of the classic DUC architecture can be depicted as in Fig. 2. Given a baseband complex signal x(n) with sampling rate f s and the anticipated output carrier is ω 0 , then the mixing sequence representation generated by NCO needs to be e − jω 0 n where ω 0 = 2π( f 0 /K f s ) and K is an interpolation factor resulting in the output data rate upgrading to Kf s .
Then, consider the following equivalent transformation model in Fig. 4, where What is more, there also exists one equivalent transformation model depicted as Fig. 6 (e.g. [10]), which means we can put the mixing operation before the interpolation so long as we extract one data from every K data of the mixing sequence.
Have a close-up view of Figs. 5 and 6, a new parallel K channel decomposition of DUC architecture occurs as shown in Fig. 7, where L k (n) can be deemed as the parallel decomposition of the original mixing signal L(n) and operates at the decreased rate of f s .
The output y(n) turns out to be the combination of K branch signals under a certain rule, as charted in Fig. 7. Each path originates from the same input signal source, and goes through a poly-phase filter derived from h(n) and a decomposed tuning sequence derived from L(n).
It is not difficult to find that the output y(n) runs at the speed of Kf s as it fetches data points sequentially from y K−1 (n) to y 0 (n), which works like a switch. However, this operation could be executed on the IO ports on hardware platform. As for FPGA devices, the inferred switch could be achieved only by D type flip flops (e.g. [11]) which is completely feasible and realisable.

Parallel NCO decomposition
As depicted in (4), the original NCO tuning sequence L(n) has been decomposed as several sub-sequences. Now we will take a look at the inner relationship of these sub-sequences to achieve the realisation.
The decomposed tuning sequence derived from L(n) can be reexpressed as where L Q, k (n) and L I, k (n) denote the cosine and sine parts of the sub-NCOs, respectively. Equation (5) also tells that the subsequences vary as the relationship between f 0 and f s changes. No matter what kind of situation, each branch tuning sequence operates at the same rate, which turns out to be f s but not Kf s , leading to a tremendous relief on the hardware implementation difficulty.
Digging into the NCO decomposition process, a fixed phase deviation of π/2 is found between {L Q, k (n)L I, k (n)} of each branch. And we can also find that the initial phase of each NCO-branch sequence changes linearly as k varies. Where From (6), a fixed phase deviation between every two adjacent branches can be calculated as According to the previous deduce, the parallel DUC architecture in real mode could be easily depicted as Fig. 8.
Obviously, each branch poly-phase filter e k (n) only has one-Kth tap of the prototype filter h(n) and runs at data rate of f s instead of Kf s . Correspondingly, we do not need to generate one NCO sequence with high data rate of Kf s , but K pairs of sine and cosine which all operate at f s , to mix with the post-filtered I(n) and Q(n), respectively. Suppose the output of each branch after mixing is y Qk (n) and y Ik (n), then we can get Thus, the final output signal y(n) can be expressed as

Poly-phase filter realisation
As proposed above, the parallel DUC architecture involves K polyphase filters on each signal path, which efficiently reduces the tap coefficient and data rate of these paths. Usually, we tend to adopt special interpolation filters to simplify the realisation, such as halfband filters (e.g. [12]), cascaded filters (e.g. [13]) and so on. However, we here discuss the general filter realisation without loss of generality. Suppose the designed prototype filter h(n) holds N-tap coefficients, and N mod K = 0, then length of each poly-phase filter e k (n) decreased to N/K. Let r k (n) represents the result that the original signal x(n) goes through each e k (n) as shown in Fig. 7. We can get According to the coefficient property of FIR (e.g. [14]) which means The r K−1−k (n) expression can be derived as Let m = N /K − m − 1, and replace m, m ∈ [0, N /K − 1] ∩ ℤ with N /K − m − 1, where N /K − m − 1 also belongs to [0, N /K − 1] ∩ ℤ, the r K − 1 − k (n) expression can be rewritten as Compare (10) and (13), it is easy to find that the results of kth and (K − 1−k)th poly-phase filter path come from the same coefficient group of the original prototype filter with different delays that operate on the input signal, as shown in Fig. 9. Fig. 9 shows that the input signal should execute delay operation first before multiplying with filter coefficients. If we apply the idea that Fig. 4 illustrates into the above structure, we succeed in putting the delay operation after the multiplication shown as Fig. 10, which achieves to reduce the multiplier accounts nearly to half.

Verification
In the following part, a verification test is launched to certify the efficiency and validity of the proposed parallel DUC technique. Given an input baseband chirp signal covers BW = 200 MHz with sample rate of f s = 500 MSPS, the time-domain waveform and frequency spectrum can be described in Fig. 11.
When the baseband signal is up-converted, the signal bandwidth with carrier doubles, i.e. 400 MHz. Suppose the anticipated carrier frequency is f 0 = 258 MHz, then the total output data rate needs to be at least 916 MSPS according to the classic Shannon theory. Let us make it 1 GSPS which current DACs on market is available to satisfy. Then the maximum interpolation factor turns out to be K = 2.
Thus, two NCO units are needed as After Nyquist transform, the above four sequences x LQ0 , x LI0 , x LQ1 , x LI1 generate the same output frequency with 242 MHz @500 MHz clock rate, which is much easier to implement on hardware, such as the NCO IP module on FPGA devices.
The prototype low-pass filter could be designed according to the interpolation theory, whose pass-band should be 250 MHz @1000 MHz sample rate at most. Parameters such as attenuation and ripple and so on could be adjusted easily. Apply the four decomposed NCO signals and filter coefficients into parallel DUC architecture shown in Fig. 8, we can obtain the waveform and spectrum of the output signal as shown in Fig. 12.
The above results show that the proposed parallel DUC architecture can effectively complete the signal spectrum conversion and operates on relatively low speed which brings great benefit for hardware implementation.

Conclusion
DUC technique plays a key role in signal generation area. Although hardware processing speed is growing rapidly in current trend, the generated signal's frequency and bandwidth are still subjected to the limited component speed such as FPGAs, DACs and so on. The proposed parallel DUC architecture with efficiency in this paper will break up this dilemma to some extent. For one hand, it reduces the system processing speed. For another, it makes the digital computation much easier, such as less filter tap coefficients, less multipliers and so on. Verification test certifies the feasibility and validity of the referred method.