Single channel pipelined variable-length FFT processor design

: In this study, a single channel pipelined variable-length fast Fourier transformation (FFT) processor design is presented. With the development of science and technology, digital signal processing has been widely used in many fields. The most basic algorithm of digital signal processing theory, FFT, is widely used in radar signal processing, remote sensing, image processing, communications and other fields. A high real-time and high-throughput FFT processor design method based on the field-programmable gate array (FPGA) platforms is proposed to meet the needs of the synthetic aperture radar imaging digital signal processing. The authors conducted the ISE synthesis based on the Xilinx FPGA platform using xc6vlx760ff1760-1 as the target chip and the ModelSim simulation, combined with Matlab, to verify the correctness of this design.


Introduction
Fast Fourier transformation (FFT) is one of the basic algorithms used in digital signal processing field. FFT processors are widely used in a variety of fields such as the radar signal processing, the remote sensing satellite, the image processing and communication.
With the constant development of the technologies applied to these fields, higher requirements are put forward to the operation of FFT [1]. For example, to meet the strict specifications ultra-wideband communication system, high-throughput data should be processed in low latency [2]. Pipeline architectures have an inherent advantage over other efficient hardware structures [3]. In long-term evolution application, transform sizes ranging from 12 points up to 1296 points are required. The traditional methods for 2n-point FFTs are difficult to apply to a general mixed radix FFT design [4]. Radix-2 k algorithm is suitable for pipeline architecture. As the radix becomes higher, the number of occupied complex multipliers decreases [5]. The multipath delay feedback, as well as multipath delay commutator architecture, is adopted for applications with higher throughput rate and better real-time performance requirements [6]. Mixed-radix algorithms make the processor more flexible for a variety of transform sizes [7]. However, many traditional implementations of FFT cannot meet these requirements, so how to implement FFT more quickly and flexibly has been a research topic. For example, a 128-to 2048-point FFT processor for 3GPP-LTE standard is implemented in [6]. A design methodology for power and area minimisation of flexible FFT processors is proposed by analysing the degree of parallelism and radix factorisation. In [8], an MDC-based FFT processor for multiple-input multiple-output orthogonal frequency division multiplexing systems with variable length is presented. The transform size includes 2048, 1024, 512 and 128. In [9], a highspeed low-complexity modified radix-25 512-point FFT processor is designed for wireless personal area network applications using eight pipelined data-path. Another 512-point MDF FFT processor is designed in [10]. The radix-2 4 -2 2 -2 3 algorithm is devised to reduce the complexity of twiddle factor multiplication [1].
FFT computation is the core of the synthetic aperture radar (SAR) imaging algorithms. High timeliness and large throughput are two main problems of the real-time imaging system. There are large amounts of data needed to be dealt with in the processing of specific project realisation, and the radar system has high requirements in real time. This has brought great challenges to the realisation of the project. Therefore, how to choose and match the hardware system of the SAR becomes an unavoidable problem. In most cases, the signal is processed by using digital signal processing (DSP) chip technology. However, due to the development of technology, the performance requirements of the SAR also increase, and the functions to be performed by DSP become more complicated. It is very difficult to design by using DSP to meet the corresponding requirements. At the same time, as the field-programmable gate array (FPGA) technology matures and its superiority in implementing digital signal processing is evident, the design of FPGAs for SAR imaging becomes more common.
Aiming at the demand of the digital signal process of the SAR imaging, this paper researches the design method of high real-time and high-throughput based on the platform of FPGA, and accomplishes a design of a single pipelined variable-length FFT processor with large data throughput.

Principle of algorithm
The original discrete Fourier transform (DFT) of an input sequence is defined as where the time-domain index n, frequency-domain index k, and point number N are decomposed as follows [11]: That is to represent the N-point FFT is in a two-dimensional form.
After decomposing the DFT formula in the decomposed form, we can get the FFT expression as follows: The expression of X l (i) is as follows: The above derivation can be summarised as a process of decomposing a large point N FFT into the calculation of the L × Mpoint FFT. Each M-point FFT is calculated separately, the results multiply the corresponding twiddle factor, and then the last level is completed by the radix-L butterfly. The result of the operation is the final result. This is the basic principle of the mixed-radix algorithm. The general expression of the complex number N is N = r 1 r 2 … r m . In particular, when r 1 = r 2 = … = r m, a N-point FFT operation can be implemented through the DFT of the m-level rpoint FFT. This paper adopts the radix-2 3 algorithm as the mainframe, that is, the 3-level radix-2 algorithm is used as a basic operation module, and the calculation point of this module is a multiple of 2 3 . This kind of algorithm improves the throughput of data, but at the same time it can only accomplish the calculation of some 2 k -point FFT. Therefore, with the basis of the radix-2 3 algorithm, the complex radix-2 2 algorithm and the radix-2 algorithm are used. So, we can do an FFT operation with k as an arbitrary value (no more than the maximum number of points of this design). The composite principle of the three algorithms is shown in Fig. 1. The radix-2 3 algorithm includes the radix-2 2 algorithm and the radix-2 algorithm.

General design of the processor
The processor designed in this paper consists of three stages. According to the calculated points, the data selector is used to select the corresponding calculated path (Figs. 2-4).

CSD constant factor multiplier design
The full name of CSD is canonic signed digit. In the calculation of a specification-signed number, a special encoding of a signed digital representation is called CSD encoding. It is not unique itself and allows a number can be expressed in many ways. The probability of the number being zero is close to 66% (relative to 50% in the two's complement code). The advantage of this coding method is that it can reduce the number of non-zero values of the vector-represented values, occupying a small area of the calculator and having high operation speed. Radix-2 3 algorithm needs to multiply a constant factor W 8 1 = 2/2 ⋅ 1 − j , so we implement it by using a CSD constant multiplier. Owing to the particularity of the value W 8 1 that its real part and its imaginary part are equal, so only one type of CSD multiplier needs to be designed. 2/2 's 2's complement is 0.101101010000 and its CSD code is 1.010101010000. Since the multiplication of the data and the constant factor is a complex operation, which includes addition and subtraction operations, a truncation occurs here and completing this CSD multiplication needs 6 clock cycle delays. Fig. 5 shows a block diagram of the CSD multiplier.

Single delay feedback (SDF) module
The SDF module is composed of a radix-2 butterfly unit and a RAM, as shown in Fig. 4. It is the core and the key point of the entire FFT processor design and bears a large amount of calculation work. Pipelined-type refers to the data flow in like water without interruption. When the data coming in does not have data to calculate with, it will be stored in RAM. Until the data which it is waiting for has come, these two data will be sent to the radix-2 butterfly unit for calculation.

Accuracy comparison of results
The following Fig. 6 shows the comparison of fixed-point processing results (FPGA Result Amplitude) to double-precision floating-point processing results (Matlab Result Amplitude) of several typical points. Table 1 shows the comparison of the resources occupied by the ISE's IP core to the designed FFT processor to process 1K points of data (the word length is 24 bits).  the partial fixed-point processing based on SystemC [12]. Fig. 7f is the imaging result of a typical single precision floating-point processing based on MATLAB. We choose the floating-point imaging result as the accurate one for comparison.

Points target imaging quality evaluation
For the point target, the peak sidelobe ratio (PSLR), integrated sidelobe ratio (ISLR) and spatial resolution (RES) are commonly adopted to evaluate the imaging quality [13]. Table 2 shows the imaging quality assessment comparison result.

Conclusion
FFT is one of the most important algorithms in digital signal processing. To meet the high performance and the real-time requirements of modern applications, hardware designers have been trying to achieve an efficient architecture for the FFT computations. This paper mainly introduces the design of a single channel pipelined variable point FFT processor. With the FFT algorithm as the most basic algorithm in digital signal processing, the FFT processors will inevitably be required with the increase in throughput, lower processing delay, which makes the FFT processor research more meaningful. This paper is based on the design of a single channel pipelined mixed-radix FFT processor architecture to further optimise the realisation of a large number of FFT processor. To design a better processor architecture needs to further study the FFT algorithms and the processor architecture.