FPGA Implementation of Reconfigurable Pulse-Shaping FIR Interpolation Filter with Carry Select Adder

[1] Student, [2] Assistant Professor,
Dept. of E&C, M S Ramaiah Institute of Technology, Bengaluru-560054
[1] devikaaraani@gmail.com, [2] lakshmi.s@msrit.edu

Abstract: This paper proposes the Field Programmable Gate Array (FPGA) architecture implementation of reconfigurable Root-Raised Cosine (RRC), Finite Impulse Response (FIR) filter in which all the adders present in the designed filter are replaced by the modified Carry-Select Adders (CSLA), which is mostly used in Digital Up Converter (DUC). The proposed filter can be reconfigured with one of three different interpolation factors of 4, 6, and 8 for 25, 37, and 49-taps filters respectively and one of two roll-off factors. The basic element of a filter is multiplier and 2-bit Binary Common Sub expression Elimination (BCSE) algorithm is used in the design of the multiplier in this work. It is a two-step optimization technique of designing a reconfigurable VLSI architecture of interpolation filter to reduce the number of slices required for the filter. The maximum operating frequency analysis is carried out using Xilinx Synthesis Tool (XST) and Cadence software.

Index Terms— BCSE, Cadence, CSLA, DUC, FIR, Interpolation factor, Roll-off factor, RRC, VLSI, XST.

I. INTRODUCTION

Digital filters are the vital part of wireless communication because of better signal to noise ratio (SNR). Their precise reproducibility allows achieving better performance levels which are difficult to obtain in analog filters. A Finite Impulse Response (FIR) filter is used to implement almost any sort of frequency response digitally. The FIR filters are most important part of various communication applications to reduce noise and to enhance the specific features, it also provides stability. It has finite impulse response, so it is practically realizable with no feedback. FIR filters are designed to get the least effect of redundancy with given specifications.

In data transmission system the basic shape of the pulse must be in such a way that they should not interfere with one another at the optimal sampling point. The rectangular pulse occupies large bandwidth and also has an unbounded frequency response. Hence, it is unsuitable for modern transmission systems. The raised cosine and root raised cosine pulses are very much suitable for band limited data transmission in which it limits the bandwidth, decay quickly and provides zero crossing during pulse sampling. Pulse shaping is a very important spectral processing technique in wireless communication to make the signals fit in its frequency band. Pulse shaping filters reduce both intersymbol interferences (ISI) and also adjacent channel interference.

The Software Defined Radio (SDR) inspires to develop a single terminal device which is capable of supporting multiple wireless communication standards. So, pulse shaping filters are widely used in modern wireless communication like SDR to transmit or receive the signal within a specific channel bandwidth. It also decreases the Bit Error Rate (BER) and increases the data transfer rate. Among the available pulse shaping filters, Root Raised Cosine (RRC) filters are most widely used because of its high rate of ISI rejection ratio, and high bandwidth limitation criteria. Wireless communication standards like IS-95, Universal Mobile Telecommunication Standards (UMTS) and Wideband Code Division Multiplexing Access (WCDMA) adopts RRC filter as the channel filter for its ability to reduce BER by disallowing timing jitter at the sampling instant. However, different standards involve different sampling rate and roll-off factor for the RRC filter. To support all of these standards in a single device, a reconfigurable RRC filter is needed with reduced power and area consumption [3]. This has motivated us to do the power and area analysis of the reconfigurable RRC filter with modification of the adders used in the existing architecture is presented in this paper.

Many researchers proposed various architectures in designing a low-power, low-area, and low-complexity
reconfigurable channel filter for data rate conversion in SDR system. Lin et al. [6] have proposed a modified Canonical Signed Digit (CSD) technique-based finite-impulse-response (FIR) filter to improve the power consumption. However, the reduction in power has been achieved by compromising with the speed of operation that makes this design unsuitable. In Common Sub expression Elimination (CSE) technique [7], multiplication operations between the constant coefficient and inputs are performed by shift and add operations. A low complexity architecture based on Binary CSE (BCSE) algorithm has been proposed in [8] and [9]. This algorithm consumes less hardware and power than those of CSD-CSE method using a common constant/programmable shift-and-add block.

However, the constant shift multiplication-based FIR filter design proposed in [9] involves the use of redundant adder in the multiplier block. This additional hardware usage consumes more area and power and makes the design unsuitable for SDR system where low power and low area consumptions are the key concerns.

II. BCSE ALGORITHM

Low-power and high-speed FIR filters with a minimum number of adders are required in the wireless communication. Among the approaches for reducing the number of adders in the multipliers of FIR filters, the CSE techniques produced the best hardware reduction as it deals with the multiplication of input signal (one variable) with coefficients (several constants)[6]. The goal of CSE is to identify multiple occurrences of identical bit patterns that are present in the CSD representation of coefficient and eliminate these redundant multiplications[5]. But, the CSD-based CSE method in [4] suffers from the drawback that the symmetry of FIR filter coefficients cannot be completely exploited when the bits in Vertical Common Sub-expressions (VCS) are of opposite sign. As a result, additional adders are required to obtain the symmetric part of the coefficients when more than one VCS with bits of opposite sign exist [3].

So, in our work, we used CSE method using a binary representation of filter coefficients, which does not destroy the symmetry of coefficients. It provides the efficient constant multiplier and is thus applicable for reconfigurable FIR filters with low complexity [9]. According to BCSE algorithm, a total of $2n$ $(n \geq 1)$ binary common sub-expressions (BCSs) can be formed out of an n-bit binary word and the number of adders required generating the partial products for n-bit BCS is $2(n-1)+1$. Shift and add based multiplication operation between the inputs (X) with the coefficient values[5] can be written as

$$X \cdot H = \frac{X}{2} + \frac{X}{8} + \frac{X}{16} + \frac{X}{32} + \frac{X}{64} + \frac{X}{128} + \frac{X}{256}$$

The partial product generated from each BCS by considering 2-bit BCS will be

$$X_1 = X + \frac{X}{2}$$

Substituting (2) in (1),

$$X \cdot H = \frac{X_1}{2} + \frac{X_1}{8} + \frac{X_1}{32} + \frac{X_1}{128} + \frac{X_1}{64} + \frac{X_1}{128} + \frac{X_1}{256}$$

The eight terms on the right-hand side of (3) correspond to the eight partial products (shown as M7-M0 in Fig1) which are generated by the 2-bit BCSE algorithm. These are summed up by the Multiplier Adder Tree (MAT) (shown as A1-A7 in Fig. 7), leading to the product according to (3). This BCSE method can be formulated as a low complexity solution to realize fixed coefficient application specific filters.

Fig. 1: Architecture of the Reconfigurable RRC Filter.

III. PROPOSED METHOD

We have designed an existing reconfigurable RRC filter architecture which is capable of reconfiguring 25, 37 and 49 tap filters with two different roll-off factors(0.22 & 0.35) with interpolation factors of 4, 6, and 8 respectively. The reconfigurable RRC filter architecture consists of four major modules, viz data generator (DG), a coefficient generator (CG), a coefficient selector (CS), and an accumulation unit block (FA). The architecture of
the RRC FIR filter design is as shown in Fig. 1[2]. In the proposed algorithm, we are using the existing architecture of RRC filter by replacing all the adders present in the architecture by modified Carry Select adder (CSLA) which is as shown in fig 10. We are using this CSLA because it is more power and efficient than the other adders [10]. From this modified architecture of the filter analysis of the maximum operating frequency and number of slices are carried out.

A. Data generation (DG) block

This block consists of filter input (RRCIN) which is of 16 bits length, interpolation select line (INTP SEL 2bits) based on which the clock signals CLK4, CLK6, and CLK8 for 25, 37 and 49 taps filter respectively have been selected for sampling the input data by factors of 4, 6, and 8 respectively.

B. Coefficient Generator (CG) block

The filter coefficients are generated by using Filter Design and Analysis Tool (FDA) in MATLAB according to the specifications. Values for these coefficients are assigned by directly in the Verilog code itself [1]. The data flow diagram of the CG block for programmable coefficient sets is shown in Fig. 2[2]. In First Coding Pass (FCP) block, two sets of filter coefficients differing only by roll-off factor (0.22 and 0.35) are the inputs. Inside the FCP block, three coding pass (CP) blocks are running in parallel for three different interpolation.

The outputs from FCP block are three sets of coded Coefficients which are passed through Second Coding Pass (SCP). The FCP block diagram is as shown in Fig.3 [2]. In Second Coding Pass (SCP) block, the common terms present vertically in between these three coded coefficient sets have been found out and coded accordingly. The architecture of the SCP block is shown in Fig.4[2]. In Partial Product Generator (PPG) Block, the shift-and-add method is used to generate the partial product during the multiplication operation between the input data (RRCIN) and the coded coefficients. In the filter architecture, 2-bit BCSs ranging from 00 to 11 have been considered as we are using the 2-bit BCSE algorithm. Within four of these BCSs, an adder is required only for the pattern 11. This facilitates the reduction in hardware and improvement in speed while performing the multiplication operation. The architecture for PPG block implementation is given in Fig. 5[2]. Depending on the coded coefficients, the multiplexer unit will select the appropriate data generated from the PPG unit. The architecture of the multiplexer and add unit used in the CG block is shown in Fig. 6[2]. The Addition unit performs

Fig. 4: Architecture for Second Coding Pass (SCP).

Fig. 5: Architecture for shift and add method in PPG.

Fig. 3: Architecture for First Coding Pass (FCP) Factors.
The task of summing all the outputs of the PPG block followed by eight multiplexer units

C. Coefficient Selector (CS) block
The CS block is used to get proper data to the final accumulation block depending on the corresponding interpolation factor parameter. It takes the input from the CG block. The architecture of CS block is shown in Fig. 7[2].

D. Final Accumulation unit
The final accumulation block consists of a chain of six adders and six registers as there is seven sub-filters as shown in Fig.8[1].

E. Carry Select adder (CSLA)
There are totally 14 adders present in the filter design and all these adders are replaced by the modified CSLA shown in Fig.10 in the proposed algorithm, which is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. This adder is used in many computational systems to reduce the carry propagation delay by independently generating multiple carries and then select a carry to generate the sum[10]. The regular architecture of the carry select adder which is as shown in Fig.9 has more area and power consumption. The analysis of these regular and modified CSLA is tabulated in the below table. The architecture of modified CSLA consists of 5 stages/groups of Ripple Carry Adder (RCA) and Binary-to-Excess-1 Converters (BEC) with input bits increasing from 1-bit in every stage.

Then we are performing the maximum frequency and number of slices analysis of the modified architecture.

IV. RESULTS AND DISCUSSION
The proposed design has been implemented on XC3S400AN field-programmable gate array (FPGA) device using Xilinx ISE 14.7 EDA tool. The synthesis results shows that in the proposed architecture number of slices are decreased by 88.64% from the existing design and also there is 8.71% improvement in the speed compared with the filter implemented in [2]. With 180nm technology, the maximum frequency analysis is carried out and the results are tabulated in the below table. By using the modified carry select adder in the filter architecture, the

Table 1: Comparison of results obtained for Regular and Modified Carry Select Adders using 180nm technology

<table>
<thead>
<tr>
<th></th>
<th>Cells</th>
<th>Area (µm²)</th>
<th>Total Power (mW)</th>
<th>Memory Usage (Kb)</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGULAR CSLA</td>
<td>53</td>
<td>574</td>
<td>20607572</td>
<td>169948</td>
</tr>
<tr>
<td>MODIFIED CSLA</td>
<td>46</td>
<td>413</td>
<td>18209230</td>
<td>168924</td>
</tr>
</tbody>
</table>

Table 2: Comparison of results in FPGA Platform

<table>
<thead>
<tr>
<th>Device</th>
<th>Filter Length</th>
<th>Max. Freq. (MHz)</th>
<th>No. of Slices</th>
</tr>
</thead>
<tbody>
<tr>
<td>XC3S400AN</td>
<td></td>
<td>69.74</td>
<td>1729</td>
</tr>
<tr>
<td></td>
<td>With CSA</td>
<td>76.10</td>
<td>540</td>
</tr>
<tr>
<td></td>
<td>With CSLA</td>
<td>75.514</td>
<td>667</td>
</tr>
</tbody>
</table>

Table 3: Comparison of results obtained from Cadence

<table>
<thead>
<tr>
<th></th>
<th>Arrival Time (ps)</th>
<th>Max. Frequency (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Without CSLA</td>
<td>6751</td>
<td>148.126</td>
</tr>
<tr>
<td>With CSLA</td>
<td>4651</td>
<td>215.007</td>
</tr>
</tbody>
</table>

Decreased by 88.64% from the existing design and also there is 8.71% improvement in the speed compared with the filter implemented in [2]. With 180nm technology, the maximum frequency analysis is carried out and the results are tabulated in the below table. By using the modified carry select adder in the filter architecture, the
maximum operating frequency of the filter is increased by 36.83%. Thus the filter design with this adder structure has more speed and also it is more efficient than the existing structure.

V. CONCLUSION

This work provides solutions to the problems encountered in designing of reconfigurable RRC filter by proposing a two-step optimization technique to make the desired filter more efficient with improvement in the maximum operating frequency of the design. Also there is large amount of reduction in the number of slices when the filter is designed in FPGA platform. The proposed design with modified carry select adder seems to be remarkably suitable for next generation multi-standard reconfigurable DUC of SDR system.

REFERENCES


