当前位置:文档之家› 关于DSP的FFT库函数的说明,包含详细使用方法

关于DSP的FFT库函数的说明,包含详细使用方法

FFT Library

Module user’s Guide

C28x Foundation Software

IMPORTANT NOTICE

Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgement, including those pertaining to warranty, patent infringement, and limitation of liability.

TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements.

Customers are responsible for their applications using TI components.

In order to minimize risks associated with the customer’s applications, adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards.

TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right of TI covering or relating to any combination, machine, or process in which such products or services might be or are used. TI’s publication of information regarding any third party’s products or services does not constitute TI’s approval, license, warranty or endorsement thereof.

Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties, conditions, limitations and notices. Representation or reproduction of this information with alteration voids all warranties provided for an associated TI product or service, is an unfair and deceptive business practice, and TI is not responsible or liable for any such use.

Resale of TI’s products or services with statements different from or beyond the parameters stated by TI for that products or service voids all express and any implied warranties for the associated TI product or service, is an unfair and deceptive business practice, and TI is not responsible nor liable for any such use.

Also see: Standard Terms and Conditions of Sale for Semiconductor Products.

https://www.doczj.com/doc/ed9678990.html,/sc/docs/stdterms.htm

Mailing Address:

Texas Instruments

Post Office Box 655303

Dallas, Texas 75265

Copyright ?2002, Texas Instruments Incorporated

Trademarks

TMS320 is the trademark of Texas Instruments Incorporated.

All other trademark mentioned herein are property of their respective companies Acronyms

xDAIS : eXpress DSP Algorithm Interface Standard

IALG : Algorithm interface defines a framework independent interface for the creation of algorithm instance objects

STB: Software Test Bench

QMATH: Fixed Point Mathematical computation

CcA : C-Callable Assembly

FIR : Finite Impulse Response Filter

IIR : Infinite Impulse Response Filter

FFT : Fast Fourier Transform

1. Complex FFT (1)

1.1. Introduction (1)

1.2. Decimation in Time FFT (2)

1.3. Spectal leakage in FTT (3)

1.4. Complex FFT computation flow (5)

1.4.1. Shuffling the input data in bit reversed order (6)

1.4.2. Windowing (8)

1.4.3. Zeroing the Imaginary Part (8)

1.4.4. Radix-2 DIT FFT computation (8)

1.4.5. Magnitude square computation (13)

1.5. Twiddle Factor (14)

1.6. Interpreting the FFT results (14)

2. Complex FFT Module (19)

2.1. CFFT32: 32-bit complex FFT (21)

3. Real FFT (27)

3.1. Introduction (27)

3.2. Real FFT Computation flow (32)

3.2.1. Packing and shuffling (33)

3.2.2. Windowing (34)

3.2.3. N-point complex FFT computation (34)

3.2.4. Split function computation (34)

3.2.5. Magnitude square computation (36)

3.3. Twiddle Factors (36)

4. Real FFT Modules (39)

4.1. RFFT32: 32-bit real FFT (41)

5. Bit reversal utilities and acquisition modules (47)

5.1. CFFT32_ACQ: Acquisition module for complex FFT (49)

5.2. RFFT32_ACQ: Acquisition module for Real FFT (53)

5.3. CFFT32_brev1: Bit reversal utility for complex FFT (57)

5.4. CFFT32_brev2: Bit reversal utility for complex FFT (59)

5.5. RFFT32_brev: Bit reversal utility for real FFT (61)

C28x F F T B E N C H M A R K S:32-b i t I m p l e m e n t a t i o n

32-b i t R e a l F F T

F F T s i z e

E x e c u t i o n C y c l e s

C a s e1:T F(Q31)C a s e2:T F(Q30)C a s e3:T F(Q30)&O T P 128 6509 6763 7017

256 14756 15394 16032

512 33081 34615 36149

1024 73422 77004 80536

32-b i t C o m p l e x F F T

128 11159 11671 12183

256 25901 27181 28461

512 59075 62146 65217

1024 132823 139991 147159

Notes:

CASE 1: Twiddle factor is in Q31 format and placed in internal memory (Zero wait state access) CASE 2: Twiddle factor is in Q30 format and placed in internal memory (Zero wait state access) CASE 3: Twiddle factor is in Q30 format and placed in OTP memory (1-wait state access)

?Texas Instruments Inc., May 2002

1.1. Introduction

Fast Fourier Transforms are an efficient class of algorithms for the digital computation of the N-point fourier transform (DFT). In general, their input sequences are assummed to be complex. The complex DFT takes two N point time domain signals and creates two N point frequency domain signals.

The term "FFT" refers to the entire set of efficient Discrete Fourier Transform (DFT) algorithms. The general form of DFT is [1]:

()()kn

N N n w n x k X ×=∑?=1

; k=0...N-1 (1)

Where, the twiddle factor, is defined as:

N nk j kn

N

e w /2π?=

(2)

We observe that for each value of

k , direct computation of ()k X involves N complex

multiplication and 1?N complex additions. Consequently to compute all N values of DFT

requires 2N complex multiplications and 2

N -N complex additions. The FFT algorithms

main duty is complexity reduction by way of decomposing the DFTs into smaller DFTs in a recursive manner. The most popular decomposition method is radix-2 decomposition. Here, the entire DFT sequence is decomposed into two smaller DFTs, which is further divided into two smaller DFTs, and so on. The recursion ends when the smallest size DFT is reached, a two-point -DFT called "butterfly”. This method significantly reduces the complexity of DFT. The reduced complexity of the radix-2 FFT algorithms is ()()N N 2

log

2× complex multiplications

and

()()N N 2log × complex additions.

Table 1. Comparison of Computational Complexity for direct DFT verses FFT.

Direct Computation

Radix-2 FFT

Number of FFT Points Complex Multiplication Complex Additions Complex Multiplication Complex

Additions 128 16,384 16,256 448

896 256 65,536 65280 1,024

2048 512 262,144 261,632

2,304

4608

Complex FFT Frequency Domain Time Domain Real Part 0 N-1 Real Part 0 N-1 Imaginary Part 0 N-1 Imaginary Part 0 N-1

1.2. Decimation in Time FFT

In the decimation in time FFT, the DFT decomposition is performed on the sequence ()

n x , which is considered as the time domain sequence. Its flow graph for eight point sequences is shown in Figure 1.

Figure 1. FFT Flow Graph for N=8 points

This flow graph immediately suggests the SW implementation of the algorithm. There are three nested loops: (a) the stage loop (b) the group loop and (c) the butterfly loop. As Figure 1 shows, there are

()N v 2log = stages (the outer loop), each stage m, has m v ?2groups and

each group has 1

2?m butterflies.

Other important considerations are:

(a) Input sequence is shuffled in a bit reversal order

(b) Twiddle factors within each group are decimated version of the twiddle factors sequence

0N w , 1N w , 2N w , (1)

2?N N w

Observe that the basic computation performed at every stage, as illustrated in Figure 1, is to take two complex numbers say the pair

)(b a ,, and multiply b by r

N w , and then add and

subtract the product from

a to form two new complex numbers )(B A ,. This basic

computation is called a butterfly because the flow graph resembles a butterfly.

Group

Butterfly

Once a butterfly operation is performed on a pair of complex numbers

)(b a , to produce

)(B A ,, there is no need to save the input pair )(b a ,. Hence we can store the result )(B A , in the same location as )(b a ,. Consequently, we require fixed amount of storage, namely

N 2 storage locations, in order store the results (N Complex number) of the computations at each stage. Since the same N 2 storage locations are used throughout the computation of the N point DFT, we say that the computations are done in place.

A second observation is concerned with the order of the input data, note that the input samples are shuffled in bit reversal form. With the input data sequence stored in bit-reversed order and butterfly computation performed in-place, the resulting DFT sequence ()k X is

obtained in natural order.

1.3. Spectral Leakage in FFT

Spectral leakage is generally present when dealing with practical signals, and may lead to problems o f interpretation. When the only frequency components present are an integer multiple of the first harmonic of the DFT, then all of the leakage components fall at the nulls of the sinc function. However, when at least one of the frequency components falls m idway between two bins, then spectral leakage occurs. It results in a smaller peak response, plus a whole series of undesirable side lobe responses corresponding to the side lobe peaks in the spectrum of the rectangular window. To reduce the spectral leakage it is common practice to use a different window function from the rectangular window – one that has a more suitable spectrum with lower side lobes.

In practice it is desirable to have a narrow main lobe and a low side lobe level. It is also important to realize that the two cannot be achieved simultaneously and a practical trade-off between the two must be tolerated. In addition to the rectangular window there are also other window functions, such as the Hanning, Hamming, Blackman and Kaiser.

Figure 3. Spectrum leakage exemplified

1.4. Complex FFT computation flow

The FFT modules provided in C28x foundation s/w use radix-2 decimation in time, inplace computation algorithm. Computation flow of N point complex FFT is exemplified in the following subsections in five phases.

1. Shuffling the N data samples in bit-reversed order

2. Windowing the N point data sequence (OPTIONAL)

3. Zeroing the Imaginary Part of complex inputs.

4. N Point Radix-2 FFT Computation

5. Magnitude Square Computation (OPTIONAL)

1.4.1. Shuffling the input data in bit-reversed order

The Radix-2 complex FFT algorithm transforms the N -complex inputs representing information in time domain in to another N -complex ouptut representing the information in frequency domain. The computation buffer should have N -complex inputs in bit reversed order so that the output at the end of the computation is in natural order. The computation buffer is also called as in place computation buffer or shortly “ipcb” because of the fact that the FFT computation are done in place.

To fecilitate the user to store the data in bit reversel order in the computation buffer, we have provided three options viz.,

1. Bit reversing the in-order data stored in contiguous memory location

2. Bit reversing the in-order data stored in alternate mormoy location

3. Storing the data directly in bit reversed order while acquiring the samples

Option 1: Bit reversing the in order data stored in contiguous memory locations.

The CFFTxx_brev1 utility reads the N -point in-order real data samples stored in contiguous memory locations and writes it as N -complex data in bit revered order. The real data sample occupies the real part of the complex number and imaginary part will be zeroed before invoking the FFT computation routine. In-place bit reversal is performed when the source pointer and destination pointer is equal. This utility requires the destination array to be aligned to N ×2long words for 32-bit input data or N ×2 words for 16-bit input data.

Option 2: Bit reversing the in-order data stored in alternate mormoy location

The CFFTxx_brev2 utility reads the N -point in-order real data samples stored in alternate memory locations and writes it as N -complex data in bit revered order. The real data sample occupies the real part of the complex number and imaginary part will be zeroed before invoking the FFT computation routine. In-place bit reversal is performed when the source pointer and destination pointer is equal. This utility requires the destination array to be aligned to N ×2long words for 32-bit input data or N ×2 words for 16-bit input data.

Option 3: Bit Reversed Data acquisition module for Complex FFT (CFFTxx_ACQ)

The CFFTxx_ACQ module acquires the N real data samples and stores it as N -point complex data sequence in bit revered order. The real data samples occupy the real part of the complex number and imaginary part will be zeroed before invoking the FFT computation routine. This module requires the destination array to be aligned to N ×2long words for 32-bit input data or N ×2 words for 16-bit input data. The detailed information about the module usage is given in the CFFTxx_ACQ module documentation.

1.4.

2. Windowing

To avoid the leakage effect illustrated in the earlier sections, it would be required to window the input signal before carrying out the FFT computation. The CFFTxx_win function obtains the winptr and ipcbptr from the FFT module handle, windows the bit reversed data sequence in the computation buffer.

1.4.3. Zeroing the Imaginary Part

The N point complex FFT takes N complex data as the input. In many real time applications, the data sequence to be processed are real valued. Hence the imaginary part of the complex number must be zeroed before invoking the computation function. The CFFTxx_izero function zeros the imaginary part of complex inputs.

1.4.4. Radix-2 DIT FFT computation

The FFT algorithm converts a sampled complex -valued function of time into a sampled complex-valued function of frequency. Most of the time, we want to operate on real-valued functions, so we set all the imaginary parts of the input to zero. Figure 7, shows

how the complex input sequence

()n x and output sequence ()k X are organized in the

computation buffer for 8 point FFT. The N point complex FFT should have the computation buffer size of N 2 to store real and imaginary part of the input data.

A complete eight-point DIT FFT is illustrated graphically in Figure 7. Each pair of arrows represents a butterfly. Notice that the entire FFT computation is made up of butterflies organized in different patterns, called groups and stages. The first stage consists of four groups of one butterfly each. The second stage has two groups of two butterflies, and the last has one g roup of four butterflies. Every stage contains N/2 (four) butterflies. Each butterfly has two input points, called the dual node and the primary node. The spacing between the nodes in the sequence is called the dual-node spacing. Associated with each butterfly is a twiddle factor whose exponent depends on the group and stage of the butterfly.

Notice that whereas the output sequence is sequentially ordered, the input sequence is not. This is an effect of repeatedly dividing the input sequence into sub-sequences of even and odd samples. It is possible to perform an FFT using input and output sequences in other orders, but these approaches generally complicate addressing in the FFT program and can require a different butterfly. We have opted to scramble the input sequence of the DIT FFT because this approach uses twiddle factors in sequential order, produces the output sequence in sequential order, and requires a relatively simple butterfly.

The characteristics of an N -point DIT FFT with bit-reversed inputs are summarized below.

It is clear that the Last stage of the

N point radix-2 FFT contains one group with 2N

butterfly. The twiddle factor sequence required for the last stage to perform butterfly computation is {

0N w , 1N w ,2N w ,3N w , 4N w ,5N w , (1)

2

?N N

w }. The twiddle factors that are required within the groups of the remaining stages are decimated version of the above

mentioned twiddle factor sequence. Twiddle factors are obtained for FFT computation by looking up in a table containing the needed twiddle factors.

The twiddle factor can be divided into real and imaginary parts because, ()()

N r j N r e

jWI WR w N

r

j r N

πππ2sin 2cos 2?==?=? , Where 12:0?=N r (3)

Hence, the twiddle factors are initialized in memory as SIN and COS values. Storing the above twiddle factor would require N locations, that is 2N samples containing half

cycle of SIN and

2

N samples containing half cycle of COS as shown in figure 8a. Instead we store N 43samples containing 4

3 cycle of SIN as shown in figure 8b to

conserve the memory.

These twiddle factor values are assembled into “FFTtf” section, which are loaded into the non-volatile memory. The length of the table is

4

3 of the FFT length that you want to use. Note that the 32-bit FFT implementation uses the 32-bit twiddle factor and 16-bit FFT implementation uses the 16-bit twiddle factor.

A generalized butterfly flow graph is shown in Figure 9. The primary node sample is represented as i r jP P P += and similarly the duel node samples is represented as

i r jQ Q Q +=. The twiddle factor (r N w ) for butterfly computation is represented as i r jW W W ?=.

The dual node (i r jQ Q Q

+=) is multiplied by the twiddle factor i r jW W W ?=. The

result of this multiplication is added to the primary node i r jP P P += to produce '''i r jP P P += and subtracted from the primary node to produce '''i r jQ Q Q +=.

Equation (4) through (7) calculate the real and imaginary parts of the butterfly outputs, it requires 4 multiplication and 6 add/sub operations. In order to avoid the overflow in the butterfly outputs and also to distribute the

N 1

scaling (required in DFT) evenly across

the stages, the butterfly outputs are scaled by 2

1. The generic butterfly is implemented as

macro (“BFLY” Macro).

The butterfly produces two complex outputs that become butterfly inputs in the next stage of the FFT. Because each stage has the same number of butterflies (N/2), the number of butterfly inputs and outputs remains the same from one stage to the next. An “in-place” implementation writes each butterfly output over the corresponding butterfly input ('

P &

'Q overwrites P & Q ) for each butterfly in a stage. In an in-place implementation, the

FFT results end up in the same memory range as the original inputs.

The twiddle factor sequence for the first stage is 2

N

r N w ×, where 0:0=r

1. 0=r

à

012

0j w jW W W N

N

i r ?==?=×

The twiddle factor sequence for the second stage

4

N

r N

w ×, where

1:0=r

1. 0=r à 014

0j w jW W W N

N i r ?==?=×

2. 1=r

à

4

1N

N

w ×=N

N

j e

42π?=()()102sin 2cos j j ?=?ππ

The first two stages do not need any multiplication as the real part and imaginary part of the twiddle factors are either “0” or “1”. Hence the first two stage need not have to use the general butterfly macro which consumes more number of cycles to execute. To enhance the execution speed we combined the first two stages and implemented as Radix-4. This radix-4 macro takes blocks of 4 complex input and computes 4 complex output for the stage 3 of the FFT algorithm. Equation (8) through (15) calculate the real and imaginary parts of the Radix-4 butterfly outputs and it is implemented as macro (“COMBO” Macro).

1

In order to avoid the overflow in the butterfly outputs and also to distribute the N scaling (required in DFT) evenly across the stages, the butterfly outputs are scaled by

1.

4

The twiddle factor sequence for the third stage

8

N

r N

w ×, where

3:0=r

1. 0=r à ()()010sin 0cos 8

0j j w jW W W N

N i r ?=?==?=×

2. 1=r à 8

1N

N w ×=N

N

j e

82π?=()()707.0707.04sin 4cos j j ?=?ππ

3. 2=r à 8

2N

N w ×=N

N

j e 822π?=()()102

sin 2cos j j ?=?ππ 4. 3=r

à

8

3N

N

w ×=N

N

j e

832π?=((707.0707.04

3sin 43cos j j ??=?ππ

From the above twiddle factors, it is obvious that 2nd

/4th

twiddle factors require

multiplication and 1st /3rd

twiddle factor does not need the multiplication in the butterfly

calculation. Moreover we can enhance the execution speed for 2nd and 4th

twiddle factors, as the absolute value of real and imaginary parts are same. Hence the third stage is implemented in an unrolled loop using dedicated macro for each of the above 4 twiddle factors viz., ZEROI, PIBY4I, PIBY2I and P3BY4I macro.

In summery,

? The first three stages of FFT computation are unrolled.

? First two stages are combined and implemented using radix-4 butterfly.

? Third stage is implemented using dedicated macro for the individual twiddle factors. ? The remaining stages are implemented in a loop using generic BFLY macro.

? Twiddle factors are assembled in “FFTtf” section that should be loaded into non-volatile memory. The size of the twiddled factor table is

4

3 of the FFT length. Note that the 32-bit FFT implementation uses the 32-bit twiddle factor and 16-bit FFT implementation uses the 16-bit twiddle factor.

1.4.5. Magnitude Square Computation

The magnitude square computation routine obtains the Magnitude Square of the complex FFT output and stores back the result either in the computation buffer or in a dedicated array as commanded by the magptr element of the complex FFT module. ()()()k jX K X k X i r += ()()()222

k X k X k X i r +=

In the case of in-place magnitude computation, magptr element of the FFT module should point to the computation buffer and the N magnitude square outputs will overwrite the first N locations in the computation buffer. If the magnitude square outputs are to be stored in a separate array, then magptr element of FFT module should point to that array. The size of the array to hold the magnitude outputs is equal to the FFT length.

相关主题
相关文档 最新文档