Fast Fourier transform
From Academic Kids

A fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. FFTs are of great importance to a wide variety of applications, from digital signal processing to solving partial differential equations to algorithms for quickly multiplying large integers. This article describes the algorithms, of which there are many; see discrete Fourier transform for properties and applications of the transform.
Let x_{0}, ...., x_{n1} be complex numbers. The DFT is defined by the formula
 <math> f_j = \sum_{k=0}^{n1} x_k e^{{2\pi i \over n} jk }
\qquad j = 0,\dots,n1. <math>
Evaluating these sums directly would take O(n^{2}) arithmetical operations (see Big O notation). An FFT is an algorithm to compute the same result in only O(n log n) operations. In general, such algorithms depend upon the factorization of n, but (contrary to popular misconception) there are O(n log n) FFTs for all n, even prime n.
Since the inverse DFT is the same as the DFT, but with the opposite sign in the exponent and a 1/n factor, any FFT algorithm can easily be adapted for it as well.
Contents 
The CooleyTukey algorithm
Main article: CooleyTukey FFT algorithm.
By far the most common FFT is the CooleyTukey algorithm. This is a divide and conquer algorithm that recursively breaks down a DFT of any composite size n = n_{1}n_{2} into many smaller DFTs of sizes n_{1} and n_{2}, along with O(n) multiplications by complex roots of unity traditionally called twiddle factors.
This method (and the general idea of an FFT) was popularized by a publication of J. W. Cooley and J. W. Tukey in 1965, but it was later discovered that those two authors had independently reinvented an algorithm known to Carl Friedrich Gauss around 1805 (and subsequently rediscovered several times in limited forms).
The most wellknown use of the CooleyTukey algorithm is to divide the transform into two pieces of size <math>n / 2<math> at each step, and is therefore limited to poweroftwo sizes, but any factorization can be used in general (as was known to both Gauss and Cooley/Tukey). These are called the radix2 and mixedradix cases, respectively (and other variants have their own names as well). Although the basic idea is recursive, most traditional implementations rearrange the algorithm to avoid explicit recursion. Also, because the CooleyTukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT, such as those described below.
Other FFT algorithms
Main articles: Primefactor FFT algorithm, Bruun's FFT algorithm, Rader's FFT algorithm, Bluestein's FFT algorithm.
There are other FFT algorithms distinct from CooleyTukey. For <math>n = n_1n_2<math> with coprime <math>n_1<math> and <math>n_2<math>, one can use the PrimeFactor (GoodThomas) algorithm (PFA), based on the Chinese Remainder Theorem, to factorize the DFT similarly to CooleyTukey but without the twiddle factors. The RaderBrenner algorithm is a CooleyTukeylike factorization but with purely imaginary twiddle factors, reducing multiplications at the cost of increased additions and reduced numerical stability. Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the Bruun and QFT algorithms. (The RaderBrenner and QFT algorithms were proposed for poweroftwo sizes, but it is possible that they could be adapted to general composite <math>n<math>. Bruun's algorithm applies to arbitrary even composite sizes.) Bruun's algorithm, in particular, is based on interpreting the FFT as a recursive factorization of the polynomial <math>z^n1<math>, here into realcoefficient polynomials of the form <math>z^m1<math> and <math>z^{2m} + az^m + 1<math>. Another polynomial viewpoint is exploited by the Winograd algorithm, which factorizes <math>z^n1<math> into cyclotomic polynomials—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimalmultiplication FFTs and is often used to find efficient algorithms for small factors. Indeed, Winograd showed that the DFT can be computed with only <math>O(n)<math> multiplications, leading to a proven achievable lower bound on the number of irrational multiplications for poweroftwo sizes; unfortunately, this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors with hardware multipliers. In particular, Winograd also makes use of the PFA as well as an algorithm by Rader for FFTs of prime sizes. Rader's algorithm, exploiting the existence of a generator for the multiplicative group modulo prime <math>n<math>, expresses a DFT of prime size <math>n<math> as a cyclic convolution of (composite) size <math>n1<math>, which can then be computed by a pair of ordinary FFTs via the convolution theorem (although Winograd uses other convolution methods). Another primesize FFT is due to L. I. Bluestein, and is sometimes called the chirpz algorithm; it also reexpresses a DFT as a convolution, but this time of the same size (which can be zeropadded to a power of two and evaluated by radix2 CooleyTukey FFTs, for example), via the identity <math>jk = (jk)^2/2 + j^2/2 + k^2/2<math>.
FFT algorithms specialized for real and/or symmetric data
In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetry
 <math>f_{nj} = f_j^*,<math>
and efficient FFT algorithms have been designed for this situation (see e.g. Sorensen, 1987). One approach consists of taking an ordinary algorithm (e.g. CooleyTukey) and removing the redundant parts of the computation, saving roughly a factor of two in time and memory. Alternatively, it is possible to express an evenlength realinput DFT as a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real data), followed by O(n) postprocessing operations.
It was once believed that realinput DFTs could be more efficiently computed by means of the Discrete Hartley transform (DHT), but it was subsequently argued that a specialized realinput DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs. Bruun's algorithm (above) is another method that was initially proposed to take advantage of real inputs, but it has not proved popular.
There are further FFT specializations for the cases of real data that have even/odd symmetry, in which case one can gain another factor of (roughly) two in time and memory and the DFT becomes the discrete cosine/sine transform(s) (DCT/DST). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with O(n) pre/post processing.
Accuracy and approximations
All of the FFT algorithms discussed so far compute the DFT exactly (in exact arithmetic, i.e. neglecting floatingpoint errors). A few "FFT" algorithms have been proposed, however, that compute the DFT approximately, with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999) achieves lower communication requirements for parallel computing with the help of a fastmultipole method. A waveletbased approximate FFT by Guo and Burrus (1996) takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995). Only the Edelman algorithm works equally well for sparse and nonsparse data, however, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data.
Even the "exact" FFT algorithms have errors when finiteprecision floatingpoint arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. CooleyTukey, have excellent numerical properties. The upper bound on the relative error for the CooleyTukey algorithm is O(ε log n), compared to O(ε n^{3/2}) for the naïve DFT formula (Gentleman and Sande, 1966), where ε is the machine floatingpoint relative precision. In fact, the root mean square (rms) errors are much better than these upper bounds, being only O(ε √log n) for CooleyTukey and O(ε √n) for the naïve DFT (Schatzman, 1996). These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use inaccurate trigonometric recurrence formulas. Some FFTs other than CooleyTukey, such as the RaderBrenner algorithm, are intrinsically less stable.
In fixedpoint arithmetic, the finiteprecision errors accumulated by FFT algorithms are worse, with rms errors growing as O(√n) for the CooleyTukey algorithm (Oppenheim & Schafer, 1975). Moreover, even achieving this accuracy requires careful attention to scaling in order to minimize the loss of precision, and fixedpoint FFT algorithms involve rescaling at each intermediate stage of decompositions like CooleyTukey.
To verify the correctness of an FFT implementation, rigorous guarantees can be obtained in O(n log n) time by a simple procedure checking the linearity, impulseresponse, and timeshift properties of the transform on random inputs (Ergün, 1995).
References
 James W. Cooley and John W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput. 19, 297–301 (1965).
 Carl Friedrich Gauss, "Nachlass: Theoria interpolationis methodo nova tractata," Werke band 3, 265–327 (Königliche Gesellschaft der Wissenschaften, Göttingen, 1866). See also M. T. Heideman, D. H. Johnson, and C. S. Burrus, "Gauss and the history of the fast Fourier transform," IEEE ASSP Magazine 1 (4), 14–21 (1984).
 P. Duhamel and M. Vetterli, "Fast Fourier transforms: a tutorial review and a state of the art," Signal Processing 19, 259–299 (1990).
 W. M. Gentleman and G. Sande, "Fast Fourier transforms—for fun and profit," Proc. AFIPS 29, 563–578 (1966).
 H. Guo, G. A. Sitton, and C. S. Burrus, "The Quick Discrete Fourier Transform," Proc. IEEE Conf. Acoust. Speech and Sig. Processing (ICASSP) 3, 445–448 (1994).
 H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus, "Realvalued fast Fourier transform algorithms," IEEE Trans. Acoust. Speech Sig. Processing ASSP35, 849–863 (1987).
 A. Edelman, P. McCorquodale, and S. Toledo, "The future fast Fourier transform?" SIAM J. Sci. Computing 20, 1094–1114 (1999).
 H. Guo and C. S. Burrus, "Fast approximate Fourier transform via wavelets transform," Proc. SPIE Intl. Soc. Opt. Eng. 2825, 250–259 (1996).
 O. V. Shentov, S. K. Mitra, U. Heute, and A. N. Hossen, "Subband DFT. I. Definition, interpretations and extensions," Signal Processing 41 (3), 261–277 (1995).
 James C. Schatzman, "Accuracy of the discrete Fourier transform and the fast Fourier transform," SIAM J. Sci. Comput. 17 (5), 1150–1166 (1996).
 A. V. Oppenheim and R. Schafer, Digital Signal Processing (Englewood Cliffs, NJ: PrenticeHall, 1975).
 Funda Ergün, "Testing multivariate linear functions: Overcoming the generator bottleneck," Proc. 27th ACM Symposium on the Theory of Computing, 407–416 (1995).
External links
 Links to FFT code and information online (http://www.fftw.org/links.html)
 Online documentation, links, book, and code (http://www.jjj.de/fxt/)de:Schnelle FourierTransformation
fr:Transformée de Fourier rapide nl:Fast Fourier Transform ja:高速フーリエ変換 pl:Szybka transformata Fouriera zh:快速傅里叶变换