Primefactor FFT algorithm
From Academic Kids

The Primefactor algorithm (PFA), also called the GoodThomas algorithm (1958/1963), is a fast Fourier transform (FFT) algorithm that reexpresses the discrete Fourier transform (DFT) of a size n = n_{1}n_{2} as a twodimensional n_{1} by n_{2} DFT, but only for the case where n_{1} and n_{2} are relatively prime. These smaller transforms of size n_{1} and n_{2} can then be evaluated by applying PFA recursively or by using some other FFT algorithm.
PFA should not be confused with the mixedradix generalization of the popular CooleyTukey algorithm, which also subdivides a DFT of size n = n_{1}n_{2} into smaller transforms of size n_{1} and n_{2}. The latter algorithm can use any factors (not necessarily relatively prime), but it has the disadvantage that it also requires extra multiplications by roots of unity called twiddle factors, in addition to the smaller transforms. On the other hand, PFA has the disadvantages that it only works for relatively prime factors (e.g. it is useless for poweroftwo sizes) and that it requires a more complicated reindexing of the data based on the Chinese Remainder Theorem (CRT). Note, however, that PFA can be combined with mixedradix CooleyTukey, with the former factorizing n into relatively prime components and the latter handling repeated factors.
PFA is also closely related to the nested Winograd FFT algorithm, where the latter performs the decomposed n_{1} by n_{2} transform via more sophisticated twodimensional convolution techniques. Some older papers therefore also call Winograd's algorithm a PFA FFT.
(Although the PFA is distinct from the CooleyTukey algorithm, it is interesting to note that Good's 1958 work on the PFA was cited as inspiration by Cooley and Tukey in their famous 1965 paper, and there was initially some confusion about whether the two algorithms were different. In fact, it was the only prior FFT work cited by them, as they were not then aware of the earlier research by Gauss and others.)
Contents 
Algorithm
Recall that the DFT is defined by the formula:
 <math>f_j = \sum_{k=0}^{n1} x_k e^{\frac{2\pi i}{n} jk }
\qquad j = 0,\dots,n1. <math>
The PFA involves a reindexing of the input and output arrays, which when substituted into the DFT formula transforms it into two nested DFTs (a twodimensional DFT).
Reindexing
Suppose that n = n_{1}n_{2}, where n_{1} and n_{2} are relatively prime. In this case, we can define a bijective reindexing of the input k and output j by:
 <math>k = k_1 n_2 + k_2 n_1 \mod n,<math>
 <math>j = j_1 n_2^{1} n_2 + j_2 n_1^{1} n_1 \mod n,<math>
where n_{1}^{1} denotes the multiplicative inverse of n_{1} modulo n_{2} and viceversa for n_{2}^{1}; the indices j_{a} and k_{a} run from 0,...,n_{a}1 (for a = 1, 2). These inverses only exist for relatively prime n_{1} and n_{2}, and that condition is also required for the first mapping to be bijective.
This reindexing of k is called the Ruritanian mapping (also Good's mapping), while this reindexing of j is called the CRT mapping. The latter refers to the fact that j is the solution to the Chinese remainder problem j = j_{1} mod n_{1} and j = j_{2} mod n_{2}.
(One could instead use the Ruritanian mapping for the output j and the CRT mapping for the input k, or various intermediate choices.)
A great deal of research has been devoted to schemes for evaluating this reindexing efficiently, ideally inplace, while minimizing the number of costly modulo (remainder) operations (Chan, 1991, and references).
DFT reexpression
The above reindexing is then substituted into the formula for the DFT, and in particular into the product jk in the exponent. Because e^{2πi} = 1, this exponent is evaluated modulo n: any n_{1}n_{2} = n cross term in the jk product can be set to zero. (Similarly, f_{j} and x_{k} are implicitly periodic in n, so their subscripts are evaluated modulo n.) The remaining terms give:
 <math>f_{j_1 n_2^{1} n_2 + j_2 n_1^{1} n_1} =
\sum_{k_1=0}^{n_11} \left( \sum_{k_2=0}^{n_21} x_{k_1 n_2 + k_2 n_1} e^{\frac{2\pi i}{n_2} j_2 k_2 } \right) e^{\frac{2\pi i}{n_1} j_1 k_1 }.
<math>
The inner and outer sums are simply DFTs of size n_{2} and n_{1}, respectively.
(Here, we have used the fact that n_{1}^{1}n_{1} is unity when evaluated modulo n_{2} in the inner sum's exponent, and viceversa for the outer sum's exponent.)
References
 I. J. Good, "The interaction algorithm and practical Fourier analysis," J. R. Statist. Soc. B 20 (2), 361372 (1958). Addendum, ibid. 22 (2), 373375 (1960).
 L. H. Thomas, "Using a computer to solve problems in physics," in Applications of Digital Computers (Ginn: Boston, 1963).
 P. Duhamel and M. Vetterli, "Fast Fourier transforms: a tutorial review and a state of the art," Signal Processing 19, 259299 (1990).
 S. C. Chan and K. L. Ho, "On indexing the primefactor fast Fourier transform algorithm," IEEE Trans. Circuits and Systems 38 (8), 951953 (1991).