Understanding Digital Signal Processing Chapter 4
The Fast Fourier Transform (FFT) is a fast implementation of the DFT. It'n not an approximation! The DFT scales with , but the FFT scales with . The most popular implementation of the FFT is the radix-2 algorithm, but it's limited to .
When deciding on sampling parameters, you should sample fast enough, so that there are no significant components near or beyond (low-pass filters can also be used). Also, you have to sample long enough for the desired resolution. Keep in mind that and choose N as the next larger power of two that gives you your desired resolution.
Once you have a sample, common processing steps on the time series includes removing the mean value to kick out the (possibly overwhelmingly large) DC-component of the spectrum. Secondly, you might apply a window-function to reduce leakage - at the cost of losing resolution and overall signal. If the given sample is not a power of two, you can finally zero-pad the data. FFT results can be enhanced by averaging or by after-the-fact windowing.
Note that the interpretation of DFTs depends on wether the original signal was real-valued or complex-valued. DFTs of real-valued signals are symmetric, so only computing the first half suffices. For complex-valued signals, all FFT bins are independent and you should look at all of them! When calculating signal amplitudes, you have to divide by a factor of for real-valued signals and by for complex valued signals, and we also have to divide out a processing loss factor if we used a window function. (These factors can be looked up in literature.)
The normalized power spectrum is calculated by normalizing the peak power to 1. In this case you don't have to worry about scale factors. It is always given in dB, which helps identify low peaks next to big ones.
\[X_\mathrm{dB}(m) = 10 \log\left( \frac{\left\| X(m) \right\|^2}{\left\| X(m) \right\|_\mathrm{max}^2}\right) = 20 \log\left( \frac{\left\| X(m) \right\|}{\left\| X(m) \right\|_\mathrm{max}}\right)\]The idea of the FFT algorithm is to avoid costly complex multiplications. In a normal DFT you often repeat the same multiplication over and over. In the FFT, you split the sum of the DFT into to sums of odd and even indeces, pull out a phase factor of the odd-valued sum (called twiddle factor), and realize (after some algebraic magic) that you are now performing two -point DFTs. You can do this over and over again, until you are doing 1-point DFTs. The data flows in a "butterfly"-pattern. I have to admit I did not read this in detail - it gets boring quite fast.
The bottleneck in FFT implementations used to be the complex multiplications, but newer processors can do these in a single cycle (the author claims). The remaining shuffling of the input sequence ("bit reversal") is currently more of a bottleneck than the multiplications, but there are some tricks to speed this up. For example, if you are performing a convolution (multiplying two FFT's and then doing an IFFT), with clever algorithm design, you don't need to shuffle anything.
There are several FFT implementations that can be broadly categorized as decimation-in-frequency (better for complex input) and decimation-in-time (better for real inputs). There is also "constant-geometry", which is best for hardware implementations. Read the second half of chapter 4 for these gnarly details, if you should ever need them.