Here is an interesting variation on the idea behind the Dolby Spectral Processor.
The Dolby Model 740 Spectral Processor is a 3 band version, derived from Ray Dolby's earlier work with Dolby-A Noise Encoding. What he found was that by splitting the sound into a few spectral bands and employing hard limiting at very low thresholds, followed by substantial gain on the limited outputs, and then adding back to the original signal -- that this produces a kind of compression curve that looks like a ski-jump. Almost unity gain at the higher sound levels so that loud sounds don't produce substantial overshoots on impulsive sounds, while increasingly compressive at lower sound levels, and then finally uniform amplification at levels below the threshold.
So I decided to try my hand at this description in the frequency domain where we might have even better control and far less problems with phase cancellation at band edges. That's what this Sound does.
I use a short FFT size (256 samples) to try to limit the amount of temporal smearing in the processed sound. This is a stereo processor with two identical processing chains, one for left, the other for right. No attempt was made (so far) to do things accurately with stereo linking. So this is more properly a dual-mono processor.
In one chain, the sound is converted to frequency domain with an FFT. Then that stream of complex valued amplitudes is analyzed in two ways. First of all I need the amplitude of each spectral component -- done with the
SqrtMagnitude? Sound. This amplitude is compared with the user settable Threshold and limited by taking the minimum of either the spectral amplitude, or the Threshold. The minimum is found by actually taking the max of each value subtracted from 1. Both are positive values. Then the minimum value is regained from that Max by subtracting it from 1 again.
Now I want a limiter response with something like a 50 ms release to smooth out rapid changes. To do that I send the minimum value just found through a delay equal to half the FFT size, but with feedback. The input gain is 1/17 and the feedback is 1-1/17. The reason for this pair of values is that 50 ms corresponds to about 2205 samples at 44.1 KHz sample rate. But the FFT size is 256, causing repetitive frequency scans at 128 sample intervals. There are only 17 such intervals in 50 ms. So this recirculant delay line is acting like the equivalent of 128 separate
AmplitudeFollowers? each with Attack and Release set to 50 ms.
The output of that recirculant delay line is then used as a multiplier on the I and Q carriers derived from the spectral phase through an
ArcTangent? Sound. When limiting, it isn't simply enough to feed the limited amplitude back through an inverse FFT. Instead we need to have the now limited amplitude but with the original spectral phase that was present in the signal at that frequency. I do this by feeding the Arctangent through a pair of Wavetables, one with Sine and the other with Cosine.
However, because the Wavetable accepts inputs ranging from -1 to 1 you see by looking at the wavetables that these Sine and Cosine tables are upside down. That is to say that we need to multiply them by -1 somehow, in addition to being multiplied by the appropriate amplitude derived from the limiter. That is done by using a -1/17 input scaling on the recirculant delay lines.
Now that we have limited the amplitudes and restored the phase, we can send the result back out an Inverse FFT to be reconstructed in amplitude space. The result is broadband limiting. What I want to do is selectively add the limited signal, with gain, back to the original signal, to give a frequency selective enhancement to the sound. I do that by sending the limited portion through a Kyma 7-band
GraphicEQ? sound to select out the relative amplitudes in these octave bands, and then push that result through the user settable Gain.
Once Limited, EQ'd, and Gain'd, I add this result back to the original signal. BUT!! All the FFT processing and the Graphic EQ impose an unknown but large delay to the processed signal, relative to the original signal. I need to time align these two before adding them together. Since I don't know offhand how many samples of delay are involved between the forward and inverse FFT's and the Graphic EQ, I send the original signal through an identical chain of processing, sans all the limiting. That way both the original and processed signals are delayed by equal amount at the output side. That's why each of the left/right channels has one forward FFT, and 2 inverse FFT and 2 Graphic EQ sounds.
The result is very interesting. I have some presets already set up for you to try. You can monitor the sidechain as well as bypassing the processing altogether. By emphasizing the midband frequencies and attenuating the very high and very low frequencies, you end up enhancing the ambience already present in the incoming sound.
Doing the opposite, by enhancing the highs and lows, and dipping the midband, you give the impression of extended frequency response.
But just treating all sidechain frequencies equally is also an interest result.
I was surprised by how clean the 128 band limiter actually sounds. I expected severe phase noise due to inaccuracies in the phase reconstruction using the
ArcTangent? and the two wavetables. But these actually perform quite respectably. When you listen to the side chain you will hear a bit of buzziness. That comes about because the intense portions of the sound (the fundamentals) have been squashed in amplitude relative to their higher harmonics. Hence with all partials sounding more uniform in amplitude, the sounds develop a more buzzy character.
--
DavidMcClain - 24 Mar 2004
By the way... along the way to discovering that the wavetables were upside down, I noticed another very interesting use for this kind of processing... By subtracting the sidechain processed signal from the original, you have a kind of "background remover" processor. You can be frequency selective about this removal by adjusting the graphic EQ. But any sounds below threshold are removed from the original. This might be a useful cleanup technique for some... e.g., removing street noise from sidewalk interviews?
By doing everything with linear phase, i.e., using FFT's to process in the frequency domain, and the Kyma
GraphicEQ? which is an FIR filter, you avoid problems in adding and subtracting the processed signal from the original. Doing all of this processing with more conventional IIR filters would lead to nasty phase discrepancies between the processed signal and the original. And that would lead to incomplete cancellation of background, for example, even where both signals had the same amplitude. Unless you match the phase between them, you will have residual signal coming out.
--
DavidMcClain - 24 Mar 2004
... Actually.. I just tried a version without the recurrent delay line for smoothing out the amplitude processing. I think it actually sounds better without that delay. More crisp.
When you use the version with the recurrent delay line, you hear a sound somewhat reminiscent of granular processing. That also has its nice and interesting points, so I left that original version in there for you to play around with... You can try changing the input scaling and feedback values to enhance that granular feel.
--
DavidMcClain - 25 Mar 2004
Here is another version along with the others. I thought about what we are trying to do here... multiband limiting... That means that for spectral amplitudes above threshold you want to multiply by the ratio of Threshold/Amplitude. Another ratio here with the denominator dependent on the input sound... ARCTAN!!
Four times the
ArcTan? block of an input ratio is everywhere slightly larger than the input, for input values between zero and one. But the difference is only about 20% at its greatest. So this might be a pretty good estimate (considering how sloppy we can be with human perceptions) for the ratio itself.
Right on! It sounds just fine to me, without needing all the complexity of recovering the spectral phase. We simply multipy both real and imaginary components as they are by this ratio and voila!
--
DavidMcClain - 25 Mar 2004
David, I find this Sound, and almost all other FFT-processed Sounds to be "buzzy". I believe this to be an artifact of the windowing or FFT length. It is quite noticeable as the audio level drops. I have heard this in almost all of the FFT Sounds you have posted over the years, as well as ones I develop myself. Is this just me, or is there something I'm missing? (fwiw - I run my system at 44.1kHz.)
--
BillMeadows - 28 Mar 2004
Hi Bill,
The FFT Windowing ought to be imperceptible, with sidebands ranging well below -60 dB relative to the carrier levels. The buzziness you hear may be due to other causes:
1. There is a certain amount of "time-aliasing" going on here because each FFT block processing N samples, and treats them as though they occured at the time of the mid sample. Then each successive block is displaced by half the block size. This should tend to smear out time events like drum strikes.
2. When dealing with things like "Spectral Enhancement" we are basically amplifying all low level signals by a considerable amount (e.g., 12-25 dB) and noise which was unnoticeable before now gets elevated to perceptible levels.
3. The spectral enhancer provided here performs hard limiting at a low threshold and then amplifies the result by a considerable amount. The result is that the highest harmonics are now almost equal in level to the fundamentals, making the sidechain very buzzy sounding. When you add that sidechain to the main signal the result is an overall increase in buzziness. One possible solution is to filter the sidechain so that no frequencies above 2-4 KHz get passed significantly. But buzziness is part of the nature of this kind of processing.
4. If you use FFT Windowing, be sure to apply it to only the input, or the output, but not both. The Kyma library contains many instances of FFT processing where a window is applied at both ends. This is incorrect, and results in the ring modulation of the signal with a carrier whose frequency is (
SignalProcessor? sampleRate / FFTSize) Hz. For 44.1 KHz sample rates and 256 sample FFT's this carrier is around 180 Hz.
I generally prefer to do my windowing on the input side, because this keeps down the spectral amplitudes of the higher frequencies. No windowing on input is equivalent to a Rect window and produces high levels of high frequencies due to the abrupt temporal cutoff at the edges of each block. Although, in principle, once you reassemble the output these excessive spectral components ought to cancel each other out, as successive blocks will have the opposite phase in these high frequency components. But if you apply any spectral shaping in the frequency domain, then that cancellation cannot happen as completely. So I prefer diminished high frequency excesses ahead of my spectral processing.
5. If you do limiting or compression inside the frequency domain (between the FFT in/out blocks) you are applying that compression to the average signal over the duration of the FFT block size. This means the attenuation will be based not on the most significant signal levels that occur during that time, but rather on all of the signal levels averaged out. Hence, the compressive attenuation is likely to be less than you might like.
6. Furthermore, when compressing/limiting in the frequency domain, each block represents an average over the duration of that block. Hence for sounds occuring earlier than the midpoint, it is as though there is no lookahead in operation. For example, if the FFT size is 256 at a sample rate of 44.1 KHz, then each block occupies about 5.8 ms. Now many applications of compression/limiting would prefer an attack time down around 0-1 ms. So you can see that doing compression/limiting in the frequency domain is a bit like running an old-style compressor without any lookahead. The attacks sneak past the clamping action. This should increase the buzziness of the sound as well.
When I do straight away FFT in to FFT out, with windowing applied at the input side only, I do not have the buzzing you describe. It sounds really quite good, except for the slight temporal smearing that occurs. Making the FFT block size small improves on the time-aliasing, at the expense of spectral resolution. A block size of 256 samples at 44.1 KHz sample rate is a good compromise, leaving a 6 ms temporal smear along with a spectral resolution of 180 Hz.
In principle, spectral shaping in the frequency domain should produce identical results to using an FIR filter in the time domain. Do you hear buzziness coming out of the Kyma Graphic EQ? If so, perhaps this has to do with the "precursor" effects of FIR filters.
IIR filters cannot produce any output until they are fed an input signal. But FIR filters will appear to produce output ahead of the arrival of the input signal. That preliminary output is called "precursors". These are generally at a very low amplitude, and occur only during the front half of the FIR length (or FFT block size). Low levels occuring 2 ms before the arrival of an event are generally imperceptible, but perhaps you do hear them?
Hope these answers help...
--
DavidMcClain - 30 Mar 2004
WE never did resolve the buzziness... Mr
McClain?'s precision limiter is one of the best digital limiters - his crescendo enhancer in combo with the precision limiter could be a killer mastering tool , but for this buzz!
If you just run a sine wave through 'Crescendo Enhancement' or through 'FFTDolbySpectralProcessor' about a
LogFreq? of 48, you'll hear the buzzing loud and clear - it seems strange that David never checked this?!? - if anyone can figure out how to solve it, 3 years and many updates down the road , it would be much appreciated...
--
CristianVogel - 04 Sep 2007