k y m a • t w e a k y — the kyma collective || view the current website || February 2015 Archive

/ WebHome / How.ConvolutionInKyma

Search


How Section


How do I...
Add a question...

Home 
Topics 
More... 

All Sections


Products
Order
Company
Community
Share
Learn

Login / Register 
Change password
Forgot password?
%SESSION_IF_AUTHENTICATED% Site Map
%SESSION_ENDIF%

Symbolic Sound


Home
Kyma Forum
Eighth Nerve

TWiki Links


%SESSION_IF_AUTHENTICATED% TWiki Shorthand
TWiki Formatting FAQ
What is TWiki?
%SESSION_ENDIF% twiki.org

Question (or Task)

There has been a lot of discussion about this lately, and I've looked into the math a bit, but I need expert advice. Could someone please explain (or post a Sound to demonstrate) how to do convolution in Kyma?

In particular, I am interested in the "impulse response" type reverbs that I am hearing about. From my read of the math, there is no way these programs can be fast enough to convolve 6 seconds of impulse response in real time. Several people have pointed out that they may only use convolution for a few milliseconds, then change to something else.

Any insights would be appreciated.

-- BillMeadows - 25 Mar 2004

Solution(s)

Convolution

You can break down any audio signal into a set of separate signals, each audio signal being very simple. For example, imagine you have a signal whose samples are:

x0 x1 x2 x3 x4 x5 x6 ...

This could be broken down into a set of simple signals:

x0  0  0  0  0  0  0 ...
 0 x1  0  0  0  0  0 ...
 0  0 x2  0  0  0  0 ...
 0  0  0 x3  0  0  0 ...
 0  0  0  0 x4  0  0 ...
 0  0  0  0  0 x5  0 ...
 0  0  0  0  0  0 x6 ...
...

If you were to mix (add) these signals together, you would get your original audio signal back. If you look at these signals, they are simply delayed and scaled impulses.

Now, let's say you have a recording of the impulse response of the room (or acoustic space). The impulse response of a room is a recording of the sound in the room after having played an impulse into it. That is, if the input to the room is:

1  0  0  0  0  0  0 ...

the impulse response in the samples of the recording in the room at a certain point:

r0 r1 r2 r3 r4 r5 r6 ...

To determine the room response to your complex signal, you can simply mix (add) the room responses to each of the simple signals we had earlier:

x0*r0 x0*r1 x0*r2 x0*r3 x0*r4 x0*r5 x0*r6 ...
  0   x1*r0 x1*r1 x1*r2 x1*r3 x1*r4 x1*r5 ...
  0     0   x2*r0 x2*r1 x2*r2 x2*r3 x2*r4 ...
  0     0     0   x3*r0 x3*r1 x3*r2 x3*r3 ...
...

Each response is just a scaled and delayed version of the impulse response (because each of the signals is a scaled and delayed impulse).

As you can see, the mix formula gets more terms as you get further into recording. The number of terms increases until you have as many terms as you have samples in the impulse response. For example, a 3 second impulse response (something like an auditorium) would have 132300 samples. This would mean that this method of performing the reverberation would involve 132300 multiplications and additions per sample, or about 5,834,430,000 multiplications and additions per second (something like 11 giga-instructions per second, requiring 22 gigabytes per second memory bandwidth).

There are some tricks you can apply using the FFT to reduce this computational cost by about a factor of 2000. The most direct method would introduce a delay of at least the length of the impulse response; more clever methods can reduce this delay dramatically but at a tremendous programming cost.

Alternative Methods

In general, people use convolution to combine the spectral characteristics of one signal with those of another signal. Using convolution to impose the characteristics of a room response on an audio signal to create artificial reverberation is just one special case of this more general goal.

For the general case of cross-synthesis, researchers have taken several different approaches to extracting spectral characteristics from one signal and applying them to another, for example, vocoding.

For the special case of reverberation, starting with Manfred Schroeder (and continuing to the present), researchers have examined the room impulse responses and have concluded that they are organized (roughly) as discrete early reflections followed by a dense reverb tail (whose amplitude falls off exponentially). Many reverberation simulations are based on this organization. The audio signal goes through a set of delays and attenuators to simulate the early reflections (the first and second reflections of the audio signal off of the closest surfaces). These early reflections are mixed and fed into a series of resonators to simulate the reverb tail.

One advantage to these "physical model" type approaches is that there are several parameters (the strength and delay time of each early reflection, the resonant frequency, bandwidth and response type of the resonators) that can be individually tuned to alter the apparent acoustic space. Another advantage is that these methods are relatively cheap in terms of computational costs. A major disadvantage is that it can be time-consuming to find parameter settings that sound like specific rooms.

Things to try in Kyma

For general cross-synthesis, check the Kyma Sound Library, especially:

and in the Prototypes:

For reverberation effects, look in the Kyma Sound Library for examples of the early reflections/reverb tail models (all found in Reverb.kym in the Effects Processing folder:

HalVerb Synthesizes reverb tail only using Allpass reverberators
Xpanded EuVerb Synthesizes reverb tail only using Allpass and HarmonicResonator reverberators
10 tap input w/Stereo Rev Combines simplistic early reflections with EuVerb reverb tail

(Reverb.kym also has a number of alternative reverb-like effects.)

Also, search in the Prototypes (ctrl+B) for "verb" to find several prototypes for reverberation effects.

To see an algorithm for convolution, take a look at Convolution.txt, a text file that will perform a convolution of a pair of monoaural samples. I whipped it up in a hurry so it is not particularly efficient, has no error checking, and stops at the end of the file instead of allowing the reverberation to ring off. To use it, open the file in Kyma, select all of the text (Ctrl+A), and then evaluate the text (Ctrl+Y).

-- KurtHebel - 26 Mar 2004

Thanks Kurt. You confirmed my suspicions about this. I have achieved some interesting results by doing complex multiplication of the outputs of two FFT blocks, then taking the iFFT, but this is limited to the number of points in the FFT.

I'll try your text routine as soon as possible, it could be just the ticket. Convolution is useful for many interesting audio effect besides simple reverb: take the impulse response of the inside of a washing machine, convolve with a tuba, hear a tuba in a washing machine!

-- BillMeadows - 26 Mar 2004

Try it on two very short samples first, it is quite slow! (If you need to interrupt it in the future, try Ctrl+U, which should interrupt any Smalltalk program as it is running.)

-- KurtHebel - 27 Mar 2004

I just wanted to add another way of describing how Convolution works for those who (like me) are not that up on maths.

CONVOLUTION WITHOUT THE MATHS.

Imagine putting a speaker at the front, and a mic at the back of a large hall, and playing a single positive going click (one sample wide (1/44100 sec)) through the speaker, and then recording the results with the mic. The recording will be the impulse response of the hall, speaker and mic setup, and will be a representation of how the sound has reached the mic after it has bounced around the walls and slowly died away.

Let's say this impulse response lasted four seconds and we put this recording into a sample player (with all the notes set to the same original pitch); then we could play the sound of the reverb in that hall, but only if our sound source was a single click. This doesn’t seem very useful at all.

Now imagine that the source sound was a click at half the level; we wouldn’t need to go back and record a new impulse, we could just play the same sample but at half level to hear what the reverb would sound like.

Now if the source was a negative-going click we could just play the whole sample inverted.

Now what if the source signal was two clicks one second apart with the first one at full level and the second at half level; then we could play the sample twice at the two different levels, but we would need a sample player with a polyphony of at least two, as the first sample will still be playing when the second (half level) sample starts.

Now if our source signal was ten clicks at different levels and polarities, we could emulate the reverb with a sample player having a polyphony of ten, as long as we started the samples at the right time and made sure that their levels match the clicks.

But who wants to hear the reverb of clicks? What we want is the reverb of real sound, but real digital sound can be considered as a stream of clicks (44100 clicks per second if the sample rate was 44.1 khz) that sit next to each other with no gaps between them.

So if we had a sample player with a polyphony of 44100 and we started up each poly sample one after the other as if the source clicks now lay next to each other with no gaps between them, and if we set the level of each poly sample to be the same levels as each individual sample of an input source signal, then we could make the reverb of any input signal, as long as it was no longer than one second. The clicks stop being clicks when they represent the individual sample steps of an evenly flowing signal.

If we had a sample player with a polyphony of 4 * 44100 (176400) we could put any signal in with no time limit. This is because after 4 seconds the first poly sample would have finished playing and would then become available to be reused for the next incoming click (or sample).

Note that all these 176400 samples will all be playing at the same time (albeit staggered by one sample each), so you can see how convolution can be very processor intensive. Although this big big sample player doesn’t need to play samples at different pitches and the sample being played all use the same impulse wave form. It does have to have a different level and/or polarity for each poly sample though.

Hope this helps someone.

-- PeteJohnston - 29 Jun 2004

Well done, Pete. Thanks.

-- BillMeadows - 29 Jun 2004

Just a note for newer members: While not exactly the convolution method discussed here, the CrossFilter? sound released and now part of the prototype library a few years ago achieves a similar result.

-- BenPhenix - 22 Feb 2010

WebForm
Question: How do I do convolution in Kyma?
Keywords: convolution

 
 
© 2003-2014 by the contributing authors. / You are TWikiGuest