Listening: Sound Stream as a Clock 26 Nov 2003 By: Laura Tedeschini-Lalli
International Journal of Modern Physics B
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=%22Laura+Tedeschini-Lalli%22&btnG=Google+Search Our act of "listening" has the ability to extract patterns, assigning them, in real time, to several different time-scales... We claim that this hierarchical storage, in turn, affects our ability to correctly synchronize events in a sound stream... (The processing of the audio-samples was done on a Kyma-Capybara system in the Dept. of Mathematics, Universitá Roma Tre.)
Discussion (Descriptions, reviews, discussion):
Anyone who has ever tried to remove a click from an audio recording has experienced the phenomenon described in this paper by mathematician/musician Laura Tedeschini-Lallo. We do not always correctly hear or remember the exact time placement of a click (or other disturbance) in the signal. Our mis-placement of the disturbance is not, however, random. In fact, Dr. Tedeschini maintains, we tend to assign the click to the nearest time point during which nothing new or unexpected is going on—the points she characterizes as having low entropy. It is almost as if our brains are kept busy during the high-entropy (high-information) segments of the signal, so we delay dealing with the click until we reach a pause or a point in the signal where everything is behaving just as we expect it will. As a result, we truly believe that the click happened later than it really did.
This was an exciting paper to read and one that inspired me to think about other applications (like music and sound design). I thought that her idea of placing "errors" in the signal was a really clever way to "probe" the way people process and understand a stream of audio. The fact that we mis-hear where those errors occur is convincing evidence that we process those signals hierarchically. It's a beautiful result. The only thing I
wished she had included was a measure of the entropy based on the
sound or the phonemes, rather than on the printed letters of the words. Probably this has a lot to do with the incongruities of English vs Italian spelling (where there is a good correspondence between letters and sounds). That would be my "request" for a future paper—a graph of prediction and entropy for the spoken (rather than the writen) phrases.
For me, the paper inspired some questions on designing sound for picture. For example, if you place a "sound-effect" during some black frames, does it have more of an impact than if you had placed it during some frames with lots of unpredictable visual activity? If you place a sound-effect with a picture that has lots of visual activity (followed by a black screen), would the audience think that the sound had occurred during the black frames?