C. Jim Cook wrote:
> This is a general problem that I'm sure many people have thought
> about for years. While it seems like it should be easy, it doesn't
> work out that way. Why? Here's a relatively easy explanation...
Hmmm... The explanation may be easy, but essentially wrong. I'd give
a more technical, less "digestible", but more accurate explanation.
> Suppose you are listening to two flutes. The sound of a flute can
> be represented as a sine wave.
This is a very rough model for a flute. If it were a single sine wave,
it would be very easy to track even polyphonic music, for example by
using spectral analysis (e.g., by FFT, the Fast Fourier Transform).
The real problem is that most natural sounds consist of several (up
to 30 or 40) sine waves, called partials (although usually only some of
them are prominent). The frequency of the partials are usually an
integer multiple of a base or fundamental frequency, thus they are
often called "harmonics".
[ The mathematician defines harmonics as an integer times the
[ fundamental frequency, so only those partials which are exactly
[ 2 or 3 or 4, etc., times the fundamental are really "harmonics."
[ Percussion instruments generate overtones which deviate from the
[ mathematician's harmonic series. (The piano tuner adjusts the
[ "stretch" -- the deviation from the harmonic series -- for the
[ best sound of a given piano.) If the overtones aren't a true
[ harmonic progression they are simply called overtones or partials.
[ -- Robbie
The "pitch" of the tone is usually corresponds to the base frequency,
but in some instruments the partial at the base frequency may be
missing or it is not the strongest partial. Sometimes partials are
missing (e.g., only the "odd" or the "even" harmonics are present), or
the partials are not even harmonic -- e.g., the piano has "stretched"
partials, and it has several partials that are unrelated to the base
frequency. Analysis of the time domain signal is even less hopeful.
Thus detecting the pitch of a single instrument is in itself a problem.
The other problem is detecting the beginning and ending of the notes.
Most methods that have accurate resolution for determining the exact
frequency of the harmonics tend to "smear" the transient behavior.
For a single voice [an isolated note] examining the time domain
(e.g., the instantaneous power of the signal) would help, but for
polyphonic signals this method is almost hopeless.
In polyphonic music the biggest problem is that the partials coming
from the different voices "overlap" (partials coming from different
voices may have the same, or almost the same frequency), thus it is
difficult to identify which partial belongs to which voice. Some
partials may interfere with each other, causing fluctuation in the
spectrum. (By the way, this is what makes the sound of acoustic
instruments so appealing.) If there are different instruments
playing at the same time, as in an orchestra, the problem is even
harder, since besides detecting the notes the instrument should be
identified, too.
If it still seems not difficult enough, the structure of the partials
of an instruments sound depend on many parameters, most notably the
pitch and the "loudness" of the note. Moreover, the noise and reverb-
eration often present in real recordings makes even the simplest
analysis almost impossible.
> However, back in high school, we learned (and mostly forgot) that
> sine A + sine B = cosine of (A X B)
I hope this formula is not taught (or quickly forgot) in the US. ;-)
> Great. Since a cosine is just a sine 180 degrees out of phase ...
Same applies here. A cosine wave is a sine wave 90 degrees out of
phase.
If anyone is further interested in WAV-to-MIDI conversion, I'd be glad
to discuss it more deeply.
Zoli Janosy
[ Zoli co-authored a paper about transcribing Welte Mignon rolls
[ which was published by the Audio Engineering Society. He holds
[ a MS degree in electrical engineering and now works at a tele-
[ communications company in Budapest, Hungary. -- Robbie
|