WAV to MIDI; The Physics
By matt@Physics.usyd.edu.au, forwarded by Claus Kucher
[ Editor's note: The following article was forwarded to us by Claus, [ but is indexed seperately in Digest header.
In article <3269AE02.2455@webworldinc.com>, Christopher Weare <cweare@webworldinc.com> wrote:
> > AND THAT'S THE KEY!... "A SINGLE INSTRUMENT" > > > > AFTER ALL THIS DISCUSSION, I STILL CAN NOT FATHOM HOW A DEVICE (BEING > > HARDWARE/SOFTWARE) CAN ACCURATELY TRACK MULTIPLE INSTRUMENTS, CONVERT > > THEM TO A MIDI NOTE (OK... I CAN SEE DOING THAT VIA FFT) .... Geeeee... > > MAYBE ONLY ONE INSTRUMENT AT A TIME IS POSSIBLE BUT .. A > > MULTI-INSTRUMENT CONVERSION... I THINK NOT !!!!! > > Consider this: A human can do it. In time, machines will be able to do > it reliably. There are already several attempts that have varying > degrees of success and extracting the note info from multi instrument > recordings. None are yet robust enough to survive as a commercial > product, but it is only a matter of time. There is no fundamental > reason why a "machine" implementation would never be able to solve this > problem.
I've done quite a bit of signal analysis and processing as part of my physics degree, so lets think about this one....
To track a single instrument you need to pick out the fundamental frequency. Not a problem.. fourier transform it and pick out the lowest peak. Call this a note. If you want you can probably sort out some sort of correlation to the amplitude of the original wave and MIDI velocity or volume information. Note that you are throwing out all the things that make the instrument unique.. ie which of the higher harmonics are present, and their relative strengths.
Now add a second instrument. Say we have a violin and a flute. Both instruments will have a fundamental frequency and a pile of overtones. If the music is in anyway tuneful, a lot of them will probably overlap. Now the human ear can detect these instruments individually, but only because they are sounds we recognise. (What if somebody created an instrument that sounded just like a flute and a violin playing in unison? You'd pick it as two instruments).
Okay.. so we need to tell our computer what a violin sounds like, and what a flute sounds like, by giving it a signature to work with. We know that certain harmonics will be present in each instrument, and depending on how its played they will appear in certain ratios. We could even get the computer to analyse a section of music, looking for simularities in wave patterns to define a verse, or a chorus, or a middle eight or whatever, and then analyse the fourier transforms of a number of tiny sections within the music to determine what instruments are present, then apply this knowledge to the music to extract the actual notes.
It is definitely not a simple "run it through" program, but an iterative series of fits to the data. Basically what you do is apply a model (I have these model instruments playing these model notes) and changing it over time to better fit the output you have (This is a common practise in all sciences that fit models to data).
Basically, it's not a trivial problem, and given that your average piece of music contains several instruments playing several thousand notes, quite a lot of number crunching needs to be done to fit the music to the sound.
The human brain does all this in real time .. incredible.
Note the method descibed is essentially a brute force method. The algorithm could probably be refined, but somebody is gonna have to implement the brute force method first.
Matt
-- Plan: To retain the childlike enjoyment for the simple things in life, while aquiring the maturity to fully appreciate them. |
(Message sent Mon 21 Oct 1996, 05:49:46 GMT, from time zone GMT.) |
|
|