Mechanical Music Digest

Mechanical Music Digest^™ Archives

You Are Not Logged In	Login/Get New Account
Please Log In. Accounts are free! Logged In users are granted additional features including a more current version of the Archives and a simplified process for submitting articles.

End-of-Year Fundraising Drive In Progress. Please visit our home page to see this and other announcements: https://www.mmdigest.com Thank you. --Jody

MMD

Archives

April 1999

1999.04.22

Prev Next

WAV-to-MIDI Converter Programs
By Zoltán Jánosy

C. Jim Cook wrote:

> This is a general problem that I'm sure many people have thought
> about for years.  While it seems like it should be easy, it doesn't
> work out that way.  Why?  Here's a relatively easy explanation...

Hmmm...  The explanation may be easy, but essentially wrong.  I'd give
a more technical, less "digestible", but more accurate explanation.

> Suppose you are listening to two flutes.  The sound of a flute can
> be represented as a sine wave.

This is a very rough model for a flute.  If it were a single sine wave,
it would be very easy to track even polyphonic music, for example by
using spectral analysis (e.g., by FFT, the Fast Fourier Transform).

The real problem is that most natural sounds consist of several (up
to 30 or 40) sine waves, called partials (although usually only some of
them are prominent).  The frequency of the partials are usually an
integer multiple of a base or fundamental frequency, thus they are
often called "harmonics".

 [ The mathematician defines harmonics as an integer times the
 [ fundamental frequency, so only those partials which are exactly
 [ 2 or 3 or 4, etc., times the fundamental are really "harmonics."
 [ Percussion instruments generate overtones which deviate from the
 [ mathematician's harmonic series.  (The piano tuner adjusts the
 [ "stretch" -- the deviation from the harmonic series -- for the
 [ best sound of a given piano.)  If the overtones aren't a true
 [ harmonic progression they are simply called overtones or partials.
 [ -- Robbie

The "pitch" of the tone is usually corresponds to the base frequency,
but in some instruments the partial at the base frequency may be
missing or it is not the strongest partial.  Sometimes partials are
missing (e.g., only the "odd" or the "even" harmonics are present), or
the partials are not even harmonic -- e.g., the piano has "stretched"
partials, and it has several partials that are unrelated to the base
frequency.  Analysis of the time domain signal is even less hopeful.
Thus detecting the pitch of a single instrument is in itself a problem.

The other problem is detecting the beginning and ending of the notes.
Most methods that have accurate resolution for determining the exact
frequency of the harmonics tend to "smear" the transient behavior.
For a single voice [an isolated note] examining the time domain
(e.g., the instantaneous power of the signal) would help, but for
polyphonic signals this method is almost hopeless.

In polyphonic music the biggest problem is that the partials coming
from the different voices "overlap" (partials coming from different
voices may have the same, or almost the same frequency), thus it is
difficult to identify which partial belongs to which voice.  Some
partials may interfere with each other, causing fluctuation in the
spectrum.  (By the way, this is what makes the sound of acoustic
instruments so appealing.)  If there are different instruments
playing at the same time, as in an orchestra, the problem is even
harder, since besides detecting the notes the instrument should be
identified, too.

If it still seems not difficult enough, the structure of the partials
of an instruments sound depend on many parameters, most notably the
pitch and the "loudness" of the note.  Moreover, the noise and reverb-
eration often present in real recordings makes even the simplest
analysis almost impossible.

> However, back in high school, we learned (and mostly forgot) that
> sine A + sine B = cosine of (A X B)

I hope this formula is not taught (or quickly forgot) in the US.  ;-)

> Great.  Since a cosine is just a sine 180 degrees out of phase ...

Same applies here.  A cosine wave is a sine wave 90 degrees out of
phase.

If anyone is further interested in WAV-to-MIDI conversion, I'd be glad
to discuss it more deeply.

Zoli Janosy

 [ Zoli co-authored a paper about transcribing Welte Mignon rolls
 [ which was published by the Audio Engineering Society.  He holds
 [ a MS degree in electrical engineering and now works at a tele-
 [ communications company in Budapest, Hungary.   -- Robbie

(Message sent Thu 22 Apr 1999, 09:48:22 GMT, from time zone GMT+0200.)

Key Words in Subject: Converter, Programs, WAV-to-MIDI

Home Archives Calendar Gallery Store Links Info

Enter text below to search the MMD Website with Google

CONTACT FORM: Click HERE to write to the editor, or to post a message about Mechanical Musical Instruments to the MMD

Unless otherwise noted, all opinions are those of the individual authors and may not represent those of the editors. Compilation copyright 1995-2025 by Jody Kravitz.

Please read our Republication Policy before copying information from or creating links to this web site.

Click HERE to contact the webmaster regarding problems with the website.

Please support publication of the MMD by donating online

Please Support Publication of the MMD with your Generous Donation

Pay via PayPal
No PayPal account required

Translate This Page