WAV to MIDI; A Review
By Claus Kucher
The discussion about "Analog --> MIDI conversion" in this group, as it shows to me, has the following state (I hope I did not miss anything):
Robbie Rhodes wrote in MMD 95.11.01 (which was "Automatic Music Digest"):
> Subject: Re: Genetics & Analog-to-MIDI
> I received a note from Artis Wodehouse inquiring how to analyze > a live-recorded phonograph performance for the statistics of the > note- and chord-attack timing. She would like to properly "randomize" > the timing of a heavily-edited piano roll performance in order to > make it sound more like a live performance. > > Artis referred me to two recent articles which I'm now reviewing: > > [1] J. Berger, R. Coifman and M. Goldberg, "Removing Noise from Music > Using Local Trigonometric Bases and Wavelet Packets," J. Audio > Eng. Soc., Vol 42, No. 10, pp. 808-818 (October 1994) > > [2] J. Berger and C. Nichols, "Bhrams at the Piano: an Analysis of > Data from the Brahms Cylinder," Leonardo Music Journal, Vol. 4, > pp. 23-30 (1994) > > Author Jonathan Berger is a composer and music professor and also > directs the Center for Studies in Music Technology at Yale University. > The JAES paper [1] uses a "cost function" to evaluate the progress > of a noise-removing algorithm, and steer its progress in the proper > direction. > > The second article [2] describes how this method, and others, were > applied to an audio recording of Johannes Brahms, himself, playing > a segment of his "Hungarian Dance No. 1" at the piano. > > The authors created a synthesized sound file from their analyzed data > (I'm not sure if it was .wav. or MIDI) and "overlaid" it with the > original Brahms sound record to test their analysis results. > > Artis's desire is rather thought-provoking. She is currently editing > the set of 12 piano rolls recorded in 1923 by Jelly Roll Morton, and adding > realistic velocities/expression in preparation for playback on a MIDI > solenoid reproducing piano. I presume that, at the current stage of editing, > the timing is very metronomic, and even with expression and accents, it > still sounds a little too good, i.e., "unreal". Simply introducing a > little random noise into the MIDI file timing just make the song sound > "live-played" sounds like an umpteenth-generation piano roll copy, because > the random timing is truly random. > > Howzat again? Yes: true random noise on the time base doesn't sound right, > either. Therefore, the original performer was NOT random in his timing, > and there must be some sort of correlation with the musical figure he's > performing. > > Artis would welcome constructive advice on this task. Any ideas out there?
In MMD 95.11.12 Zoltan Janosy replied to Robbie Rhodes discussing "WAV to MIDI conversion"
and John D. Rhodes contributed the following:
> a. I assume the objective is to create sheet music of the original > performance, not simply produce a combination of MIDI-notes/voices > which "sounds like" the original. So let's get the "sounds-like" out > of way with sampled notes, and concentrate on combining the notes to > produce the target timing and harmony. [ No: the major objective is > something like MIDI control of a dynamic waveform synthesizer. --Rob] > > b. I think you should be able to extract samples of *individual* > notes from a disk recording. Take advantage of the slight attack > descrepencies on simultaneous notes. Once you have the attack/decay > signature isolated for a given note, you should be able to remove it > from a chord (within reason) to assist isolating the signature of > other notes. > > Further thoughts: > Computer scientists (like some manufacturing automation engineers I > work with) sometimes produce overly-complex solutions by attempting > to program/automate everything. Some problems (e.g. pattern > recognition) are extremely tough to find programmatic solutions for; > but humans (in fact, most animals) are *extremely* good at pattern > recognition -- in more than the visual domain, I should add. So, swallow > some inventor's pride, save lots of time, and put a human in the loop. > > For example, Rob, you and brother Doug can listen to a piano > recording and play it by ear and write out the notation. You are > bringing far more to bear on the problem than simple aural analysis. > You are including: how many notes can one hand play, what is the > intended harmony, how would one voice this, etc. Use your knowledge > to supplement (and teach) the program.
In MMD 95.11.15 Robbie Rhodes wrote the following:
> My nephew, Brad, is studying for a Doctorate at MIT in the artificial > intelligence arena, and is a research assistant in the MIT Media Lab. > This letter addressess Artis Wodehouse's problem of recovering the > note timing in a recorded piano performance, and suggests analysis > with matched filters (nowadays a fairly mature process).
> Personally, I wouldn't use GA's for this problem, at least with the > genotype being a MIDI string (the phenotype being the .wav file). GA's > are usually used when there are a bunch of different parameters interacting > with each other in intractable ways. That's not the case here, since a note > or chord doesn't affect the rest of the music after it stops resonating. > (If you're interested in more about genetic algorithms or genetic > programming ask me -- I just deleted a huge discourse on how you might use > them on this problem before deciding it wasn't the right approach.) > > Since it's been a long time since I've done any signals stuff, I bounced > the problem off of a few people around the lab and the best approach seems > to be to use a bank of filters, each finding where a different note appears > in the music. Here's the jist, and I'm sure my Dad can fill in any details > or correct mistakes. > > For each voice and note take a few samples from whatever output program > you'll be using. These samples should be of different velocities and > durations, and you'll want to get some samples with just the onset, while > others have the full note. You'll also want to capture the reverb > afterwards. Then create a matched filter consisting of the onset of just > that note starting at 0 time. (My office mate & I have been looking over > our old signals textbook, and it looks like this is done by reversing the > sample in time, creating a filter with that impulse response, and then > convolving that with the music.) This should create a signal with peaks > where the center of the sample matches; use that and the length of the > sample to find where the sample should start. > > Once you've got the start of the note and where it was, you can play with > different durations and velocities to get the closest match. The inner > product of the sample with the music over the same period of time should > give a number indicating how good a fit the choice is. After that you'll > need to go through by hand and clean it up, but that should do most of it. > It'll probably work less well with non-percussion instruments since there's > a lot more information than just note, velocity, and duration, but it > should still give a good approximation. > > This is almost certainly already being done, at least in the laboratory if > not in commercial products yet. Steve Mann recommended several references in > wavelet theory to look at. If these aren't a good match, chances are > something they reference will be. > > R. Wilson, A D Calway, and E R S Pearson. A generalized wavelet transform > for Fourier analysis: the multiresolution Fourier transform and its > application to image and audio signal analysis. IEEE Trans. on Information > Theory, 38(2):674-690, March 1992 > > C.E. Heil and D.F. Walnut. Continuous and discrete wavelet transforms. > SIAM Review, 31(4):628-666, 1989. > > S.G. Mallat. A theory for multiresolution signal decomposition: The > wavelet representation. IEEE Trans. on Patt. Anal. and Mach. Intell., > 11(7):674-693, 1989 > > I. Daubechies. The wavelet transform, time-frequency localization and > signal analysis. IEEE Trans on Inf. Theory, 36(5):961-1005, 1990 > > G. Strang. Wavelets and dilation equations: A brief introduction. SIAM > Review, 31(4):614-627, 1989 > > B.C.J. Moore. An introduction to the psychology of hearing. Academic > Press, second edition, 1982 > > I'd also recommend checking out the sound and media group at the MIT Media > Lab: <http://sound.media.mit.edu/>. They seem to have projects along these > lines.
The discussion went on half a year later (MMD 96.07.22) when John Tuttle was asking the group for a "program that converts .wav files to MIDI files??" and Jody Kravitz placed the following editor's note:
> Converting single pitches to MIDI can be done pretty easily, > but in the context of this group I'm sure you're interested > in taking a .wav file of a recorded piano (or band) piece and > converting it to a MIDI form of the performance. > > I consider this the "Holy Grail" of Artificial Intellegence > Research as applied to music. This is _NOT_ a solved > problem. The more complex the musical input (more instruments) > the harder it gets. I have several recordings that were taken > from 78's by the Smithsonian that I'd like to have MIDI scores > of. I figure that some day I'll bring the .wav files to > someone who knows how to transcribe music by hand and pay them > to do it. I have some ideas about how to write software > to simplify the task, but the essential "computer" is still the > human.
and S.K. Goodman replied:
> I have been transcribing material from 78's and cylinders to paper for > years. By the time one edits MIDI files from wav files, it would just > be simpler to do a straight "by ear" transcription direct to MIDI. > This is exactly what I am doing right now making a full 88- note piano > roll arrangement from the U.S. Marine Band playing several movements > of Sousa's syncopated compositions from his suites. > > An example of how clunky converting input to MIDI can be other than by > keystroke or mouse is the amount of editing that a scanned score > requires (such as generated by Midiscan software) to make it a viable > MIDI file. I hope the situation improves.
This is the point till now in October 1996 - so maybe the discussion will continue forced with the attached article from Matt (The Lost Soul).
I think it could be from interest not only for the sleeping programmers-technicians-musicians-players and other related folks to enter the hall of fame bringing the most wanted software of converting Analog-to-MIDI data to the rest of the world ;-)
Claus Kucher
|
(Message sent Wed 23 Oct 1996, 11:48:04 GMT, from time zone GMT+0200.) |
|
|