Analog to MIDI
By Larry Smith
Mr. Pleatman:
>> How difficult would it be to write a program to >> convert played music do midi?
>In my humble opinion, conversion from pitch to midi is very difficult.
No question it's a difficult problem. However, if the folks on this list don't mind, I've been noodling around an idea for an algorithm that _might_ be able to do general analog to midi conversions. I've been designing the algorithm in my spare time, but I haven't tried implementing it, and I'm sure it is not going to be easy, but it has the potential - in theory - to do the job. But this is deep computer science, so the other folks feel free to skip this one, and if anyone wants to talk about it, we can take it to direct email.
Basically, the germ of the idea is simple: use a genetic algorithm. I think the internal representation of a midi sequence would make a reasonable "chromosome" for such an algorithm. We would start with a fairly large number of chromosomes, as many as we could, at least a few hundred, initially made up randomly, and load these into the "paddock".
The evaluator is the heart and soul of the genetic algorithm. The basic genetic mechanisms outlined above make up the _machinery_ of evolution, but the success evaluator is what enables us to direct it. It is the computer's equivalent of "survival of the fittest". It will "execute" the generated midi sequences over and over as they are completed by the mating routine below, and compare the .wav outputs with the original .wav input of the target. It must generate some rating that indicates how good a match the two inputs are, and that rating must be weighted - correct length, +10, standard deviation of amplitudes * 2 - this is the tough part. All these measurements are built into the evaluator. A really sophisticated evaluator would isolate sequences and rate them individually (Opening: 10, middle -2, ending 3, for example). When it is done, the rating is tacked onto the tested sequence and it is returned to the "paddock" for mating and production of new offspring sequences.
Sequences are selected for mating by scanning the paddock and mating the highest ratings with other high ratings chosen at random. These new "offspring" sequences will replace the lowest-rated sequences in the paddock. The simplest way to do this "mating" would be to select a split point at random and transfer the head of one to the tail of the other or vice versa. A more powerful way would be to use the evaluation algorithm discussed above to assign success weights to individual segments and select high-weighted segments for crossover (that is, select a pair to mate and construct the offspring from the highest-rated segments of the two parents), but the algorithm should converge on a solution even without that, though it may take a long time to run. We must not forget to allow for a tiny amount of mutation - a one-in-a-thousand or ten-thousand chance of zapping a random number into a sequence.
This will have to stew for a while. It would be wise to set it up so that it can be interrupted and save its state to the hard drive and be able to reload and pick up just where it left off, so you can use the computer during the day and let the algorithm run all night, every night. Periodically, you could play the top half-dozen scorers and see how close you are to where you want to be. Eventually the program will max out a bunch of selections that together make up the "best" sequence it can get within the limitations of the system's hardware, and the average evaluation score will rise until it maxes out. The algorithm could detect that and exit, though checking it every evening would not be a bad idea since it may spend quite a few thousand generations building up the score for the whole paddock even after it has maxed out one "champion" individual. Alternatively, one could simply exit as soon as a specimen having a certain maximal score is evaluated.
As we like to say in the Unix world, "you are not expected to understand this" =) Genetics are a way of stirring random numbers and getting useful results from them. Given that humans are, in fact, only about ten million generations from the teeny little rodent-like mammals that lived with the last of the dinosaurs, it really isn't that large a leap from random noise to a specific sequence of midi events that emulates a particular .wav form. In theory, this algorithm could be rigged to do arrangements of very different instruments - you can play anything you like into it, and then restrict what voices the test sequences in the paddock can use, and the algorithm should create a "best match" using just those voices.
How good and how fast this system would be, I cannot say. Genetics have been used to solve some very, very intractable problems, and they are one of our last, best hopes for Artificial Intelligence. But they are not trivial things to code up - and the _math_ that is behind these things! People have gotten PhD's just explaining _why_ thse algorithms work as well as they do, as fast as they do. Most programs I've seen using genetics are getting good approximations of solutions within a couple dozen generations, and rarely more than a couple hundred. A computer could evaluate that many in less than an hour. But the tough part is the evaluator. Solve that, and I guess Darwin can do the rest for you, it's a natural law.
In any case, I've been working on this as one way to create new music box arrangements. My synthesizer has a music box voice, so I figured that feeding in a recorded music box would give it a good chance to find a perfect match and create a midi version of the same tune. But there is no theoretical limitation on this, you could set up a kazoo as the only available output voice and play the Philadelphia Philharmonic into the input and get a rendition of whatever it was for the kazoo. Contrarywise, you could leave the output voices unrestricted and, in theory, get a midi version of what was played. BUT - you will almost certainly get "artifacts" - if an instrument makes the right noise, the algorithm won't care if it's "weird" - that is, genetics won't care if one note from a cello is called for in the middle of the second fiddle's solo, so long as it "sounds" right - this may be fine for automated music, but if you want perfection you will need to edit the output by hand for this kind of sillyness (and if you think that's not true to life, let me tell you about this thing called an "appendix" =)
This is getting overlong. If Jody thinks it's okay for the list, he will append a note to that effect, and if anyone wants to discuss this and there _isn't_ a note, drop me some email. This _bears_ on auto-music, but the topic itself is deep hacking.
Oh, one other thing: for a good explanation of evolution, what it is and why it works, read Richard Dawkins, "The Blind Watchmaker" - it's an easy read and not too much math for ordinary mortals. And for the more adventurous among you, a good introduction to genetic algorithms and machinery of genetic selection and mating, you read "Genetic Algorithms" - I don't recall the author - but it's the best reference to how to build this kind of stuff. What _out_for_the_math_ - when they start explaining "schema" and nature of genetic coding for spacial searches you can quickly lose your depth - but you only need that chapter if you want to understand _why_ it works, the algorithm itself, with the exception of the evaluator as noted above, is fairly trivial, a very straightforward simulation of the genetic machinery in every living thing. Recommended.
regards, Larry Smith
|
(Message sent Mon 9 Oct 1995, 15:24:39 GMT, from time zone GMT-0400.) |
|
|