MMD
Archives
October 1995
1995.10.10
Prev Next
Analog to MIDI
By Larry Smith
Mr. Pleatman:

>> How difficult would it be to write a program to
>> convert played music do midi?

>In my humble opinion, conversion from pitch to midi is very difficult.

No question it's a difficult problem.  However, if the folks on
this list don't mind, I've been noodling around an idea for an
algorithm that _might_ be able to do general analog to midi conversions.
I've been designing the algorithm in my spare time,
but I haven't tried implementing it, and I'm sure it is not going
to be easy, but it has the potential - in theory - to do the job.
But this is deep computer science, so the other folks feel free
to skip this one, and if anyone wants to talk about it, we can
take it to direct email.

Basically, the germ of the idea is simple: use a genetic algorithm.
I think the internal representation of a midi sequence would make
a reasonable "chromosome" for such an algorithm.  We would start
with a fairly large number of chromosomes, as many as we could, at
least a few hundred, initially made up randomly, and load these into
the "paddock".

The evaluator is the heart and soul of the genetic algorithm.  The
basic genetic mechanisms outlined above make up the _machinery_ of
evolution, but the success evaluator is what enables us to direct
it.  It is the computer's equivalent of "survival of the fittest".
It will "execute" the generated midi sequences over and over as they
are completed by the mating routine below, and compare the .wav outputs
with the original .wav input of the target.  It must generate
some rating that indicates how good a match the two inputs are, and
that rating must be weighted - correct length, +10, standard deviation
of amplitudes * 2 - this is the tough part.  All these measurements
are built into the evaluator.  A really sophisticated evaluator would
isolate sequences and rate them individually (Opening: 10, middle -2,
ending 3, for example).  When it is done, the rating is tacked onto
the tested sequence and it is returned to the "paddock" for mating
and production of new offspring sequences.

Sequences are selected for mating by scanning the paddock and mating
the highest ratings with other high ratings chosen at random.  These
new "offspring" sequences will replace the lowest-rated sequences in
the paddock.  The simplest way to do this "mating" would be to select
a split point at random and transfer the head of one to the tail of
the other or vice versa. A more powerful way would be to use the evaluation
algorithm discussed above to assign success weights to individual
segments and select high-weighted segments for crossover (that is, select
a pair to mate and construct the offspring from the highest-rated segments
of the two parents), but the algorithm should converge on a solution even
without that, though it may take a long time to run.  We must not forget
to allow for a tiny amount of mutation - a one-in-a-thousand
or ten-thousand chance of zapping a random number into a sequence.

This will have to stew for a while.  It would be wise to set it up so
that it can be interrupted and save its state to the hard drive and be
able to reload and pick up just where it left off, so you can use the
computer during the day and let the algorithm run all night, every
night.  Periodically, you could play the top half-dozen scorers and
see how close you are to where you want to be.  Eventually the program
will max out a bunch of selections that together make up the "best"
sequence it can get within the limitations of the system's hardware,
and the average evaluation score will rise until it maxes out.  The
algorithm could detect that and exit, though checking it every evening
would not be a bad idea since it may spend quite a few thousand
generations building up the score for the whole paddock even after it
has maxed out one "champion" individual.  Alternatively, one could
simply exit as soon as a specimen having a certain maximal score is
evaluated.

As we like to say in the Unix world, "you are not expected to understand
this" =)  Genetics are a way of stirring random numbers and getting useful
results from them.  Given that humans are, in fact, only about ten
million generations from the teeny little rodent-like mammals that lived
with the last of the dinosaurs, it really isn't that large a leap from
random noise to a specific sequence of midi events that emulates a particular
.wav form.  In theory, this algorithm could be rigged to do arrangements
of very different instruments - you can play anything you like into it,
and then restrict what voices the test sequences in the paddock can use,
and the algorithm should create a "best match" using just those voices.

How good and how fast this system would be, I cannot say.  Genetics
have been used to solve some very, very intractable problems, and
they are one of our last, best hopes for Artificial Intelligence.
But they are not trivial things to code up - and the _math_ that is
behind these things!  People have gotten PhD's just explaining _why_
thse algorithms work as well as they do, as fast as they do.  Most
programs I've seen using genetics are getting good approximations of
solutions within a couple dozen generations, and rarely more than
a couple hundred.  A computer could evaluate that many in less than
an hour.  But the tough part is the evaluator.  Solve that, and I
guess Darwin can do the rest for you, it's a natural law.

In any case, I've been working on this as one way to create new
music box arrangements.  My synthesizer has a music box voice, so
I figured that feeding in a recorded music box would give it a good
chance to find a perfect match and create a midi version of the same
tune.  But there is no theoretical limitation on this, you could set
up a kazoo as the only available output voice and play the Philadelphia
Philharmonic into the input and get a rendition of whatever it was for
the kazoo.  Contrarywise, you could leave the output voices unrestricted
and, in theory, get a midi version of what was played.  BUT - you will
almost certainly get "artifacts" - if an instrument makes the right
noise, the algorithm won't care if it's "weird" - that is, genetics
won't care if one note from a cello is called for in the middle of the
second fiddle's solo, so long as it "sounds" right - this may be fine
for automated music, but if you want perfection you will need to edit
the output by hand for this kind of sillyness (and if you think that's
not true to life, let me tell you about this thing called an "appendix"
=)

This is getting overlong.  If Jody thinks it's okay for the list, he
will append a note to that effect, and if anyone wants to discuss this
and there _isn't_ a note, drop me some email.  This _bears_ on auto-music,
but the topic itself is deep hacking.

Oh, one other thing: for a good explanation of evolution, what it is
and why it works, read Richard Dawkins, "The Blind Watchmaker" - it's
an easy read and not too much math for ordinary mortals.  And for the
more adventurous among you, a good introduction to genetic algorithms
and machinery of genetic selection and mating, you read "Genetic
Algorithms" - I don't recall the author - but it's the best reference
to how to build this kind of stuff.  What _out_for_the_math_ - when
they start explaining "schema" and nature of genetic coding for spacial
searches you can quickly lose your depth - but you only need that chapter
if you want to understand _why_ it works, the algorithm itself, with
the exception of the evaluator as noted above, is fairly trivial, a very
straightforward simulation of the genetic machinery in every living
thing.  Recommended.

regards,
Larry Smith
(Message sent Mon, 09 Oct 95 11:24:39 -0400 , from time zone -0400.)
Key Words in Subject: Analog, MIDI