Hi everyone, I've been following the discussion about converting audio
into MIDI file, and like to point out some things from my experience.
Some of it has been said, so I may repeat it partially, but this gives
you a more complete picture.
The Audio-To-Midi project could be divided in to 3 parts:
1. Getting the audio in to your program,
2. Doing the actual conversion,
3. Edit the resulting MIDI file.
Point 1 (getting the audio in to your program) is the least of your
problems. There are many ways to do it, but you should be able to get
a first rate wave file, depending on your source. If it is a
professional recording on CD or MP3, etc., you can import and convert it
to a wave file without any loss compared to the source. If the source
is an old cassette or vinyl recording, the result will be very much
depending on the quality of those. You can try editing the resulting
wave by removing noise (take this "remove" not to serious) or enhance
certain frequencies and so on.
Point 2 (doing the actual conversion) is a completely different matter.
A computer does things by the numbers. It looks for a certain
condition that is "true", or true within a certain range, and responds
to it with some action.
Real life audio, on the other hand, is a very complex mixture of
sounds. Humans are very clever in breaking this mixture into different
parts for processing in the brain. There are many situations that one
does not really understand what is said, but one still grasps the idea
of the situation.
For example, if you enter a party room with a lot of noise, and your
friend on the other side of the room shouts to you, "Do you want a beer,"
you probably will not understand what he is saying, but you will still
grasp the question by understanding the situation and the obvious
gestures (he points to his bottle of beer!).
The same goes for a piece of music. If you look at a [visual image of
a] fragment of audio, it will be very hard (or impossible) to say what
this sound consists of -- it's just a lot of frequencies on top of each
other. If I would change the attack characteristics of a piano note
into the attack characteristics of a violin, I could probably fool you
to believe that you are hearing a (strange kind of) violin instead of a
Why is it that the ear and brain can understand this mixture of
frequencies? Well, we have a big library of sounds (and how particular
instruments sound) and musical situations in our head, and from out of
the puddle of sound we can "understand" a certain musical line played
by an instrument. If a saxophone is playing a downward movement, we
will anticipate (and partly create) this line in our brain. In other
words, we actually make things up that are not really there (since all
the frequencies piled on top of each other cancel out a lot of useful
I have no doubt that, in the future, computers will be able to do it,
but then we are talking of really "intelligent" computers, having a lot
of background information and libraries -- a computer program that has
"learned" how each different instrument in the piece sounds, and how
music theory works, a computer program that can follow the musical
phrases played by each instrument and that can fill in the blanks when
the mixture is so overloaded that it contains just about all
The programs that are available today only look at frequencies [the
audio power spectrum], and try to determine the fundamental frequency
out of the harmonic content of a note. A chord played on a piano will
probably already be too much, since the fundamental of the higher notes
fall near to the overtones of the lower notes, and the program is just
told to disregard the harmonics. So if we talk "solo", we mean
actually monophonic -- one note at a time. A piano played in chords is
polyphonic, and can not be considered "solo".
Point 3 has it's own problems. Say that you solve point 2. Then you
will end up with the same as if a piano player played a piece into a
MIDI recording software program. If he did this freely (not following
the metronome of the recording device), then the notes will not be in
the right place in terms of the bar lines. The reason is that the
tempo of the piece and the tempo of the MIDI file [during recording]
are different. To solve this problem requires a lot of thinking on
it's own, particularly if the piece has tempo changes.
If we want to get the music placed on the bar lines, we can use tools
such as "Stretch/Shrink", "Slide" and "Quantize", etc., which is very
time consuming work.
But we must make some decisions. Are we going to get each note on
the exact spot (so actually changing the result in more exact tempo,
loosing human feel)? Or are we going to settle for the notes to be
"near" the bar lines, inserting tempo changes all over the place to
keep the feel as original as possible?
Most of the time you would want an "exact" (on the bar) version if you
want a printed score, and a more human version for performance.
Well, folks, point 3 is a little insight of the problems that we face
in making automated music. And as we here at home in Herentals get
closer to completion of our "humanizing" efforts of our organ pipes,
accordion and percussion instruments, the problems just keep increasing.
Before, it was only entering and editing the notes. Now, if you have
the notes, you are just started. Making just one long note sound as if
played by a human requires many controllers.
In a few weeks we are doing our second Opera in Belgium and Holland
with our latest developed organ. Each rank of pipes has it's own
computer controlled pressure device, as well as the two accordions.
Some of the MIDI files for this instrument are over 2 MB large. You
can easily say that a MIDI file for a pressure controlled organ is 50
times bigger than a normal MIDI file. And this is without the many
thousand vibrato and chorus controllers that are automatically
generated by extra software.
Tony & Frank Decap, DECAP Herentals, Belgium
[ Frank and Tony create all of the music played on the marvelous big
[ MIDI-controlled Decap dance hall organs that they build in Herentals.
[ -- Robbie