MMD
Archives
October 1996
1996.10.26
Prev Next
WAV to MIDI; The Physics
By matt@Physics.usyd.edu.au
 [ Editor's note: The following article was forwarded to us by Claus,
 [ but is indexed seperately in Digest header.

In article <3269AE02.2455@webworldinc.com>,
Christopher Weare  <cweare@webworldinc.com> wrote:

> > AND THAT'S THE KEY!... "A SINGLE INSTRUMENT"
> >
> > AFTER ALL THIS DISCUSSION, I STILL CAN NOT FATHOM HOW A DEVICE (BEING
> > HARDWARE/SOFTWARE) CAN ACCURATELY TRACK MULTIPLE INSTRUMENTS, CONVERT
> > THEM TO A MIDI NOTE (OK... I CAN SEE DOING THAT VIA FFT) .... Geeeee...
> > MAYBE ONLY ONE INSTRUMENT AT A TIME IS POSSIBLE BUT .. A
> > MULTI-INSTRUMENT CONVERSION... I THINK NOT !!!!!
>
> Consider this:  A human can do it.  In time, machines will be able to do
> it reliably.  There are already several attempts that have varying
> degrees of success and extracting the note info from multi instrument
> recordings.  None are yet robust enough to survive as a commercial
> product, but it is only a matter of time.  There is no fundamental
> reason why a "machine" implementation would never be able to solve this
> problem.

I've done quite a bit of signal analysis and processing as part of my
physics degree, so lets think about this one....

To track a single instrument you need to pick out the fundamental
frequency. Not a problem.. fourier transform it and pick out the lowest
peak. Call this a note. If you want you can probably sort out some sort
of correlation to the amplitude of the original wave and MIDI velocity
or volume information. Note that you are throwing out all the things
that make the instrument unique.. ie  which of the higher harmonics are
present, and their relative strengths.

Now add a second instrument. Say we have a violin and a flute. Both
instruments will have a fundamental frequency and a pile of overtones.
If the music is in anyway tuneful, a lot of them will probably overlap.
Now the human ear can detect these instruments individually, but only
because they are sounds we recognise. (What if somebody created an
instrument that sounded just like a flute and a violin playing in
unison? You'd pick it as two instruments).

Okay.. so we need to tell our computer what a violin sounds like, and
what a flute sounds like, by giving it a signature to work with. We
know that certain harmonics will be present in each instrument, and
depending on how its played they will appear in certain ratios. We
could even get the computer to analyse a section of music, looking for
simularities in wave patterns to define a verse, or a chorus, or a
middle eight or whatever, and then analyse the fourier transforms of a
number of tiny sections within the music to determine what instruments
are present, then apply this knowledge to the music to extract the
actual notes.

It is definitely not a simple "run it through" program, but an
iterative series of fits to the data. Basically what you do is apply a
model (I have these model instruments playing these model notes) and
changing it over time to better fit the output you have (This is a
common practise in all sciences that fit models to data).

Basically, it's not a trivial problem, and given that your average
piece of music contains several instruments playing several thousand
notes, quite a lot of number crunching needs to be done to fit the
music to the sound.

The human brain does all this in real time .. incredible.

Note the method descibed is essentially a brute force method.  The
algorithm could probably be refined, but somebody is gonna have to
implement the brute force method first.

Matt

--¶
Plan:¶
To retain the childlike enjoyment for the simple things in life,¶
while aquiring the maturity to fully appreciate them.
(Message sent 21 Oct 1996 05:49:46 GMT , from time zone +0000.)
Key Words in Subject: MIDI, Physics, WAV