MMD > Archives > November 1995 > 1995.11.15 > 01Prev  Next


Recovering Timing from Old Rolls
By Brad Rhodes

Interesting ideas -- thanks for forwarding them.

Personally, I wouldn't use GA's for this problem, at least with the
genotype being a MIDI string (the phenotype being the .wav file).  GA's
are usually used when there are a bunch of different parameters interacting
with each other in intractable ways.  That's not the case here, since a note
or chord doesn't affect the rest of the music after it stops resonating.
(If you're interested in more about genetic algorithms or genetic
programming ask me -- I just deleted a huge discourse on how you might use
them on this problem before deciding it wasn't the right approach.)

Since it's been a long time since I've done any signals stuff, I bounced
the problem off of a few people around the lab and the best approach seems
to be to use a bank of filters, each finding where a different note appears
in the music.  Here's the jist, and I'm sure my Dad can fill in any details
or correct mistakes.

For each voice and note take a few samples from whatever output program
you'll be using.  These samples should be of different velocities and
durations, and you'll want to get some samples with just the onset, while
others have the full note.  You'll also want to capture the reverb
afterwards.  Then create a matched filter consisting of the onset of just
that note starting at 0 time.  (My office mate & I have been looking over
our old signals textbook, and it looks like this is done by reversing the
sample in time, creating a filter with that impulse response, and then
convolving that with the music.)  This should create a signal with peaks
where the center of the sample matches; use that and the length of the
sample to find where the sample should start.

Once you've got the start of the note and where it was, you can play with
different durations and velocities to get the closest match.  The inner
product of the sample with the music over the same period of time should
give a number indicating how good a fit the choice is.  After that you'll
need to go through by hand and clean it up, but that should do most of it.
It'll probably work less well with non-percussion instruments since there's
a lot more information than just note, velocity, and duration, but it
should still give a good approximation.

This is almost certainly already being done, at least in the laboratory if
not in commercial products yet.  Steve Mann recommended several references in
wavelet theory to look at.  If these aren't a good match, chances are
something they reference will be.

R. Wilson, A D Calway, and E R S Pearson.  A generalized wavelet transform
for Fourier analysis: the multiresolution Fourier transform and its
application to image and audio signal analysis.  IEEE Trans. on Information
Theory, 38(2):674-690, March 1992

C.E. Heil and D.F. Walnut.  Continuous and discrete wavelet transforms.
SIAM Review, 31(4):628-666, 1989.

S.G. Mallat.  A theory for multiresolution signal decomposition: The
wavelet representation.  IEEE Trans. on Patt. Anal. and Mach. Intell.,
11(7):674-693, 1989

I. Daubechies.  The wavelet transform, time-frequency localization and
signal analysis.  IEEE Trans on Inf. Theory, 36(5):961-1005, 1990

G. Strang.  Wavelets and dilation equations: A brief introduction.  SIAM
Review, 31(4):614-627, 1989

B.C.J. Moore.  An introduction to the psychology of hearing.  Academic
Press, second edition, 1982

I'd also recommend checking out the sound and media group at the MIT Media
Lab: <http://sound.media.mit.edu/>.  They seem to have projects along these
lines.

Brad



(Message sent Wed, 15 Nov 95 00:28:17 PST , from time zone -0800.)

Key Words in Subject:  Old, Recovering, Rolls, Timing