Recovering Timing from Old Rolls
By Brad Rhodes, forwarded by Robbie Rhodes
|Interesting ideas -- thanks for forwarding them.|
Personally, I wouldn't use GA's for this problem, at least with the genotype being a MIDI string (the phenotype being the .wav file). GA's are usually used when there are a bunch of different parameters interacting with each other in intractable ways. That's not the case here, since a note or chord doesn't affect the rest of the music after it stops resonating. (If you're interested in more about genetic algorithms or genetic programming ask me -- I just deleted a huge discourse on how you might use them on this problem before deciding it wasn't the right approach.)
Since it's been a long time since I've done any signals stuff, I bounced the problem off of a few people around the lab and the best approach seems to be to use a bank of filters, each finding where a different note appears in the music. Here's the jist, and I'm sure my Dad can fill in any details or correct mistakes.
For each voice and note take a few samples from whatever output program you'll be using. These samples should be of different velocities and durations, and you'll want to get some samples with just the onset, while others have the full note. You'll also want to capture the reverb afterwards. Then create a matched filter consisting of the onset of just that note starting at 0 time. (My office mate & I have been looking over our old signals textbook, and it looks like this is done by reversing the sample in time, creating a filter with that impulse response, and then convolving that with the music.) This should create a signal with peaks where the center of the sample matches; use that and the length of the sample to find where the sample should start.
Once you've got the start of the note and where it was, you can play with different durations and velocities to get the closest match. The inner product of the sample with the music over the same period of time should give a number indicating how good a fit the choice is. After that you'll need to go through by hand and clean it up, but that should do most of it. It'll probably work less well with non-percussion instruments since there's a lot more information than just note, velocity, and duration, but it should still give a good approximation.
This is almost certainly already being done, at least in the laboratory if not in commercial products yet. Steve Mann recommended several references in wavelet theory to look at. If these aren't a good match, chances are something they reference will be.
R. Wilson, A D Calway, and E R S Pearson. A generalized wavelet transform for Fourier analysis: the multiresolution Fourier transform and its application to image and audio signal analysis. IEEE Trans. on Information Theory, 38(2):674-690, March 1992
C.E. Heil and D.F. Walnut. Continuous and discrete wavelet transforms. SIAM Review, 31(4):628-666, 1989.
S.G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. on Patt. Anal. and Mach. Intell., 11(7):674-693, 1989
I. Daubechies. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans on Inf. Theory, 36(5):961-1005, 1990
G. Strang. Wavelets and dilation equations: A brief introduction. SIAM Review, 31(4):614-627, 1989
B.C.J. Moore. An introduction to the psychology of hearing. Academic Press, second edition, 1982
I'd also recommend checking out the sound and media group at the MIT Media Lab: <http://sound.media.mit.edu/>. They seem to have projects along these lines.
(Message sent Wed 15 Nov 1995, 08:28:17 GMT, from time zone GMT-0800.)