Introduction

Conversion of a digital waveform, such as a wave file, to a list of what instruments are playing which notes, as in a midi file, is beyond today's technological capabilities once the music goes beyond one note at a time on a single instrument. It is also beyond most people's capabilities, requiring perfect pitch and a great deal of musical ability if the music is of any complexity.

Everybody knows that accurate speech recognition is very difficult to achieve, but the available software is slowly getting better. Now imagine trying to recognise several speakers at once, with a wide range of accents. In addition, speech recognition is a commercially valuable market, so a lot of time and money goes into this field of research. Music recognition is a much smaller market, so will never achieve the same level of development.

Wave Goodbye was designed for 'decoding' music played on a single instrument but which may contain more than one note at a time, such as piano music. With a little experimentation on the parameter settings, and with a couple of caveats, it performs quite well. If you try to decode a full orchestral work you may get something recognisable out at the end after tweaking the parameter settings, but it will not be very good.

In common to all wave-to-midi conversion software packages there are two areas where the performance of the software is weaker:

Recognising low notes. Because the wavelength of two consecutive low notes is very similar it is difficult for the software to tell them apart. The presence of other notes being played at the same time, even much higher up, can make the job much harder. Wave Goodbye is generally accurate to about an octave below middle C. Low note accuracy should be improved a little in the next version.

Finding very short notes. Most software of this type works be splitting the sound into very short pieces and finding the frequencies present within each segment. However, the shorter each piece is made the harder it is to recognise low notes, and so a compromise must be made. Wave Goodbye takes segments of 2048 samples. (At 44100Hz this is about 1/20 second.) In addition part of the noise removal filtering removes any frequency bands which are only seen during one segment. These means that the shortest possible recognised note must be present in two consecutive samples. If the note starts partway through a sample then the frequency presence within that sample will be weak and may be removed by the amplitude filtering routine. Likewise for the end of the note. In general a note which lasts at least three times the sampling period (0.14 seconds) will always be recognised as long as it is not too quiet with respect to any other notes that are being played at the same time. If the music contains notes shorter than this then they may be dropped.


Page last updated 25th November 2000