|
 
Introduction
Conversion of a digital waveform, such as a wave file, to a
list of what instruments are playing which notes, as in a midi
file, is beyond today's technological capabilities once the music
goes beyond one note at a time on a single instrument. It is also
beyond most people's capabilities, requiring perfect pitch and
a great deal of musical ability if the music is of any complexity.
Everybody knows that accurate speech recognition is very difficult
to achieve, but the available software is slowly getting better.
Now imagine trying to recognise several speakers at once, with
a wide range of accents. In addition, speech recognition is a
commercially valuable market, so a lot of time and money goes
into this field of research. Music recognition is a much smaller
market, so will never achieve the same level of development.
Wave Goodbye was designed for 'decoding' music played on a
single instrument but which may contain more than one note at
a time, such as piano music. With a little experimentation on
the parameter settings, and with a couple of caveats, it performs
quite well. If you try to decode a full orchestral work you may
get something recognisable out at the end after tweaking the parameter
settings, but it will not be very good.
In common to all wave-to-midi conversion software packages
there are two areas where the performance of the software is weaker:
Recognising low notes. Because the wavelength of two
consecutive low notes is very similar it is difficult for the
software to tell them apart. The presence of other notes being
played at the same time, even much higher up, can make the job
much harder. Wave Goodbye is generally accurate to about an octave
below middle C. Low note accuracy should be improved a little
in the next version.
Finding very short notes. Most software of this type
works be splitting the sound into very short pieces and finding
the frequencies present within each segment. However, the shorter
each piece is made the harder it is to recognise low notes, and
so a compromise must be made. Wave Goodbye takes segments of 2048
samples. (At 44100Hz this is about 1/20 second.) In addition part
of the noise removal filtering removes any frequency bands which
are only seen during one segment. These means that the shortest
possible recognised note must be present in two consecutive samples.
If the note starts partway through a sample then the frequency
presence within that sample will be weak and may be removed by
the amplitude filtering routine. Likewise for the end of the note.
In general a note which lasts at least three times the sampling
period (0.14 seconds) will always be recognised as long as it
is not too quiet with respect to any other notes that are being
played at the same time. If the music contains notes shorter than
this then they may be dropped.
|