Bibliography


Cemgil, T., P. Desain, and B. Kappen. 2000. Rhythm quantization for transcription. Computer Music Journal 24 (2): 60–76.

A method for creating not only note sequences but also digital scores from a vocal audio performance.

Hosted Paper


Chamberlin, D., A. Ghias, J. Logan, and B. C. Smith. 1995. Query by humming - musical information retrieval in an audio database. ACM Multimedia 95: 231–36.

One of the first forays into query-by-humming. Chamberlin et al. provide us with a database searching algorithm based on audio, and formulate the conversion of fundamental frequencies into MIDI pitches.

Hosted Paper


Clarisse, L., J. Martens, M. Lesaffre, B. Baets, H. Meyer, and M. Leman. 2002. An auditory model based transcriber of singing sequences. In Third International Conference on Music Information Retrieval: ISMIR 2002, 116–23.

The human cochlea is modelled in this method of singing transcription. It is implemented for the purpose of query-by-humming. Their approach also includes a standard RMS note onset estimation.

Hosted Paper


Haus, G., and E. Pollastri. 2001. An audio front-end for query-by-humming systems. In Proceedings of the International Symposium on Music Information Retrieval: ISMIR, 116–23.

Tuning considerations are made in this QBH system. In this case, tuning is considered to be constant for a vocal snippet, and thus the most common distance from F0 to tempered pitch is used as the tuning offset for all notes.

Hosted Paper


Kawahara, H., and A. de Cheveigne. 2002. Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America 111 (4): 1917–30.

One of the most popular autocorrelation algorithms. The YIN algorithm is less sensitive to octave errors, and provides a robust method for F0 estimation.

Hosted Paper


Klapuri, A., T. Viitaniemi, and A. Eronen. 2003. Probabilistic models for the transcription of single-voice melodies. Tampere University of Technology: 59–63.

An early implementation of probabilistic models to attack the transcription problem. They include musical context, including key signature information, in their probabilistic methods.

Hosted Paper


Maher, R., and J. J. Beauchamp. 1994. Fundamental frequency estimation of musical signals using a two-way mismatch procedure. Journal of the Acoustical Society of America 95 (4): 2254–63.

An intuitive approach to fundamental frequency estimation. Error functions are created by predicting the harmonics of a signal, and comparing that against the observed harmonics of the different F0 trials.

Hosted Paper


McNab, R., L. Smith, and I. Witten. 1995. Signal processing for melody transcription. Department of Computer Science, University of Waikato: 4.

Provides a time-adaptive tuning algorithm in order to adaptively tune to the user's current tuning context. Allows for the drift of tuning in a singing performance.

Hosted Paper


Ryynanen, M. 2006. Singing transcription. Signal Processing Methods for Music Transcription 4: 361–90.

A comprehensive overview of singing transcription systems and the challenges they face.

Hosted Paper


Ryynanen, M., and A. Klapuri. 2004. Modelling of note events for singing transcription. In Proceedings of IEEE Workshop on Statistical and Perceptual Audio Processing: SAPA, 216–221.

Ryynanen and Klapuri use probabilistic models as well as the extraction of metrical accents in order to better predict where note onsets occur in singing transcription. Their method is comprehensive, and uses an HMM to find the attach, sustain, and silence states for any given note.

Hosted Paper