CHIEN Yu-Ren, Hsin-Min WANG, Shyh-Kang JENG
Simulated Formant Modeling of Accompanied Singing Signals for Vocal Melody Extraction
Edition: Proceedings of the 9th Sound and Music Computing Conference (SMC12), Copenhagen, Denmark, p.33-40
Type of media: Article
Comment: Yu-Ren Chien, Graduate Institute of Communication Engineering, National Taiwan University, Taiwan
Hsin-Min Wang, Institute of Information Science, Academia Sinica, Taiwan
Shyh-Kang Jeng, Department of Electrical Engineering, National Taiwan University, Taiwan
This paper deals with the task of extracting vocal melodies from accompanied singing recordings. The challenging aspect of this task consists in the tendency for instrumental sounds to interfere with the extraction of the desired vocal melodies, especially when the singing voice is not necessarily predominant among other sound sources. Existing methods in the literature are either rule-based or statistical. It is difficult for rule-based methods to adequately take advantage of human voice characteristics, whereas statistical approaches typically require large-scale data collection and labeling efforts. In this work, the extraction is based on a model of the input signals that integrates acoustic-phonetic knowledge and real-world data under a probabilistic framework. The resulting vocal pitch estimator is simple, determined by a small set of parameters and a small set of data. Tested on a publicly available dataset, the proposed method achieves a transcription accuracy of 76%.