Multitask learning for fundamental frequency estimation in music
By R. Bittner
Rachel Bittner (Spotify) presents to ADASP about a multi-task deep learning architecture that jointly predicts outputs for multi-f0, melody, vocal and bass line estimation.
Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately and until recently little work had used learning-based approaches. We present a multi-task deep learning architecture that jointly predicts outputs for multi-f0, melody, vocal and bass line estimation and is trained using a large, semi-automatically annotated dataset. We provide evidence of the usefulness of the recently proposed Harmonic CQT to fundamental frequency. Finally, we show that the multitask model outperforms its single-task counterparts, and that the addition of synthetically generated training data is beneficial.
Rachel is a Research Scientist at Spotify in New York City, and recently completed her Ph.D. at the Music and Audio Research Lab at New York University under Dr. Juan P. Bello. Previously, she was a research assistant at NASA Ames Research Center working with Durand Begault in the Advanced Controls and Displays Laboratory. She did her master’s degree in math at NYU’s Courant Institute, and her bachelor’s degree in music performance and math at UC 2 Irvine. Her research interests are at the intersection of audio signal processing and machine learning, applied to musical audio. Her dissertation work applied machine learning to various types of fundamental frequency estimation.