On probing heads for benchmarking self-supervised speech representation models
By Salah Zaiem
Each month, the lab proposes invited talks for its partners.
This month, Salah Zaiem, last year PhD Student in the Audio group of Télécom (ADASP), will give a presentation entitled On probing heads for benchmarking self-supervised speech representation models.
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, and while the number of considered tasks has been growing, most rely upon a single decoding architecture that maps the frozen SSL representations to the downstream labels. This work investigates the robustness of such benchmarking results to changes in the decoder architecture. Interestingly, it appears that varying the architecture of the downstream decoder leads to significant variations in the leaderboards of most tasks. Concerningly, our study reveals that benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.
Salah Zaiem is currently working toward the Ph.D. degree at Telecom Paris, France, supervised by Slim Essid and Titouan Parcollet. His research focuses on understanding and motivating the choices in Selfsupervised learning pipelines for speech.