New Master internship position with ADASP: Unsupervised data selection for knowledge distillation of self-supervised speech models.
By S. Zaiem

Our group is hiring a Master intern on the topic “Unsupervised data selection for knowledge distillation of self-supervised speech models.”.

Important informations

  • Date: April/May 2023
  • Duration: 5 to 6 months
  • Place of work: Palaiseau (Paris outskirts), France
  • Remuneration: 600€/month
  • Supervisors: Salah Zaiem, Slim Essid,
  • Contact: salah.zaiem@telecom-paris.fr

Problem statement and context

Knowledge distillation is the process of transferring knowledge from a large teacher model to a smaller student one, to reduce inference time and computational cost. In this internship, we want to explore training data selection for distilling self-supervised speech representation models. Proper distillation requires access to the full dataset used for training the teacher model, which is problematic for two reasons :

  • Training data may not be publicly available, in the cases of private data or unclear data manipulation during training.
  • Training data consists of huge datasets, leading to costly distillations.

We want to explore techniques coming from unsupervised data selection for a better selection of training data in distillation.

References

Chen, S., Wang, C., Chen, Z., Wu, Y., Liu, S., Chen, Z., Li, J., Kanda, N., Yoshioka, T., Xiao, X., Wu, J., Zhou, L., Ren, S., Qian, Y., Qian, Y., Wu, J., Zeng, M., Yu, X., & Wei, F. (2021). WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE Journal on Selected Topics in Signal Processing, 16(6), 1505–1518. https://doi.org/10.1109/JSTSP.2022.3188113

Lee, Y., Jang, K., Goo, J., Jung, Y., & Kim, H. (2022). FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022-September, 3588–3592. https://doi.org/10.48550/arxiv.2207.00555

Lu, Z., Wang, Y., Zhang, Y., Han, W., Chen, Z., & Haghani, P. (n.d.). Unsupervised Data Selection via Discrete Speech Representation for ASR. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022-September, https://arxiv.org/abs/2204.01981

Candidate profile

  • They are currently finishing an M2 degree in Data Science, Machine learning, Signal Processing, or Speech/Audio/Music processing.
  • Strong skills in Python and a good theoretical and practical knowledge of deep learning (using Pytorch) are required.