New Master internship position with ADASP: Improving Urban Sound Event Detection with Unsupervised Source Separation
By F. Angulo
Our group is hiring a Master intern on the topic “Improving Urban Sound Event Detection with Unsupervised Source Separation”.
Important informations
- Date: March/April 2023
- Duration: 5 to 6 months
- Place of work: Palaiseau (Paris outskirts), France
- Remuneration: 600€/month
- Supervisors: Florian Angulo, Slim Essid, Geoffroy Peeters
- Contact: florian.angulo@telecom-paris.fr
Problem statement and context
Sound event detection (SED) consists in predicting automatically which sound class occurs in an audio recording and when. It has promising applications in urban noise pollution monitoring [1]. SED in real conditions involve complex soundscapes with overlapping sound events and varying sound levels. Several works proposed to use unsupervised sound separation to improve SED systems under those challenging conditions [2]. Recently, it was successfully applied in bird sound event detection [3]. This motivates us to investigate how well this applies to general-purpose urban sound event detection.
Proposal description
In this internship, linked to a PhD thesis project, we propose to :
- re-implement and tune an efficient unsupervised urban sound source separation system based on state-of-the-art deep neural networks [4] and training strategies [5].
- Use this system jointly with (provided) SED systems on multiple urban sound datasets [1] and analyse its performances in comparison with SED systems applied directly on the audio (without separating sources), as in [3].
- Explore self-training strategies to improve the joint separation-classification system [6][7].
[1] Cartwright et. al. “SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context.” DCASE, 2020
[2] Turpault et. al. “Improving Sound Event Detection In Domestic Environments Using Sound Separation.” DCASE, 2020
[3] Denton et. al. “Improving Bird Classification with Unsupervised Sound Separation”. ICASSP, 2022
[4] Tzinis et.al. “Compute and Memory Efficient Universal Sound Source Separation.” Journal of Signal Processing Systems, 2022.
[5] Wisdom et. al. “Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation.” WASPAA, 2021
[6] Pishdadian et al. “Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019
[7] Zhang et. al. « Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation ». CVPR, 2022
Candidate profile
- They are currently finishing an M2 degree in Data Science, Machine learning, Signal Processing, or Speech/Audio/Music processing.
- Strong skills in Python and a good theoretical and practical knowledge of deep learning (using Pytorch) are required.