Internship proposal: neuro-steered speech source separation

Internship proposal: neuro-steered speech source separation
By G. Cantisani

Our group is hiring an intern in machine learning for audio signal processing.

Context

Telecom Paris was founded in July 1878 and is one of the top four engineering schools in France for training general engineers. Recognized for its close ties with businesses, this public graduate school ensures excellent employment prospects in all industries and is considered the number-one engineering school for digital technology. With its top-level innovative teaching, Télécom Paris is at the center of a unique innovation ecosystem, drawing on the interaction and cross-disciplinary nature of the school’s academic programs, interdisciplinary research, two business incubators and its campuses (Paris and Sophia Antipolis – EURECOM). Its LTCI laboratory has been accredited by HCERES as an outstanding unit in the field of digital sciences for its international reputation and exceptional number of initiatives supporting the socio-economic world and industry, as well as for its great contribution to teaching.

A founding member of Institut Polytechnique de Paris and an IMT (Institut Mines-Télécom) school, Télécom Paris is positioned as the college for digital innovation on the Paris-Saclay Campus.

The Information Processing and Communication Laboratory (LTCI) is Telecom Paris’ in-house research laboratory. Since January 2017, it has continued the work previously carried out by the CNRS joint research unit of the same name. The LTCI was created in 1982 and is known for its extensive coverage of topics in the field of information and communication technologies. The LTCI’s core subject areas are computer science, networks, data science, signal and image processing and digital communications. The laboratory is also active in issues related to systems engineering and applied mathematics.

The internship will be hosted by Telecom Paris’ Audio Data Analysis and Signal Processing (ADASP) group, a subgroup of the statistics, signal processing and machine learning (S²A) team, within the Images, Data & Signals (IDS) department.

Internship topic

Listeners suffering from hearing loss can hardly follow a conversation in a multi-speaker environment. Current hearing aids can suppress ambient noise but still struggle with multiple speakers, where it is not trivial to state which speaker should be enhanced and which should be suppressed.

Auditory attention is the cognitive mechanism that allows humans to focus on a sound source of interest. This allows the brain to extract and process high-level sound content effectively and efficiently. Auditory attention decoding aims at determining which sound source a person is ``focusing on’’ by just analyzing the listener’s brain response. Previous AAD studies based on continuous electroencephalographic (EEG) signals have shown that the neural activity tracks dynamic changes in the audio stimulus and can be successfully used to decode selective attention to a speaker [1].

The internship’s goal is to develop new machine learning algorithms that automatically separate the speaker of interest thanks to the listener’s EEG response [2],[3]. The candidate will be expected to:

Select a proper dataset and study the relevant literature
Develop new algorithms and run experiments
Perform evaluations and compare with state-of-the-art
Communicate research progress and document results

[1] J. A O’sullivan, et al. Attentional selection in a cocktailparty environment can be decoded from single-trial EEG. Cerebral Cortex, 25(7):1697–1706, 2014.

[2] E. Ceolini, at al. Brain-informed speech separation (BISS) for enhancement of targetspeaker in multitalker speech perception.NeuroImage, 2020

[3] G. Cantisani, et al. Neuro-steered music source separation with EEG-based auditoryattention decoding and contrastive-NMF.Hal preprint, 2020

Requirements

Knowledge of machine learning and signal processing
Oral and written proficiency in English
Programming: Python for scientific programming, LaTeX
Soft skill (communication, problem solving)
Knowledge of optimisation, EEG, and audio signal processing will be appreciated.

Place of work

Palaiseau (Paris outskirts), France

Duration

6 months

Contact

Cantisani Giorgia

Slim Essid

Gaël Richard

Share on

Twitter Facebook LinkedIn