Giorgia Cantisani's PhD defense
By G. Cantisani

Giorgia Cantisani will defend soon her PhD thesis, entitled ““Neuro-steered Music Source Separation”.

The defense will take place on Monday, December 13, 2021, at 4 pm at Télécom Paris:

The jury is composed as follows:

  • Ms Isabelle BLOCH, Professor, Télécom Paris (President)

  • Mr Alexandre GRAMFORT, Senior Research Scientist, Inria (Reviewer)

  • Mr Shihab SHAMMA, Professor, ENS (Reviewer)

  • Ms Elaine CHEW, Senior CNRS Researcher, IRCAM (Examiner)

  • Ms Blair KANESHIRO, Adjunct Professor, Stanford University (Examiner)

  • Mr Slim ESSID, Professor, Télécom Paris (Thesis co-director)

  • Mr Gaël RICHARD, Professor, Télécom Paris (Thesis director)

  • Mr Alexey OZEROV, Senior research scientist, Interdigital R&D, France (Invited guest)

The presentation will be in English.


In this PhD thesis, we address the challenge of integrating Brain-Computer Interfaces (BCI) and music technologies on the specific application of music source separation, which is the task of isolating individual sound sources that are mixed in the audio recording of a musical piece. This problem has been investigated for decades, but never considering BCI as a possible way to guide and inform separation systems. Specifically, we explored how the neural activity characterized by electroencephalographic signals (EEG) reflects information about the attended instrument and how we can use it to inform a source separation system. First, we studied the problem of EEG-based auditory attention decoding of a target instrument in polyphonic music, showing that the EEG tracks musically relevant features which are highly correlated with the time-frequency representation of the attended source and only weakly correlated with the unattended one. Second, we leveraged this ``contrast’’ to inform an unsupervised source separation model based on a novel non-negative matrix factorisation (NMF) variant, named contrastive-NMF (C-NMF) and automatically separate the attended source. Unsupervised NMF represents a powerful approach in such applications with no or limited amounts of training data as when neural recording is involved. Indeed, the available music-related EEG datasets are still costly and time-consuming to acquire, precluding the possibility of tackling the problem with fully supervised deep learning approaches. Thus, in the last part of the thesis, we explored alternative learning strategies to alleviate this problem. Specifically, we propose to adapt a state-of-the-art music source separation model to a specific mixture using the time activations of the sources derived from the user’s neural activity. This paradigm can be referred to as one-shot adaptation, as it acts on the target song instance only. We conducted an extensive evaluation of both the proposed system on the MAD-EEG dataset which was specifically assembled for this study obtaining encouraging results, especially in difficult cases where non-informed models struggle.