Resources for the paper: Neuro-steered music source separation with EEG-based auditory attention decoding and contrastive-NMF

Resources for the paper: Neuro-steered music source separation with EEG-based auditory attention decoding and contrastive-NMF
By G. Cantisani

In our paper (Cantisani et al., 2021), we present a novel neuro-steered music source separation framework. In particular, we propose an unsupervised nonnegative matrix factorisation (NMF) variant, named Contrastive-NMF (C-NMF), that separates a target instrument, guided by the user’s selective auditory attention to that instrument, which is tracked in his/her electroencephalographic (EEG) response to music.

The paper

Cantisani, G., Essid, S., & Richard, G. (2021, June). NEURO-STEERED MUSIC SOURCE SEPARATION WITH EEG-BASED AUDITORY ATTENTION DECODING AND CONTRASTIVE-NMF. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP39728.2021.9413841

Demo

The experiments are designed to evaluate whether the EEG information helps the separation process (C-NMF-e). However, to verify that the improvement is due to the EEG and not to the model’s discriminative capacity, in addiction to the blind NMF (NMF), we built a second baseline which consists of the C-NMF to which meaningless side information is given (C-NMF-r). As the models are entirely unsupervised, the factorized components need to be assigned to each source before applying the Wiener filter. In the two baselines, the components are clustered according to their MFCC similarity. In the case of the C-NMF-e we do not need this as the EEG information automatically identifies and gathers the target instrument components.

In Figure, one can compare the Signal-to-Distortion Ratio (SDR) obtained for different methods and instruments in the dataset. For all the instruments except for the Bass, our model performs significantly better than both the blind NMF and C-NMF-r. The high variance experienced when separating the Bass and the Drums is due to the high variance experienced across different subjects.

The audio examples are taken from the test set which is described in the paper. For the proposed model C-NMF-e, we propose several examples which are related to different subjects and different spatial renderings.

Vocals

Mix