Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixture Representation for Blind Source Separation
By M. Fontaine

M. Fontaine talks to ADASP about Gaussian scale mixture representation for blind source separation


This article describes an audio blind source separation (BSS) method called GSM-FastMNMF that extends the multichannel nonnegative matrix factorization with a frequency jointly-diagonalizable spatial model, a.k.a. FastMNMF, based on the Gaussian scale mixture (GSM) probability distribution to represent various heavy-tailed distributions. As two instances of GSM-FastMNMF, we present beta-FastMNMF based on the super-Gaussian distribution and GH-FastMNMF based on the generalized hyperbolic distribution, which encompasses the original Gaussian FastMNMF, an existing variant based on the Student’s $t$ distribution, and a newly proposed variant based on the normal inverse Gaussian (NIG) distribution. To optimize GSM-FastMNMF, we follow the expectation-maximization (EM) framework to derive a computationally-efficient expectation computation (E-step) for each $\beta$-FastMNMF and GH-FastMNMF variants and a common multiplicative update rule (M-step) for both variants. We demonstrate that such heavy-tailed extensions as NIG outperforms the other variants for speaker separation.


After a PhD in Inria Nancy Grand-Est entitled “ alpha-stable process for signal processing”, Mathieu Fontaine started in October 2019 a Postdoc in RIKEN Artificial Intelligence Project (AIP) and became a guest at Kyoto University. His interests is mainly on audio signal processing including, but not limited, to speech enhancement, speaker separation, source localization and music source separation using heavy-tailed probabilistic models and/or deep bayesian networks.