Robust Speech Separation with Independent Vector Analysis

Robust Speech Separation with Independent Vector Analysis
By Robin Scheibler

Robin Scheibler, Senior Researcher at LINE Corporation, Tokyo, Japan will introduce his work about Robust Speech Separation algorithms and audio toolboxes

Abstract

Independent Vector Analysis (IVA) is a statistical method for multichannel blind source separation that is particularly suited for the separation of convolutional mixtures, such as those of audio signals. IVA uses two key components: the statistical independence and an assumed model for the sources to separate. Traditional source models like super-Gaussian or non-negative low-rank give good, but somewhat limited performance. In this talk, we will describe recent progress in including rich models learnt from data into IVA. This algorithm preserves the powerful independence assumption while leveraging the ability of neural networks to model complex sources such as speech. It has interesting properties like being agnostic to the number of sources to separate, generally light on the number of parameters, and robust to mismatch between training and test data. We will describe the general algorithm and some recent extensions to unsupervised learning, with some help from signal processing, as well as joint training with an ASR backend. In a brief second part to the talk, I will describe pyroomacoustics, a python package for the simulation of room acoustics, as well as torchiva, a recently released PyTorch toolbox for IVA.

Biography

My name is Robin Scheibler and I am a researcher focusing on data and signals. My research interests have taken me from proving theorems to literally jumping in a cold mountain lake to sample its water, checking for the presence of arsenic. When I work on a problem, I want to take it from the idea all the way to the implementation into a physical prototype.

Currently, I live in Tokyo and work on multichannel speech enhancement and blind source separation at LINE Corporation.

I am the main contributor to Pyroomacoustics, a Python package for room simulation and audio array processing. The package lets the user easily create acoustic scenarios and simulate the propagation between sources and microphones.

I got my B.Sc., M.Sc., and Ph.D. from EPFL in Switzerland. I was lucky enough to be a member of the awesome Audiovisual Communications Laboratory from 2012 to 2017. Between 2017 and 2020, I was a post-doctoral research at Ono Laboratory working on the sound-to-light sensors: Blinkies. Before all that, I worked for NEC in Japan, and IBM Research in Switzerland.

In my free time, I like to create electronics gadgets. I am an early member of SAFECAST Japan where I build mobile Geiger counters to monitor the fallout from the Fukushima Dai-Ichi nuclear accident. I co-founded the biodesign for the real-world project.

Share on

Twitter Facebook LinkedIn