Generative Modeling at Dolby
By Joan Serrà

Joan Serrà presents an overview of research on generative modelling for audio at Dolby labs.


The Applied AI team at Dolby Laboratories was working in generative modeling before it was mainstream (since 2018). In this talk, I’ll provide an overview of the main works we have published on generative modeling for audio signals. Those include now-obvious or benchmark generative modeling problems, such as text-to-audio or image-to-audio, but also problems where generative modeling was previously thought to be not appropriate or suboptimal, such as audio coding or speech enhancement.


Joan Serrà leads the Applied AI team at Dolby Laboratories (since 2022), where he performs research on machine learning with application to audio and multimodal analysis/synthesis. Joan did an MSc and PhD in machine learning for audio at the Music Technology Group of Universitat Pompeu Fabra (2006-2011) and a postdoc in artificial intelligence at IIIA-CSIC (2011-2015). After that, he joined Telefónica R&D as a machine learning researcher (2015-2019) and, later, Dolby Laboratories as an AI staff researcher (2019-2022). Joan has had research stays at the Max Planck Institute for the Physics of Complex Systems (2010) and the Max Planck Institute for Computer Science (2011). He has been involved in several research projects, funded by National and European institutions, and co-authored over 100 publications, many of them highly cited and/or in top tier venues. He occasionally acts as reviewer or area chair for some of those venues (provided articles are free access/charge) and gives talks and lectures on subjects of his interest (lately basically related to deep learning and generative modeling).

More on the speaker’s website.