Deep Generative Models for Audio Applications
By Yuki Mitsufuji

Yuki Mitsufuji presents the work of the Music Foundation Model Team in Sony AI.

Abstract

Music Foundation Model Team in Sony AI is responsible for the building blocks of foundation models (deep generative modeling & multimodal pretraining) and the development of technologies for the generation, restoration and compression of music and cinematic media. I will introduce our recent works recently accepted at top venues including ICASSP, ICLR, ICML and show several demos made available when the commercial products powered by our technologies were released.

Biography

Yuki Mitsufuji, PhD, holds dual roles at Sony, leading two departments (Creative AI Lab, Music Foundation Model Team), and is a specially appointed associate professor at the Tokyo Institute of Technology, where he lectures on generative models. He’s achieved Senior Member status in IEEE and serves on the IEEE AASP Technical Committee 2023–2026. He chaired “Diffusion-based Generative Models for Audio and Speech” at ICASSP2023 and will co-chair “Generative Semantic Communication: How Generative Models Enhance Semantic Communications” at ICASSP2024.

More on the speaker’s website.