Machine listening in a carbon-constrained world
By Vincent Lostanlen

Vincent Lostanlen will introduce his work about green machine listening models


The climate emergency compels us to rethink how we conduct machine listening research. Against the “business as usual” narrative, we must anticipate new demands, new constraints, and new risks. Yet, the current literature tends to oversimplify the problem by reducing it to mere CPU/GPU usage. In this talk, i will explain why the quest towards Green(er) IT must involve three important concepts: life-cycle assessment (LCA), rebound effects, and intermittency. Then, i will present a personal perspective on our duty as audio-focused scientists in a changing world. I will stress the importance of documenting the role of sound in ecological processes such as migration, species assemblages, and ecosystem decline. Secondly, i will describe an early-stage prototype of eco-acoustic sensor which operates intermittently without wires nor batteries by harvesting solar energy and performing sound event detection on device. I will show that the quest for Green(er) IT has implications at the architectural level for deep neural networks, and will outline a roadmap towards kilobyte-sized machine listening models performing audio content analysis in real time on FPGA’s or general-purpose microcontroller hardware. I will conclude by provoking a renewed discussion on the digital music technologies of the future: what kinds of auditory human-machine interactions will we afford?


I am a scientist (chargé de recherche) at CNRS, the French national center for scientific research. I belong to a research unit named LS2N, which stands for “Nantes Laboratory of Digital Sciences”. My office is located on the campus of the “Centrale Nantes” engineering school. I am also a visiting scholar at New York University.

The main goal of my research is to build an artificial intelligence of sounds. I aim to further the understanding, the integration, and the democratization of machine listening systems. I also strive to expand the scope of audio content analysis beyond the use case of speech recognition; thus encompassing music, animal vocalizations, and urban sounds.

Our auditory system is highly complex, which calls for the development of scientific protocols at various levels of abstraction: from closed-form physical models of sound synthesis to qualitative studies of musical interactions involving both humans and machines. At an intermediate level, I give a pivotal role to convolutional operators in the time–frequency domain, such as scattering transforms and deep convolutional networks.

More on the speaker’s website.