Simplifying the Creation of Spoken-word Content
By G. Mysore
Gautham Mysore talks to us about research aimed at allowing people to easily create high quality content without the need to become an audio engineer
Spoken-word content such as podcasts, radio stories, audiobooks, vlogs, and lecture videos are very prevalent these days. However, creating high quality content can be quite challenging, especially for novices. High quality recording equipment and recording environments are expensive and can be difficult to set up and use. Voice editing tools can have steep learning curve. Finally, people often have limited voice acting skills, which affects the production quality. We therefore aim to dramatically simplify this process to allow people to easily create high quality content without the need to become an audio engineer. In this talk, I will present our work in this space and a present a number of open problems. This includes variations of classical speech processing problems like speech enhancement and synthesis, as well as new problems at the intersection of signal processing, machine learning, and HCI.
Gautham Mysore is a principal scientist and head of the Audio Research Group at Adobe Research in San Francisco. He is also an Adjunct Professor at Stanford University in the Center for Computer Research in Music and Acoustics (CCRMA). His research involves developing new machine learning and signal processing algorithms for a wide variety of real-world audio applications. Gautham received his Ph.D. (CCRMA), M.A. (CCRMA), and M.S. (Electrical Engineering) from Stanford University. He has previously been a visiting researcher at the Gatsby Computational Neuroscience Unit at the University College London. He has co-authored over 60 papers and 35 patents. He has been a member of the IEEE technical committee on Audio and Acoustic Signal Processing, and was a technical program co-chair of WASPAA 2017.