Weakly Supervised Representation Learning for Audio-Visual Scene Analysis
By S. Parekh

We propose a novel multimodal framework that instantiates multiple instance learning for audio-visual representation learning

The paper

(Parekh et al., 2019): Parekh, S., Essid, S., Ozerov, A., Duong, N. Q. K., Pérez, P., & Richard, G. (2019). Weakly Supervised Representation Learning for Audio-Visual Scene Analysis. IEEE/ACM Transactions on Audio, Speech and Language Processing. https://hal.telecom-paris.fr/hal-02399993

Note

The demo is also available on the author’s website.