Weakly Supervised Representation Learning for Audio-Visual Scene Analysis
By S. Parekh
We propose a novel multimodal framework that instantiates multiple instance learning for audio-visual representation learning
The paper
(Parekh et al., 2019): Parekh, S., Essid, S., Ozerov, A., Duong, N. Q. K., Pérez, P., & Richard, G. (2019). Weakly Supervised Representation Learning for Audio-Visual Scene Analysis. IEEE/ACM Transactions on Audio, Speech and Language Processing. https://hal.telecom-paris.fr/hal-02399993
Note
The demo is also available on the author’s website.