Guiding Audio Source Separation by Video Object Information
By S. Parekh

Some separation results computed using proposed methods:

  • LS: NMF + Non-negative least squares (NNLS)
  • spLS: NMF + Sparse NNLS
  • JLS Rand: Joint NMF and sparse NNLS with random initialization
  • JLS NMF: Joint NMF and sparse NNLS with W and H initialized using audio mixture NMF output

To play the audio click on the link between square brackets [].

The particularly challenging and interesting case of same instrument mixtures has been highlighted in blue below. Motion information plays a crucial role for such cases.

 

EEP MoCap Dataset

Examples Separation Method Estimated Sources
P1: violin + cello [Mixture] LS [violin] [cello]
spLS [violin] [cello]
JLS Rand [violin] [cello]
JLS NMF [violin] [cello]
P1: violin + viola [Mixture] LS [violin] [viola]
spLS [violin] [viola]
JLS Rand [violin] [viola]
JLS NMF [violin] [viola]
violin (P1) + violin (P2) [Mixture] LS [violin P1] [violin P2]
spLS [violin P1] [violin P2]
JLS Rand [violin P1] [violin P2]
JLS NMF [violin P1] [violin P2]
P1: violin + viola +cello [Mixture] LS [violin] [viola] [cello]
spLS [violin] [viola] [cello]
JLS Rand [violin] [viola] [cello]
JLS NMF [violin] [viola] [cello]

 

URMP Video Set

Examples Separation Method Estimated Sources
violin 1 + violin 2 [Mixture] LS [violin 1] [violin 2]
spLS [violin 1] [violin 2]
JLS Rand [violin 1] [violin 2]
JLS NMF [violin 1] [violin 2]
violin 2 + viola [Mixture] LS [violin 2] [viola]
spLS [violin 2] [viola]
JLS Rand [violin 2] [viola]
JLS NMF [violin 2] [viola]
violin 1 + viola + cello [Mixture] LS [violin 1] [viola] [cello]
spLS [violin 1] [viola] [cello]
JLS Rand [violin 1] [viola] [cello]
JLS NMF [violin 1] [viola] [cello]
violin 1 + violin 2 + viola + cello [Mixture] LS [violin 1] [violin 2] [viola] [cello]
spLS [violin 1] [violin 2] [viola] [cello]
JLS Rand [violin 1] [violin 2] [viola] [cello]
JLS NMF [violin 1] [violin 2] [viola] [cello]