Karim Ibrahim's PhD defense
By S. Zaiem

Karim Ibrahim will defend soon his PhD thesis, entitled “Informed Audio Source Separation with Deep Learning in Limited Data Settings”.

The defense will take place on Thursday, December 16, 2021, at 9:30am at Télécom Paris:

19 place Marguerite Perey F-91120 Palaiseau Salle : Amphi 4

The jury is composed as follows:

  • M. Talel ABDESSALEM, Télécom Paris, (President)
  • M. Markus SCHEDL, Johannes Kepler University Linz, (Reviewer)
  • M. Jean-François PETIOT, Ecole Centrale de Nantes, (Reviewer)
  • M. Kyogu LEE, Seoul National University, (Examiner)
  • Mme Elena CABRIO, Université Côte d’Azur, Inria, CNRS, I3S, (Examiner)
  • M. Geoffroy PEETERS, Télécom Paris, (Thesis director)
  • Mme Elena EPURE, Deezer, (Thesis co-director)
  • M. Gaël RICHARD, Télécom Paris, (Invité, thesis co-director)

The presentation will be in English.


The exponential growth of online services and user data changed how we interact with various services, and how we explore and select new products. Hence, there is a growing need for methods to recommend the appropriate items for each user. In the case of music, it is more important to recommend the right items at the right moment. It has been well documented that the context, i.e. the listening situation of the users, strongly influences their listening preferences. Hence, there has been an increasing attention towards developing recommendation systems. State-of-the-art approaches are sequence-based models aiming at predicting the tracks in the next session using available contextual information. However, these approaches lack interpretability and serve as a hit-or-miss with no room for user involvement. Additionally, few previous approaches focused on studying how the audio content relates to these situational influences, and even to a less extent making use of the audio content in providing contextual recommendations. Hence, these approaches suffer from lack of interpretability.

In this dissertation, we study the potential of using the audio content primarily to disambiguate the listening situations, providing a pathway for interpretable recommendations based on the situation.

First, we study the potential listening situations that influence/change the listening preferences of the users. We developed a semi-automated approach to link between the listened tracks and the listening situation using playlist titles as a proxy. Through this approach, we were able to collect datasets of music tracks labelled with their situational use. We proceeded with studying the use of music auto-taggers to identify potential listening situations using the audio content. These studies led to the conclusion that the situational use of a track is highly user-dependent. Hence, we proceeded with extending the music-autotaggers to a user-aware model to make personalized predictions. Our studies showed that including the user in the loop significantly improves the performance of predicting the situations. This user-aware music auto-tagger enabled us to tag a given track through the audio content with potential situational use, according to a given user by leveraging their listening history.

Finally, to successfully employ this approach for a recommendation task, we needed a different method to predict the potential current situations of a given user. To this end, we developed a model to predict the situation given the data transmitted from the user’s device to the service, and the demographic information of the given user. Our evaluations show that the models can successfully learn to discriminate the potential situations and rank them accordingly. By combining the two model; the auto-tagger and situation predictor, we developed a framework to generate situational sessions in real-time and propose them to the user. This framework provides an alternative pathway to recommending situational sessions, aside from the primary sequential recommendation system deployed by the service, which is both interpretable and addressing the cold-start problem in terms of recommending tracks based on their content.