In the deliverable ‘
D3.2 – Predictive analytics and recommendation framework v2’,
Thomas Lidy, Adrian Lecoutre, Khalil Boulkenafet from
Musimap, and
Manos Schinas, Christos Koutlis, Symeon Papadopoulos form
CERTH, presented the work conducted in our project to meet the requirements related to
predictive analytics and
recommendations, aiming to produce
popularity-oriented results for
artists, tracks and genres.
Regarding
track’s popularity estimation and prediction, the team described that overall track popularity is estimated from the following sources:
Deezer rank, Spotify popularity, views and likes on YouTube, and global airplay counts provided by our partner
BMAT. The team compared
k-Nearest Neighbors (kNN) against
Long-Short-Term-Memory (LSTM) deep learning models, and concluded that LSTM resulted slightly better, but at a dramatically higher computational cost; therefore, decided to use the kNN implementation instead. More importantly,
Thomas Lidy, Adrian Lecoutre, Khalil Boulkenafet Manos Schinas, Christos Koutlis, and Symeon Papadopoulos explained that in both cases, since these models are based on machine learning models trained on historical crawled signals, they will theoretically become more and more accurate over time. In general, the different analyses have shown that
a simple trend or seasonality can be easily predicted for a relatively small time prediction interval.
Nevertheless, it is pinpointed that ‘
the difficulty can rise significantly when the source signal is almost static (such as in the case of Spotify popularity), the predicted target goes farther in the future, or when an unexpected trend appears without any detectable pattern in the previous days of history in the data’. As the authors mention ‘
we have studied that a multivariate LSTM approach can potentially overcome these issues when more data becomes available. In the current implementation the kNN approach is used to predict a track’s popularity up to 21 days into the future, based on up to 28 days of history.’
For
artist popularity our team proposed a
non-linear aggregation method in order to combine diverse sources of popularity information, like
Spotify followers, YouTube views, Last.fm playcounts etc. This method actually leverages geometrical shapes formed by the normalized metric values obtained for each artist and combines them by computing a fraction where the numerator corresponds to the artist under study and the denominator corresponds to the best possible case i.e.
the most popular possible artist. The results showed that
this method outperformed the most natural choice being a simple average and also it
outperformed other non-linear metric aggregation methods in terms of correlation, rank correlation and rank distance with the ground truth.
Following now the
impact of events, such as
album release, TV show appearance or interview, on an artist’s popularity level, it was remarkable that no significant changes were observed on popularity metrics such as YouTube views/subscribers and Last.fm playcounts, after the events, but changes were observed on streaming activity (Spotify, iTunes, Deezer streams). The FuturePulse team compared two different methods that estimate the level of impact an event has on future popularity values/streaming activity. In addition, the segmented linear regression method showed good performance
identifying accurately the upcoming changes after an event.
Last but not least, for the
estimation of genre popularity and growth the authors elaborated on the way they worked in order to tackle the problem of data sparsity. By analysing genre occurrences in Spotify artists using a graph embedding technique, our team identified sub-genre associations between genres. As they explained, ‘
that information is then used to count genre appearances in music charts’. However, although the first results were promising, when these associations were considered, there was a need to further evaluate the identified associations as well as the popularity scores generated.
As for future activities for the future, the authors mentioned the following goals:
- Analyze playlists and develop a methodology to detect similar playlists based on co-listening patterns, content similarity and music genres
- Update tracks popularity estimation and prediction by adding more sources e.g. country-wise airplays, charts, playlists, Spotify analytics data etc.
- Investigate multivariate approaches (e.g. LSTM) again with more data available
- Update and evaluate genre popularity estimation, by using genre associations
- Combine in a principled way the several artist popularity estimations developed separately for each of the use cases.
- Co-inform track popularity estimation and artist popularity estimation mutually
You can read the
D3.2 – Predictive analytics and recommendation framework here.
Source: Deliverable D3.2 Predictive analytics and recommendation framework v2 -
Authors: Thomas Lidy (MMAP), Adrian Lecoutre (MMAP), Khalil Boulkenafet (MMAP), Manos Schinas (CERTH), Christos Koutlis (CERTH), Symeon Papadopoulos (CERTH) - Contributor/s: Vasiliki Gkatziaki (CERTH), Emmanouil Krasanakis (CERTH), Polychronis Charitidis (CERTH) - Deliverable Lead Beneficiary: MMAP