In the deliverable ‘D3.2 – Predictive analytics and recommendation framework v2’
, Thomas Lidy, Adrian Lecoutre, Khalil Boulkenafet
, and Manos Schinas, Christos Koutlis, Symeon Papadopoulos
, presented the work conducted in our project to meet the requirements related to predictive analytics
, aiming to produce popularity-oriented results
for artists, tracks and genres
Regarding track’s popularity estimation and prediction
, the team described that overall track popularity is estimated from the following sources: Deezer rank, Spotify popularity, views and likes on YouTube, and global airplay counts
provided by our partner BMAT
. The team compared k-Nearest Neighbors (kNN)
against Long-Short-Term-Memory (LSTM)
deep learning models, and concluded that LSTM resulted slightly better, but at a dramatically higher computational cost; therefore, decided to use the kNN implementation instead. More importantly, Thomas Lidy, Adrian Lecoutre, Khalil Boulkenafet Manos Schinas, Christos Koutlis, and Symeon Papadopoulos
explained that in both cases, since these models are based on machine learning models trained on historical crawled signals, they will theoretically become more and more accurate over time. In general, the different analyses have shown that a simple trend or seasonality can be easily predicted for a relatively small time prediction interval
Nevertheless, it is pinpointed that ‘the difficulty can rise significantly when the source signal is almost static (such as in the case of Spotify popularity), the predicted target goes farther in the future, or when an unexpected trend appears without any detectable pattern in the previous days of history in the data’
. As the authors mention ‘we have studied that a multivariate LSTM approach can potentially overcome these issues when more data becomes available. In the current implementation the kNN approach is used to predict a track’s popularity up to 21 days into the future, based on up to 28 days of history.
For artist popularity
our team proposed a non-linear aggregation method
in order to combine diverse sources of popularity information, like Spotify followers, YouTube views, Last.fm playcounts
etc. This method actually leverages geometrical shapes formed by the normalized metric values obtained for each artist and combines them by computing a fraction where the numerator corresponds to the artist under study and the denominator corresponds to the best possible case i.e. the most popular possible artist
. The results showed that this method outperformed the most natural choice being a simple average
and also it outperformed other non-linear metric aggregation methods in terms of correlation, rank correlation and rank distance with the ground truth.
Following now the impact of events
, such as album release, TV show appearance or interview
, on an artist’s popularity level, it was remarkable that no significant changes were observed on popularity metrics such as YouTube views/subscribers and Last.fm playcounts, after the events, but changes were observed on streaming activity (Spotify, iTunes, Deezer streams). The FuturePulse team compared two different methods that estimate the level of impact an event has on future popularity values/streaming activity. In addition, the segmented linear regression method showed good performance identifying accurately the upcoming changes after an event
Last but not least, for the estimation of genre popularity and growth
the authors elaborated on the way they worked in order to tackle the problem of data sparsity. By analysing genre occurrences in Spotify artists using a graph embedding technique, our team identified sub-genre associations between genres. As they explained, ‘that information is then used to count genre appearances in music charts
’. However, although the first results were promising, when these associations were considered, there was a need to further evaluate the identified associations as well as the popularity scores generated.
As for future activities for the future, the authors mentioned the following goals:
- Analyze playlists and develop a methodology to detect similar playlists based on co-listening patterns, content similarity and music genres
- Update tracks popularity estimation and prediction by adding more sources e.g. country-wise airplays, charts, playlists, Spotify analytics data etc.
- Investigate multivariate approaches (e.g. LSTM) again with more data available
- Update and evaluate genre popularity estimation, by using genre associations
- Combine in a principled way the several artist popularity estimations developed separately for each of the use cases.
- Co-inform track popularity estimation and artist popularity estimation mutually
You can read the D3.2 – Predictive analytics and recommendation framework here.
Deliverable D3.2 Predictive analytics and recommendation framework v2 - Authors:
Thomas Lidy (MMAP), Adrian Lecoutre (MMAP), Khalil Boulkenafet (MMAP), Manos Schinas (CERTH), Christos Koutlis (CERTH), Symeon Papadopoulos (CERTH) - Contributor/s: Vasiliki Gkatziaki (CERTH), Emmanouil Krasanakis (CERTH), Polychronis Charitidis (CERTH) - Deliverable Lead Beneficiary: MMAP