Tempo and
genre are two inter-leaved aspects of music; genres are often associated to rhythm patterns which are played in specific tempo ranges.
In the paper ‘
Extending deep rhythm for tempo and genre estimation using complex convolutions, multitask learning and multi-input network', FuturePulse partner
Hadrien Foroughm (IRCAM - Sorbonne Universit) and
Geoffroy Peeters (LTCI - Telecom Paris - Institut Polytechnique) presented
three main extensions of the
Deep Rhythm network for
tempo estimation and genre classification.
The authors propose the use of a
complex-HCQM representation as input of a complex convolution neural network. As they point out, this actually allows an improvement in term of tempo Acc but surprisingly not in terms of tempo Acc1 and Acc2 neither in terms of genre classification.
Furthermore, in order to better take into account, the
interdependencies between tempo and genre, Hadrien Foroughm and Geoffroy Peeters introduce a
multi-input network where a VGG-like network with melspectrogram input is added to represent timbre information along Deep Rhythm. Most importantly, they showed that this allows an improvement for both tasks.
Last but not least, a multi-task output is presented, where both
tempo and genre are estimated jointly. As it is mentioned ‘
with the Oracle frame prediction, we showed that there is still room for improve the tempo estimation.’
One of the future works will be to apply an
attention mechanism system on top of the Deep Rhythm network to select automatically the temporal segment corresponding to the global tempo ground-truth annotation.
You can read the paper ‘
Extending deep rhythm for tempo and genre estimation using complex convolutions, multitask learning and multi-input network'
here.