are two inter-leaved aspects of music; genres are often associated to rhythm patterns which are played in specific tempo ranges.
In the paper ‘Extending deep rhythm for tempo and genre estimation using complex convolutions, multitask learning and multi-input network
', FuturePulse partner Hadrien Foroughm
(IRCAM - Sorbonne Universit) and Geoffroy Peeters
(LTCI - Telecom Paris - Institut Polytechnique) presented three main extensions
of the Deep Rhythm network
for tempo estimation and genre classification
The authors propose the use of a complex-HCQM representation
as input of a complex convolution neural network. As they point out, this actually allows an improvement in term of tempo Acc but surprisingly not in terms of tempo Acc1 and Acc2 neither in terms of genre classification.
Furthermore, in order to better take into account, the interdependencies between tempo and genre
, Hadrien Foroughm and Geoffroy Peeters introduce a multi-input network
where a VGG-like network with melspectrogram input is added to represent timbre information along Deep Rhythm. Most importantly, they showed that this allows an improvement for both tasks.
Last but not least, a multi-task output is presented, where both tempo and genre are estimated jointly
. As it is mentioned ‘with the Oracle frame prediction, we showed that there is still room for improve the tempo estimation
One of the future works will be to apply an attention mechanism system
on top of the Deep Rhythm network to select automatically the temporal segment corresponding to the global tempo ground-truth annotation.
You can read the paper ‘Extending deep rhythm for tempo and genre estimation using complex convolutions, multitask learning and multi-input network