In the Publication “
An Analysis of the Effect of Data Augmentation Methods: Experiments for a Musical Genre Classification Task”, our FuturePulse project partner
Rémi Mignot (IRCAM Lab – CNRS – Sorbonne Université, Paris,) and
Geoffroy Peeters (LTCI – Télécom Paris – Institut Polytechnique de Paris) provide an in-depth analysis of
two data augmentation methods:
sound transformations and
sound segmentation.
Supervised machine learning relies on the accessibility of large datasets of annotated data. This is crucial since small datasets usually lead to overfitting when training high-dimensional machine-learning models. As the authors explain, “
since the manual annotation of such large datasets is a long, tedious and expensive process, another possibility is to artificially increase the size of the dataset - this is known as data augmentation”.
In specific, the
sound transformations method transforms a music track to a set of new music tracks by applying processes such as pitch-shifting, time-stretching or filtering. On the other hand, the
sound segmentation method splits a long sound signal into a set of shorter time segments.
By testing both of these methods for a
genre classification task, the authors pinpoint that the experiments have shown
‘their ability to significantly improve the results for small datasets when it is not possible to annotate more examples manually’.
Some
key highlights:
- Among the tested segmentation, it was seen that it is preferable to use segments of 30 seconds both during training and testing, rather than a longer duration. But as described, depending on the descriptors used, ‘segments shorter than that may not provide meaningful representations, and they did not provide improvements with the method tested’.
- The evaluation results indicated the benefit of applying transformations to the training examples. It was also observed that there was no use in employing more than a few transformed versions of each example.
- The robustness of models trained with transformations was shown experimentally.
- The results of individual transformations (without a chain) showed an overfitting problem which focuses the models to the transformation type used during the training. As the authors point out “using different transformations in series (a chain) does not improve classification of original sounds (compared to results using individual transformations), but it significantly improves the overall robustness, i.e. when applying the model to transformed or degraded sounds.”
The main contribution of this work is to detail by experimentation the benefit of these methods, used alone or together, during training and/or testing.
Rémi Mignot and
Geoffroy Peeters furthermore demonstrate their use in improving the robustness of potentially unknown sound degradations. Last but not least, by analyzing these results, they provide good practice recommendations.
You can read the Publication “
An Analysis of the Effect of Data Augmentation Methods: Experiments for a Musical Genre Classification Task” here.