publications | Axel Marmoret

2025

National Conf.
AutoMashup: Automatic Music Mashups Creation

Marine Delabaere^*, Léa Miqueu^*, Michael Moreno^*, Gautier Bigois^*, and 7 more authors

In GRETSI’25 - XXXe Colloque Francophone de Traitement du Signal et des Images, 2025

Abs Bib PDF Code Poster

We introduce AutoMashup, a system for automatic mashup creation based on source separation, music analysis, and compatibility estimation. We propose using COCOLA to assess compatibility between separated stems and investigate whether general-purpose pretrained audio models (CLAP and MERT) can support zero-shot estimation of track pair compatibility. Our results show that mashup compatibility is asymmetric — it depends on the role assigned to each track (vocals or accompaniment) — and that current embeddings fail to reproduce the perceptual coherence measured by COCOLA. These findings underline the limitations of general-purpose audio representations for compatibility estimation in mashup creation.
@inproceedings{delabaere2025automashup, title = {{AutoMashup: Automatic Music Mashups Creation}}, author = {Delabaere, Marine and Miqueu, L{\'e}a and Moreno, Michael and Bigois, Gautier and Duong, Hoang and Fernandez, Ella and Manent, Flavie and Salgado-Herrera, Maria and Pasdeloup, Bastien and Farrugia, Nicolas and Marmoret, Axel}, url = {https://hal.science/hal-05191030}, booktitle = {{GRETSI'25 - XXXe Colloque Francophone de Traitement du Signal et des Images}}, year = {2025}, }
National Conf.
Raining Words : Les modèles d’ASR peuvent-ils retranscrire les sous-genres de Metal ?

Bastien Pasdeloup^* and Axel Marmoret^*

In GRETSI’25 - XXXe Colloque Francophone de Traitement du Signal et des Images, 2025

Abs Bib PDF Code

Extreme vocal styles in Metal music are known for their intensity, vocal saturation, and low intelligibility. In this paper, we evaluate the ability of recent Automatic Speech Recognition (ASR) models to transcribe such vocals in an out-of-distribution (OOD) setting. We assess five state-of-the-art ASR models using two types of data: isolated extreme vocal recordings and full metal songs from various subgenres. We also apply source separation techniques to extract vocals from the music tracks. Our results show that these models struggle to accurately transcribe extreme vocals, especially in cases of severe vocal distortion or atypical prosody.
@inproceedings{pasdeloup2025raining, title = {{Raining Words : Les modèles d'ASR peuvent-ils retranscrire les sous-genres de Metal ?}}, author = {Pasdeloup, Bastien and Marmoret, Axel}, booktitle = {{GRETSI'25 - XXXe Colloque Francophone de Traitement du Signal et des Images}}, year = {2025}, url = {https://hal.science/hal-05191118}, }
National Conf.
Apprentissage par transfert pour la détection et la classification automatiques de grandes baleines dans l’océan austral

Lucie Jean-Labadye^*†‡, Gabriel Dubus^*†, Dorian Cazau^†, Nicolas Farrugia^‡, and 2 more authors

In GRETSI’25 - XXXe Colloque Francophone de Traitement du Signal et des Images, 2025

Abs Bib PDF Code

Automatic detection and classification of cetacean vocalizations in passive acoustic recordings are complex tasks. While convolutional neural networks are widely used, their generalization is often constrained by scarce annotated data and high recording variability (geographic, temporal, equipment-related, etc.). Transfer learning, leveraging larger pretrained networks, offers a potential solution. In this context, we investigated the Perch encoder and evaluated it using metrics ensuring a fair comparison.
@inproceedings{jean2025apprentissage, title = {{Apprentissage par transfert pour la détection et la classification automatiques de grandes baleines dans l'océan austral}}, author = {Jean-Labadye, Lucie and Dubus, Gabriel and Cazau, Dorian and Farrugia, Nicolas and Marmoret, Axel and Adam, Olivier}, booktitle = {{GRETSI'25 - XXXe Colloque Francophone de Traitement du Signal et des Images}}, year = {2025}, url = {https://hal.science/hal-05201458}, }

2023

Journal
Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm

Axel Marmoret^*†, Jérémy E Cohen^‡, and Frédéric Bimbot^†

Transactions of the International Society for Music Information Retrieval, Nov 2023

Abs DOI Bib PDF Code

Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ’chorus’, ’verse’, ’solo’, etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a dynamic programming algorithm that segments self-similarity matrices, which are a standard description used in MSA and in numerous other applications. In this work, self-similarity matrices are computed from the feature representation of an audio signal and time is sampled at the bar-scale. This study examines three different standard similarity functions for the computation of self-similarity matrices. Results show that, in optimal conditions, the proposed algorithm achieves a level of performance which is competitive with supervised state-of-the-art methods while only requiring knowledge of bar positions. In addition, the algorithm is made open-source and is highly customizable.
@article{marmoret2023barwise, author = {Marmoret, Axel and Cohen, Jérémy E and Bimbot, Fr{\'e}d{\'e}ric}, doi = {10.5334/tismir.167}, journal = {Transactions of the International Society for Music Information Retrieval}, volume = {6}, number = {1}, pages = {167--185}, title = {Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm}, month = nov, year = {2023}, url = {https://transactions.ismir.net/articles/10.5334/tismir.167}, }
International Conf.
Convolutive block-matching segmentation algorithm with application to music structure analysis

Axel Marmoret^*†, Jérémy E Cohen^‡, and Frédéric Bimbot^†

In 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Nov 2023

Abs DOI Bib PDF Video Code Poster

Music Structure Analysis (MSA) consists of representing a song in sections (such as ’chorus’, ’verse’, ’solo’ etc), and can be seen as the retrieval of a simplified organization of the song. This work presents a new algorithm, called Convolutive Block-Matching (CBM) algorithm, devoted to MSA. In particular, the CBM algorithm is a dynamic programming algorithm, applying on autosimilarity matrices, a standard tool in MSA. In this work, autosimilarity matrices are computed from the feature representation of an audio signal, and time is sampled on the barscale. We study three different similarity functions for the computation of autosimilarity matrices. We report that the proposed algorithm achieves a level of performance competitive to that of supervised state-of-the-art methods on 3 among 4 metrics, while being fully unsupervised.
@inproceedings{marmoret2023convolutive, title = {Convolutive block-matching segmentation algorithm with application to music structure analysis}, author = {Marmoret, Axel and Cohen, Jérémy E and Bimbot, Fr{\'e}d{\'e}ric}, booktitle = {2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, year = {2023}, organization = {IEEE}, doi = {10.1109/WASPAA58266.2023.10248174}, url = {https://hal.science/hal-03834996}, }

2022

International Conf.
Semi-Supervised Convolutive NMF for Automatic Music Transcription

Haoran Wu, Axel Marmoret^†, and Jérémy E Cohen^‡

In Proceedings of the 19th Sound and Music Computing Conference, Nov 2022

Abs DOI Bib PDF Code Poster

Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
@inproceedings{wu2022semi, title = {Semi-Supervised Convolutive NMF for Automatic Music Transcription}, author = {Wu, Haoran and Marmoret, Axel and Cohen, J{\'e}r{\'e}my E}, booktitle = {Proceedings of the 19th Sound and Music Computing Conference}, year = {2022}, doi = {10.5281/zenodo.6798192}, url = {https://hal.science/hal-03608497}, }
International Conf.
Barwise Compression Schemes for Audio-Based Music Structure Analysis

Axel Marmoret^*, Jérémy E Cohen^†, and Frédéric Bimbot^*

In Proceedings of the 19th Sound and Music Computing Conference, Nov 2022

Abs DOI Bib PDF Code Poster

Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and “piece-specific” Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.
@inproceedings{marmoret2022barwise, title = {Barwise Compression Schemes for Audio-Based Music Structure Analysis}, author = {Marmoret, Axel and Cohen, J{\'e}r{\'e}my E and Bimbot, Fr{\'e}d{\'e}ric}, booktitle = {{Proceedings of the 19th Sound and Music Computing Conference}}, year = {2022}, doi = {10.5281/zenodo.6798330}, url = {https://hal.science/hal-03600873}, }
National Conf.
Nonnegative tucker decomposition with beta-divergence for music structure analysis of audio signals

Axel Marmoret^*, Florian Voorwinden^*, Valentin Leplat^†, Jérémy E Cohen^‡, and 1 more author

In XXVIIIème Colloque Francophone de Traitement du Signal et des Images (GRETSI 2022), Nov 2022

Abs Bib PDF Code Poster

Nonnegative Tucker Decomposition (NTD), a tensor decomposition model, has received increased interest in the recent years because of its ability to blindly extract meaningful patterns in tensor data. Nevertheless, existing algorithms to compute NTD are mostly designed for the Euclidean loss. On the other hand, NTD has recently proven to be a powerful tool in Music Information Retrieval. This work proposes a Multiplicative Updates algorithm to compute NTD with the beta-divergence loss, often considered a better loss for audio processing. We notably show how to implement efficiently the multiplicative rules using tensor algebra, a naive approach being intractable. Finally, we show on a Music Structure Analysis task that unsupervised NTD fitted with beta-divergence loss outperforms earlier results obtained with the Euclidean loss.
@inproceedings{marmoret2022nonnegative, title = {Nonnegative tucker decomposition with beta-divergence for music structure analysis of audio signals}, author = {Marmoret, Axel and Voorwinden, Florian and Leplat, Valentin and Cohen, J{\'e}r{\'e}my E and Bimbot, Fr{\'e}d{\'e}ric}, booktitle = {XXVIII{\`e}me Colloque Francophone de Traitement du Signal et des Images (GRETSI 2022)}, year = {2022}, publisher = {GRETSI - Groupe de Recherche en Traitement du Signal et des Images}, number = {001-0233}, pages = {933--936}, url = {https://hal.science/hal-03409508}, }
Ph.D. Thesis
Unsupervised machine learning paradigms for the representation of music similarity and structure

Axel Marmoret^*

Université de Rennes. Segmentation code can be found here, low-rank factorization code can be found here, and the specific methods for compressing music barwise can be found here , Nov 2022

Abs Bib PDF Video

Musical structure, defined as a simplified representation of the organization of a song, is an important musicological concept, but hard to automatically estimate. This thesis presents new methods to automatically estimate the structural segmentation of a song, focusing the study of music at the barscale. By developing a new segmentation algorithm (called “CBM”) and by comparing several unsupervised compression schemes (from linear and multilinear algebra to neural networks), paradigms introduced in this thesis result in segmentation performance outperforming those of the unsupervised State-of-the-Art methods and almost similar with those of the global State-of-the-Art, obtained with supervised machine learning algorithms. In particular, as the methods described in this thesis are unsupervised, the estimation do not rely on annotated data, lowering the bias in the estimates related to ambiguity and subjectivity (inherent to musical structure) while limiting the loss in performance compared to the best supervised methods. In addition, some of the methods studied in this thesis (in particular Nonnegative Tucker Decomposition) allow to extract automatically interpretable parts of a song which may be used for other task than the estimation of structure, and participate in the development of interpretable machine and deep learning algorithms, which is a major field of research nowadays.
@phdthesis{marmoret2022unsupervised, title = {Unsupervised machine learning paradigms for the representation of music similarity and structure}, author = {Marmoret, Axel}, year = {2022}, school = {Universit{\'e} de Rennes}, url = {https://hal.science/tel-03937846}, }

2021

Preprint
Polytopic Analysis of Music

Axel Marmoret^*, Jérémy E Cohen^*, and Frédéric Bimbot^*

arXiv preprint arXiv:2212.11054, Nov 2021

Abs Bib PDF Code

Structural segmentation of music refers to the task of finding a symbolic representation of the organisation of a song, reducing the musical flow to a partition of non-overlapping segments. Under this definition, the musical structure may not be unique, and may even be ambiguous. One way to resolve that ambiguity is to see this task as a compression process, and to consider the musical structure as the optimization of a given compression criteria. In that viewpoint, C. Guichaoua developed a compression-driven model for retrieving the musical structure, based on the "System and Contrast" model, and on polytopes, which are extension of nhypercubes. We present this model, which we call "polytopic analysis of music", along with a new opensource dedicated toolbox called MusicOnPolytopes (in Python). This model is also extended to the use of the Tonnetz as a relation system. Structural segmentation experiments are conducted on the RWC Pop dataset. Results show improvements compared to the previous ones, presented by C. Guichaoua.
@article{marmoret2022polytopic, title = {Polytopic Analysis of Music}, author = {Marmoret, Axel and Cohen, J{\'e}r{\'e}my E and Bimbot, Fr{\'e}d{\'e}ric}, journal = {arXiv preprint arXiv:2212.11054}, year = {2021}, }

2020

International Conf.
Uncovering audio patterns in music with nonnegative Tucker decomposition for structural segmentation

Axel Marmoret^*, Jérémy E Cohen^*, Nancy Bertin^*, and Frédéric Bimbot^*

In ISMIR 2020-21st International Society for Music Information Retrieval, Nov 2020

Abs Bib PDF Video Code Poster

Recent work has proposed the use of tensor decomposition to model repetitions and to separate tracks in loop-based electronic music. The present work investigates further on the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few patterns, we illustrate the ability of the decomposition to capture and single out repeated motifs in the corresponding compressed space, which can be interpreted from a musical viewpoint. The resulting features also turn out to be efficient for structural segmentation, leading to experimental results on the RWC Pop data set which are potentially challenging state-of-the-art approaches that rely on extensive example-based learning schemes.
@inproceedings{marmoret2020uncovering, title = {Uncovering audio patterns in music with nonnegative Tucker decomposition for structural segmentation}, author = {Marmoret, Axel and Cohen, J{\'e}r{\'e}my E and Bertin, Nancy and Bimbot, Fr{\'e}d{\'e}ric}, booktitle = {ISMIR 2020-21st International Society for Music Information Retrieval}, year = {2020}, pages = {788--794}, url = {https://hal.science/hal-02928733v1}, }

2019

Master’s Thesis
Multi-Channel Automatic Music Transcription Using Tensor Algebra

Axel Marmoret^*, Nancy Bertin^*, and Jérémy E Cohen^*

arXiv preprint arXiv:2107.11250, Nov 2019

Abs Bib PDF Code

Music is an art, perceived in unique ways by every listener, coming from acoustic signals. In the meantime, standards as musical scores exist to describe it. Even if humans can make this transcription, it is costly in terms of time and efforts, even more with the explosion of information consecutively to the rise of the Internet. In that sense, researches are driven in the direction of Automatic Music Transcription. While this task is considered solved in the case of single notes, it is still open when notes superpose themselves, forming chords. This report aims at developing some of the existing techniques towards Music Transcription, particularly matrix factorization, and introducing the concept of multi-channel automatic music transcription. This concept will be explored with mathematical objects called tensors.
@article{marmoret2019multi, title = {Multi-Channel Automatic Music Transcription Using Tensor Algebra}, author = {Marmoret, Axel and Bertin, Nancy and Cohen, Jérémy E}, journal = {arXiv preprint arXiv:2107.11250}, year = {2019}, url = {https://hal.science/hal-03301448}, }