This Notebook is associated with the ICASSP2022 submission, presenting audio outputs of the Nonnegative Tucker Decomposition (NTD) when optimizing different loss functions. In particular, the three evaluated loss functions are three special cases of the more general $\beta$-divergence:

More details about our algorithm are to be found in the ICASSP submsission (which should be the reason of your presence on this page). Audio signals are obtained by applying the Griffin-Lim algorithm to STFT.

This notebook will present signals, showing results of:

Note though that signals representing songs will be limited to the first 16 bars, in order to limit the size of this HTML page.

We insist on the fact that, while audio signals are listenable, they are not of profesional musical quality either due to inaccuracies in the decomposition or due to the phase-estimation algorithm that we use (Griffin-Lim). Improving the reconstruction of these signals could constitute future work.

In the meantime, we believe that these audio examples are good examples of the potential and outputs of the NTD, and allow to qualitatively evaluate the differences between the different loss functions.


Let's start with importing external librairies (which are installed automatically if you used pip install, otherwise you should install them manually).

And now, let's import the nn_fac and MusicNTD code (respectively code for Nonnegative Factorizations methods and for everything else (data manipulation, segmentation, etc) associated with NTD for music segmenation):

Next, we need to load the song to decompose. We used Come Together from The Beatles as example, but feel free to chose any song you'd like! (in wav though.)

NB: this comment only applies of you're compiling the Notebook, and not reading the HTML, as the HTML is static.


Let's compute the STFT of the song:

and then form the tensor-spectrogram of this STFT:

We reconstruct the song from the unfolded tensor spectrogram. Hence, the song will be reconstructed from the 96 chosen samples per bar.

To reconstruct the song, the algorithm needs the hop length of the STFT. As bars can be of different length, we compute the median hop length from the different bars, and applies it to all bars in our song.

Now, let's recreate the signal from the barwise STFT, in order to study the reconstruction quality of the Griffin-Lim algorithm. We limit the song to a certain number of bars (not to overload the final HTML file).

Let's hear it: