import musicntd.scripts.hide_code as hide

C:\Users\amarmore\AppData\Local\Continuum\anaconda3\envs\NTD_segmentation\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
C:\Users\amarmore\AppData\Local\Continuum\anaconda3\envs\NTD_segmentation\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit

From padding to subdivision¶

As evoked in the 1st notebook, in previous experiments, every bar of the tensor was zero-padded if it was shorter than the longest bar of the song.

This fix is not satisfactory, as it creates null artifacts at the end of most of the slices of the tensor.

Description of the subdivision method¶

Instead, we decided to over-sample the chromagram (32-sample hop) and then select the same number of frames in each bar. This way, rather than having equally spaced frames in all bars of the tensor which resulted in slices of the tensor of inequal sizes (before padding), it now computes bar-chromagrams of the same number of frames, which is a parameter to be set. In each bar-chromagram, frames are almost* equally spaced, but the gap between two consecutive frames in two different bars can now be different.

We call subdivision of bars the number of frames we select in each bar. This parameter is to be set, and we will try to evaluate a good parameter in the next part of this notebook.

Concretely, let's consider the chromagram of a particular bar, starting at time $t_0$ and ending at time $t_1$. This chromagram contains $n = (t_1 - t_0 + 1) * \frac{s_r}{32}$ frames, with $s_r$ the sampling rate. In this chromagram, given a subdivision $sub$, we will select frame at indexes $\{k * \frac{n}{sub}$ for $k \in [0, sub[$ and $k$ integer $\}$. As indexes need to be integers, we need to round the precedent expression.

*almost, because of the rounding operation presented above

Setting the subdivision parameter¶

We will test three values for the subdivision parameter:

96 (24 beats per bar),
128 (32 beats per bar),
192 (48 beats per bar).

We will test the segmentation on the entire RWC Popular dataset, with MIREX10 annotations, and by testing several ranks (16,24,32,40) for $H$ and $Q$.

Note that, due to the conclusion in Notebook 2, we now have fixed $W$ to the 12-size identity matrix.

# On définit le type d'annotations
annotations_type = "MIREX10"
ranks_rhythm = [16,24,32,40]
ranks_pattern = [16,24,32,40]

Subdivision 96¶

Fixed ranks¶

Below are segmentation results with the subdivision fixed to 96, for the different ranks values, on the RWC Pop dataset.

Results are computed with tolerance of respectively 0.5 seconds and 3 seconds.

zero_five_nine, three_nine = hide.compute_ranks_RWC(ranks_rhythm,ranks_pattern, W = "chromas", annotations_type = annotations_type,
                                                  subdivision=96, penalty_weight = 1)

c:\users\amarmore\desktop\projects\phd main projects\on git\code\tensor factorization\musicntd\autosimilarity_segmentation.py:43: RuntimeWarning: invalid value encountered in true_divide
  this_array = np.array([list(i/np.linalg.norm(i)) for i in this_array.T]).T

Oracle ranks¶

In this condition, we only keep the ranks leading to the highest F measure.

In that sense, it's an optimistic upper bound on metrics.

hide.printmd("**A 0.5 secondes:**")
best_chr_zero_five = hide.best_f_one_score_rank(zero_five_nine)
hide.printmd("**A 3 secondes:**")
best_chr_three = hide.best_f_one_score_rank(three_nine)

Below is presented the distribution of the optimal ranks in the "oracle ranks" condition, i.e. the distribution of the ranks for $H$ and $Q$ which result in the highest F measure for the different songs.

hide.plot_3d_ranks_study(zero_five_nine, ranks_rhythm, ranks_pattern)

Below is shown the distribution histogram of the F measure obtained with the oracle ranks.

hide.plot_f_mes_histogram(zero_five_nine)

Finally, here are displayed the 5 worst songs in term of F measure in this condition.

hide.return_worst_songs(zero_five_nine, 5)

[('77.wav', 0.2857),
 ('51.wav', 0.359),
 ('98.wav', 0.4),
 ('63.wav', 0.4242),
 ('50.wav', 0.4324)]

Subdivision 128¶

Fixed ranks¶

Below are segmentation results with the subdivision fixed to 128, for the different ranks values, on the RWC Pop dataset.

Results are computed with tolerance of respectively 0.5 seconds and 3 seconds.

zero_five_cent, three_cent = hide.compute_ranks_RWC(ranks_rhythm,ranks_pattern, W = "chromas", annotations_type = annotations_type,
                                                  subdivision=128, penalty_weight = 1)

c:\users\amarmore\desktop\projects\phd main projects\on git\code\tensor factorization\musicntd\autosimilarity_segmentation.py:43: RuntimeWarning: invalid value encountered in true_divide
  this_array = np.array([list(i/np.linalg.norm(i)) for i in this_array.T]).T

Oracle ranks¶

In this condition, we only keep the ranks leading to the highest F measure.

In that sense, it's an optimistic upper bound.

hide.printmd("**A 0.5 secondes:**")
best_chr_zero_five = hide.best_f_one_score_rank(zero_five_cent)
hide.printmd("**A 3 secondes:**")
best_chr_three = hide.best_f_one_score_rank(three_cent)

Below is presented the distribution of the optimal ranks in the "oracle ranks" condition, i.e. the distribution of the ranks for $H$ and $Q$ which result in the highest F measure for the different songs.

hide.plot_3d_ranks_study(zero_five_cent, ranks_rhythm, ranks_pattern)

Below is shown the distribution histogram of the F measure obtained with the oracle ranks.

hide.plot_f_mes_histogram(zero_five_cent)

Finally, here are displayed the 5 worst songs in term of F measure in this condition.

hide.return_worst_songs(zero_five_cent, 5)

[('77.wav', 0.2857),
 ('51.wav', 0.359),
 ('71.wav', 0.2778),
 ('63.wav', 0.375),
 ('50.wav', 0.3784)]

Subdivision 192¶

Fixed ranks¶

Below are segmentation results with the subdivision fixed to 192, for the different ranks values, on the RWC Pop dataset.

Results are computed with tolerance of respectively 0.5 seconds and 3 seconds.

zero_five_hunnine, three_hunnine = hide.compute_ranks_RWC(ranks_rhythm,ranks_pattern, W = "chromas", annotations_type = annotations_type,
                                                  subdivision=192, penalty_weight = 1)

c:\users\amarmore\desktop\projects\phd main projects\on git\code\tensor factorization\musicntd\autosimilarity_segmentation.py:43: RuntimeWarning: invalid value encountered in true_divide
  this_array = np.array([list(i/np.linalg.norm(i)) for i in this_array.T]).T

Oracle ranks¶

In this condition, we only keep the ranks leading to the highest F measure.

In that sense, it's an optimistic upper bound.

hide.printmd("**A 0.5 secondes:**")
best_chr_zero_five = hide.best_f_one_score_rank(zero_five_hunnine)
hide.printmd("**A 3 secondes:**")
best_chr_three = hide.best_f_one_score_rank(three_hunnine)

Below is presented the distribution of the optimal ranks in the "oracle ranks" condition, i.e. the distribution of the ranks for $H$ and $Q$ which result in the highest F measure for the different songs.

hide.plot_3d_ranks_study(zero_five_hunnine, ranks_rhythm, ranks_pattern)

Below is shown the distribution histogram of the F measure obtained with the oracle ranks.

hide.plot_f_mes_histogram(zero_five_hunnine)

Finally, here are displayed the 5 worst songs in term of F measure in this condition.

hide.return_worst_songs(zero_five_hunnine, 5)

[('51.wav', 0.3333),
 ('77.wav', 0.2857),
 ('71.wav', 0.2857),
 ('63.wav', 0.4118),
 ('34.wav', 0.4375)]

Conclusion¶

We didn't find the difference in the segmentation results to be significative.

In that sense, we concluded that the three tested subdivisions were equally satisfying for our experiments, and we decided to pursue with the 96 subdivision only, in order to reduce computation time and complexity, as it is the smallest tested value.

96 also presents the advantage (compared to 128) to be divisible by 3 and 4, which are the most common number of beats per bar in western pop music (even if, for now, we have restricted our study to music with 4 beats per bar).

	Résultats à 0.5 secondes	Vrai Positifs	Faux Positifs	Faux Négatifs	Precision	Rappel	F mesure
Rang Q:16	Rang H:16	8.4700	5.7900	10.3400	0.5977	0.4566	0.5092
	Rang H:24	8.8400	5.5000	9.9700	0.6171	0.4780	0.5312
	Rang H:32	8.7100	5.6300	10.1000	0.6136	0.4707	0.5247
	Rang H:40	8.9400	5.6600	9.8700	0.6164	0.4827	0.5336
Rang Q:24	Rang H:16	9.0400	5.9100	9.7700	0.6063	0.4889	0.5339
	Rang H:24	9.6700	5.9700	9.1400	0.6246	0.5213	0.5601
	Rang H:32	9.7000	5.8800	9.1100	0.6270	0.5232	0.5632
	Rang H:40	9.5800	6.2300	9.2300	0.6103	0.5173	0.5530
Rang Q:32	Rang H:16	9.8100	6.2300	9.0000	0.6180	0.5275	0.5618
	Rang H:24	9.8400	6.5100	8.9700	0.6023	0.5252	0.5535
	Rang H:32	10.1100	6.0400	8.7000	0.6266	0.5450	0.5767
	Rang H:40	9.8400	6.2800	8.9700	0.6117	0.5274	0.5594
Rang Q:40	Rang H:16	9.2800	6.7700	9.5300	0.5776	0.4984	0.5270
	Rang H:24	9.7000	6.6900	9.1100	0.5907	0.5180	0.5450
	Rang H:32	9.7200	6.9000	9.0900	0.5856	0.5217	0.5452
	Rang H:40	10.0000	6.5700	8.8100	0.6063	0.5361	0.5621

	Résultats à 3 secondes	Vrai Positifs	Faux Positifs	Faux Négatifs	Precision	Rappel	F mesure
Rang Q:16	Rang H:16	10.8000	3.4600	8.0100	0.7712	0.5812	0.6522
	Rang H:24	10.9900	3.3500	7.8200	0.7787	0.5938	0.6641
	Rang H:32	10.8300	3.5100	7.9800	0.7676	0.5818	0.6515
	Rang H:40	11.0200	3.5800	7.7900	0.7672	0.5932	0.6593
Rang Q:24	Rang H:16	11.3000	3.6500	7.5100	0.7642	0.6100	0.6690
	Rang H:24	11.7500	3.8900	7.0600	0.7641	0.6316	0.6813
	Rang H:32	11.6900	3.8900	7.1200	0.7590	0.6282	0.6785
	Rang H:40	11.7300	4.0800	7.0800	0.7510	0.6316	0.6775
Rang Q:32	Rang H:16	11.8600	4.1800	6.9500	0.7465	0.6363	0.6780
	Rang H:24	11.9200	4.4300	6.8900	0.7357	0.6393	0.6746
	Rang H:32	12.1500	4.0000	6.6600	0.7559	0.6534	0.6931
	Rang H:40	12.1400	3.9800	6.6700	0.7604	0.6527	0.6938
Rang Q:40	Rang H:16	11.8300	4.2200	6.9800	0.7422	0.6352	0.6740
	Rang H:24	12.1600	4.2300	6.6500	0.7475	0.6515	0.6869
	Rang H:32	12.1000	4.5200	6.7100	0.7349	0.6489	0.6804
	Rang H:40	12.3300	4.2400	6.4800	0.7505	0.6613	0.6945

	Résultats à 0.5 secondes	Vrai Positifs	Faux Positifs	Faux Négatifs	Precision	Rappel	F mesure
Rang Q:16	Rang H:16	8.5900	5.6000	10.2200	0.6063	0.4638	0.5183
	Rang H:24	8.6600	5.7600	10.1500	0.6006	0.4676	0.5182
	Rang H:32	8.9800	5.3700	9.8300	0.6293	0.4837	0.5384
	Rang H:40	8.9800	5.4700	9.8300	0.6230	0.4849	0.5376
Rang Q:24	Rang H:16	9.3900	5.8700	9.4200	0.6156	0.5069	0.5477
	Rang H:24	9.4000	6.1300	9.4100	0.6082	0.5076	0.5459
	Rang H:32	9.3900	6.1500	9.4200	0.6053	0.5055	0.5439
	Rang H:40	9.5100	6.1100	9.3000	0.6096	0.5123	0.5500
Rang Q:32	Rang H:16	9.6600	6.2600	9.1500	0.6075	0.5202	0.5526
	Rang H:24	10.1100	6.0100	8.7000	0.6257	0.5425	0.5747
	Rang H:32	9.8800	6.4000	8.9300	0.6072	0.5317	0.5599
	Rang H:40	9.9500	6.3800	8.8600	0.6092	0.5352	0.5627
Rang Q:40	Rang H:16	9.5400	6.3300	9.2700	0.6008	0.5121	0.5450
	Rang H:24	9.5700	6.6700	9.2400	0.5857	0.5134	0.5401
	Rang H:32	9.8400	6.3600	8.9700	0.6073	0.5280	0.5559
	Rang H:40	9.8200	6.9900	8.9900	0.5878	0.5231	0.5455

	Résultats à 3 secondes	Vrai Positifs	Faux Positifs	Faux Négatifs	Precision	Rappel	F mesure
Rang Q:16	Rang H:16	10.7800	3.4100	8.0300	0.7715	0.5816	0.6535
	Rang H:24	10.9300	3.4900	7.8800	0.7697	0.5891	0.6575
	Rang H:32	10.8600	3.4900	7.9500	0.7687	0.5845	0.6532
	Rang H:40	11.0100	3.4400	7.8000	0.7719	0.5940	0.6618
Rang Q:24	Rang H:16	11.5500	3.7100	7.2600	0.7648	0.6227	0.6762
	Rang H:24	11.5200	4.0100	7.2900	0.7502	0.6195	0.6690
	Rang H:32	11.5300	4.0100	7.2800	0.7463	0.6194	0.6682
	Rang H:40	11.7800	3.8400	7.0300	0.7629	0.6330	0.6831
Rang Q:32	Rang H:16	11.8100	4.1100	7.0000	0.7485	0.6342	0.6764
	Rang H:24	11.9200	4.2000	6.8900	0.7436	0.6390	0.6790
	Rang H:32	12.1100	4.1700	6.7000	0.7512	0.6496	0.6876
	Rang H:40	12.1000	4.2300	6.7100	0.7476	0.6491	0.6858
Rang Q:40	Rang H:16	11.8500	4.0200	6.9600	0.7523	0.6380	0.6803
	Rang H:24	11.9800	4.2600	6.8300	0.7421	0.6430	0.6795
	Rang H:32	11.9700	4.2300	6.8400	0.7452	0.6432	0.6792
	Rang H:40	12.1300	4.6800	6.6800	0.7312	0.6501	0.6783

	Résultats à 0.5 secondes	Vrai Positifs	Faux Positifs	Faux Négatifs	Precision	Rappel	F mesure
Rang Q:16	Rang H:16	8.5000	5.6200	10.3100	0.6042	0.4599	0.5138
	Rang H:24	8.6800	5.7400	10.1300	0.6037	0.4691	0.5196
	Rang H:32	8.8100	5.5400	10.0000	0.6141	0.4744	0.5267
	Rang H:40	8.9000	5.7100	9.9100	0.6086	0.4791	0.5278
Rang Q:24	Rang H:16	9.4600	5.8400	9.3500	0.6224	0.5109	0.5534
	Rang H:24	9.3200	5.9800	9.4900	0.6094	0.5027	0.5437
	Rang H:32	9.5500	6.1700	9.2600	0.6121	0.5154	0.5524
	Rang H:40	9.4900	6.1300	9.3200	0.6088	0.5119	0.5498
Rang Q:32	Rang H:16	9.6300	6.1200	9.1800	0.6099	0.5190	0.5540
	Rang H:24	9.8100	6.4500	9.0000	0.6090	0.5259	0.5565
	Rang H:32	9.9700	6.3300	8.8400	0.6077	0.5346	0.5617
	Rang H:40	9.7800	6.3800	9.0300	0.6030	0.5270	0.5559
Rang Q:40	Rang H:16	9.7400	6.3500	9.0700	0.6048	0.5214	0.5528
	Rang H:24	9.8700	6.5100	8.9400	0.6058	0.5269	0.5560
	Rang H:32	9.6700	6.5600	9.1400	0.5944	0.5215	0.5481
	Rang H:40	9.8700	6.8800	8.9400	0.5892	0.5313	0.5515

	Résultats à 3 secondes	Vrai Positifs	Faux Positifs	Faux Négatifs	Precision	Rappel	F mesure
Rang Q:16	Rang H:16	10.6800	3.4400	8.1300	0.7712	0.5767	0.6494
	Rang H:24	10.9600	3.4600	7.8500	0.7729	0.5918	0.6598
	Rang H:32	10.8700	3.4800	7.9400	0.7671	0.5843	0.6525
	Rang H:40	11.0300	3.5800	7.7800	0.7660	0.5947	0.6592
Rang Q:24	Rang H:16	11.6800	3.6200	7.1300	0.7742	0.6298	0.6848
	Rang H:24	11.6500	3.6500	7.1600	0.7655	0.6264	0.6797
	Rang H:32	11.6600	4.0600	7.1500	0.7499	0.6268	0.6736
	Rang H:40	11.8400	3.7800	6.9700	0.7666	0.6379	0.6881
Rang Q:32	Rang H:16	11.6900	4.0600	7.1200	0.7507	0.6274	0.6741
	Rang H:24	11.8400	4.4200	6.9700	0.7379	0.6344	0.6725
	Rang H:32	12.2300	4.0700	6.5800	0.7549	0.6538	0.6910
	Rang H:40	12.0200	4.1400	6.7900	0.7454	0.6460	0.6835
Rang Q:40	Rang H:16	12.0700	4.0200	6.7400	0.7564	0.6467	0.6877
	Rang H:24	12.0800	4.3000	6.7300	0.7454	0.6463	0.6828
	Rang H:32	11.9400	4.2900	6.8700	0.7384	0.6426	0.6778
	Rang H:40	12.1800	4.5700	6.6300	0.7343	0.6548	0.6823