Automatic Karaoke
Could we automatically create karaokes?
This project was an attempt at automatically creating karaoke tracks directly from a song’s audio signal.
The core idea was quite simple: leverage state-of-the-art (SOTA) deep learning models to achieve key steps.
- Perform source sepration (we used Demucs);
- Automatically transcribe lyrics (we used Whsiper);
- Align both audio and transcription (we failed here, because we were never able to install forced-alignments alogrithms).
Overall, it was fun and interesting. I supervised the project, that was realized by Mathis Fajeau as part of his courses at IMT Atlantique. You may find the developments here.
I should reactivate this project in the future to:
- Make a working algorithm after all;
- Make a nice GUI and deploy the interface online.
Nowadays, this is TODO.