Music genre classification using MIDI and audio features

Zehra Cataltepe*, Yusuf Yaslan, Abdullah Sonmez

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

65 Citations (Scopus)


We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD). NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to audio and then use the audio features to train different classifiers. MIDI and audio from MIDI classifiers alone achieve much smaller accuracies than those reported by McKay and Fujinaga who used not NCD but a number of domain-based MIDI features for their classification. Combining MIDI and audio from MIDI classifiers improves accuracy and gets closer to, but still worse, accuracies than McKay and Fujinaga's. The best root genre accuracies achieved using MIDI, audio, and combination of them are 0.75, 0.86, and 0.93, respectively, compared to 0.98 of McKay and Fujinaga. Successful classifier combination requires diversity of the base classifiers. We achieve diversity through using certain number of seconds of the MIDI file, different sample rates and sizes for the audio file, and different classification algorithms.

Original languageEnglish
Article number36409
JournalEurasip Journal on Advances in Signal Processing
Publication statusPublished - 2007


Dive into the research topics of 'Music genre classification using MIDI and audio features'. Together they form a unique fingerprint.

Cite this