Home
Scholarly Works
Evaluating Musical Predictions with Multiple...
Journal article

Evaluating Musical Predictions with Multiple Versions of a Work

Abstract

The widespread use of music content analysis tools illustrates the need for diverse evaluation techniques to ensure their accuracy, robustness, reliability, and quality. This is particularly challenging in the case of features which predict musical properties whose values cannot be independently verified. Here we propose a new method for evaluating such tools that does not rely on a-priori knowledge of correct outcomes (i.e., “ground truth”). Instead, it examines many versions of a single composition, comparing predictions of musical properties expected to be relatively stable across recordings (mode, number of note events) to those expected to vary (tempo, timbre). This allows for assessing the efficacy of feature extraction even in situations where correct answers are unknown (or unknowable). As a proof of concept, we applied this approach to 17 commercially available recordings of J. S. Bach's 24 preludes from the Well-Tempered Clavier (Book 1) using three popular music content analysis tools, comparing variation in feature extraction across 17 versions of all 24 preludes (408 data points for each feature extracted). We find significant differences in the variation of mode predictions between tools, as well as more variation for predictions of mode than predictions of the number of note events. This affords a useful way of comparing predictions (whether between features or tools) which is particularly useful in the absence of ground truth. Other potential applications include parameter optimization, algorithm selection, and benchmarking procedures.

Authors

Swierczek K; Schutz M

Journal

Music & Science, Vol. 8, ,

Publisher

SAGE Publications

Publication Date

November 1, 2025

DOI

10.1177/20592043251384138

ISSN

2059-2043

Labels

Contact the Experts team