A data-driven approach to pitch spelling and key signature estimation
By F. Foscarin

F. Foscarin presents his work on pitch spelling and key signature estimation


We present a data-driven approach to the joint estimation of pitch spelling and key signature from a MIDI file. The two information are fundamental to produce a fully-fledged musical score and are used for other MIR tasks such as harmonic analysis, section identification and search in a digital music library. We employ a sequence-to-sequence synchronous model, explore different architectures and techniques such as approximated self-attention and conditional random fields, and give musicological interpretation of the used techniques. The model requires few input information that are straightforward to extract from all kind of MIDI files or other symbolic encodings and it is robust against input noise. It can be used with pretrained parameters or, with a proposed data-augmentation procedure, it can be trained from a relatively small dataset, thus making it easy to employ in a music processing pipeline. Compared to other state of the art approaches, it generalizes on different music styles, while still obtaining a high accuracy. Cross validated on a reference dataset, it produces around half of the errors for pitch spelling and comparable results for the key signature estimation task.