Dataset

The official dataset of the challenge

To support model development, we have curated the SongEval dataset—an open-source benchmark containing 2,399 full-length songs (approx. 140.3 hours of generated song) from a variety of genres and languages. These songs have been annotated across five aesthetic dimensions:

  • Overall Coherence
  • Memorability
  • Naturalness of Vocal Breathing and Phrasing
  • Clarity of Song Structure
  • Overall Musicality

The dataset covers a wide range of genres including Pop, Rock, Jazz, Hip-hop, Classical, and more. It includes songs in both English and Mandarin Chinese, making it a diverse resource for training models. For more details concerning the dataset, we refer to dataset paper.