Automatic Song Aesthetics Evaluation Challenge

A competition aim at fostering the development of automatic models that can predict human aesthetic ratings of generated songs.

News and Updates

December 14, 2025: We have added a comprehensive "Detailed Analysis" section, including scoring methodologies and performance visualizations for both tracks.

December 4, 2025: We have updated the ranking results in the Leaderboard section below.

November 15, 2025: We have updated the formula and threshold for calculating Top-Tier Accuracy with detailed information available at the "Evaluation" section below.

November 10, 2025: We have sent the test set and submission Instructions to all successfully registered teams via email. These are also available in the respective track pages. Kindly note that the final submission deadline is 23:59 November 20, 2025 (AoE time).

Call for Participation

With the rapid growth of generative music models, such as song generation (composing melodies, lyrics, harmonies, and vocals), we are entering an exciting new era of personalized music, virtual artists, and multimedia content creation. Despite these advancements, the evaluation of the aesthetic quality of generated music remains a challenge. Traditional metrics like pitch accuracy and signal clarity fall short of capturing the complex emotional and artistic dimensions of music that matter most to listeners. This challenge aims to create a benchmark for assessing the aesthetic quality of automatically generated songs. Participants will develop models that predict human ratings of songs based on musicality, emotional engagement, vocal expressiveness, and overall enjoyment.

Join us to push the boundaries of song aesthetics evaluation and contribute to the future of generative music!

Challenge Overview

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge is designed to foster the development of models that can predict human aesthetic ratings of full-length generated songs. We focus on generating songs that align with human perceptions of musicality, emotional depth, and vocal expressiveness. Participants will be tasked with developing models that predict subjective ratings based on audio inputs.

Objective: Create models that can predict human ratings of aesthetic quality in songs, including dimensions like overall musicality, emotional engagement, and vocal expressiveness.

Track Settings

The competition consists of two tracks:

Track 1: Overall Musicality Score Prediction Participants must predict a single holistic aesthetic score for each song, representing an overall musical impression of the song’s artistic quality.

Track 2: Fine-Grained Aesthetic Dimension Prediction Participants to predict five specific aesthetic dimensions for each song.

Evaluation

Each track will use correlation-based metrics as follows:

  • Linear Correlation Coefficient
  • Spearman’s rank correlation coefficient
  • Kendall’s Rank Correlation Coefficient
  • Top-Tier Accuracy

We will measure both system-level and utterance-level.

Top-Tier Accuracy Calculation Rules & Thresholds

Quantification Method: Top-Tier Accuracy is uniformly measured using the F1 score.

  • F1 Score Formula: F1 = 2 × (Precision × Recall) / (Precision + Recall)

  • Brief Definition: Precision = True Positives / (True Positives + False Positives); Recall = True Positives / (True Positives + False Negatives)

Top-Tier Song Thresholds:

  • Track 1 (Overall Musicality): Score ≥ 4.0
  • Track 2 (Subdivided Aesthetic Dimensions):
    • Coherence ≥ 4.0
    • Memorability ≥ 3.75
    • Naturalness ≥ 4.0
    • Clarity ≥ 3.75
    • Musicality ≥ 4.0

Submission Instructions

Final results can be submitted using Google form: ICASSP2026 ASAE Challenge Final Submission Form

Prediction files

Each participating team must submit one SCP file per track.

  • Track 1 submission format:

    Each line should contain: utt score

  • Track 2 submission format:

    Each line should contain: utt score1 score2 score3 score4 score5

Please ensure that the utt IDs exactly match those in the provided test sets.

All predicted scores are in a valid numeric format.

Files are named as track1_set1_pred.scp, track1_set2_pred.scp and track2_pred.scp, respectively.

System description (2 pages)

  • Each team must submit a two-page system description summarizing their method, model architecture, training strategy, and any relevant implementation details.(excluding references)

  • The format should follow the ICASSP official paper template (available on the ICASSP 2026 website)

Submission of both the prediction files and the system description is required. Missing either will lead to cancellation of the challenge ranking.

Baseline System

The competition provides a baseline system built upon SongEval. The baseline toolkit leverages a trained aesthetic evaluation model on SongEval, enabling automatic scoring of generated songs across five perceptual dimensions, closely aligned with professional musicians’ judgments.

The baseline test validation IDs are available in the val_ids.txt file.

This baseline serves as a reproducible and extensible starting point, helping participants better benchmark their systems and ensuring fair comparison across different approaches.

Leaderboard

Track 1: Overall Musicality Score Prediction

Rank Team Name Score
1🏆 Hachimi 0.575
2🏆 BAL-RAE 0.556
3🏆 qualifier 0.529
4 HyperCritic 0.518
5 Baseline 0.510
6 yyyf 0.507
6 LoveAImusic 0.507
8 LeVo 0.503
9 Ah3Dui 0.497
9 Niuguangshuo 0.497
11 nbu 0.496
12 mi-whu 0.476
13 BHE-AIM 0.469
14 Harmonics 0.438
15 Team_Mingda 0.429
16 MAIL 0.426
17 IITJVision 0.425
18 PIRL 0.424
19 DYME 0.388

Track 2: Fine-Grained Aesthetic Dimension Prediction

Rank Team Name Score
1🏆 LeVo 0.655
2🏆 HyperCritic 0.604
3 Team Resonance 0.598
4 mi-whu 0.596
5 BAL-RAE 0.589
6 Baseline 0.574
7 yyyf 0.573
8 LoveAImusic 0.568
9 MAIL 0.567
10 Niuguangshuo 0.563
11 Ah3Dui 0.553
11 PIRL 0.553
13 Harmonics 0.525
14 DYME 0.501
15 nk_hlt_group 0.499
16 Hachimi 0.493
17 nbu 0.484

Note: 🏆 indicates teams invited to submit ICASSP 2-page papers.

Detailed Analysis

Track 1 Scoring Methodology

The final score for Track 1 is derived from two test sets (Set1 and Set2). Each set is evaluated at both the Utterance (UTT) and System (SYS) levels.

  • Metric Calculation per Set

For each dataset (Set 1 and Set 2), we first calculate the composite metrics for each team. Since the TTA metric exists only at the UTT level, it is used directly. For the other metrics (LCC, SRCC, KATU), we average the scores from the SYS and UTT levels:

$$\begin{aligned} \text{LCC}_{avg} &= \frac{\text{LCC}_{sys} + \text{LCC}_{utt}}{2} \\ \text{SRCC}_{avg} &= \frac{\text{SRCC}_{sys} + \text{SRCC}_{utt}}{2} \\ \text{KATU}_{avg} &= \frac{\text{KATU}_{sys} + \text{KATU}_{utt}}{2} \end{aligned}$$
  • Score Calculation per Set

The score for a specific set is the average of these four metrics:

$$Score_{set} = \frac{\text{LCC}_{avg} + \text{SRCC}_{avg} + \text{KATU}_{avg} + \text{TTA}_{utt}}{4}$$
  • Final Track 1 Score

The final score for Track 1 is a weighted average of the scores from Set 1(Easy) and Set 2(Hard), with a ratio of 2:8:

$$\text{Final Score (Track 1)} = 0.2 \times Score_{set1} + 0.8 \times Score_{set2}$$

Track 2 Scoring Methodology

Track 2 evaluation involves five dimensions: Coherence, Naturalness, Memorability, Clarity, and Musicality. The calculation proceeds in three steps:

  • Average Calculation per Metric across Dimensions

For each dimension (Coherence, Naturalness, Memorability, Clarity, Musicality), we first calculate the average of LCC, SRCC, and KATU at both UTT and SYS levels:

$$\text{LCC}_{avg\_dim} = \frac{\text{LCC}_{sys\_dim} + \text{LCC}_{utt\_dim}}{2}$$ $$\text{SRCC}_{avg\_dim} = \frac{\text{SRCC}_{sys\_dim} + \text{SRCC}_{utt\_dim}}{2}$$ $$\text{KATU}_{avg\_dim} = \frac{\text{KATU}_{sys\_dim} + \text{KATU}_{utt\_dim}}{2}$$

TTA is used directly from the UTT level, as it is only available there.

  • Calculate Final Score for Each Dimension

For each dimension, we then calculate the average of LCC, SRCC, KATU, and TTA:

$$\text{Final Score}_{dim} = \frac{\text{LCC}_{avg\_dim} + \text{SRCC}_{avg\_dim} + \text{KATU}_{avg\_dim} + \text{TTA}_{utt\_dim}}{4}$$
  • Overall Track 2 Final Score

Finally, we calculate the overall Track 2 score by averaging the final scores of all five dimensions:

$$\text{Final Score (Track 2)} = \frac{\sum_{d=1}^{5} \text{Final Score}_{dim}}{5}$$

Note: The LCC, SRCC, and KATU scores shown in the above figures are averaged from both utterance-level (UTT) and system-level (SYS) evaluations. For more detailed results, please refer to the results folder.

Timeline

  • September 01, 2025: Registration opens
  • September 10, 2025: Train set and baseline system release
  • November 10, 2025: Test set release
  • November 20, 2025: Results and system description submission deadline
  • December 07, 2025 January 07, 2026: ICASSP2026 GC paper submission deadline (invited only)
  • January 11, 2026 January 21, 2026: ICASSP2026 GC paper acceptance notification
  • January 18, 2026 January 28, 2026: ICASSP2026 GC camera-ready paper submission deadline
  • May 4-8, 2026: ICASSP 2026 at Barcelona, Spain

Organizers

The challenge is organized by a distinguished team of researchers:

  • Lei Xie, Northwestern Polytechnical University (China)
  • Hao Liu, Shanghai Conservatory of Music (China)
  • Wenwu Wang, University of Surrey (United Kingdom)
  • Wei Xue, Hong Kong University of Science and Technology (Hong Kong, China)
  • Shuai Wang, Nanjing University (China)
  • Yui Sudo, SB Intuition (Japan)
  • Ting Dang, University of Melbourne (Australia)
  • Haohe Liu, Meta (United Kingdom)
  • Hexin Liu, Nanyang Technological University (Singapore)
  • Xiangyu Zhang, University of New South Wales (Australia)
  • Jingyao Wu, Massachusetts Institute of Technology (United States of America)
  • Hao Shi, SB Intuition (Japan)
  • Jixun Yao, Northwestern Polytechnical University (China)
  • Huixin Xue, Shanghai Conservatory of Music (China)
  • Ziqian Ning, Northwestern Polytechnical University (China)
  • Ruibin Yuan, Hong Kong University of Science and Technology (Hong Kong, China)
  • Guobin Ma, Northwestern Polytechnical University (China)
  • Yuxuan Xia, Northwestern Polytechnical University (China)

Contact

For any inquiries, please contact: Email: yaojx@mail.nwpu.edu.cn

Welcome to join our WeChat group