Emotional Intelligence Track Results

1. Task Dimensions & Evaluation Methodology

The final score for this challenge track combines both automated metrics and human evaluation to ensure the results are professional and objective.

Evaluation Environment: The automated evaluation utilizes the Qwen/Qwen3-Omni-30B-A3B-Instruct model, which is deployed entirely in a local environment for scoring. For the detailed evaluation prompts, please refer to the challenge guidelines.

Task 1: Emotional Trajectory Detection

  • Dimension 1: Accuracy_Completeness
  • Dimension 2: Depth_Granularity
  • Dimension 3: Added_Value

Task 2: Emotional Reasoning

  • Dimension 1: Information_Integration
  • Dimension 2: Insight_RootCause
  • Dimension 3: Clarity_Logic

Task 3: Empathy Assessment

  • Dimension 1: Textual_empathy_insight
  • Dimension 2: Vocal_empathy_congruence
  • Dimension 3: Audio_quality_naturalness
Task Dimension Evaluation Method Evaluation Tool / Team
Task 1: Emotional Trajectory Detection Automated Evaluation Qwen/Qwen3-Omni-30B-A3B-Instruct
Task 2: Emotional Reasoning Automated Evaluation Qwen/Qwen3-Omni-30B-A3B-Instruct
Task 3: Empathy Assessment - Dimension 1 Automated Evaluation Qwen/Qwen3-Omni-30B-A3B-Instruct
Task 3: Empathy Assessment - Dimensions 2 & 3 Human Evaluation 20 Human Evaluators

⚠️ Rules on Violations & Anomalies To maintain fairness and validity, the following rules strictly apply:

  1. Language Mismatch: For both the Chinese and English test sets, if the language of a submitted response does not match the input audio, that sample will be automatically assigned the minimum score (1 point).
  2. Human Evaluation Team: The evaluation for Dimensions 2 and 3 of Task 3 is conducted by human evaluators organized by Beijing AIShell Co., Ltd. The composition and qualifications of the evaluators are detailed below:
  • Number of Evaluators: 20 in total.
  • Language Groups: 10 for the Chinese evaluation group, 10 for the English evaluation group.
  • Experience & Education: All evaluators possess a university Bachelor’s degree or higher and have over six months of relevant data-annotation/subjective-evaluation experience.
  • Language Proficiency: the Chinese evaluation group is composed of native Mandarin speakers. Evaluators assigned to the English test set possess fluent English proficiency.
  • Demographics:
    • Gender Distribution: 13 female, 7 male.
    • Age Distribution: Average age 24.2 years (Range: 22–27 years).

2. Final Scores and Ranking

The Final Score for each team is calculated as the weighted sum of scores from the respective task dimensions:

$$\text{Final Score(zh/en)} = (\text{Task-1-Avg} \times 0.2) + (\text{Task-2-Avg} \times 0.2) + (\text{Task-3-D1} \times 0.1) + (\text{Task-3-D2} \times 0.25) + (\text{Task-3-D3} \times 0.25)$$ $$\text{Final Score} = (\text{Final Score(zh)} \times 0.5) + (\text{Final Score(en)} \times 0.5)$$

Chinese Test Set Results
Team Task-1-D1 Task-1-D2 Task-1-D3 Task-1-Avg Task-2-D1 Task-2-D2 Task-2-D3 Task-2-Avg Task-3-D1 Task-3-D2 (Human Score) Task-3-D3 (Human Score) Final Score (zh)
BJTU_Unisound_team 4.89 4.95 4.95 4.93 4.57 4.54 4.73 4.61 3.92 3.73 3.81 4.18
HDTLAB 4.28 3.95 4.44 4.22 4.25 4.42 5.00 4.56 3.74 3.00 3.37 3.72
IUSpeech 3.53 3.19 3.57 3.43 3.15 3.15 4.78 3.69 3.40 2.78 3.13 3.24
Lingcon insight 3.53 3.08 3.47 3.36 3.33 3.07 4.76 3.72 3.38 2.89 3.25 3.29
SenseDialog 2.40 2.41 2.41 2.41 5.00 5.00 5.00 5.00 4.96 3.62 3.76 3.82
TeleAI 4.95 5.00 5.00 4.98 4.91 4.96 5.00 4.96 3.87 3.69 3.87 4.26
NJU-TencentHY 4.83 4.96 4.96 4.92 5.00 5.00 5.00 5.00 4.10 3.50 3.74 4.20
Baseline 3.23 3.15 3.25 3.21 2.95 2.61 4.18 3.25 3.28 3.00 3.31 3.20
English Test Set Results
Team Task-1-D1 Task-1-D2 Task-1-D3 Task-1-Avg Task-2-D1 Task-2-D2 Task-2-D3 Task-2-Avg Task-3-D1 Task-3-D2 (Human Score) Task-3-D3 (Human Score) Final Score (en)
BJTU_Unisound_team 4.48 4.65 4.65 4.59 4.88 4.89 4.95 4.91 4.13 3.96 3.72 4.23
HDTLAB 4.41 4.07 4.76 4.41 4.22 4.40 5.00 4.54 3.74 3.74 3.59 4.00
IUSpeech 2.35 2.00 2.20 2.18 2.10 1.94 2.91 2.32 2.15 3.61 3.54 2.90
Lingcon insight 1.77 1.75 1.84 1.79 1.84 1.63 2.51 1.99 2.25 3.10 3.09 2.53
SenseDialog 4.92 4.95 4.95 4.94 4.84 4.84 4.84 4.84 4.91 3.87 3.56 4.30
TeleAI 4.93 4.97 4.97 4.96 5.00 5.00 5.00 5.00 3.84 3.89 3.69 4.27
NJU-TencentHY 4.69 4.99 4.99 4.89 5.00 5.00 5.00 5.00 4.18 3.92 3.61 4.28
Baseline 2.13 1.92 2.04 2.03 2.03 1.89 2.74 2.22 2.19 2.70 2.81 2.45
Final Scores and Ranking
Team Final Score Ranking
TeleAI* 4.27 1
NJU-TencentHY* 4.24 2
BJTU_Unisound_team* 4.21 3
SenseDialog 4.06 4
HDTLAB 3.86 5
IUSpeech 3.07 6
Lingcon insight 2.91 7
Baseline 2.82 8

*: invited to submit ICASSP 2-page papers.

3. ICASSP 2-Page Papers

According to the grand challenge official rules, the top 3 teams in this track (TeleAI, Tencent Ai Lab-NJU and BJTU_Unisound_team) are invited to submit papers (2 pages main content + extra page with refs) to the ICASSP 2026 Grand Challenge Track.

  • Specific submission instructions will be emailed separately to the qualifying teams.