Emotional Intelligence Track Results
1. Task Dimensions & Evaluation Methodology
The final score for this challenge track combines both automated metrics and human evaluation to ensure the results are professional and objective.
Evaluation Environment: The automated evaluation utilizes the Qwen/Qwen3-Omni-30B-A3B-Instruct model, which is deployed entirely in a local environment for scoring. For the detailed evaluation prompts, please refer to the challenge guidelines.
Task 1: Emotional Trajectory Detection
- Dimension 1: Accuracy_Completeness
- Dimension 2: Depth_Granularity
- Dimension 3: Added_Value
Task 2: Emotional Reasoning
- Dimension 1: Information_Integration
- Dimension 2: Insight_RootCause
- Dimension 3: Clarity_Logic
Task 3: Empathy Assessment
- Dimension 1: Textual_empathy_insight
- Dimension 2: Vocal_empathy_congruence
- Dimension 3: Audio_quality_naturalness
| Task Dimension | Evaluation Method | Evaluation Tool / Team |
|---|---|---|
| Task 1: Emotional Trajectory Detection | Automated Evaluation | Qwen/Qwen3-Omni-30B-A3B-Instruct |
| Task 2: Emotional Reasoning | Automated Evaluation | Qwen/Qwen3-Omni-30B-A3B-Instruct |
| Task 3: Empathy Assessment - Dimension 1 | Automated Evaluation | Qwen/Qwen3-Omni-30B-A3B-Instruct |
| Task 3: Empathy Assessment - Dimensions 2 & 3 | Human Evaluation | 20 Human Evaluators |
⚠️ Rules on Violations & Anomalies To maintain fairness and validity, the following rules strictly apply:
- Language Mismatch: For both the Chinese and English test sets, if the language of a submitted response does not match the input audio, that sample will be automatically assigned the minimum score (1 point).
- Human Evaluation Team: The evaluation for Dimensions 2 and 3 of Task 3 is conducted by human evaluators organized by Beijing AIShell Co., Ltd. The composition and qualifications of the evaluators are detailed below:
- Number of Evaluators: 20 in total.
- Language Groups: 10 for the Chinese evaluation group, 10 for the English evaluation group.
- Experience & Education: All evaluators possess a university Bachelor’s degree or higher and have over six months of relevant data-annotation/subjective-evaluation experience.
- Language Proficiency: the Chinese evaluation group is composed of native Mandarin speakers. Evaluators assigned to the English test set possess fluent English proficiency.
- Demographics:
- Gender Distribution: 13 female, 7 male.
- Age Distribution: Average age 24.2 years (Range: 22–27 years).
2. Final Scores and Ranking
The Final Score for each team is calculated as the weighted sum of scores from the respective task dimensions:
$$\text{Final Score(zh/en)} = (\text{Task-1-Avg} \times 0.2) + (\text{Task-2-Avg} \times 0.2) + (\text{Task-3-D1} \times 0.1) + (\text{Task-3-D2} \times 0.25) + (\text{Task-3-D3} \times 0.25)$$ $$\text{Final Score} = (\text{Final Score(zh)} \times 0.5) + (\text{Final Score(en)} \times 0.5)$$
Chinese Test Set Results
| Team | Task-1-D1 | Task-1-D2 | Task-1-D3 | Task-1-Avg | Task-2-D1 | Task-2-D2 | Task-2-D3 | Task-2-Avg | Task-3-D1 | Task-3-D2 (Human Score) | Task-3-D3 (Human Score) | Final Score (zh) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BJTU_Unisound_team | 4.89 | 4.95 | 4.95 | 4.93 | 4.57 | 4.54 | 4.73 | 4.61 | 3.92 | 3.73 | 3.81 | 4.18 |
| HDTLAB | 4.28 | 3.95 | 4.44 | 4.22 | 4.25 | 4.42 | 5.00 | 4.56 | 3.74 | 3.00 | 3.37 | 3.72 |
| IUSpeech | 3.53 | 3.19 | 3.57 | 3.43 | 3.15 | 3.15 | 4.78 | 3.69 | 3.40 | 2.78 | 3.13 | 3.24 |
| Lingcon insight | 3.53 | 3.08 | 3.47 | 3.36 | 3.33 | 3.07 | 4.76 | 3.72 | 3.38 | 2.89 | 3.25 | 3.29 |
| SenseDialog | 2.40 | 2.41 | 2.41 | 2.41 | 5.00 | 5.00 | 5.00 | 5.00 | 4.96 | 3.62 | 3.76 | 3.82 |
| TeleAI | 4.95 | 5.00 | 5.00 | 4.98 | 4.91 | 4.96 | 5.00 | 4.96 | 3.87 | 3.69 | 3.87 | 4.26 |
| NJU-TencentHY | 4.83 | 4.96 | 4.96 | 4.92 | 5.00 | 5.00 | 5.00 | 5.00 | 4.10 | 3.50 | 3.74 | 4.20 |
| Baseline | 3.23 | 3.15 | 3.25 | 3.21 | 2.95 | 2.61 | 4.18 | 3.25 | 3.28 | 3.00 | 3.31 | 3.20 |
English Test Set Results
| Team | Task-1-D1 | Task-1-D2 | Task-1-D3 | Task-1-Avg | Task-2-D1 | Task-2-D2 | Task-2-D3 | Task-2-Avg | Task-3-D1 | Task-3-D2 (Human Score) | Task-3-D3 (Human Score) | Final Score (en) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BJTU_Unisound_team | 4.48 | 4.65 | 4.65 | 4.59 | 4.88 | 4.89 | 4.95 | 4.91 | 4.13 | 3.96 | 3.72 | 4.23 |
| HDTLAB | 4.41 | 4.07 | 4.76 | 4.41 | 4.22 | 4.40 | 5.00 | 4.54 | 3.74 | 3.74 | 3.59 | 4.00 |
| IUSpeech | 2.35 | 2.00 | 2.20 | 2.18 | 2.10 | 1.94 | 2.91 | 2.32 | 2.15 | 3.61 | 3.54 | 2.90 |
| Lingcon insight | 1.77 | 1.75 | 1.84 | 1.79 | 1.84 | 1.63 | 2.51 | 1.99 | 2.25 | 3.10 | 3.09 | 2.53 |
| SenseDialog | 4.92 | 4.95 | 4.95 | 4.94 | 4.84 | 4.84 | 4.84 | 4.84 | 4.91 | 3.87 | 3.56 | 4.30 |
| TeleAI | 4.93 | 4.97 | 4.97 | 4.96 | 5.00 | 5.00 | 5.00 | 5.00 | 3.84 | 3.89 | 3.69 | 4.27 |
| NJU-TencentHY | 4.69 | 4.99 | 4.99 | 4.89 | 5.00 | 5.00 | 5.00 | 5.00 | 4.18 | 3.92 | 3.61 | 4.28 |
| Baseline | 2.13 | 1.92 | 2.04 | 2.03 | 2.03 | 1.89 | 2.74 | 2.22 | 2.19 | 2.70 | 2.81 | 2.45 |
Final Scores and Ranking
| Team | Final Score | Ranking |
|---|---|---|
| TeleAI* | 4.27 | 1 |
| NJU-TencentHY* | 4.24 | 2 |
| BJTU_Unisound_team* | 4.21 | 3 |
| SenseDialog | 4.06 | 4 |
| HDTLAB | 3.86 | 5 |
| IUSpeech | 3.07 | 6 |
| Lingcon insight | 2.91 | 7 |
| Baseline | 2.82 | 8 |
*: invited to submit ICASSP 2-page papers.
3. ICASSP 2-Page Papers
According to the grand challenge official rules, the top 3 teams in this track (TeleAI, Tencent Ai Lab-NJU and BJTU_Unisound_team) are invited to submit papers (2 pages main content + extra page with refs) to the ICASSP 2026 Grand Challenge Track.
- Specific submission instructions will be emailed separately to the qualifying teams.