Human-like Spoken Dialogue Systems Challenge

The challenge aims to promote systematic, real-world evaluation of next-generation dialogue systems and advance the field toward truly human-like interaction.

News and Updates

December 13, 2025: We are pleased to announce the challenge results: Track 1 Results, Track 2 Results

November 17, 2025: Upon review, anomalies were identified in a small number of dialogue data samples. To ensure the accuracy and reliability of the evaluation results, the following items will be excluded from the final evaluation:

  • Track 1: task2_4_en_0002_0014 and items with dialogue_id prefixes task2_3_zh_0015, task2_4_zh_0015, and task2_5_zh_0015.
  • Track 2: test-003126.wav, clean-000899.wav

November 17, 2025: The test set and submission rules have been released. Please refer to the page for each track for detailed information.

November 11, 2025: Due to the delayed release of the training data, the release of the test data was also postponed. We expect to release the test set around November 17. Please stay tuned.

October 29, 2025: We have released the baseline model; you can download the model files from the pages for the two tracks.

October 10, 2025: We have sent training and dev dataset access instructions and other relevant information for this challenge via email to all successfully registered teams.

If you have not received the email, please:

  • Check the email address you used for registration and confirm that it was an institutional email rather than a personal email (for example, QQ, Gmail, 163, etc.). We only sent emails to valid institutional addresses.
  • Check your email’s spam or subscription folder.

Challenge Call

Have you been following the recent buzz around the impressive performance of next-generation voice dialogue models like GPT-4o, Doubao, and the newly released GPT-Realtime? They are not only lightning-fast and expressive but also enable seamless multimodal interactions, making conversations feel remarkably human.

From the traditional “clunky AI” to today’s “AI assistant,” the evolution of voice dialogue systems has been nothing short of astonishing. But just how far are we from achieving truly “natural human-machine dialogue”? While current voice models excel in technical metrics, they still lack a certain “human touch.” They may recognize single emotions like “happiness” or “sadness,” but struggle to truly understand the complexity of our emotional changes or empathize with our situations. They may engage in fluent one-on-one exchanges, yet become flustered in real-world interaction scenarios such as interruptions, overlapping speech, or group chats. This is the “uncanny valley” that current voice dialogue systems struggle to cross.

To break through this bottleneck and advance technology toward truly “human-like” interaction, a coalition of institutions—including Northwestern Polytechnical University, Nanjing University, The Chinese University of Hong Kong, Huawei Technologies Co., Ltd., and AISHELL—has jointly launched the HumDial (Human-like Spoken Dialogue Systems) Challenge! We believe a truly intelligent dialogue system must not only “understand clearly, reason logically, and express coherently” but also possess the ability to interact seamlessly with humans in real, emotionally complex environments.

The inaugural HumDial2026 Challenge will be held at ICASSP 2026, a premier conference for speech research, and will focus on two core challenges:

  • Emotional Intelligence: Moving beyond simplistic emotion labeling, this track will test a model’s ability to accurately understand context-dependent emotions, provide empathetic responses, conduct in-depth reasoning, and dynamically track emotional shifts—empowering AI to truly understand and connect with users.
  • Full-Duplex Interaction: Breaking free from rigid turn-based exchanges, this track will evaluate a system’s ability to handle interruptions, overlapping speech, real-time feedback, and natural conversational rhythms, helping AI learn to communicate more naturally.

We will not only introduce brand-new evaluation dimensions but also release exclusive, finely annotated datasets of real-world scenarios for each track. If you’re passionate about “human-like” dialogue systems and eager to shape the future of next-generation voice interaction, we welcome you to follow and register for the challenge! Let’s work together to turn AI into a warm, emotionally aware communication partner.

Registration

Teams can register by the google form: https://docs.google.com/forms/d/e/1FAIpQLSdRrlfqrhh8QhOxtKMr03AxnnX14md_EwFuIuMt-Hf4fhhARA/viewform?usp=header

Reminder! Please use your institutional or corporate email address to register, and avoid using personal email accounts.

Timeline

  • August 20, 2025: Registration opens
  • September 29, 2025: Release of training set, validation set, and baseline system
  • November 10, 2025: Release of test set
  • November 25, 2025: Submission deadline
  • December 07, 2025 January 07, 2026: Deadline for submitting 2-page papers to ICASSP 2026 (invited teams only)
  • January 11, 2026 January 21, 2026: Notification of acceptance for 2-page ICASSP 2026 papers
  • January 18, 2026 January 28, 2026: Submission of camera-ready papers
  • May 4–8, 2026: ICASSP 2026 Conference, Barcelona, Spain

Official Competition Rules

1. Resource Usage Policy

To ensure the fairness, integrity, and transparency of the competition, all participating teams must strictly adhere to the following regulations.

1.1 Resource Definitions

  • Internal Resources: Official datasets, baseline models, and accompanying documentation directly provided by the organizers.
  • External Resources: Any resources not provided by the organizers, including but not limited to external data, pre-trained models, open-source libraries, and third-party API services.

1.2 External Resource Usage

External Data

  • Must be publicly available datasets. This includes any data that researchers or groups can obtain through public channels (e.g., official websites, academic data platforms, open-source communities) via direct download or a standard application process.
  • The use of any private, non-public, or access-restricted proprietary datasets is strictly prohibited.

External Pre-trained Models

  • Must be publicly available, open-source models.
  • The use of any open-source pre-trained models available through public channels (such as Hugging Face, GitHub, or official model repositories) is permitted. Submissions must be accompanied by clear version information for all external models used.

1.3 Resource Declaration Requirement

In the final technical report, participating teams must provide a clear and complete list of all resources used (both internal and external), detailing how each was applied.

2. Competition Dataset Usage Policy

Train Set: Participants may use the official training subset provided.

  • Standard data augmentation techniques (e.g., adding noise, pitch shifting, speed variation) on the official training set are permitted.
  • Supplementary training with external public datasets is allowed, provided they are in full compliance with Section 1.2.
  • If generating synthetic data (e.g., using Text-to-Speech), the underlying model (e.g., the TTS model) must itself be a publicly available, open-source model compliant with Section 1.2.
  • All data augmentation methods, synthesis processes, and external data sources must be thoroughly documented in the final technical report.

Dev Set: May be used for model performance evaluation and debugging.

Test Set: The competition leaderboard will be based on model performance on the test set. The organizers will provide a public test set for participants to validate their models. However, the final ranking will be determined by a comprehensive evaluation based on performance on both the public test set and a private (hidden) test set, where results must be successfully reproduced by the organizers.

3. Submission Requirements

To ensure fairness and reproducibility, all teams must submit a complete and self-contained submission package that is independently runnable by the specified deadline.

3.1 Submission Package Contents

The submission package must include all of the following:

  • Results File: The model inference output, formatted as specified (e.g., submission.json).
  • Model Files: Complete model weights, configuration files, and all necessary dependencies.
  • Docker Image: A Docker image with a pre-configured environment, supporting one-click inference execution.
  • Technical Documentation: A detailed README.md file that clearly explains how to use the provided code and model files to reproduce the submitted results.

3.2 Source Code Requirements

  • Reproduction Script: A one-click startup script (e.g., run.sh) must be provided to execute the complete inference pipeline and generate the final result file(s).

3.3 Docker Image Specifications

  • Environment Consistency: The Docker image must contain all necessary environments, software libraries, and dependencies to ensure seamless execution in the evaluation environment.
  • One-Click Inference: The Docker image must contain an executable script that, when run, automatically completes the entire inference process and generates the output file in the specified format.
  • Detailed Guide: Specific instructions for building, using, and submitting the Docker image will be provided with the release of the baseline implementation.

4. Oversight and Interpretation

4.1 Right of Interpretation

The organizers reserve the final right to interpret all rules of this competition. The organizers may adjust or supplement the rules as necessary during the competition, and any changes will be communicated to all teams in a timely manner.

4.2 Fairness and Authenticity Verification

To ensure the integrity of the competition, the organizers reserve the right to audit all submissions. If requested, teams are required to provide further technical details to facilitate this review.

4.3 Handling of Violations

If a team engages in any of the following actions, the organizers reserve the right to unilaterally disqualify the team, revoke any awards, and reclaim all prizes and monetary bonuses:

  • The submission contains falsified or fabricated information.
  • A serious violation of competition rules or submission requirements has occurred.
  • Engaging in any form of cheating.

Organizers

The challenge is organized by a distinguished team of researchers:

  • Lei Xie, Professor, Northwestern Polytechnical University
  • Shuai Wang, Associate Professor, Nanjing University
  • Haizhou Li, Professor, Chinese University of Hong Kong
  • Eng Siong Chng, Professor, Nanyang Technological University
  • Hung-yi Lee, Professor, Natioanl Taiwan University
  • Chao Zhang, Assistant Professor, Tsinghua University
  • Guangzhi Sun, Junior Research Fellow, University of Cambridge
  • Xixin Wu, Assistant Professor, Chinese University of Hong Kong
  • Longshuai Xiao, Huawei Technologies
  • Zihan Zhang, Huawei Technologies
  • Xinsheng Wang, Soul AI Lab
  • Hui Bu, AISHELL
  • Xin Xu, AISHELL
  • Zhixian Zhao, Northwestern Polytechnical University
  • Hongfei Xue, Northwestern Polytechnical University
  • Xuelong Geng, Northwestern Polytechnical University
  • GuoJian Li, Northwestern Polytechnical University
  • Shuiyuan Wang, Northwestern Polytechnical University

Contact

For any inquiries, please contact:

Welcome to join our WeChat group

Are you ready?

Get started with track 1