The challenge aims to promote systematic, real-world evaluation of next-generation dialogue systems and advance the field toward truly human-like interaction.
December 13, 2025: We are pleased to announce the challenge results: Track 1 Results, Track 2 Results
November 17, 2025: Upon review, anomalies were identified in a small number of dialogue data samples. To ensure the accuracy and reliability of the evaluation results, the following items will be excluded from the final evaluation:
November 17, 2025: The test set and submission rules have been released. Please refer to the page for each track for detailed information.
November 11, 2025: Due to the delayed release of the training data, the release of the test data was also postponed. We expect to release the test set around November 17. Please stay tuned.
October 29, 2025: We have released the baseline model; you can download the model files from the pages for the two tracks.
October 10, 2025: We have sent training and dev dataset access instructions and other relevant information for this challenge via email to all successfully registered teams.
If you have not received the email, please:
Have you been following the recent buzz around the impressive performance of next-generation voice dialogue models like GPT-4o, Doubao, and the newly released GPT-Realtime? They are not only lightning-fast and expressive but also enable seamless multimodal interactions, making conversations feel remarkably human.
From the traditional “clunky AI” to today’s “AI assistant,” the evolution of voice dialogue systems has been nothing short of astonishing. But just how far are we from achieving truly “natural human-machine dialogue”? While current voice models excel in technical metrics, they still lack a certain “human touch.” They may recognize single emotions like “happiness” or “sadness,” but struggle to truly understand the complexity of our emotional changes or empathize with our situations. They may engage in fluent one-on-one exchanges, yet become flustered in real-world interaction scenarios such as interruptions, overlapping speech, or group chats. This is the “uncanny valley” that current voice dialogue systems struggle to cross.
To break through this bottleneck and advance technology toward truly “human-like” interaction, a coalition of institutions—including Northwestern Polytechnical University, Nanjing University, The Chinese University of Hong Kong, Huawei Technologies Co., Ltd., and AISHELL—has jointly launched the HumDial (Human-like Spoken Dialogue Systems) Challenge! We believe a truly intelligent dialogue system must not only “understand clearly, reason logically, and express coherently” but also possess the ability to interact seamlessly with humans in real, emotionally complex environments.
The inaugural HumDial2026 Challenge will be held at ICASSP 2026, a premier conference for speech research, and will focus on two core challenges:
We will not only introduce brand-new evaluation dimensions but also release exclusive, finely annotated datasets of real-world scenarios for each track. If you’re passionate about “human-like” dialogue systems and eager to shape the future of next-generation voice interaction, we welcome you to follow and register for the challenge! Let’s work together to turn AI into a warm, emotionally aware communication partner.
Teams can register by the google form: https://docs.google.com/forms/d/e/1FAIpQLSdRrlfqrhh8QhOxtKMr03AxnnX14md_EwFuIuMt-Hf4fhhARA/viewform?usp=header
Reminder! Please use your institutional or corporate email address to register, and avoid using personal email accounts.
To ensure the fairness, integrity, and transparency of the competition, all participating teams must strictly adhere to the following regulations.
External Data
External Pre-trained Models
In the final technical report, participating teams must provide a clear and complete list of all resources used (both internal and external), detailing how each was applied.
Train Set: Participants may use the official training subset provided.
Dev Set: May be used for model performance evaluation and debugging.
Test Set: The competition leaderboard will be based on model performance on the test set. The organizers will provide a public test set for participants to validate their models. However, the final ranking will be determined by a comprehensive evaluation based on performance on both the public test set and a private (hidden) test set, where results must be successfully reproduced by the organizers.
To ensure fairness and reproducibility, all teams must submit a complete and self-contained submission package that is independently runnable by the specified deadline.
The submission package must include all of the following:
The organizers reserve the final right to interpret all rules of this competition. The organizers may adjust or supplement the rules as necessary during the competition, and any changes will be communicated to all teams in a timely manner.
To ensure the integrity of the competition, the organizers reserve the right to audit all submissions. If requested, teams are required to provide further technical details to facilitate this review.
If a team engages in any of the following actions, the organizers reserve the right to unilaterally disqualify the team, revoke any awards, and reclaim all prizes and monetary bonuses:
The challenge is organized by a distinguished team of researchers:
For any inquiries, please contact:
Welcome to join our WeChat group