Human-like-Spoken-Dialogue-Systems-Challenge

The challenge aims to promote systematic, real-world evaluation of next-generation dialogue systems and advance the field toward truly human-like interaction.

Challenge Call

Have you been following the recent buzz around the impressive performance of next-generation voice dialogue models like GPT-4o, Doubao, and the newly released GPT-Realtime? They are not only lightning-fast and expressive but also enable seamless multimodal interactions, making conversations feel remarkably human.

From the traditional “clunky AI” to today’s “AI assistant,” the evolution of voice dialogue systems has been nothing short of astonishing. But just how far are we from achieving truly “natural human-machine dialogue”? While current voice models excel in technical metrics, they still lack a certain “human touch.” They may recognize single emotions like “happiness” or “sadness,” but struggle to truly understand the complexity of our emotional changes or empathize with our situations. They may engage in fluent one-on-one exchanges, yet become flustered in real-world interaction scenarios such as interruptions, overlapping speech, or group chats. This is the “uncanny valley” that current voice dialogue systems struggle to cross.

To break through this bottleneck and advance technology toward truly “human-like” interaction, a coalition of institutions—including Northwestern Polytechnical University, Nanjing University, The Chinese University of Hong Kong, Huawei Technologies Co., Ltd., and AISHELL—has jointly launched the HumDial (Human-like Spoken Dialogue Systems) Challenge! We believe a truly intelligent dialogue system must not only “understand clearly, reason logically, and express coherently” but also possess the ability to interact seamlessly with humans in real, emotionally complex environments.

The inaugural HumDial2026 Challenge will be held at ICASSP 2026, a premier conference for speech research, and will focus on two core challenges:

  • Emotional Intelligence: Moving beyond simplistic emotion labeling, this track will test a model’s ability to accurately understand context-dependent emotions, provide empathetic responses, conduct in-depth reasoning, and dynamically track emotional shifts—empowering AI to truly understand and connect with users.
  • Full-Duplex Interaction: Breaking free from rigid turn-based exchanges, this track will evaluate a system’s ability to handle interruptions, overlapping speech, real-time feedback, and natural conversational rhythms, helping AI learn to communicate more naturally.

We will not only introduce brand-new evaluation dimensions but also release exclusive, finely annotated datasets of real-world scenarios for each track. If you’re passionate about “human-like” dialogue systems and eager to shape the future of next-generation voice interaction, we welcome you to follow and register for the challenge! Let’s work together to turn AI into a warm, emotionally aware communication partner.

Registration

Teams can register by the google form: https://docs.google.com/forms/d/e/1FAIpQLSdRrlfqrhh8QhOxtKMr03AxnnX14md_EwFuIuMt-Hf4fhhARA/viewform?usp=header

Reminder! Please use your institutional or corporate email address to register, and avoid using personal email accounts.

Timeline

  • August 20, 2025: Registration opens
  • September 29, 2025: Release of training set, validation set, and baseline system(delay for a few days)
  • November 10, 2025: Release of test set
  • November 25, 2025: Submission deadline
  • December 7, 2025: Deadline for submitting 2-page papers to ICASSP 2026 (invited teams only)
  • January 11, 2026: Notification of acceptance for 2-page ICASSP 2026 papers
  • January 18, 2026: Submission of final version of papers
  • May 4–8, 2026: ICASSP 2026 Conference, Barcelona, Spain

Guidelines for participants

  1. Model Requirements: Participants may submit systems based on either end-to-end architectures or cascaded pipelines . There is no restriction on the model structure, but all models must be trained using publicly available resources.
  2. Data Usage: Use of the official test set or any of its labels for model training or tuning is strictly prohibited. Participants are not allowed to use any private or unauthorized datasets.
  3. Submission Format: Participants must submit a docker container with the complete system and source code along with detailed instructions for reproduction. All submissions must be executable and allow for transparent verification by the organizers.
  4. Prizes and Awards: The top 3 teams in each track will receive prizes based on final rankings: 5,000 USD for 1st place, 3,000 USD for 2nd place, and 2,000 USD for 3rd place. Winning teams will be invited to present their work at the ICASSP 2026 special session
  5. Final Interpretation: The organizing committee reserves the right of final interpretation of the rules and all matters related to the challenge.

Organizers

The challenge is organized by a distinguished team of researchers:

  • Lei Xie, Professor, Northwestern Polytechnical University
  • Shuai Wang, Associate Professor, Nanjing University
  • Haizhou Li, Professor, Chinese University of Hong Kong
  • Eng Siong Chng, Professor, Nanyang Technological University
  • Hung-yi Lee, Professor, Natioanl Taiwan University
  • Chao Zhang, Assistant Professor, Tsinghua University
  • Guangzhi Sun, Junior Research Fellow, University of Cambridge
  • Xixin Wu, Assistant Professor, Chinese University of Hong Kong
  • Longshuai Xiao, Huawei Technologies
  • Zihan Zhang, Huawei Technologies
  • Xinsheng Wang, Soul AI Lab
  • Hui Bu, AISHELL
  • Xin Xu, AISHELL
  • Zhixian Zhao, Northwestern Polytechnical University
  • Hongfei Xue, Northwestern Polytechnical University
  • Xuelong Geng, Northwestern Polytechnical University
  • GuoJian Li, Northwestern Polytechnical University
  • Shuiyuan Wang, Northwestern Polytechnical University

Contact

For any inquiries, please contact:

Welcome to join our WeChat group

Are you ready?

Get started with task 1