Driven by the rapid advancement of Large Language Models (LLMs) and Multimodal LLMs,
AI-powered smart glasses are emerging as a next-generation platform for human-computer interaction.
Equipped with microphone arrays and cameras, smart glasses naturally capture the wearer’s
egocentric (first-person) perspective, enabling hands-free multimodal communication throughout daily
life.
However, deploying robust speech-centric interaction systems on smart glasses introduces
distinct challenges compared with traditional stationary devices such as smart speakers
or handheld devices such as smartphones. Smart glasses operate in highly dynamic acoustic
environments, including environmental noise, user-generated motion noise, and speech from
surrounding people.
To address these challenges, the SmartGlasses Challenge introduces a new benchmark
for evaluating Automatic Speech Recognition (ASR) and Spoken Language Understanding (SLU)
in real-world egocentric interaction scenarios, including human–machine dialogue,
dyadic conversation, and multi-party meetings.