SmartGlasses 2026

Challenge Call

Challenge and Scenario Description

Driven by the rapid advancement of Large Language Models (LLMs) and Multimodal LLMs, AI-powered smart glasses are emerging as a next-generation platform for human-computer interaction. Equipped with microphone arrays and cameras, smart glasses naturally capture the wearer’s egocentric (first-person) perspective, enabling hands-free multimodal communication throughout daily life.

However, deploying robust speech-centric interaction systems on smart glasses introduces distinct challenges compared with traditional stationary devices such as smart speakers or handheld devices such as smartphones. Smart glasses operate in highly dynamic acoustic environments, including environmental noise, user-generated motion noise, and speech from surrounding people.

To address these challenges, the SmartGlasses Challenge introduces a new benchmark for evaluating Automatic Speech Recognition (ASR) and Spoken Language Understanding (SLU) in real-world egocentric interaction scenarios, including human–machine dialogue, dyadic conversation, and multi-party meetings.

Latest News

Stay updated with the latest announcements from the organizers

Here are the latest updates and news from the ESmartGlasses Challenge organizers:

[Date TBD] Challenge registration opens.
[Date TBD] Dataset release announcement.
[Date TBD] Workshop details announced.

Track Overview

Details about the Tracks to be completed

Track 1: Human–Machine Command Dialogue

Scenario: Daily human–AI glasses dialogue recorded on-device, covering both single-turn commands and multi-turn conversational interactions grounded in audio and visual context.

ASR Task: Evaluate speech recognition robustness under diverse real-world environments including noisy streets, cycling, and whisper speech. The evaluation metric is Word Error Rate (WER).

SLU Task: Evaluate the system’s ability to interpret the user’s intent from spoken commands. In multi-turn interactions, the system must leverage dialogue history to resolve contextual dependencies.

Track 2: Dyadic Dialogue Understanding

Scenario: Face-to-face two-person conversations in everyday settings, involving overlapping speech, background interference, topic shifts, and complex semantic structures.

TSA-ASR Task: Evaluate speaker-attributed transcription with time alignment in overlapping speech scenarios. The metric is time-constrained minimum permutation WER (tcpWER).

SLU Task: Evaluate the system’s ability to capture factual details, track logical flow, and understand relationships between speakers within dyadic dialogues.

Track 3: Multi-Party Meeting Understanding

Scenario: Multi-speaker meetings with varying numbers of participants, frequent turn-taking, long conversational contexts, and domain-specific vocabulary.

TSA-ASR Task: Evaluate multi-speaker speech recognition with speaker diarization and temporal alignment in highly overlapped environments. The metric is tcpWER.

SLU Task: Evaluate the system’s ability to understand complex meeting discussions, extract key information, and summarize speaker-wise viewpoints from long-form conversations.

Data Description

Explore the datasets and evaluation metrics for the challenge

The SmartGlasses dataset comprises about 200 hours of 4-channel audio-visual recordings, entirely recorded in real-life scenarios using commercial-grade AI glasses. To ensure the comprehensiveness and fairness of the evaluation, the dataset covers diverse acoustic environments, varying speech volumes, rich user profiles, and introduces highly challenging multispeaker conversation scenarios.

Track	Scene	Data Modality	Audio Channels	Total Duration	Tasks
Track 1	Human–Machine Command Dialogue	Audio & Video	4	TBD	ASR & SLU
Track 2	Dyadic Dialogue Understanding	Audio	4	TBD	TSA-ASR & SLU
Track 3	Multi-Party Meeting Understanding	Audio	4	TBD	TSA-ASR & SLU

Evaluation Metric

Understand how your results will be evaluated

ASR Evaluation: Track 1 adopts the standard Word Error Rate (WER). Tracks 2 and 3 involve multi-speaker scenarios and adopt Time-Stamped Speaker-Attributed ASR (TSA-ASR). The evaluation metric is time-constrained minimum permutation WER (tcpWER).

SLU Evaluation: All tracks evaluate semantic understanding using objective Question Answering (QA) accuracy based on multiple-choice questions constructed from the dialogue context.

Registration Guidelines

Follow the steps below to complete your registration for SmartGlasses

Step 1: Registration

Teams wishing to participate in the challenge should register via the provided Registration Form (TBD) . Please submit the following details for each participant:

Team name
Team member's name
Organization
Email address

Step 2: Dataset Access

After successful registration, teams will be provided with access to the 200-hour SmartGlasses dataset, including training and validation sets.

Contact Information

If you have any questions, please contact the organizers.

Submission Guidelines

Each team should submit their predictions for the evaluation tasks (Recognition & Understanding).
Track 1 requires WER for recognition and QA Accuracy for understanding.
Track 2 & 3 require tcpWER for recognition and QA Accuracy for understanding.

Challenge Timeline

The tentative timeline for running the challenge

The tentative timeline for running the challenge is as follows (TBD):

[Date TBD] Challenge begins. Release of training and validation data.
[Date TBD] Release of testing data.
[Date TBD] Result submission deadline.
[Date TBD] Release of challenge results and rankings.

Leaderboard

Rankings will be available once the challenge results are released.

Leaderboard Coming Soon...

License

The SmartGlasses dataset is available for academic research. The following conditions require your compliance:

References to the SmartGlasses dataset need to be included in any work using the dataset.
For the baseline research paper, please cite the paper listed on our website.
You may not use the SmartGlasses dataset or any derivative works for other purposes.

All rights not expressly granted to you are reserved by the organizers of this challenge.

Organizers

Lei Xie

Northwestern Polytechnical University, China

Longshuai Xiao

Huawei, China

Zhaohong Ni

Meta, USA

Xie Chen

Shanghai Jiao Tong University, China

Jun Du

USTC, China

Eng‑Siong Chng

Nanyang Technological University, Singapore

Jun Zhou

Rokid, China

Dehui Gao

Northwestern Polytechnical University, China

Zhaokai Sun

Northwestern Polytechnical University, China

Zhixian Zhao

Northwestern Polytechnical University, China

Runduo Han

Northwestern Polytechnical University, China

Yujie Liao

Northwestern Polytechnical University, China

SLT2026

SmartGlasses Challenge: Egocentric Speech Interaction on AI Glasses

Benchmarking Egocentric Speech Interaction for Next-Generation AI Glasses in Real-World Environments

Challenge Call

Latest News

Track Overview

Track 1: Human–Machine Command Dialogue

Track 2: Dyadic Dialogue Understanding

Track 3: Multi-Party Meeting Understanding

Data Description

Evaluation Metric

Registration Guidelines

Step 1: Registration

Step 2: Dataset Access

Contact Information

Submission Guidelines

Challenge Timeline

Leaderboard

Leaderboard Coming Soon...

License

Organizers

Lei Xie

Longshuai Xiao

Zhaohong Ni

Xie Chen

Jun Du

Eng‑Siong Chng

Jun Zhou

Dehui Gao

Zhaokai Sun

Zhixian Zhao

Runduo Han

Yujie Liao