SmartGlasses 2026

Challenge Call

Driven by the rapid advancement of Large Language Models (LLMs) and Multimodal LLMs, AI-powered smart glasses are emerging as a next-generation platform for human-computer interaction. Equipped with microphone arrays and cameras, smart glasses naturally capture the wearer’s egocentric (first-person) perspective, enabling hands-free multimodal communication throughout daily life.

However, deploying robust speech-centric interaction systems on smart glasses introduces distinct challenges compared with traditional stationary devices such as smart speakers or handheld devices such as smartphones. Smart glasses operate in highly dynamic acoustic environments, including environmental noise, user-generated motion noise, and speech from surrounding people.

To address these challenges, the SmartGlasses Challenge introduces a new benchmark for evaluating Time-Stamped Speaker-Attributed ASR (TSA-ASR) and Spoken Language Understanding (SLU) in real-world egocentric interaction scenarios, including dyadic conversation, and multi-party meetings.

Latest News

Here are the latest updates and news from the SmartGlasses Challenge organizers:

2026-06-19(News): The Leaderboard is Now Officially Open! The official evaluation leaderboard will remain open for a total of 8 days. Please note that each track and task features its own independent leaderboard and prize pool. Participants can now proceed to the Leaderboard page to access the submission links and review the detailed evaluation rules.
2026-06-19: Update on Task 2 (SLU) Test Set Data: We have successfully updated the Spoken Language Understanding (SLU) test set by completing the missing Question IDs for all QA pairs. All participating teams are required to pull the latest version of the dataset immediately to ensure that our scoring system can correctly parse and evaluate your submission format.
2026-06-13: The test data has been officially released!
2026-06-10: Reference Results of Task 1 have been updated!
2026-05-27: Challenge rules have been updated. The Official Evaluation Toolkit is now publicly available!
2026-05-26: The training and validation data (Part 2) have been released! All training and validation data are now available to all registered teams.
2026-05-15: The training data and validation data (Part 1) have now been released! All participating teams, please check the email address used for challenge registration to obtain the data download link. The rest of the training data and validation data will be released in the next few days.
2026-05-07: The release of the training set and validation set is rescheduled to May 14th.
2026-04-15: Registration Opens

Track Overview

Track 1: Dyadic Dialogue Understanding

Scenario: Face-to-face two-person conversations in everyday settings, involving overlapping speech, background interference, topic shifts, and complex semantic structures.

TSA-ASR Task: Evaluate speaker-attributed transcription with time alignment in overlapping speech scenarios. The metric is time-constrained minimum permutation WER (tcpWER).

SLU Task: Evaluate the system’s ability to capture factual details, track logical flow, and understand relationships between speakers within dyadic dialogues.

Track 2: Multi-Party Meeting Understanding

Scenario: Multi-speaker meetings with varying numbers of participants, frequent turn-taking, long conversational contexts, and domain-specific vocabulary.

TSA-ASR Task: Evaluate multi-speaker speech recognition with speaker diarization and temporal alignment in highly overlapped environments. The metric is tcpWER.

SLU Task: Evaluate the system’s ability to understand complex meeting discussions, extract key information, and summarize speaker-wise viewpoints from long-form conversations.

Data Description

The organizing committee has provided independent download links for each track. After downloading and extracting the files, taking Track 1 as an example, the file structure of the dataset's root directory (SmartGlasses-Track1) is as follows (the structure for Track 2 is identical):

SmartGlasses-Track1/
├── Train/                 # Training set
│   └── Part1/
│       ├── audio/         # Audio files (.wav)
│       └── textgrid/      # Text and timestamp annotation files (.TextGrid)
├── Dev/                   # Development (Validation) set
│   └── Part1/
│       ├── audio/
│       ├── textgrid/
│       └── QA/            # QA annotation files (.json) - Provided only in the Dev set
└── data.jsonl             # Global data index and metadata file

Detailed folder descriptions

audio/: Contains the four-channel dialogue audio files (.wav format) for the respective track.
textgrid/: Contains the .TextGrid annotation files corresponding to the audio. These files include speaker-level timestamp boundaries and their corresponding text transcriptions.
QA/: (Provided only in the Dev set.) This folder contains the .json files used for the objective multiple-choice evaluation. Each JSON file corresponds to an audio segment and includes the question, options, and ground-truth answer required for the evaluation.
Special note on QA data: The multiple-choice QA data in the Dev set is automatically generated by large language models (LLMs) and has not undergone strict human verification; therefore, it may contain minor flaws or noise. We are fully open-sourcing this data and its answers to serve as reference examples, helping participating teams build their pipelines and debug their models. For the final hidden test set, we will use a similar approach to construct complex speech understanding and reasoning multiple-choice questions. However, all test questions and ground-truth answers will undergo strict human review and refinement to ensure absolute fairness and scientific rigor in the final evaluation.

Global index file: data.jsonl

A data.jsonl file is provided in the root directory of each track. This file serves as the global index for the entire dataset, where each line represents the metadata for a single data sample.

Note: The additionally collected data will be released later as Part 2. Part 2 is merely a chronological update in the release schedule; its data format and folder structure will be exactly identical to the current Part 1.

Dataset statistics

Below is the statistical overview of all released data (Part 1, Part 2, and Test):

Track 1: Dyadic Dialogue Understanding

Split	Sessions	Total Duration (hrs)	Avg. Duration (sec)
Train (Part 1)	332	29.12	315.71
Train (Part 2)	55	4.81	314.53
Dev (Part 1)	66	5.59	304.82
Dev (Part 2)	15	1.27	305.01
Test	50	4.16	299.80
Total	518	44.95	312.39

Track 2: Multi-party Meeting Understanding

Split	Sessions	Total Duration (hrs)	Avg. Duration (sec)
Train (Part 1)	105	34.13	1170.09
Train (Part 2)	25	7.70	1108.58
Dev (Part 1)	21	7.06	1210.47
Dev (Part 2)	15	4.45	1068.29
Test	30	8.69	1042.21
Total	196	62.03	1139.33

Smart Glasses Microphone Array Layout

Channel-to-Microphone Mapping

The 4 channels of the audio files are mapped to the physical micro-electro-mechanical systems (MEMS) microphone array integrated onto the smart glasses frames as follows:

Channel 1 (mic1): Right temple, rear position
Channel 2 (mic2): Right temple, front position
Channel 3 (mic3): Left temple, front position
Channel 4 (mic4): Left temple, rear position

Physical Array Geometry

The spatial coordinates and geometric constraints of the acoustic centers of the four microphones are specified below:

Horizontal Projection Displacements:

Intra-temple separation (Right): The axial distance between mic1 and mic2 is 47 mm.
Intra-temple separation (Left): The axial distance between mic3 and mic4 is 50 mm.
Inter-temple span (Front): The cross-lateral distance between mic2 and mic3 is 145 mm.
Inter-temple span (Rear): The cross-lateral distance between mic1 and mic4 is 146 mm.

Vertical & Lateral Offsets:

Compared to the right-rear microphone (mic1), the right-front microphone (mic2) features a positive vertical elevation of 10 mm and an outward lateral offset of 1 mm along the frame thickness direction.
The acoustic centers of mic1, mic3, and mic4 reside on the same horizontal reference plane (zero vertical offset).
The left and right temples are orthogonal to the plane of the lenses.
The baseline connecting mic1 and mic4 is strictly parallel to the plane of the lenses.

Evaluation & Baselines

To assist participating teams in accurately evaluating their models' performance locally, the organizing committee has initially released the official evaluation guidelines and scoring scripts. In the subsequent phases, we will continuously update the repository with the evaluation performances of more baseline systems (including baseline models trained on a combination of open-source datasets and the official SmartGlasses scenario data).

Official Evaluation & Baseline Repository (GitHub): SmartGlasses Challenge - Scoring & Baseline. This toolkit standardizes the evaluation metrics, time tolerance collars, and input formats, ensuring that local evaluations on the Dev Set perfectly align with the final Test Set official scoring standards.

Task 1: TSA-ASR (Time-Stamped Speaker-Attributed ASR)

Primary Ranking Metric: tcpWER. A strict time tolerance collar of 5 seconds is applied during the official evaluation.
Diagnostic Metrics: cpWER and DER are additionally reported to help teams pinpoint specific sources of error within their systems.
Rules & Format: The official ASR evaluation pipeline is built upon the open-source framework MeetEval. Model outputs must strictly adhere to the STM (Segmental Time Mark) format. For detailed scoring rules and Chinese tokenization guidelines, please refer to our GitHub page.

Task 2: SLU (Spoken Language Understanding / Audio QA)

Metric: Accuracy, defined as the percentage of correctly answered multiple-choice questions.
Current Status: For now, teams can evaluate their models locally by calculating the exact match accuracy against the Dev Set references.

Registration Guidelines

Step 1: Registration

Please complete your registration by filling out a registration form.

Preferred: Google Registration Form
Alternative (Mainland China): If you cannot access Google Forms, or if repeated submissions are unsuccessful, please use the Tencent Registration Form.

After you submit the registration form, the organizing committee will send a confirmation email within 1 business day. Please check your inbox in time; if you do not receive it, please check your spam folder first, or contact us via the email addresses below.

Step 2: Dataset Access

After successful registration and agreeing to the challenge rules, participating teams will be granted access to the SmartGlasses dataset, and the download link will be sent via email.

Contact Information

If you have any questions, please contact the organizers:

WeChat Group QR Code

If the QR code is expired, please email us to request the latest QR code.

The organizers of the challenge reserve the right to interpret, modify, and amend the participation terms and challenge rules.

Challenge Timeline

The tentative timeline for running the challenge is as follows:

2026-04-15: Registration Opens
2026-05-07: Release of Training Set, Validation Set
2026-06-01: Registration Closes
2026-06-12[Updated!]: Release of Test Set
2026-06-26[Updated!]: Results Submission Deadline
2026-07-03: System Description Submission Deadline
2026-07-08: SLT Official Paper Submission Deadline
2026-09-01: Paper Notification

LEADERBOARD

The official leaderboards are now open for a total of 8 days. For this challenge, each task (TSA-ASR and SLU) under Track 1 and Track 2 is completely independent. The final rankings for each task will be calculated independently, and challenge prizes will be distributed separately. Please double-check and submit your results via the dedicated portals for the corresponding tasks. For submission procedures, packaging rules, and format examples, please refer to the detailed guidelines on each respective Leaderboard page.

📍 Submission Portals

Track 1: Dyadic Dialogue Understanding

Track 2: Multi-Party Meeting Understanding

⚠️ Important Evaluation Instructions for Task 2 (SLU)

To ensure fairness and scientific rigor, Task 2 adopts a two-phase evaluation mechanism and requires the latest data format:

Data Update Requirement: You must use the latest test set data, which now includes the complete Question IDs, for inference and result formatting. [Download Link]
Phase 1 (First 6 Days): Evaluation is based on the current question bank containing complete Question IDs. Leaderboard scores during this phase serve only as a periodic reference and will not be counted toward the final ranking.
Phase 2 (Last 2 Days): The organizing committee will release the final batch of brand-new QA questions, which will be merged with the Phase 1 data to form the complete question bank. At that time, the supplemental QA data and the specific submission links for Phase 2 will be available directly on the Phase 1 Leaderboard pages linked above. This website will also be updated accordingly. The test scores from this phase (i.e., on the complete question bank) will serve as the sole criterion for the final ranking of the SLU task.

Requirements for Final Submission

To ensure the fairness, academic rigor, and integrity of this challenge, all teams intending to be eligible for the final ranking and awards are required to submit their system description paper and corresponding reproducibility materials in accordance with the schedule and requirements specified below.

1. System Description Paper Submission (Required for All Teams)

Submission Deadline

July 3, 2026 (AOE Time)

Submission Method

Please submit your paper in PDF format to our official emails: gdh@mail.nwpu.edu.cn.

Content Requirements

Each team must submit a comprehensive system description paper, elaborating on the datasets used, model architecture, data processing strategies, training details, and core technical optimizations of your solution.

Format & Review Policy

The paper follows a strict 3+1 page format (up to 3 pages for the main content and 1 page for references) and adopts a single-blind review process. Please prepare your manuscript using the official IEEE SLT 2026 author templates.
Official SLT 2026 templates: https://attend.ieee.org/slt-2026/authors-instructions/

Multi-Task Submission Instructions

If your team participates in multiple tasks, you may either integrate all solutions into a single consolidated system description paper or submit independent papers for each task. Please note that if separate papers are submitted, each manuscript must present distinct and differentiated content (e.g., focusing on task-specific technical strategies and innovations). Highly redundant or substantially duplicate content across multiple submissions is strictly prohibited.

Recommendation for SLT Official Conference Papers

The organizing committee will conduct a comprehensive assessment of all submitted system description papers. Beyond the top-ranked teams, solutions that demonstrate notable innovations in model architecture, multimodal processing, or data augmentation will be highly recommended. These teams will be invited to submit their work as a Challenge paper to IEEE SLT 2026.
SLT 2026 official paper submission deadline: July 8, 2026 (AOE Time)

2. Docker Reproducibility Submission (Required for Top 5 Teams per Task)

Submission Deadline

July 1, 2026 (AOE Time)

Submission Method

Please send a publicly accessible Docker image link (hosting on DockerHub is highly recommended) to our official emails: gdh@mail.nwpu.edu.cn to support full inference reproducibility.

Docker Image Specifications

Environment Consistency: The image must contain all necessary runtime environments, software libraries, model weights, and dependencies. It must run independently and seamlessly in the official evaluation environment without requiring any additional network downloads or manual configurations.
One-Click Inference: A standard automated execution script must be pre-configured in the image. The script must automatically execute the full pipeline from raw audio input to model inference and generate final output files strictly conforming to the officially required format.
Usage Documentation: A clear README.md file must be included, detailing how to pull the image, mount the local data directory, and run the one-click inference command, ensuring the organizing committee can reproduce your results smoothly.

3. Official Evaluation & Compliance Statement

Upon receiving the Docker images from the Top 5 teams, the organizing committee will conduct an independent replication and evaluation on the internal hidden test set.

If any non-compliant behaviors are detected during official replication (including but not limited to violating model parameter constraints, unauthorized use of closed-source data, manual intervention during inference, etc.), or if there is an unreasonably large discrepancy between the official reproduced score on the hidden set and the team's leaderboard result (indicating potential academic misconduct), the organizing committee will launch an internal review process. Following deliberation, the organizing committee reserves the absolute right to nullify the results of non-compliant teams, adjust the final rankings, and revoke their challenge and award eligibility.

All final rankings and awards are subject to the official results published on the challenge website.

If you have any questions regarding the submission requirements, please contact us via our official emails (gdh@mail.nwpu.edu.cn) at any time.

FAQ

1. Data

Q: Why does the current download only include Part 1? Will Part 2 have a different format?

A: We adopted a staged release strategy to allow participating teams to acquire data and set up baseline pipelines as early as possible. Part 1 contains the complete directory structure and sufficient data to run through the entire process. Part 2 will be released gradually within a week. Part 2 simply represents an expansion in data volume; its file format, four-channel audio properties, and directory structure will be exactly identical to Part 1. When released, you will only need to append the new data to the corresponding folders.

Q: I noticed that the multiple-choice QA in the Dev set occasionally contains logical flaws or noise. Will the Test set be the same?

A: The QA pairs in the Dev set were automatically generated by Large Language Models (LLMs) without strict human verification, and are provided merely as "reference examples" for teams to debug their pipelines. For the hidden Test set used for final leaderboard ranking, all questions and Ground Truth answers will undergo rigorous double human verification and refinement.

Q: The audio is four-channel. Do I have to use all the channels?

A: We provide complete four-channel audio to preserve the most authentic acoustic spatial information. Participating teams are free to decide whether to utilize the multi-channel information for beamforming / front-end signal processing, or simply extract a single channel for model training. This depends entirely on your algorithm design.

Q: If I only participate in a single track, can I use the data from the other track for training?

A: Yes, this is fully permitted. You are welcome to use the official datasets across tracks (e.g., using Track 2 data to assist in training a model for Track 1) to augment your training data and improve the model's generalizability.

2. Models & Rules

Q: Does the challenge require using the same model/system to simultaneously complete both TSA-ASR and SLU tasks? Is it allowed to design separate modules?

A: We allow separate, independent systems for the two tasks. You may decouple them — for example, training a dedicated recognition system specifically for the TSA-ASR task, and training another independent model specifically for the SLU task. Nevertheless, we strongly encourage teams to explore unified architectures capable of handling both tasks simultaneously.

Q: Do I have to use the exact same model to participate in both Track 1 and Track 2?

A: This is not mandatory. You can design one specific model for Track 1 and a different model for Track 2. However, we encourage teams with sufficient resources to explore foundational Omni-modal/Audio-Language Models capable of handling both tracks simultaneously.

Q: Can we use external data and pre-trained foundation models?

A: Yes, this is allowed. You may use open-source pre-trained foundation models (e.g., Whisper, LLaMA, Qwen) and external, open-source datasets (e.g., LibriSpeech). However, the use of any private data is strictly prohibited. All external resources and data augmentation methods used must be explicitly disclosed in the final System Description Paper.

Data License

By downloading and using the SmartGlasses Challenge dataset, participating teams agree to and commit to strictly abiding by the following terms:

1.1 Usage Restrictions

The authorization of this dataset is strictly limited to non-commercial academic research.
The dataset may only be used for participating in the "IEEE SLT 2026 SmartGlasses Challenge" and subsequent related academic research after the challenge concludes. It is strictly prohibited to use this dataset or any of its derivative versions for any commercial purposes, product development, or profitable services.

1.2 Distribution & Confidentiality

This dataset is accessible only to officially registered teams. Participating teams must not publish, leak, transfer, or distribute the dataset (including audio files, text annotations, and any subsets) to unregistered third-party individuals or organizations.

1.3 Mandatory Citation

Any research outcomes generated using this dataset (including but not limited to academic papers, technical reports, public presentations, or open-source projects) must comply with the following citation guidelines:

During the challenge and before the official paper publication: The official website of the SmartGlasses Challenge (https://aslp-lab.github.io/SmartGlasses) must be explicitly cited in the acknowledgments or references.
After the official paper publication: Once the organizing committee officially publishes the Overview Paper or baseline papers for the SmartGlasses Challenge, all subsequent works utilizing this dataset must mandatorily cite the official paper.

1.4 Rights & Disclaimer

All intellectual property rights of the dataset belong to the SmartGlasses Challenge Organizing Committee and its affiliated institutions.
The dataset is provided "As is". The organizing committee makes no warranties regarding the dataset's suitability for any specific scenarios and shall not be held liable for any direct or indirect damages arising from the use of this data.

Organizers

Lei Xie

Northwestern Polytechnical University, China

Longshuai Xiao

Huawei, China

Xie Chen

Shanghai Jiao Tong University, China

Jun Du

USTC, China

Shuai Wang

Nanjing University

Liumeng Xue

Nanjing University

Eng‑Siong Chng

Nanyang Technological University, Singapore

Zhonghua Fu

Northwestern Polytechnical University, China

Jun Zhou

Rokid, China

Xin Xu

AISHELL, China

Hui Bu

AISHELL, China

Zhixian Zhao

Northwestern Polytechnical University, China

Dehui Gao

Northwestern Polytechnical University, China

Yike Zhu

Northwestern Polytechnical University, China

Yuhang Dai

Northwestern Polytechnical University, China

Zhennan Lin

Northwestern Polytechnical University, China

Yujie Liao

Northwestern Polytechnical University, China

SLT2026

SmartGlasses Challenge: Egocentric Speech Interaction on AI Glasses

Benchmarking Egocentric Speech Interaction for Next-Generation AI Glasses in Real-World Environments

Challenge Call

Latest News

Track Overview

Track 1: Dyadic Dialogue Understanding

Track 2: Multi-Party Meeting Understanding

Data Description

Detailed folder descriptions

Global index file: data.jsonl

Dataset statistics

Track 1: Dyadic Dialogue Understanding

Track 2: Multi-party Meeting Understanding

Smart Glasses Microphone Array Layout

Channel-to-Microphone Mapping

Physical Array Geometry

Evaluation & Baselines

Task 1: TSA-ASR (Time-Stamped Speaker-Attributed ASR)

Task 2: SLU (Spoken Language Understanding / Audio QA)

Registration Guidelines

Step 1: Registration

Step 2: Dataset Access

Contact Information

Challenge Timeline

LEADERBOARD

📍 Submission Portals

⚠️ Important Evaluation Instructions for Task 2 (SLU)

Requirements for Final Submission

1. System Description Paper Submission (Required for All Teams)

2. Docker Reproducibility Submission (Required for Top 5 Teams per Task)

3. Official Evaluation & Compliance Statement

FAQ

1. Data

2. Models & Rules

Data License

1.1 Usage Restrictions

1.2 Distribution & Confidentiality

1.3 Mandatory Citation

1.4 Rights & Disclaimer

Organizers

Lei Xie

Longshuai Xiao

Xie Chen

Jun Du

Shuai Wang

Liumeng Xue

Eng‑Siong Chng

Zhonghua Fu

Jun Zhou

Xin Xu

Hui Bu

Zhixian Zhao

Dehui Gao

Yike Zhu

Yuhang Dai

Zhennan Lin

Yujie Liao