85SDFeb 24, 2025Code
AAD-LLM: Neural Attention-Driven Auditory Scene UnderstandingXilin Jiang, Sukru Samet Dindar, Vishal Choudhari et al.
This work addresses the limitation of auditory AI in aligning with human perception for applications like hearing aids or communication systems, representing a novel paradigm rather than an incremental improvement.
83CLFeb 17, 2025Code
Step-Audio: Unified Understanding and Generation in Intelligent Speech InteractionAilin Huang, Boyong Wu, Bruce Wang et al.
This addresses high costs, weak dynamic control, and limited intelligence in speech interaction models for developers and researchers, representing a significant advancement rather than an incremental improvement.
81CLNov 16, 2024
Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New ParadigmSari Masri, Huthaifa I. Ashqar, Mohammed Elhenawy
This proposes a new paradigm for traffic management systems that could improve efficiency at intersections for drivers and autonomous vehicles.
79AIFeb 15, 2025Code
USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot InteractionsHamed Rahimi, Adil Bahaj, Mouad Abrini et al.
This work addresses the problem of personalized human-robot interactions for diverse users, providing a significant advancement in social robotics.
78CLNov 5, 2025Code
Step-Audio-EditX Technical ReportChao Yan, Boyong Wu, Peng Yang et al.
This addresses the need for advanced audio editing tools for content creators and researchers, offering a novel approach that is not incremental.
78CVAug 30, 2025
Visually Grounded Narratives: Reducing Cognitive Burden in Researcher-Participant InteractionRuntong Wu, Jiayao Song, Fei Teng et al.
This addresses the dual burden of data analysis and member checking for researchers and participants in narrative inquiry, representing a first attempt in the field.
78LGMay 20, 2025
Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal LearningJiangrong Shen, Yulin Xie, Qi Xu et al.
This work addresses critical challenges in energy-efficient multimodal sensory processing for neuromorphic systems, establishing a new paradigm rather than being incremental.
78HCFeb 12, 2025Code
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-SolvingSteven-Shine Chen, Jimin Lee, Paul Pu Liang
This work addresses the need for more effective and engaging educational technologies, particularly for students struggling with complex math concepts.
77CLDec 16, 2024Code
LLMs Can Simulate Standardized Patients via Agent CoevolutionZhuoyun Du, Lujie Zheng, Renjun Hu et al.
This addresses the problem of scalable and effective medical training for healthcare professionals, representing a novel application of agent coevolution rather than an incremental improvement.
77AIMay 20, 2025Code
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory PerceptionsBufang Yang, Lilin Xu, Liekang Zeng et al.
This work addresses the need for more effective proactive AI assistants in daily scenarios, representing a novel approach rather than an incremental improvement.
77LGJul 28, 2025Code
Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal CommunicationsXinye Cao, Hongcan Guo, Guoshun Nan et al.
This addresses efficiency and flexibility challenges for resource-constrained mobile environments in interactive multimodal applications like route planning.
77CLMar 6Code
Learning Next Action Predictors from Human-Computer InteractionOmar Shaikh, Valentin Teutschbein, Kanishk Gandhi et al.
This work addresses the problem of anticipating user needs for proactive AI systems by predicting their next computer interaction, which is significant for developers of AI assistants.
76HCJul 8, 2025Code
SSSUMO: Real-Time Semi-Supervised Submovement DecompositionEvgenii Rudakov, Jonathan Shock, Otto Lappi et al.
This addresses challenges in human-computer interaction, rehabilitation medicine, and motor control studies by providing a fast and accurate method for analyzing human movements.
76CLFeb 17, 2025Code
A-MEM: Agentic Memory for LLM AgentsWujiang Xu, Zujie Liang, Kai Mei et al.
This addresses the need for more adaptive and context-aware memory management in LLM agents, representing a novel method rather than an incremental improvement.
76HCApr 21, 2025Code
NeuGaze: Reshaping the future BCIYiqian Yang
This provides a low-cost, accessible alternative to BCIs for motor-impaired users, enabling intuitive human-computer interaction in applications like assistive technology and entertainment.
75AIMar 11, 2025Code
AI-native Memory 2.0: Second MeJiale Wei, Xiang Ying, Tao Gao et al.
This addresses the inefficiency of repeated data input for users interacting with various digital platforms, representing a novel approach rather than an incremental improvement.
75CVOct 24, 2025Code
Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband RangingYing Xue, Jiaxi Jiang, Rayan Armani et al.
This addresses the challenge of multi-person motion capture in unconstrained environments for applications like virtual reality or sports analysis, representing a novel integration rather than an incremental improvement.
75AIOct 10, 2025Code
GTAlign: Game-Theoretic Alignment of LLM Assistants for Social WelfareSiqi Zhu, David Zhang, Pedro Cisneros-Velarde et al.
This addresses the issue of misaligned LLM behavior for users in practical applications, offering a novel approach to enhance cooperative outcomes.
75AIJan 21, 2025
UI-TARS: Pioneering Automated GUI Interaction with Native AgentsYujia Qin, Yining Ye, Junjie Fang et al.
This addresses the challenge of automating GUI tasks for users and developers, offering a novel approach that reduces reliance on heavily wrapped commercial models and expert-crafted workflows.
75LGMay 21, 2025Code
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual DecodingYuxiang Wei, Yanteng Zhang, Xi Xiao et al.
This work addresses the need for interpretable and generalizable brain-computer interfaces for neuroscience and medical applications, representing a novel method rather than an incremental improvement.