Junyuan Liang

h-index17
2papers

2 Papers

CVApr 1
Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning

Junyuan Liang, Qi Zhou, Sahan Bulathwela et al.

Previous studies have illustrated the potential of analysing gaze behaviours in collaborative learning to provide educationally meaningful information for students to reflect on their learning. Over the past decades, machine learning approaches have been developed to automatically detect gaze behaviours from video data. Yet, since these approaches often require large amounts of labelled data for training, human annotation remains necessary. Additionally, researchers have questioned the cross-configuration robustness of machine learning models developed, as training datasets often fail to encompass the full range of situations encountered in educational contexts. To address these challenges, this study proposes a scalable artificial intelligence approach that leverages pretrained and foundation models to automatically detect gaze behaviours in face-to-face collaborative learning contexts without requiring human-annotated data. The approach utilises pretrained YOLO11 for person tracking, YOLOE-26 with text-prompt capability for education-related object detection, and the Gaze-LLE model for gaze target prediction. The results indicate that the proposed approach achieves an F1-score of 0.829 in detecting students' gaze behaviours from video data, with strong performance for laptop-directed gaze and peer-directed gaze, yet weaker performance for other gaze targets. Furthermore, when compared to other supervised machine learning approaches, the proposed method demonstrates superior and more stable performance in complex contexts, highlighting its better cross-configuration robustness. The implications of this approach for supporting students' collaborative learning in real-world environments are also discussed.

CLJul 10, 2025
Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing

Junyi Wen, Junyuan Liang, Zicong Hong et al.

Efficient state restoration in multi-turn conversations with large language models (LLMs) remains a critical challenge, primarily due to the overhead of recomputing or loading full key-value (KV) caches for all historical tokens. To address this, existing approaches compress KV caches across adjacent layers with highly similar attention patterns. However, these methods often apply a fixed compression scheme across all conversations, selecting the same layer pairs for compression without considering conversation-specific attention dynamics. This static strategy overlooks variability in attention pattern similarity across different conversations, which can lead to noticeable accuracy degradation. We present Krul, a multi-turn LLM inference system that enables accurate and efficient KV cache restoration. Krul dynamically selects compression strategies based on attention similarity across layer pairs and uses a recomputation-loading pipeline to restore the KV cache. It introduces three key innovations: 1) a preemptive compression strategy selector to preserve critical context for future conversation turns and selects a customized strategy for the conversation; 2) a token-wise heterogeneous attention similarity estimator to mitigate the attention similarity computation and storage overhead during model generation; 3) a bubble-free restoration scheduler to reduce potential bubbles brought by the imbalance of recomputing and loading stream due to compressed KV caches. Empirical evaluations on real-world tasks demonstrate that Krul achieves a 1.5x-2.68x reduction in time-to-first-token (TTFT) and a 1.33x-2.35x reduction in KV cache storage compared to state-of-the-art methods without compromising generation quality.