LGFeb 17Code
GLM-5: from Vibe Coding to Agentic EngineeringGLM-5 Team, Aohan Zeng, Xin Lv et al. · tsinghua
We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. To advance model alignment and autonomy, we implement a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Furthermore, we propose novel asynchronous agent RL algorithms that further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. Through these innovations, GLM-5 achieves state-of-the-art performance on major open benchmarks. Most critically, GLM-5 demonstrates unprecedented capability in real-world coding tasks, surpassing previous baselines in handling end-to-end software engineering challenges. Code, models, and more information are available at https://github.com/zai-org/GLM-5.
LGSep 27, 2024Code
CLLMate: A Multimodal Benchmark for Weather and Climate Events ForecastingHaobo Li, Zhaowei Wang, Jiachen Wang et al.
Forecasting weather and climate events is crucial for making appropriate measures to mitigate environmental hazards and minimize losses. However, existing environmental forecasting research focuses narrowly on predicting numerical meteorological variables (e.g., temperature), neglecting the translation of these variables into actionable textual narratives of events and their consequences. To bridge this gap, we proposed Weather and Climate Event Forecasting (WCEF), a new task that leverages numerical meteorological raster data and textual event data to predict weather and climate events. This task is challenging to accomplish due to difficulties in aligning multimodal data and the lack of supervised datasets. To address these challenges, we present CLLMate, the first multimodal dataset for WCEF, using 26,156 environmental news articles aligned with ERA5 reanalysis data. We systematically benchmark 23 existing MLLMs on CLLMate, including closed-source, open-source, and our fine-tuned models. Our experiments reveal the advantages and limitations of existing MLLMs and the value of CLLMate for the training and benchmarking of the WCEF task.
HCJul 17, 2024
StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT InteractionsZixin Chen, Jiachen Wang, Meng Xia et al.
The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this endeavor is challenging due to the absence of datasets focused on student-ChatGPT conversations and the complexities in identifying and analyzing the evolutional interaction patterns within conversations. To address these challenges, we collected conversational data from 48 students interacting with ChatGPT in a master's level data visualization course over one semester. We then developed a coding scheme, grounded in the literature on cognitive levels and thematic analysis, to categorize students' interaction patterns with ChatGPT. Furthermore, we present a visual analytics system, StuGPTViz, that tracks and compares temporal patterns in student prompts and the quality of ChatGPT's responses at multiple scales, revealing significant pedagogical insights for instructors. We validated the system's effectiveness through expert interviews with six data visualization instructors and three case studies. The results confirmed StuGPTViz's capacity to enhance educators' insights into the pedagogical value of ChatGPT. We also discussed the potential research opportunities of applying visual analytics in education and developing AI-driven personalized learning solutions.
LGJun 23, 2025Code
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningLixin Wu, Na Cai, Qiao Cheng et al.
We introduce Confucius3-Math, an open-source large language model with 14B parameters that (1) runs efficiently on a single consumer-grade GPU; (2) achieves SOTA performances on a range of mathematical reasoning tasks, outperforming many models with significantly larger sizes. In particular, as part of our mission to enhancing education and knowledge dissemination with AI, Confucius3-Math is specifically committed to mathematics learning for Chinese K-12 students and educators. Built via post-training with large-scale reinforcement learning (RL), Confucius3-Math aligns with national curriculum and excels at solving main-stream Chinese K-12 mathematical problems with low cost. In this report we share our development recipe, the challenges we encounter and the techniques we develop to overcome them. In particular, we introduce three technical innovations: Targeted Entropy Regularization, Recent Sample Recovery and Policy-Specific Hardness Weighting. These innovations encompass a new entropy regularization, a novel data scheduling policy, and an improved group-relative advantage estimator. Collectively, they significantly stabilize the RL training, improve data efficiency, and boost performance. Our work demonstrates the feasibility of building strong reasoning models in a particular domain at low cost. We open-source our model and code at https://github.com/netease-youdao/Confucius3-Math.
IVJun 23, 2019Code
Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological OperationsYuankai Huo, James G. Terry, Jiachen Wang et al.
Manually tracing regions of interest (ROIs) within the liver is the de facto standard method for measuring liver attenuation on computed tomography (CT) in diagnosing nonalcoholic fatty liver disease (NAFLD). However, manual tracing is resource intensive. To address these limitations and to expand the availability of a quantitative CT measure of hepatic steatosis, we propose the automatic liver attenuation ROI-based measurement (ALARM) method for automated liver attenuation estimation. The ALARM method consists of two major stages: (1) deep convolutional neural network (DCNN)-based liver segmentation and (2) automated ROI extraction. First, liver segmentation was achieved using our previously developed SS-Net. Then, a single central ROI (center-ROI) and three circles ROI (periphery-ROI) were computed based on liver segmentation and morphological operations. The ALARM method is available as an open source Docker container (https://github.com/MASILab/ALARM).246 subjects with 738 abdomen CT scans from the African American-Diabetes Heart Study (AA-DHS) were used for external validation (testing), independent from the training and validation cohort (100 clinically acquired CT abdominal scans).
95.7OHApr 15
RFID-based Real-Time Geriatric Gait Speed Monitoring System: Design, Implementation and Clinical EvaluationNatong Lin, Jiachen Wang, Lisa C. Barry et al.
Gait speed is a widely used indicator of functional health and mobility decline, yet in clinical practice it is commonly measured manually using a stopwatch, which limits scalability and measurement frequency. Privacy-preserving and maintenance-free sensing approaches can enable more routine and less burdensome assessments in real-world care settings. This paper presents the design, implementation, and real-world deployment of a fully passive, battery-free gait-speed monitoring system based on ultra-high-frequency (UHF) RFID. Compared with camera- and wearable-based approaches, the proposed system preserves patient privacy by avoiding video capture and biometric data, while eliminating battery maintenance. The system employs a dual-antenna configuration and an edge-based peak-detection algorithm to estimate gait speed in real time from received signal strength indicator (RSSI) streams. By leveraging antenna-beam symmetry and asymmetric signal processing, the method improves robustness to noise, plateau regions, and multiple local maxima. We evaluate the system during routine outpatient care across three clinical sites using 966 trials, achieving an 87.7% measurement success rate. Compared with concurrent stopwatch timing, the system attains a mean absolute error of 0.064 $m/s$, demonstrating reliable operation with accuracy suitable for clinical gait-speed assessment.
69.7SYApr 7
Physics-Informed Neural Optimal Control for Precision Immobilization Technique in Emergency ScenariosYangye Jiang, Jiachen Wang, Daofei Li
Precision Immobilization Technique (PIT) is a potentially effective intervention maneuver for emergency out-of-control vehicle, but its automation is challenged by highly nonlinear collision dynamics, strict safety constraints, and real-time computation requirements. This work presents a PIT-oriented neural optimal-control framework built around PicoPINN (Planning-Informed Compact Physics-Informed Neural Network), a compact physics-informed surrogate obtained through knowledge distillation, hierarchical parameter clustering, and relation-matrix-based parameter reconstruction. A hierarchical neural-OCP (Optimal Control Problem) architecture is then developed, in which an upper virtual decision layer generates PIT decision packages under scenario constraints and a lower coupled-MPC (Model Predictive Control) layer executes interaction-aware control. To evaluate the framework, we construct a PIT Scenario Dataset and conduct surrogate-model comparison, planning-structure ablation, and multi-fidelity assessment from simulation to scaled by-wire vehicle tests. In simulation, adding the upper planning layer improves PIT success rate from 63.8% to 76.7%, and PicoPINN reduces the original PINN parameter count from 8965 to 812 and achieves the smallest average heading error among the learned surrogates (0.112 rad). Scaled vehicle experiments are further used as evidence of control feasibility, with 3 of 4 low-speed controllable-contact PIT trials achieving successful yaw reversal.
AIJul 21, 2025
Micromobility Flow Prediction: A Bike Sharing Station-level Study via Multi-level Spatial-Temporal Attention Neural NetworkXi Yang, Jiachen Wang, Song Han et al.
Efficient use of urban micromobility resources such as bike sharing is challenging due to the unbalanced station-level demand and supply, which causes the maintenance of the bike sharing systems painstaking. Prior efforts have been made on accurate prediction of bike traffics, i.e., demand/pick-up and return/drop-off, to achieve system efficiency. However, bike station-level traffic prediction is difficult because of the spatial-temporal complexity of bike sharing systems. Moreover, such level of prediction over entire bike sharing systems is also challenging due to the large number of bike stations. To fill this gap, we propose BikeMAN, a multi-level spatio-temporal attention neural network to predict station-level bike traffic for entire bike sharing systems. The proposed network consists of an encoder and a decoder with an attention mechanism representing the spatial correlation between features of bike stations in the system and another attention mechanism describing the temporal characteristic of bike station traffic. Through experimental study on over 10 millions trips of bike sharing systems (> 700 stations) of New York City, our network showed high accuracy in predicting the bike station traffic of all stations in the city.
LGJan 18, 2022
Accelerating Representation Learning with View-Consistent Dynamics in Data-Efficient Reinforcement LearningTao Huang, Jiachen Wang, Xiao Chen
Learning informative representations from image-based observations is of fundamental concern in deep Reinforcement Learning (RL). However, data-inefficiency remains a significant barrier to this objective. To overcome this obstacle, we propose to accelerate state representation learning by enforcing view-consistency on the dynamics. Firstly, we introduce a formalism of Multi-view Markov Decision Process (MMDP) that incorporates multiple views of the state. Following the structure of MMDP, our method, View-Consistent Dynamics (VCD), learns state representations by training a view-consistent dynamics model in the latent space, where views are generated by applying data augmentation to states. Empirical evaluation on DeepMind Control Suite and Atari-100k demonstrates VCD to be the SoTA data-efficient algorithm on visual control tasks.
CVNov 21, 2021
Denoised Internal Models: a Brain-Inspired Autoencoder against Adversarial AttacksKaiyuan Liu, Xingyu Li, Yurui Lai et al.
Despite its great success, deep learning severely suffers from robustness; that is, deep neural networks are very vulnerable to adversarial attacks, even the simplest ones. Inspired by recent advances in brain science, we propose the Denoised Internal Models (DIM), a novel generative autoencoder-based model to tackle this challenge. Simulating the pipeline in the human brain for visual signal processing, DIM adopts a two-stage approach. In the first stage, DIM uses a denoiser to reduce the noise and the dimensions of inputs, reflecting the information pre-processing in the thalamus. Inspired from the sparse coding of memory-related traces in the primary visual cortex, the second stage produces a set of internal models, one for each category. We evaluate DIM over 42 adversarial attacks, showing that DIM effectively defenses against all the attacks and outperforms the SOTA on the overall robustness.
HCJan 13, 2021
EventAnchor: Reducing Human Interactions in Event Annotation of Racket Sports VideosDazhen Deng, Jiang Wu, Jiachen Wang et al.
The popularity of racket sports (e.g., tennis and table tennis) leads to high demands for data analysis, such as notational analysis, on player performance. While sports videos offer many benefits for such analysis, retrieving accurate information from sports videos could be challenging. In this paper, we propose EventAnchor, a data analysis framework to facilitate interactive annotation of racket sports video with the support of computer vision algorithms. Our approach uses machine learning models in computer vision to help users acquire essential events from videos (e.g., serve, the ball bouncing on the court) and offers users a set of interactive tools for data annotation. An evaluation study on a table tennis annotation system built on this framework shows significant improvement of user performances in simple annotation tasks on objects of interest and complex annotation tasks requiring domain knowledge.
HCSep 5, 2020
PassVizor: Toward Better Understanding of the Dynamics of Soccer PassesXiao Xie, Jiachen Wang, Hongye Liang et al.
In soccer, passing is the most frequent interaction between players and plays a significant role in creating scoring chances. Experts are interested in analyzing players' passing behavior to learn passing tactics, i.e., how players build up an attack with passing. Various approaches have been proposed to facilitate the analysis of passing tactics. However, the dynamic changes of a team's employed tactics over a match have not been comprehensively investigated. To address the problem, we closely collaborate with domain experts and characterize requirements to analyze the dynamic changes of a team's passing tactics. To characterize the passing tactic employed for each attack, we propose a topic-based approach that provides a high-level abstraction of complex passing behaviors. Based on the model, we propose a glyph-based design to reveal the multi-variate information of passing tactics within different phases of attacks, including player identity, spatial context, and formation. We further design and develop PassVizor, a visual analytics system, to support the comprehensive analysis of passing dynamics. With the system, users can detect the changing patterns of passing tactics and examine the detailed passing process for evaluating passing tactics. We invite experts to conduct analysis with PassVizor and demonstrate the usability of the system through an expert interview.
LGJun 5, 2020
Parallel ensemble methods for causal direction inferenceYulai Zhang, Jiachen Wang, Gang Cen et al.
Inferring the causal direction between two variables from their observation data is one of the most fundamental and challenging topics in data science. A causal direction inference algorithm maps the observation data into a binary value which represents either x causes y or y causes x. The nature of these algorithms makes the results unstable with the change of data points. Therefore the accuracy of the causal direction inference can be improved significantly by using parallel ensemble frameworks. In this paper, new causal direction inference algorithms based on several ways of parallel ensemble are proposed. Theoretical analyses on accuracy rates are given. Experiments are done on both of the artificial data sets and the real world data sets. The accuracy performances of the methods and their computational efficiencies in parallel computing environment are demonstrated.
CVFeb 21, 2019
Lung Cancer Detection using Co-learning from Chest CT Images and Clinical DemographicsJiachen Wang, Riqiang Gao, Yuankai Huo et al.
Early detection of lung cancer is essential in reducing mortality. Recent studies have demonstrated the clinical utility of low-dose computed tomography (CT) to detect lung cancer among individuals selected based on very limited clinical information. However, this strategy yields high false positive rates, which can lead to unnecessary and potentially harmful procedures. To address such challenges, we established a pipeline that co-learns from detailed clinical demographics and 3D CT images. Toward this end, we leveraged data from the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL), which focuses on early detection of lung cancer. A 3D attention-based deep convolutional neural net (DCNN) is proposed to identify lung cancer from the chest CT scan without prior anatomical location of the suspicious nodule. To improve upon the non-invasive discrimination between benign and malignant, we applied a random forest classifier to a dataset integrating clinical information to imaging data. The results show that the AUC obtained from clinical demographics alone was 0.635 while the attention network alone reached an accuracy of 0.687. In contrast when applying our proposed pipeline integrating clinical and imaging variables, we reached an AUC of 0.787 on the testing dataset. The proposed network both efficiently captures anatomical information for classification and also generates attention maps that explain the features that drive performance.
CVJan 7, 2019
Reproducibility Evaluation of SLANT Whole Brain Segmentation Across Clinical Magnetic Resonance Imaging ProtocolsYunxi Xiong, Yuankai Huo, Jiachen Wang et al.
Whole brain segmentation on structural magnetic resonance imaging (MRI) is essential for understanding neuroanatomical-functional relationships. Traditionally, multi-atlas segmentation has been regarded as the standard method for whole brain segmentation. In past few years, deep convolutional neural network (DCNN) segmentation methods have demonstrated their advantages in both accuracy and computational efficiency. Recently, we proposed the spatially localized atlas network tiles (SLANT) method, which is able to segment a 3D MRI brain scan into 132 anatomical regions. Commonly, DCNN segmentation methods yield inferior performance under external validations, especially when the testing patterns were not presented in the training cohorts. Recently, we obtained a clinically acquired, multi-sequence MRI brain cohort with 1480 clinically acquired, de-identified brain MRI scans on 395 patients using seven different MRI protocols. Moreover, each subject has at least two scans from different MRI protocols. Herein, we assess the SLANT method's intra- and inter-protocol reproducibility. SLANT achieved less than 0.05 coefficient of variation (CV) for intra-protocol experiments and less than 0.15 CV for inter-protocol experiments. The results show that the SLANT method achieved high intra- and inter- protocol reproducibility.
CVNov 10, 2018
Coronary Calcium Detection using 3D Attention Identical Dual Deep Network Based on Weakly Supervised LearningYuankai Huo, James G. Terry, Jiachen Wang et al.
Coronary artery calcium (CAC) is biomarker of advanced subclinical coronary artery disease and predicts myocardial infarction and death prior to age 60 years. The slice-wise manual delineation has been regarded as the gold standard of coronary calcium detection. However, manual efforts are time and resource consuming and even impracticable to be applied on large-scale cohorts. In this paper, we propose the attention identical dual network (AID-Net) to perform CAC detection using scan-rescan longitudinal non-contrast CT scans with weakly supervised attention by only using per scan level labels. To leverage the performance, 3D attention mechanisms were integrated into the AID-Net to provide complementary information for classification tasks. Moreover, the 3D Gradient-weighted Class Activation Mapping (Grad-CAM) was also proposed at the testing stage to interpret the behaviors of the deep neural network. 5075 non-contrast chest CT scans were used as training, validation and testing datasets. Baseline performance was assessed on the same cohort. From the results, the proposed AID-Net achieved the superior performance on classification accuracy (0.9272) and AUC (0.9627).