Hongbao Zhang

LG
h-index9
7papers
55citations
Novelty53%
AI Score41

7 Papers

DBMay 16
MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

Han Chen, Zining Zhang, Wenqi Pei et al.

Memory is a fundamental component for enabling long-context LLM agents, supporting persistent state across interactions through a continuous serve-and-update lifecycle. Despite substantial prior work, existing systems suffer from significant maintenance overhead due to two key limitations: coarse-grained state management and inherently sequential update pipelines. In particular, updates are often tightly coupled with LLM inference and require full-state rewrites, leading to poor scalability and growing latency as memory accumulates. To address these challenges, we present MemForest, a memory framework that reformulates agent memory as a write-efficient temporal data management problem. MemForest breaks the sequential bottleneck via parallel chunk extraction, decoupling memory construction into concurrent, independent operations. To further eliminate coarse-grained maintenance, we introduce MemTree, a hierarchical temporal index that organizes memory as time-ordered trees rather than flat global summaries. This design replaces full-state rewrites with localized per-node updates, reducing maintenance cost to the affected tree paths while naturally preserving temporally evolving states. We evaluate MemForest on two long-context memory benchmarks, LongMemEval-S and LoCoMo. On LongMemEval-S, MemForest achieves the best overall performance among stateful baselines, reaching 79.8% pass@1 accuracy while sustaining a memory construction throughput approximately 6x higher than state-of-the-art approaches including EverMemOS.

CLFeb 16, 2025
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models

Zihao Zhu, Hongbao Zhang, Ruotong Wang et al.

Large Reasoning Models (LRMs) are designed to solve complex tasks by generating explicit reasoning traces before producing final answers. However, we reveal a critical vulnerability in LRMs -- termed Unthinking Vulnerability -- wherein the thinking process can be bypassed by manipulating special delimiter tokens. It is empirically demonstrated to be widespread across mainstream LRMs, posing both a significant risk and potential utility, depending on how it is exploited. In this paper, we systematically investigate this vulnerability from both malicious and beneficial perspectives. On the malicious side, we introduce Breaking of Thought (BoT), a novel attack that enables adversaries to bypass the thinking process of LRMs, thereby compromising their reliability and availability. We present two variants of BoT: a training-based version that injects backdoor during the fine-tuning stage, and a training-free version based on adversarial attack during the inference stage. As a potential defense, we propose thinking recovery alignment to partially mitigate the vulnerability. On the beneficial side, we introduce Monitoring of Thought (MoT), a plug-and-play framework that allows model owners to enhance efficiency and safety. It is implemented by leveraging the same vulnerability to dynamically terminate redundant or risky reasoning through external monitoring. Extensive experiments show that BoT poses a significant threat to reasoning reliability, while MoT provides a practical solution for preventing overthinking and jailbreaking. Our findings expose an inherent flaw in current LRM architectures and underscore the need for more robust reasoning systems in the future.

CVDec 7, 2024
HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing

Zihao Zhu, Hongbao Zhang, Guanzong Wu et al.

Visual-textual inconsistency (VTI) evaluation plays a crucial role in cleansing vision-language data. Its main challenges stem from the high variety of image captioning datasets, where differences in content can create a range of inconsistencies (\eg, inconsistencies in scene, entities, entity attributes, entity numbers, entity interactions). Moreover, variations in caption length can introduce inconsistencies at different levels of granularity as well. To tackle these challenges, we design an adaptive evaluation framework, called Hierarchical and Multi-Grained Inconsistency Evaluation (HMGIE), which can provide multi-grained evaluations covering both accuracy and completeness for various image-caption pairs. Specifically, the HMGIE framework is implemented by three consecutive modules. Firstly, the semantic graph generation module converts the image caption to a semantic graph for building a structural representation of all involved semantic items. Then, the hierarchical inconsistency evaluation module provides a progressive evaluation procedure with a dynamic question-answer generation and evaluation strategy guided by the semantic graph, producing a hierarchical inconsistency evaluation graph (HIEG). Finally, the quantitative evaluation module calculates the accuracy and completeness scores based on the HIEG, followed by a natural language explanation about the detection results. Moreover, to verify the efficacy and flexibility of the proposed framework on handling different image captioning datasets, we construct MVTID, an image-caption dataset with diverse types and granularities of inconsistencies. Extensive experiments on MVTID and other benchmark datasets demonstrate the superior performance of the proposed HMGIE to current state-of-the-art methods.

LGOct 30, 2020
Validate and Enable Machine Learning in Industrial AI

Hongbo Zou, Guangjing Chen, Pengtao Xie et al.

Industrial Artificial Intelligence (Industrial AI) is an emerging concept which refers to the application of artificial intelligence to industry. Industrial AI promises more efficient future industrial control systems. However, manufacturers and solution partners need to understand how to implement and integrate an AI model into the existing industrial control system. A well-trained machine learning (ML) model provides many benefits and opportunities for industrial control optimization; however, an inferior Industrial AI design and integration limits the capability of ML models. To better understand how to develop and integrate trained ML models into the traditional industrial control system, test the deployed AI control system, and ultimately outperform traditional systems, manufacturers and their AI solution partners need to address a number of challenges. Six top challenges, which were real problems we ran into when deploying Industrial AI, are explored in the paper. The Petuum Optimum system is used as an example to showcase the challenges in making and testing AI models, and more importantly, how to address such challenges in an Industrial AI system.

LGAug 5, 2018
Missing Value Imputation Based on Deep Generative Models

Hongbao Zhang, Pengtao Xie, Eric Xing

Missing values widely exist in many real-world datasets, which hinders the performing of advanced data analytics. Properly filling these missing values is crucial but challenging, especially when the missing rate is high. Many approaches have been proposed for missing value imputation (MVI), but they are mostly heuristics-based, lacking a principled foundation and do not perform satisfactorily in practice. In this paper, we propose a probabilistic framework based on deep generative models for MVI. Under this framework, imputing the missing entries amounts to seeking a fixed-point solution between two conditional distributions defined on the missing entries and latent variables respectively. These distributions are parameterized by deep neural networks (DNNs) which possess high approximation power and can capture the nonlinear relationships between missing entries and the observed values. The learning of weight parameters of DNNs is performed by maximizing an approximation of the log-likelihood of observed values. We conducted extensive evaluation on 13 datasets and compared with 11 baselines methods, where our methods largely outperforms the baselines.

LGNov 25, 2017
Learning Less-Overlapping Representations

Pengtao Xie, Hongbao Zhang, Eric P. Xing

In representation learning (RL), how to make the learned representations easy to interpret and less overfitted to training data are two important but challenging issues. To address these problems, we study a new type of regulariza- tion approach that encourages the supports of weight vectors in RL models to have small overlap, by simultaneously promoting near-orthogonality among vectors and sparsity of each vector. We apply the proposed regularizer to two models: neural networks (NNs) and sparse coding (SC), and develop an efficient ADMM-based algorithm for regu- larized SC. Experiments on various datasets demonstrate that weight vectors learned under our regularizer are more interpretable and have better generalization performance.

CLNov 4, 2017
Predicting Discharge Medications at Admission Time Based on Deep Learning

Yuan Yang, Pengtao Xie, Xin Gao et al.

Predicting discharge medications right after a patient being admitted is an important clinical decision, which provides physicians with guidance on what type of medication regimen to plan for and what possible changes on initial medication may occur during an inpatient stay. It also facilitates medication reconciliation process with easy detection of medication discrepancy at discharge time to improve patient safety. However, since the information available upon admission is limited and patients' condition may evolve during an inpatient stay, these predictions could be a difficult decision for physicians to make. In this work, we investigate how to leverage deep learning technologies to assist physicians in predicting discharge medications based on information documented in the admission note. We build a convolutional neural network which takes an admission note as input and predicts the medications placed on the patient at discharge time. Our method is able to distill semantic patterns from unstructured and noisy texts, and is capable of capturing the pharmacological correlations among medications. We evaluate our method on 25K patient visits and compare with 4 strong baselines. Our methods demonstrate a 20% increase in macro-averaged F1 score than the best baseline.