Boyang Zhang

LG
h-index19
30papers
177citations
Novelty55%
AI Score57

30 Papers

ROJun 4
MPCoT: Reward-Guided Multi-Path Latent Reasoning for Test-Time Scalable Vision-Language-Action

Boyang Zhang, Lianlei Shan

Vision-Language-Action (VLA) policies remain brittle in long-horizon and high-uncertainty control, where one-pass action decoding provides limited inference-time deliberation. Explicit chain-of-thought can increase reasoning depth, but introduces token latency and an indirect text-to-action interface. We propose MPCoT, a reward-guided multi-path latent reasoning framework that initializes $M$ hypotheses, refines them for K weight-tied steps, and softly aggregates them before action decoding. A training-only path-preference objective evaluates candidate action branches with expert-action consistency, world-model/VLM-based progress, and success feedback to align the latent path scorer with downstream execution quality. MPCoT preserves the original 8-step action interface, generates zero reasoning tokens, and exposes configurable inference controls (K,M). Under matched protocols on LIBERO and CALVIN, MPCoT improves long-horizon performance, with ablations confirming depth-width effects, confidence-weighted aggregation, and reward-guided path supervision.

CVOct 7, 2022Code
Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos

Boyang Zhang, SuPing Wu, Hu Cao et al.

In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences' tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git.

CRJul 30, 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

Boyang Zhang, Yicong Tan, Yun Shen et al.

Recently, autonomous agents built on large language models (LLMs) have experienced significant development and are being deployed in real-world applications. These agents can extend the base LLM's capabilities in multiple ways. For example, a well-built agent using GPT-3.5-Turbo as its core can outperform the more advanced GPT-4 model by leveraging external components. More importantly, the usage of tools enables these systems to perform actions in the real world, moving from merely generating text to actively interacting with their environment. Given the agents' practical applications and their ability to execute consequential actions, it is crucial to assess potential vulnerabilities. Such autonomous systems can cause more severe damage than a standalone language model if compromised. While some existing research has explored harmful actions by LLM agents, our study approaches the vulnerability from a different perspective. We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions. We conduct comprehensive evaluations using various attack methods, surfaces, and properties to pinpoint areas of susceptibility. Our experiments reveal that these attacks can induce failure rates exceeding 80\% in multiple scenarios. Through attacks on implemented and deployable agents in multi-agent scenarios, we accentuate the realistic risks associated with these vulnerabilities. To mitigate such attacks, we propose self-examination detection methods. However, our findings indicate these attacks are difficult to detect effectively using LLMs alone, highlighting the substantial risks associated with this vulnerability.

CRFeb 23, 2023
A Plot is Worth a Thousand Words: Model Information Stealing Attacks via Scientific Plots

Boyang Zhang, Xinlei He, Yun Shen et al.

Building advanced machine learning (ML) models requires expert knowledge and many trials to discover the best architecture and hyperparameter settings. Previous work demonstrates that model information can be leveraged to assist other attacks, such as membership inference, generating adversarial examples. Therefore, such information, e.g., hyperparameters, should be kept confidential. It is well known that an adversary can leverage a target ML model's output to steal the model's information. In this paper, we discover a new side channel for model information stealing attacks, i.e., models' scientific plots which are extensively used to demonstrate model performance and are easily accessible. Our attack is simple and straightforward. We leverage the shadow model training techniques to generate training data for the attack model which is essentially an image classifier. Extensive evaluation on three benchmark datasets shows that our proposed attack can effectively infer the architecture/hyperparameters of image classifiers based on convolutional neural network (CNN) given the scientific plot generated from it. We also reveal that the attack's success is mainly caused by the shape of the scientific plots, and further demonstrate that the attacks are robust in various scenarios. Given the simplicity and effectiveness of the attack method, our study indicates scientific plots indeed constitute a valid side channel for model information stealing attacks. To mitigate the attacks, we propose several defense mechanisms that can reduce the original attacks' accuracy while maintaining the plot utility. However, such defenses can still be bypassed by adaptive attacks.

CROct 19, 2023
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

Boyang Zhang, Zheng Li, Ziqing Yang et al.

While advanced machine learning (ML) models are deployed in numerous real-world applications, previous works demonstrate these models have security and privacy vulnerabilities. Various empirical research has been done in this field. However, most of the experiments are performed on target ML models trained by the security researchers themselves. Due to the high computational resource requirement for training advanced models with complex architectures, researchers generally choose to train a few target models using relatively simple architectures on typical experiment datasets. We argue that to understand ML models' vulnerabilities comprehensively, experiments should be performed on a large set of models trained with various purposes (not just the purpose of evaluating ML attacks and defenses). To this end, we propose using publicly available models with weights from the Internet (public models) for evaluating attacks and defenses on ML models. We establish a database, namely SecurityNet, containing 910 annotated image classification models. We then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models. We share SecurityNet with the research community. and advocate researchers to perform experiments on public models to better demonstrate their proposed methods' effectiveness in the future.

ROMay 23
RoboHitch: Learning Visual Affordance from Disordered Keypoints for Hitch Knots Tying

Jiahui Zuo, Boyang Zhang, Fumin Zhang

Robotic manipulation of deformable linear objects (DLOs) presents significant challenges due to complex dynamics and frequent self-occlusions. Existing robotic knot tying methods typically rely on precise topological state tracking with ordered keypoints and explicit edge connectivity. This reliance makes them prone to failures due to tracking drift and topology mismatch caused by repeated bending and crossings during knot formation.To address these limitations, we introduce RoboHitch, a novel framework that learns to perform hitch knot tying from human demonstrations using only disordered 3D keypoints and RGB images. This eliminates the need for explicit topological order, allowing for more flexible manipulation. Our method employs a dynamic Graph Autoencoder to extract geometric features from untracked keypoints, complemented by a Convolutional Autoencoder that captures essential visual context. A bidirectional cross-attention mechanism then fuses these modalities to jointly predict pick and place affordances, facilitating implicit reasoning about the rope's state and enabling knot tying under occlusion.Real-world experiments demonstrate the effectiveness and generalizability of our approach, successfully completing hitch knots in scenarios with self-occlusions.

CYNov 3, 2023
Comprehensive Assessment of Toxicity in ChatGPT

Boyang Zhang, Xinyue Shen, Wai Man Si et al.

Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP. The emerging large language models (LLMs), such as ChatGPT, can potentially further accentuate this threat. Previous works have discovered that ChatGPT can generate toxic responses using carefully crafted inputs. However, limited research has been done to systematically examine when ChatGPT generates toxic responses. In this paper, we comprehensively evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets that closely align with real-world scenarios. Our results show that ChatGPT's toxicity varies based on different properties and settings of the prompts, including tasks, domains, length, and languages. Notably, prompts in creative writing tasks can be 2x more likely than others to elicit toxic responses. Prompting in German and Portuguese can also double the response toxicity. Additionally, we discover that certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses. We hope our discoveries can guide model developers to better regulate these AI systems and the users to avoid undesirable outputs.

ROMar 11
TacLoc: Global Tactile Localization on Objects from a Registration Perspective

Zirui Zhang, Boyang Zhang, Fumin Zhang et al.

Pose estimation is essential for robotic manipulation, particularly when visual perception is occluded during gripper-object interactions. Existing tactile-based methods generally rely on tactile simulation or pre-trained models, which limits their generalizability and efficiency. In this study, we propose TacLoc, a novel tactile localization framework that formulates the problem as a one-shot point cloud registration task. TacLoc introduces a graph-theoretic partial-to-full registration method, leveraging dense point clouds and surface normals from tactile sensing for efficient and accurate pose estimation. Without requiring rendered data or pre-trained models, TacLoc achieves improved performance through normal-guided graph pruning and a hypothesis-and-verification pipeline. TacLoc is evaluated extensively on the YCB dataset. We further demonstrate TacLoc on real-world objects across two different visual-tactile sensors.

CVApr 9Code
ParseBench: A Document Parsing Benchmark for AI Agents

Boyang Zhang, Sebastián G. Acosta, Preston Carlson et al.

AI agents are changing the requirements for document parsing. What matters is \emph{semantic correctness}: parsed output must preserve the structure and meaning needed for autonomous decisions, including correct table structure, precise chart data, semantically meaningful formatting, and visual grounding. Existing benchmarks do not fully capture this setting for enterprise automation, relying on narrow document distributions and text-similarity metrics that miss agent-critical failures. We introduce \textbf{ParseBench}, a benchmark of ${\sim}2{,}000$ human-verified pages from enterprise documents spanning insurance, finance, and government, organized around five capability dimensions: tables, charts, content faithfulness, semantic formatting, and visual grounding. Across 14 methods spanning vision-language models, specialized document parsers, and LlamaParse, the benchmark reveals a fragmented capability landscape: no method is consistently strong across all five dimensions. LlamaParse Agentic achieves the highest overall score at \agenticoverall\%, and the benchmark highlights the remaining capability gaps across current systems. Dataset and evaluation code are available on \href{https://huggingface.co/datasets/llamaindex/ParseBench}{HuggingFace} and \href{https://github.com/run-llama/ParseBench}{GitHub}.

CVDec 9, 2024Code
Compression for Better: A General and Stable Lossless Compression Framework

Boyang Zhang, Daning Cheng, Yunquan Zhang et al.

This work focus on how to stabilize and lossless model compression, aiming to reduce model complexity and enhance efficiency without sacrificing performance due to compression errors. A key challenge is effectively leveraging compression errors and defining the boundaries for lossless compression to minimize model loss. i.e., compression for better. Currently, there is no systematic approach to determining this error boundary or understanding its specific impact on model performance. We propose a general \textbf{L}oss\textbf{L}ess \textbf{C}ompression theoretical framework (\textbf{LLC}), which further delineates the compression neighborhood and higher-order analysis boundaries through the total differential, thereby specifying the error range within which a model can be compressed without loss. To verify the effectiveness of LLC, we apply various compression techniques, including quantization and decomposition. Specifically, for quantization, we reformulate the classic quantization search problem as a grouped knapsack problem within the lossless neighborhood, achieving lossless quantization while improving computational efficiency. For decomposition, LLC addresses the approximation problem under low-rank constraints, automatically determining the rank for each layer and producing lossless low-rank models. We conduct extensive experiments on multiple neural network architectures on different datasets. The results show that without fancy tricks, LLC can effectively achieve lossless model compression. Our code will be made publicly.

LGDec 9, 2024Code
Lossless Model Compression via Joint Low-Rank Factorization Optimization

Boyang Zhang, Daning Cheng, Yunquan Zhang et al.

Low-rank factorization is a popular model compression technique that minimizes the error $δ$ between approximated and original weight matrices. Despite achieving performances close to the original models when $δ$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

OCMay 13
Proximal-Based Generative Modeling for Bayesian Inverse Problems

Boyang Zhang, Zhiguo Wang, Ya-Feng Liu

Score-based diffusion models demonstrate superior performance in generative tasks but encounter fundamental bottlenecks in inverse problems due to the analytical intractability of the time-dependent likelihood score. To bridge this gap, we propose a novel proximal-based generative modeling (PGM) framework that rigorously circumvents explicit likelihood evaluation. Our framework is built upon a theoretical equivalence between Gaussian convolution in diffusion processes and Moreau-Yosida regularization in nonsmooth optimization. This enables a new sampling mechanism driven by the proposed Moreau score, which admits a closed-form expression via proximal operators. Moreover, we introduce Moreau score matching to learn the proximal operators that rely solely on samples drawn from the prior distribution. Theoretically, PGM eliminates the early-stopping bias inherent in the score-based diffusion model and achieves non-asymptotic convergence. Experiments demonstrate that PGM significantly surpasses state-of-the-art methods in reconstruction quality and sampling time.

OPTICSMay 23, 2025Code
SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization for Scalable, PDE-Constrained Meta-Optical Neural Network Training

Pingchuan Ma, Ziang Yin, Qi Jing et al.

DONNs leverage light propagation for efficient analog AI and signal processing. Advances in nanophotonic fabrication and metasurface-based wavefront engineering have opened new pathways to realize high-capacity DONNs across various spectral regimes. Training such DONN systems to determine the metasurface structures remains challenging. Heuristic methods are fast but oversimplify metasurfaces modulation, often resulting in physically unrealizable designs and significant performance degradation. Simulation-in-the-loop optimizes implementable metasurfaces via adjoint methods, but is computationally prohibitive and unscalable. To address these limitations, we propose SP2RINT, a spatially decoupled, progressive training framework that formulates DONN training as a PDE-constrained learning problem. Metasurface responses are first relaxed into freely trainable transfer matrices with a banded structure. We then progressively enforce physical constraints by alternating between transfer matrix training and adjoint-based inverse design, avoiding per-iteration PDE solves while ensuring final physical realizability. To further reduce runtime, we introduce a physics-inspired, spatially decoupled inverse design strategy based on the natural locality of field interactions. This approach partitions the metasurface into independently solvable patches, enabling scalable and parallel inverse design with system-level calibration. Evaluated across diverse DONN training tasks, SP2RINT achieves digital-comparable accuracy while being 1825 times faster than simulation-in-the-loop approaches. By bridging the gap between abstract DONN models and implementable photonic hardware, SP2RINT enables scalable, high-performance training of physically realizable meta-optical neural systems. Our code is available at https://github.com/ScopeX-ASU/SP2RINT

LGMay 8
A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

Daning Cheng, Zeyu Liu, Jun Sun et al.

The scaling behavior, in which test performance often improves as model size and data increase, is a central empirical phenomenon in modern deep learning, yet its theoretical basis remains incomplete. In this paper, we study depth expansion in normalized residual networks: starting from a trained model in an old hypothesis class, we insert a new residual block at an intermediate layer and ask when such an expansion can yield a provable improvement in test risk. We develop a unified framework that decomposes this question into representational gain, optimization gain, and generalization transfer. First, under a first-order descent condition near zero initialization, we prove that the expanded hypothesis class contains an auxiliary jumpboard model with strictly smaller population risk than the original model. Second, under norm control tailored to post-normalized residual architectures, we establish a norm-based Rademacher complexity bound for the expanded model class. These ingredients lead to two complementary test-risk guarantees: one route passes through population risk and is tighter when a positive population margin is available, while the other works directly at the train/test level, avoids Hoeffding transfer, and is more robust in degenerate regimes. Together, these results provide a theorem-driven mechanism under which residual depth expansion can improve test performance in normalized residual networks. More broadly, they suggest that scaling is inherently joint: depth creates new improving directions, width enhances the finite-sample observability of weak signals, and data determines whether the statistical cost of expansion can be controlled.

CLFeb 26
Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent

Boyang Zhang, Yang Zhang

The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended deanonymization risks in textual data such as news articles. In this work, we introduce an LLM agent designed to evaluate and mitigate such risks through a structured, interpretable pipeline. Central to our framework is the proposed $\textit{SALA}$ (Stylometry-Assisted LLM Analysis) method, which integrates quantitative stylometric features with LLM reasoning for robust and transparent authorship attribution. Experiments on large-scale news datasets demonstrate that $\textit{SALA}$, particularly when augmented with a database module, achieves high inference accuracy in various scenarios. Finally, we propose a guided recomposition strategy that leverages the agent's reasoning trace to generate rewriting prompts, effectively reducing authorship identifiability while preserving textual meaning. Our findings highlight both the deanonymization potential of LLM agents and the importance of interpretable, proactive defenses for safeguarding author privacy.

LGNov 10, 2025
Rethinking Parameter Sharing as Graph Coloring for Structured Compression

Boyang Zhang, Daning Cheng, Yunquan Zhang

Modern deep models have massive parameter sizes, leading to high inference-time memory usage that limits practical deployment. Parameter sharing, a form of structured compression, effectively reduces redundancy, but existing approaches remain heuristic-restricted to adjacent layers and lacking a systematic analysis for cross-layer sharing. However, extending sharing across multiple layers leads to an exponentially expanding configuration space, making exhaustive search computationally infeasible and forming a critical bottleneck for parameter sharing. We recast parameter sharing from a group-theoretic perspective as introducing structural symmetries in the model's parameter space. A sharing configuration can be described by a coloring function $α:L\rightarrow C$ (L: layer indices and C: sharing classes), which determines inter-layer sharing groups while preserving structural symmetry. To determine the coloring function, we propose a second-order geometric criterion based on Taylor expansion and the Hessian spectrum. By projecting perturbations onto the Hessian's low-curvature eigensubspace, the criterion provides an analytic rule for selecting sharing groups that minimize performance impact, yielding a principled and scalable configuration procedure. Across diverse architectures and tasks, Geo-Sharing consistently outperforms state-of-the-art heuristic sharing strategies, achieving higher compression ratios with smaller accuracy degradation.

LGDec 28, 2025
FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment

Boyang Zhang, Xiaobing Chen, Songyang Zhang et al.

Mixture-of-Experts (MoE) models enable scalable neural networks through conditional computation. However, their deployment with federated learning (FL) faces two critical challenges: 1) resource-constrained edge devices cannot store full expert sets, and 2) non-IID data distributions cause severe expert load imbalance that degrades model performance. To this end, we propose \textbf{FLEX-MoE}, a novel federated MoE framework that jointly optimizes expert assignment and load balancing under limited client capacity. Specifically, our approach introduces client-expert fitness scores that quantify the expert suitability for local datasets through training feedback, and employs an optimization-based algorithm to maximize client-expert specialization while enforcing balanced expert utilization system-wide. Unlike existing greedy methods that focus solely on personalization while ignoring load imbalance, our FLEX-MoE is capable of addressing the expert utilization skew, which is particularly severe in FL settings with heterogeneous data. Our comprehensive experiments on three different datasets demonstrate the superior performance of the proposed FLEX-MoE, together with its ability to maintain balanced expert utilization across diverse resource-constrained scenarios.

LGJan 29
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

Jinhao Zhang Yunquan Zhang, Zicheng yan, Boyang Zhang et al.

Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

ASFeb 26, 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

Ziyue Jiang, Yi Ren, Ruiqi Li et al.

While recent zero-shot text-to-speech (TTS) models have significantly improved speech quality and expressiveness, mainstream systems still suffer from issues related to speech-text alignment modeling: 1) models without explicit speech-text alignment modeling exhibit less robustness, especially for hard sentences in practical applications; 2) predefined alignment-based models suffer from naturalness constraints of forced alignments. This paper introduces \textit{MegaTTS 3}, a TTS system featuring an innovative sparse alignment algorithm that guides the latent diffusion transformer (DiT). Specifically, we provide sparse alignment boundaries to MegaTTS 3 to reduce the difficulty of alignment without limiting the search space, thereby achieving high naturalness. Moreover, we employ a multi-condition classifier-free guidance strategy for accent intensity adjustment and adopt the piecewise rectified flow technique to accelerate the generation process. Experiments demonstrate that MegaTTS 3 achieves state-of-the-art zero-shot TTS speech quality and supports highly flexible control over accent intensity. Notably, our system can generate high-quality one-minute speech with only 8 sampling steps. Audio samples are available at https://sditdemo.github.io/sditdemo/.

LGDec 9, 2024
FP=xINT:A Low-Bit Series Expansion Algorithm for Post-Training Quantization

Boyang Zhang, Daning Cheng, Yunquan Zhang et al.

Post-Training Quantization (PTQ) converts pre-trained Full-Precision (FP) models into quantized versions without training. While existing methods reduce size and computational costs, they also significantly degrade performance and quantization efficiency at extremely low settings due to quantization noise. We introduce a deep model series expansion framework to address this issue, enabling rapid and accurate approximation of unquantized models without calibration sets or fine-tuning. This is the first use of series expansion for neural network quantization. Specifically, our method expands the FP model into multiple low-bit basis models. To ensure accurate quantization, we develop low-bit basis model expansions at different granularities (tensor, layer, model), and theoretically confirm their convergence to the dense model, thus restoring FP model accuracy. Additionally, we design AbelianAdd/Mul operations between isomorphic models in the low-bit expansion, forming an Abelian group to ensure operation parallelism and commutativity. The experiments show that our algorithm achieves state-of-the-art performance in low-bit settings; for example, 4-bit quantization of ResNet-50 surpasses the original accuracy, reaching 77.03%. The code will be made public.

LGFeb 19, 2025
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies

Boyang Zhang, Daning Cheng, Yunquan Zhang et al.

The exponential growth in parameter size and computational complexity of deep models poses significant challenges for efficient deployment. The core problem of existing compression methods is that different layers of the model have significant differences in their tolerance to compression levels. For instance, the first layer of a model can typically sustain a higher compression level compared to the last layer without compromising performance. Thus, the key challenge lies in how to allocate compression levels across layers in a way that minimizes performance loss while maximizing parameter reduction. To address this challenge, we propose a Compression Error Theory (CET) framework, designed to determine the optimal compression level for each layer. Taking quantization as an example, CET leverages differential expansion and algebraic geometry to reconstruct the quadratic form of quantization error as ellipsoids and hyperbolic paraboloids, and utilizes their geometric structures to define an error subspace. To identify the error subspace with minimal performance loss, by performing orthogonal decomposition of the geometric space, CET transforms the optimization process of the error subspace into a complementary problem. The final theoretical analysis shows that constructing the quantization subspace along the major axis results in minimal performance degradation. Through experimental verification of the theory, CET can greatly retain performance while compressing. Specifically, on the ResNet-34 model, CET achieves nearly 11$\times$ parameter compression while even surpassing performance comparable to the original model.

LGAug 9, 2025
MoQE: Improve Quantization Model performance via Mixture of Quantization Experts

Jinhao Zhang, Yunquan Zhang, Boyang Zhang et al.

Quantization method plays a crucial role in improving model efficiency and reducing deployment costs, enabling the widespread application of deep learning models on resource-constrained devices. However, the quantization process inevitably introduces accuracy degradation. In this paper, we propose Mixture of Quantization Experts( abbr. MoQE), a quantization inference framework based on the Mixture-of-Experts (MoE) architecture, aiming to jointly improve the performance of quantization models. MoQE combines multiple quantization variants of one full-precision model as specialized "quantization experts" and dynamically routes input data to the most suitable expert based on its characteristics. MoQE alleviates the performance degradation commonly seen in single quantization models through specialization quantization expert models. We design lightweight, structure-aware router models tailored for both CV and NLP tasks. Experimental evaluations on ResNet, LLaMA, and Qwen model families across benchmark datasets including ImageNet, WikiText, C4, and OpenWebText demonstrate that MoQE achieves performance comparable to SOTA quantization model, without incurring significant increases in inference latency.

LGJul 8, 2025
Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach

Xiaobing Chen, Boyang Zhang, Xiangwei Zhou et al.

The integration of Federated Learning (FL) and Mixture-of-Experts (MoE) presents a compelling pathway for training more powerful, large-scale artificial intelligence models (LAMs) on decentralized data while preserving privacy. However, efficient federated training of these complex MoE-structured LAMs is hindered by significant system-level challenges, particularly in managing the interplay between heterogeneous client resources and the sophisticated coordination required for numerous specialized experts. This article highlights a critical, yet underexplored concept: the absence of robust quantitative strategies for dynamic client-expert alignment that holistically considers varying client capacities and the imperative for system-wise load balancing. Specifically, we propose a conceptual system design for intelligent client-expert alignment that incorporates dynamic fitness scoring, global expert load monitoring, and client capacity profiling. By tackling these systemic issues, we can unlock more scalable, efficient, and robust training mechanisms {with fewer communication rounds for convergence}, paving the way for the widespread deployment of large-scale federated MoE-structured LAMs in edge computing with ultra-high communication efficiency.

CLApr 13, 2025
Can the capability of Large Language Models be described by human ability? A Meta Study

Mingrui Zan, Yunquan Zhang, Boyang Zhang et al.

Users of Large Language Models (LLMs) often perceive these models as intelligent entities with human-like capabilities. However, the extent to which LLMs' capabilities truly approximate human abilities remains a topic of debate. In this paper, to characterize the capabilities of LLMs in relation to human capabilities, we collected performance data from over 80 models across 37 evaluation benchmarks. The evaluation benchmarks are categorized into 6 primary abilities and 11 sub-abilities in human aspect. Then, we then clustered the performance rankings into several categories and compared these clustering results with classifications based on human ability aspects. Our findings lead to the following conclusions: 1. We have confirmed that certain capabilities of LLMs with fewer than 10 billion parameters can indeed be described using human ability metrics; 2. While some abilities are considered interrelated in humans, they appear nearly uncorrelated in LLMs; 3. The capabilities possessed by LLMs vary significantly with the parameter scale of the model.

CRNov 27, 2025
GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

Xinyu Zhang, Yixin Wu, Boyang Zhang et al.

Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (LVLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.

OCOct 14, 2025
A Gradient Guided Diffusion Framework for Chance Constrained Programming

Boyang Zhang, Zhiguo Wang, Ya-Feng Liu

Chance constrained programming (CCP) is a powerful framework for addressing optimization problems under uncertainty. In this paper, we introduce a novel Gradient-Guided Diffusion-based Optimization framework, termed GGDOpt, which tackles CCP through three key innovations. First, GGDOpt accommodates a broad class of CCP problems without requiring the knowledge of the exact distribution of uncertainty-relying solely on a set of samples. Second, to address the nonconvexity of the chance constraints, it reformulates the CCP as a sampling problem over the product of two distributions: an unknown data distribution supported on a nonconvex set and a Boltzmann distribution defined by the objective function, which fully leverages both first- and second-order gradient information. Third, GGDOpt has theoretical convergence guarantees and provides practical error bounds under mild assumptions. By progressively injecting noise during the forward diffusion process to convexify the nonconvex feasible region, GGDOpt enables guided reverse sampling to generate asymptotically optimal solutions. Experimental results on synthetic datasets and a waveform design task in wireless communications demonstrate that GGDOpt outperforms existing methods in both solution quality and stability with nearly 80% overhead reduction.

CRSep 29, 2025
Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models

Boyang Zhang, Istemi Ekin Akkus, Ruichuan Chen et al.

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing and reasoning over diverse modalities, but their advanced abilities also raise significant privacy concerns, particularly regarding Personally Identifiable Information (PII) leakage. While relevant research has been conducted on single-modal language models to some extent, the vulnerabilities in the multimodal setting have yet to be fully investigated. In this work, we investigate these emerging risks with a focus on vision language models (VLMs), a representative subclass of MLLMs that covers the two modalities most relevant for PII leakage, vision and text. We introduce a concept-guided mitigation approach that identifies and modifies the model's internal states associated with PII-related content. Our method guides VLMs to refuse PII-sensitive tasks effectively and efficiently, without requiring re-training or fine-tuning. We also address the current lack of multimodal PII datasets by constructing various ones that simulate real-world scenarios. Experimental results demonstrate that the method can achieve an average refusal rate of 93.3% for various PII-related tasks with minimal impact on unrelated model performances. We further examine the mitigation's performance under various conditions to show the adaptability of our proposed method.

LGSep 22, 2025
Learning to Rank with Top-$K$ Fairness

Boyang Zhang, Quanqi Hu, Mingxuan Sun et al.

Fairness in ranking models is crucial, as disparities in exposure can disproportionately affect protected groups. Most fairness-aware ranking systems focus on ensuring comparable average exposure for groups across the entire ranked list, which may not fully address real-world concerns. For example, when a ranking model is used for allocating resources among candidates or disaster hotspots, decision-makers often prioritize only the top-$K$ ranked items, while the ranking beyond top-$K$ becomes less relevant. In this paper, we propose a list-wise learning-to-rank framework that addresses the issues of inequalities in top-$K$ rankings at training time. Specifically, we propose a top-$K$ exposure disparity measure that extends the classic exposure disparity metric in a ranked list. We then learn a ranker to balance relevance and fairness in top-$K$ rankings. Since direct top-$K$ selection is computationally expensive for a large number of items, we transform the non-differentiable selection process into a differentiable objective function and develop efficient stochastic optimization algorithms to achieve both high accuracy and sufficient fairness. Extensive experiments demonstrate that our method outperforms existing methods.

ROAug 13, 2025
CaRoBio: 3D Cable Routing with a Bio-inspired Gripper Fingernail

Jiahui Zuo, Boyang Zhang, Fumin Zhang

The manipulation of deformable linear flexures has a wide range of applications in industry, such as cable routing in automotive manufacturing and textile production. Cable routing, as a complex multi-stage robot manipulation scenario, is a challenging task for robot automation. Common parallel two-finger grippers have the risk of over-squeezing and over-tension when grasping and guiding cables. In this paper, a novel eagle-inspired fingernail is designed and mounted on the gripper fingers, which helps with cable grasping on planar surfaces and in-hand cable guiding operations. Then we present a single-grasp end-to-end 3D cable routing framework utilizing the proposed fingernails, instead of the common pick-and-place strategy. Continuous control is achieved to efficiently manipulate cables through vision-based state estimation of task configurations and offline trajectory planning based on motion primitives. We evaluate the effectiveness of the proposed framework with a variety of cables and channel slots, significantly outperforming the pick-and-place manipulation process under equivalent perceptual conditions. Our reconfigurable task setting and the proposed framework provide a reference for future cable routing manipulations in 3D space.

LGMay 23, 2025
Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training

Zeyu Liu, Yan Li, Yunquan Zhang et al.

Training large language models typically demands extensive GPU memory and substantial financial investment, which poses a barrier for many small- to medium-sized teams. In this paper, we propose a full-parameter pre-training and fine-tuning framework based on block coordinate descent (BCD), enhanced with engineering optimizations, to enable efficient training of large-scale models on cost-effective RTX 4090, A100 and A800 GPU clusters. Under identical hardware configurations, we reduce the training cost of a 7B model to 33% on A100/A800 and only 2.6% on RTX 4090, compared to standard full-parameter training. It also enables large models previously restricted to A100 clusters to be trained on RTX 4090 without degrading performance. BCD achieves comparable or better accuracy than full-parameter and fine-tuning methods at most cases, with lower GPU consumption and improved hardware utilization.