Haoze Wang

DS
h-index1
3papers
6citations
Novelty52%
AI Score40

3 Papers

12.1DSApr 30
Streaming Max-Cut in General Metrics

Shaofeng H. -C. Jiang, Pan Peng, Haoze Wang

Max-Cut is a fundamental combinatorial optimization problem that has been studied in various computational settings. We initiate the study of its streaming complexity in \emph{general metric spaces} with access to distance oracles. We give a $(1 + ε)$-approximate algorithm for estimating the Max-Cut value in \emph{sliding-window} streams using only poly-logarithmic space. This is the first sliding-window algorithm for Max-Cut even in Euclidean spaces, and it matches a known insertion-only space bound in the special case of Euclidean spaces [Chen, Jiang, Krauthgamer, STOC'23]. In sharp contrast, we give a $\poly(n)$-space lower bound in the \emph{dynamic} streaming setting. This yields a separation from the Euclidean case, where the polylogarithmic-space $(1+ε)$-approximation extends to dynamic streams. On the technical side, our sliding-window algorithm builds on the smooth histogram framework of [Braverman and Ostrovsky, SICOMP'10]. To make this framework applicable, we establish the first smoothness bound for metric Max-Cut. Moreover, we develop a streaming algorithm for metric Max-Cut in insertion-only streams, whose key ingredient is a new metric reservoir sampling technique.

LGSep 25, 2024
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization

Yifan Tan, Haoze Wang, Chao Yan et al.

Model quantization has become a crucial technique to address the issues of large memory consumption and long inference times associated with LLMs. Mixed-precision quantization, which distinguishes between important and unimportant parameters, stands out among numerous quantization schemes as it achieves a balance between precision and compression rate. However, existing approaches can only identify important parameters through qualitative analysis and manual experiments without quantitatively analyzing how their importance is determined. We propose a new criterion, so-called 'precision alignment', to build a quantitative framework to holistically evaluate the importance of parameters in mixed-precision quantization. Our observations on floating point addition under various real-world scenarios suggest that two addends should have identical precision, otherwise the information in the higher-precision number will be wasted. Such an observation offers an essential principle to determine the precision of each parameter in matrix multiplication operation. As the first step towards applying the above discovery to large model inference, we develop a dynamic KV-Cache quantization technique to effectively reduce memory access latency. Different from existing quantization approaches that focus on memory saving, this work directly aims to accelerate LLM inference through quantifying floating numbers. The proposed technique attains a 25% saving of memory access and delivers up to 1.3x speedup in the computation of attention in the decoding phase of LLM, with almost no loss of precision.

SEAug 22, 2025
LLM-Assisted Semantic Alignment and Integration in Collaborative Model-Based Systems Engineering Using SysML v2

Zirui Li, Stephan Husung, Haoze Wang

Cross-organizational collaboration in Model-Based Systems Engineering (MBSE) faces many challenges in achieving semantic alignment across independently developed system models. SysML v2 introduces enhanced structural modularity and formal semantics, offering a stronger foundation for interoperable modeling. Meanwhile, GPT-based Large Language Models (LLMs) provide new capabilities for assisting model understanding and integration. This paper proposes a structured, prompt-driven approach for LLM-assisted semantic alignment of SysML v2 models. The core contribution lies in the iterative development of an alignment approach and interaction prompts, incorporating model extraction, semantic matching, and verification. The approach leverages SysML v2 constructs such as alias, import, and metadata extensions to support traceable, soft alignment integration. It is demonstrated with a GPT-based LLM through an example of a measurement system. Benefits and limitations are discussed.