5 Papers

81.2CLMay 10Code
LLM Agents Already Know When to Call Tools -- Even Without Reasoning

Chung-En Sun, Linbo Liu, Ge Yan et al.

Tool-augmented LLM agents tend to call tools indiscriminately, even when the model can answer directly. Each unnecessary call wastes API fees and latency, yet no existing benchmark systematically studies when a tool call is actually needed. We propose When2Tool, a benchmark of 18 environments (15 single-hop, 3 multi-hop) spanning three categories of tool necessity -- computational scale, knowledge boundaries, and execution reliability -- each with controlled difficulty levels that create a clear decision boundary between tool-necessary and tool-unnecessary tasks. We evaluate two families of training-free baselines: Prompt-only (varying the prompt to discourage unnecessary calls) and Reason-then-Act (requiring the model to reason about tool necessity before acting). Both provide limited control: Prompt-only suppresses necessary calls alongside unnecessary ones, and Reason-then-Act still incurs a disproportionate accuracy cost on hard tasks. To understand why these baselines fail, we probe the models' hidden states and find that tool necessity is linearly decodable from the pre-generation representation with AUROC 0.89--0.96 across six models, substantially exceeding the model's own verbalized reasoning. This reveals that models already know when tools are needed, but fail to act on this knowledge during generation. Building on this finding, we propose Probe&Prefill, which uses a lightweight linear probe to read the hidden-state signal and prefills the model's response with a steering sentence. Across all models tested, Probe&Prefill reduces tool calls by 48% with only 1.7% accuracy loss, while the best baseline at comparable accuracy only reduces 6% of tool calls, or achieves a similar tool call reduction but incurs a 5$\times$ higher accuracy loss. Our code is available at https://github.com/Trustworthy-ML-Lab/when2tool

LGFeb 3
Distance Marching for Generative Modeling

Zimo Wang, Ishit Mehta, Haolin Lu et al.

Time-unconditional generative models learn time-independent denoising vector fields. But without time conditioning, the same noisy input may correspond to multiple noise levels and different denoising directions, which interferes with the supervision signal. Inspired by distance field modeling, we propose Distance Marching, a new time-unconditional approach with two principled inference methods. Crucially, we design losses that focus on closer targets. This yields denoising directions better directed toward the data manifold. Across architectures, Distance Marching consistently improves FID by 13.5% on CIFAR-10 and ImageNet over recent time-unconditional baselines. For class-conditional ImageNet generation, despite removing time input, Distance Marching surpasses flow matching using our losses and inference methods. It achieves lower FID than flow matching's final performance using 60% of the sampling steps and 13.6% lower FID on average across backbone sizes. Moreover, our distance prediction is also helpful for early stopping during sampling and for OOD detection. We hope distance field modeling can serve as a principled lens for generative modeling.

CLFeb 10
Steer2Edit: From Activation Steering to Component-Level Editing

Chung-En Sun, Ge Yan, Zimo Wang et al.

Steering methods influence Large Language Model behavior by identifying semantic directions in hidden representations, but are typically realized through inference-time activation interventions that apply a fixed, global modification to the model's internal states. While effective, such interventions often induce unfavorable attribute-utility trade-offs under strong control, as they ignore the fact that many behaviors are governed by a small and heterogeneous subset of model components. We propose Steer2Edit, a theoretically grounded, training-free framework that transforms steering vectors from inference-time control signals into diagnostic signals for component-level rank-1 weight editing. Instead of uniformly injecting a steering direction during generation, Steer2Edit selectively redistributes behavioral influence across individual attention heads and MLP neurons, yielding interpretable edits that preserve the standard forward pass and remain compatible with optimized parallel inference. Across safety alignment, hallucination mitigation, and reasoning efficiency, Steer2Edit consistently achieves more favorable attribute-utility trade-offs: at matched downstream performance, it improves safety by up to 17.2%, increases truthfulness by 9.8%, and reduces reasoning length by 12.2% on average. Overall, Steer2Edit provides a principled bridge between representation steering and weight editing by translating steering signals into interpretable, training-free parameter updates.

CVNov 21, 2024
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition

Zimo Wang, Cheng Wang, Taiki Yoshino et al.

We propose a method, HotSpot, for optimizing neural signed distance functions. Existing losses, such as the eikonal loss, act as necessary but insufficient constraints and cannot guarantee that the recovered implicit function represents a true distance function, even if the output minimizes these losses almost everywhere. Furthermore, the eikonal loss suffers from stability issues in optimization. Finally, in conventional methods, regularization losses that penalize surface area distort the reconstructed signed distance function. We address these challenges by designing a loss function using the solution of a screened Poisson equation. Our loss, when minimized, provides an asymptotically sufficient condition to ensure the output converges to a true distance function. Our loss also leads to stable optimization and naturally penalizes large surface areas. We present theoretical analysis and experiments on both challenging 2D and 3D datasets and show that our method provides better surface reconstruction and a more accurate distance approximation.

APFeb 9, 2022
House Price Valuation Model Based on Geographically Neural Network Weighted Regression: The Case Study of Shenzhen, China

Zimo Wang, Yicheng Wang, Sensen Wu

Confronted with the spatial heterogeneity of real estate market, some traditional research utilized Geographically Weighted Regression (GWR) to estimate the house price. However, its kernel function is non-linear, elusive, and complex to opt bandwidth, the predictive power could also be improved. Consequently, a novel technique, Geographical Neural Network Weighted Regression (GNNWR), has been applied to improve the accuracy of real estate appraisal with the help of neural networks. Based on Shenzhen house price dataset, this work conspicuously captures the weight distribution of different variants at Shenzhen real estate market, which GWR is difficult to materialize. Moreover, we focus on the performance of GNNWR, verify its robustness and superiority, refine the experiment process with 10-fold cross-validation, extend its application area from natural to socioeconomic geospatial data. It's a practical and trenchant way to assess house price, and we demonstrate the effectiveness of GNNWR on a complex socioeconomic dataset.