Koshi Eguchi

CL
3papers
14citations
Novelty50%
AI Score45

3 Papers

CLFeb 24
Steering at the Source: Style Modulation Heads for Robust Persona Control

Yoshihiro Izawa, Gouki Minegishi, Koshi Eguchi et al.

Activation steering offers a computationally efficient mechanism for controlling Large Language Models (LLMs) without fine-tuning. While effectively controlling target traits (e.g., persona), coherency degradation remains a major obstacle to safety and practical deployment. We hypothesize that this degradation stems from intervening on the residual stream, which indiscriminately affects aggregated features and inadvertently amplifies off-target noise. In this work, we identify a sparse subset of attention heads (only three heads) that independently govern persona and style formation, which we term Style Modulation Heads. Specifically, these heads can be localized via geometric analysis of internal representations, combining layer-wise cosine similarity and head-wise contribution scores. We demonstrate that intervention targeting only these specific heads achieves robust behavioral control while significantly mitigating the coherency degradation observed in residual stream steering. More broadly, our findings show that precise, component-level localization enables safer and more precise model control.

43.2NIMay 19
How Helpful is LLM Assistance in Network Operations? A Case Study at a Large Demonstration Network

Ryo Nakamura, Koshi Eguchi

This paper reports on a real-world case study in which over 100 network engineers assessed how a Large Language Model (LLM) can assist in building and operating a network. The versatility of LLMs has accelerated their adoption across a wide range of domains, and assisting network operations is one such promising application. LLMs are probabilistic models, unlike deterministic protocols and configurations; therefore, clarifying their capabilities -- how and to what extent LLMs can help in network operations -- is a crucial step toward adopting LLMs. To offer practical insights into this issue, we conducted an extensive experiment on a large demonstration network built for a public exhibition, consisting of 21 racks with heterogeneous network devices. In the experiment, a total of 105 network engineers used an LLM-based chatbot while building and operating the network. The chatbot was equipped with three external functions: retrieval-augmented generation for domain-specific knowledge, CLI control of network devices running on the network, and access to a ticket system. The participants gave evaluations for the chatbot's responses on a best-effort basis. Analysis of the chat histories shows that 68.1% of the evaluations were positive, indicating a quantitative baseline of the LLM's helpfulness in network operations. Our results also demonstrate that understanding the capabilities of the chatbot is important for eliciting better responses. Moreover, we provide detailed use case analyses while sharing actual user--chatbot interactions.

CLDec 13, 2025
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Yoav Gelberg, Koshi Eguchi, Takuya Akiba et al.

So far, expensive finetuning beyond the pretraining sequence length has been a requirement for effectively extending the context of language models (LM). In this work, we break this key bottleneck by Dropping the Positional Embeddings of LMs after training (DroPE). Our simple method is motivated by three key theoretical and empirical observations. First, positional embeddings (PEs) serve a crucial role during pretraining, providing an important inductive bias that significantly facilitates convergence. Second, over-reliance on this explicit positional information is also precisely what prevents test-time generalization to sequences of unseen length, even when using popular PE-scaling methods. Third, positional embeddings are not an inherent requirement of effective language modeling and can be safely removed after pretraining, following a short recalibration phase. Empirically, DroPE yields seamless zero-shot context extension without any long-context finetuning, quickly adapting pretrained LMs without compromising their capabilities in the original training context. Our findings hold across different models and dataset sizes, far outperforming previous specialized architectures and established rotary positional embedding scaling methods.