Kehu Yang

h-index71

3papers

21,319citations

3 Papers

NIJun 26

Hermes: A General-Purpose Proxy-Enabled Networking Architecture

Behrooz Farkiani, Fan Liu, Ke Yang et al.

We introduce Hermes, a general-purpose networking architecture that aims to improve service delivery over the Internet. Hermes delegates networking responsibilities from applications and services to proxies and is designed as a portable, adaptable solution to four fundamental challenges of efficient service delivery over the Internet: end-to-end traffic management, backward compatibility, data-plane security and privacy models, and adaptable communication layers. The design centers on an overlay of reconfigurable proxies and HTTP tunneling and proxying techniques, utilizing assisting components to extend proxy functionality when needed. Through prototyping and emulation, we demonstrate that Hermes improves key performance metrics across multiple use cases: it provides backward compatibility through protocol translation and tunneling, improves reliability by delegating retry logic to proxies, enables unified policy-based Layer 3 routing across network segments, and serves as an efficient substrate for future architectures like NDN, facilitating their operation over the Internet. Beyond evaluating Hermes across various use cases, we measured the overhead of Hermes' HTTP tunneling and proxying mechanisms and found it to be modest, typically under 2 ms per proxy pair traversal in an isolated collocated setup. Although the HTTP proxying and tunneling techniques used by Hermes increase single-connection processing overhead, we also show that, with up to 1,000 concurrent requests, proxies can amortize connection setup time and reduce end-to-end latency by utilizing connection pooling and multiplexing.

5.8AIAug 7, 2025Code

InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

Shuo Cai, Su Lu, Qi Zhou et al.

Large language models (LLMs) have exhibited impressive reasoning abilities on a wide range of complex tasks. However, enhancing these capabilities through post-training remains resource intensive, particularly in terms of data and computational cost. Although recent efforts have sought to improve sample efficiency through selective data curation, existing methods often rely on heuristic or task-specific strategies that hinder scalability. In this work, we introduce InfiAlign, a scalable and sample-efficient post-training framework that integrates supervised fine-tuning (SFT) with Direct Preference Optimization (DPO) to align LLMs for enhanced reasoning. At the core of InfiAlign is a robust data selection pipeline that automatically curates high-quality alignment data from open-source reasoning datasets using multidimensional quality metrics. This pipeline enables significant performance gains while drastically reducing data requirements and remains extensible to new data sources. When applied to the Qwen2.5-Math-7B-Base model, our SFT model achieves performance on par with DeepSeek-R1-Distill-Qwen-7B, while using only approximately 12% of the training data, and demonstrates strong generalization across diverse reasoning tasks. Additional improvements are obtained through the application of DPO, with particularly notable gains in mathematical reasoning tasks. The model achieves an average improvement of 3.89% on AIME 24/25 benchmarks. Our results highlight the effectiveness of combining principled data selection with full-stage post-training, offering a practical solution for aligning large reasoning models in a scalable and data-efficient manner. The model checkpoints are available at https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-SFT.

3.6IRMay 19, 2025Code

JIR-Arena: The First Benchmark Dataset for Just-in-time Information Recommendation

Ke Yang, Kevin Ros, Shankar Kumar Senthil Kumar et al.

Just-in-time Information Recommendation (JIR) is a service designed to deliver the most relevant information precisely when users need it, , addressing their knowledge gaps with minimal effort and boosting decision-making and efficiency in daily life. Advances in device-efficient deployment of foundation models and the growing use of intelligent wearable devices have made always-on JIR assistants feasible. However, there has been no systematic effort to formally define JIR tasks or establish evaluation frameworks. To bridge this gap, we present the first mathematical definition of JIR tasks and associated evaluation metrics. Additionally, we introduce JIR-Arena, a multimodal benchmark dataset featuring diverse, information-request-intensive scenarios to evaluate JIR systems across critical dimensions: i) accurately inferring user information needs, ii) delivering timely and relevant recommendations, and iii) avoiding irrelevant content that may distract users. Developing a JIR benchmark dataset poses challenges due to subjectivity in estimating user information needs and uncontrollable system variables affecting reproducibility. To address these, JIR-Arena: i) combines input from multiple humans and large AI models to approximate information need distributions; ii) assesses JIR quality through information retrieval outcomes using static knowledge base snapshots; and iii) employs a multi-turn, multi-entity validation framework to improve objectivity and generality. Furthermore, we implement a baseline JIR system capable of processing real-time information streams aligned with user inputs. Our evaluation of this baseline system on JIR-Arena indicates that while foundation model-based JIR systems simulate user needs with reasonable precision, they face challenges in recall and effective content retrieval. To support future research in this new area, we fully release our code and data.