LGJun 26, 2023Code
Restart Sampling for Improving Generative ProcessesYilun Xu, Mingyang Deng, Xiang Cheng et al. · mit
Generative processes that involve solving differential equations, such as diffusion models, frequently necessitate balancing speed and quality. ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while stochasticity in SDE contracts accumulated errors. Based on these findings, we propose a novel sampling algorithm called Restart in order to better balance discretization errors and contraction. The sampling method alternates between adding substantial noise in additional forward steps and strictly following a backward ODE. Empirically, Restart sampler surpasses previous SDE and ODE samplers in both speed and accuracy. Restart not only outperforms the previous best SDE results, but also accelerates the sampling speed by 10-fold / 2-fold on CIFAR-10 / ImageNet $64 \times 64$. In addition, it attains significantly better sample quality than ODE samplers within comparable sampling times. Moreover, Restart better balances text-image alignment/visual quality versus diversity than previous samplers in the large-scale text-to-image Stable Diffusion model pre-trained on LAION $512 \times 512$. Code is available at https://github.com/Newbeeer/diffusion_restart_sampling
LGJun 1, 2023
Transformers learn to implement preconditioned gradient descent for in-context learningKwangjun Ahn, Xiang Cheng, Hadi Daneshmand et al.
Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient descent. Going beyond the question of expressivity, we ask: Can transformers learn to implement such algorithms by training over random problem instances? To our knowledge, we make the first theoretical progress on this question via an analysis of the loss landscape for linear transformers trained over random instances of linear regression. For a single attention layer, we prove the global minimum of the training objective implements a single iteration of preconditioned gradient descent. Notably, the preconditioning matrix not only adapts to the input distribution but also to the variance induced by data inadequacy. For a transformer with $L$ attention layers, we prove certain critical points of the training objective implement $L$ iterations of preconditioned gradient descent. Our results call for future theoretical studies on learning algorithms by training transformers.
LGOct 2, 2023
Linear attention is (maybe) all you need (to understand transformer optimization)Kwangjun Ahn, Xiang Cheng, Minhak Song et al.
Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical linearized shallow Transformer model. Specifically, we train linear Transformers to solve regression tasks, inspired by J.~von Oswald et al.~(ICML 2023), and K.~Ahn et al.~(NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of Transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized Transformer model could actually be a valuable, realistic abstraction for understanding Transformer optimization.
CROct 31, 2022
Unsafe's Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering via Machine LearningSangdon Park, Xiang Cheng, Taesoo Kim · gatech
Memory-safety bugs introduce critical software-security issues. Rust provides memory-safe mechanisms to avoid memory-safety bugs in programming, while still allowing unsafe escape hatches via unsafe code. However, the unsafe code that enhances the usability of Rust provides clear spots for finding memory-safety bugs in Rust source code. In this paper, we claim that these unsafe spots can still be identifiable in Rust binary code via machine learning and be leveraged for finding memory-safety bugs. To support our claim, we propose the tool textttrustspot, that enables reverse engineering to learn an unsafe classifier that proposes a list of functions in Rust binaries for downstream analysis. We empirically show that the function proposals by textttrustspot can recall $92.92\%$ of memory-safety bugs, while it covers only $16.79\%$ of the entire binary code. As an application, we demonstrate that the function proposals are used in targeted fuzzing on Rust packages, which contribute to reducing the fuzzing time compared to non-targeted fuzzing.
MLMay 26
Transformers Can Learn Posterior Predictive Distributions In-ContextGyeonghun Kang, Changwoo J. Lee, Xiang Cheng
Prior-data fitted networks (PFNs) have recently emerged as a powerful approach for Bayesian prediction tasks, approximating the posterior predictive distribution (PPD) through in-context learning. Despite their strong empirical performance and ability to go beyond point predictions, theoretical understandings of the algorithmic capability of transformers to learn distributions in context are still lacking. Focusing on Gaussian process regression problems, we show by construction that transformers can implement a gradient descent algorithm targeting the posterior predictive mean and variance, followed by nonlinear mappings that yield binned probabilities of PPD. We study the error bounds of the approximated PPD in terms of attention depth and bin resolution. Based on these results, we further demonstrate the key role of normalization and the choice of attention depth in enabling the extrapolation abilities of transformers beyond the pretraining sample size range. We conduct simulations that corroborate our findings, providing insight into the expressivity of PFNs targeting PPDs and how architectural choices may influence generalization capabilities.
SPAug 24, 2024
Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double DynamicsZonghui Yang, Shijian Gao, Xiang Cheng et al.
Integrated sensing and communication (ISAC) technology is vital for vehicular networks, yet the time-varying communication channels and rapid movement of targets present significant challenges for real-time precoding design. Traditional optimization-based methods are computationally complex and depend on perfect prior information, which is often unavailable in double-dynamic scenarios. In this paper, we propose a synesthesia of machine (SoM)-enhanced precoding paradigm that leverages modalities such as positioning and channel information to adapt to these dynamics. Utilizing a deep reinforcement learning (DRL) framework, our approach pushes ISAC performance boundaries. We also introduce a parameter-shared actor-critic architecture to accelerate training in complex state and action spaces. Extensive experiments validate the superiority of our method over existing approaches.
CLMay 24
GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel PlanningXiang Cheng, Yulan Hu, Lulu Zheng et al.
Travel planning is a realistic task for evaluating the planning and tool-use abilities of LLM agents. However, existing benchmarks typically assume only a single user, thereby avoiding one of the most challenging aspects of real-world scenarios: an agent's ability to identify and resolve conflicts among multiple users. To address this gap, we introduce \textbf{GroupTravelBench}, the first benchmark for \textbf{multi-user, multi-turn} travel planning. Based on real user profiles, POI data, and ticket price data, we synthesize 650 tasks and divide them into three difficulty levels. Beyond standard abilities in single-user itinerary planning, such as multi-step reasoning and tool use, our benchmark further evaluates three key capabilities required for travel agents: \emph{(i) elicitation} -- proactively engaging in multi-turn dialogue to gather preferences from each user; \emph{(ii) coordination} -- resolving conflicts among users through compromise or subgrouping strategies; and \emph{(iii) planning} -- searching for travel plans that maximize overall group utility while maintaining fairness and feasibility. To simulate real-world conversational itinerary planning while enabling reliable tool use and offline evaluation, we build an interactive sandbox environment with cached real-world tool data. We evaluate a wide range of LLMs and find that even frontier models still show substantial weaknesses in preference coverage and group fairness. \textit{GroupTravelBench} provides a practical and reproducible benchmark for advancing research on LLM agents for real-world travel planning.
LGJun 18, 2023
Fast Conditional Mixing of MCMC Algorithms for Non-log-concave DistributionsXiang Cheng, Bohan Wang, Jingzhao Zhang et al.
MCMC algorithms offer empirically efficient tools for sampling from a target distribution $π(x) \propto \exp(-V(x))$. However, on the theory side, MCMC algorithms suffer from slow mixing rate when $π(x)$ is non-log-concave. Our work examines this gap and shows that when Poincaré-style inequality holds on a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC iterates over $\mathcal{X}$ mixes fast to the true conditional distribution. This fast mixing guarantee can hold in cases when global mixing is provably slow. We formalize the statement and quantify the conditional mixing rate. We further show that conditional mixing can have interesting implications for sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture models and Gibbs-sampling with well-connected local minima.
CLApr 8Code
ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language UnderstandingXuanle Zhao, Xinyuan Cai, Xiang Cheng et al.
While Vision-Language Models (VLMs) have demonstrated significant potential in chemical visual understanding, current models are predominantly optimized for direct visual question-answering tasks. This paradigm often results in "black-box" systems that fail to utilize the inherent capability of Large Language Models (LLMs) to infer underlying reaction mechanisms. In this work, we introduce ChemVLR, a chemical VLM designed to prioritize reasoning within the perception process. Unlike conventional chemical VLMs, ChemVLR analyzes visual inputs in a fine-grained manner by explicitly identifying granular chemical descriptors, such as functional groups, prior to generating answers. This approach ensures the production of explicit and interpretable reasoning paths for complex visual chemical problems. To facilitate this methodology, we implement a cross-modality reverse-engineering strategy, combined with a rigorous filtering pipeline, to curate a large-scale reasoning-and-captioning dataset comprising 760k high-quality samples across molecular and reaction tasks. Furthermore, we adopt a three-stage training framework that systemically builds model perception and reasoning capacity. Experiments demonstrate that ChemVLR achieves state-of-the-art (SOTA) performance, surpassing both leading proprietary models and domain-specific open-source baselines. We also provide comprehensive ablation studies to validate our training strategy and data generation designs. Code and model weights will be available at https://github.com/xxlllz/ChemVLR.
LGFeb 23Code
Variational Trajectory Optimization of Anisotropic Diffusion SchedulesPengxi Liu, Zeyu Michael Li, Xiang Cheng
We introduce a variational framework for diffusion models with anisotropic noise schedules parameterized by a matrix-valued path $M_t(θ)$ that allocates noise across subspaces. Central to our framework is a trajectory-level objective that jointly trains the score network and learns $M_t(θ)$, which encompasses general parameterization classes of matrix-valued noise schedules. We further derive an estimator for the derivative with respect to $θ$ of the score that enables efficient optimization of the $M_t(θ)$ schedule. For inference, we develop an efficiently-implementable reverse-ODE solver that is an anisotropic generalization of the second-order Heun discretization algorithm. Across CIFAR-10, AFHQv2, FFHQ, and ImageNet-64, our method consistently improves upon the baseline EDM model in all NFE regimes. Code is available at https://github.com/lizeyu090312/anisotropic-diffusion-paper.
LGFeb 11Code
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM TrainingGuobin Shen, Chenxiao Zhao, Xiang Cheng et al.
Training stability remains a central challenge in reinforcement learning (RL) for large language models (LLMs). Policy staleness, asynchronous training, and mismatches between training and inference engines all cause the behavior policy to diverge from the current policy, risking training collapse. Importance sampling provides a principled correction for this distribution shift but suffers from high variance; existing remedies such as token-level clipping and sequence-level normalization lack a unified theoretical foundation. We propose Variational sEquence-level Soft Policy Optimization (VESPO). By incorporating variance reduction into a variational formulation over proposal distributions, VESPO derives a closed-form reshaping kernel that operates directly on sequence-level importance weights without length normalization. Experiments on mathematical reasoning benchmarks show that VESPO maintains stable training under staleness ratios up to 64x and fully asynchronous execution, and delivers consistent gains across both dense and Mixture-of-Experts models. Code is available at https://github.com/FloyedShen/VESPO
AIDec 31, 2025
AMAP Agentic Planning Technical ReportAMAP AI Agent Team, Yulan Hu, Xiangwen Zhang et al.
We present STAgent, an agentic large language model tailored for spatio-temporal understanding, designed to solve complex tasks such as constrained point-of-interest discovery and itinerary planning. STAgent is a specialized model capable of interacting with ten distinct tools within spatio-temporal scenarios, enabling it to explore, verify, and refine intermediate steps during complex reasoning. Notably, STAgent effectively preserves its general capabilities. We empower STAgent with these capabilities through three key contributions: (1) a stable tool environment that supports over ten domain-specific tools, enabling asynchronous rollout and training; (2) a hierarchical data curation framework that identifies high-quality data like a needle in a haystack, curating high-quality queries by retaining less than 1\% of the raw data, emphasizing both diversity and difficulty; and (3) a cascaded training recipe that starts with a seed SFT stage acting as a guardian to measure query difficulty, followed by a second SFT stage fine-tuned on queries with high certainty, and an ultimate RL stage that leverages data of low certainty. Initialized with Qwen3-30B-A3B to establish a strong SFT foundation and leverage insights into sample difficulty, STAgent yields promising performance on TravelBench while maintaining its general capabilities across a wide range of general benchmarks, thereby demonstrating the effectiveness of our proposed agentic model.
SPApr 28
SPAT: A Semantic Port-Aware Adaptive-Rate Transmission Protocol for Semantic CommunicationYunhao Wang, Shuai Ma, Bin Shen et al.
With the evolution of 6G, semantic communication has emerged as a promising paradigm by prioritizing the delivery of task-relevant meaning over strict bit-level correctness. However, existing transport mechanisms still rely on explicit port headers and bit-level validation, making them vulnerable to header corruption and the resulting packet loss. To address this issue, this paper proposes a Semantic Port-Aware Adaptive-Rate Transmission Protocol (SPAT) for semantic communication. The proposed framework jointly embeds source and destination port information into semantic representations, thereby reducing dependence on explicit port headers while enabling robust port-aware transmission. Furthermore, a differentiated semantic processing mechanism is developed for uplink and downlink scenarios, where port identification is introduced for uplink service recognition and destination-aware conditional gating is designed for downlink selective decoding. In addition, an adaptive-rate controller is incorporated to dynamically adjust the number of transmitted semantic channels according to channel conditions and feature importance, thereby improving both robustness and transmission efficiency. Experimental results on the AFHQ and ImageNet-10 datasets, together with real-world experimental measurements, demonstrate that SPAT consistently outperforms TCP, UDP, and SITP in reconstruction quality across different SNRs while maintaining low-latency transmission.
CLDec 30, 2024Code
DoTA: Weight-Decomposed Tensor Adaptation for Large Language ModelsXiaolin Hu, Xiang Cheng, Peiyu Liu et al.
Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices. However, low-rank approximation in two-dimensional space fails to capture high-dimensional structures within the target matrix. Recently, tensor decomposition methods have been explored for fine-tuning LLMs, leveraging their ability to extract structured information. Yet, these approaches primarily rely on random initialization, and the impact of initialization on tensor adaptation remains underexplored. In this paper, we reveal that random initialization significantly diverges from the validation loss achieved by full fine-tuning. To address this, we propose Weight-Decomposed Tensor Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights for effective initialization in fine-tuning LLMs. Additionally, we introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization. Experiments on commonsense and arithmetic reasoning tasks show that DoTA outperforms random initialization methods with fewer parameters. QDoTA further reduces memory consumption and achieves comparable performance to DoTA on commonsense reasoning tasks. We will release our code to support future research.
LGMay 12
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual InformationGuobin Shen, Xiang Cheng, Chenxiao Zhao et al.
On-policy self-distillation, where a student is pulled toward a copy of itself conditioned on privileged context (e.g., a verified solution or feedback), offers a promising direction for advancing reasoning capability without a stronger external teacher. Yet in math reasoning the gains are inconsistent, even when the same approach succeeds elsewhere. A pointwise mutual information analysis traces the failure to the privileged context itself: it inflates the teacher's confidence on tokens already implied by the solution (structural connectives, verifiable claims) and deflates it on deliberation tokens ("Wait", "Let", "Maybe") that drive multi-step search. We propose Anti-Self-Distillation (AntiSD), which ascends a divergence between student and teacher rather than descending it: this reverses the per-token sign and yields a naturally bounded advantage in one step. An entropy-triggered gate disables the term once the teacher entropy collapses, completing a drop-in replacement for default self-distillation. Across five models from 4B to 30B parameters on math reasoning benchmarks, AntiSD reaches the GRPO baseline's accuracy in 2 to 10x fewer training steps and improves final accuracy by up to 11.5 points. AntiSD opens a path to scalable self-improvement, where a language model bootstraps its own reasoning through its training signal.
LGMay 12
From Generic Correlation to Input-Specific Credit in On-Policy Self DistillationGuobin Shen, Lei Huang, Xiang Cheng et al.
On-policy self-distillation has emerged as a promising paradigm for post-training language models, in which the model conditions on environment feedback to serve as its own teacher, providing dense token-level rewards without external teacher models or step-level annotations. Despite its empirical success, what this reward actually measures and what kind of credit it assigns remain unclear. Under a posterior-compatibility interpretation of feedback conditioning, standard in the implicit-reward literature, we show that the self-distillation token reward is a Bayesian filtering increment whose trajectory sum is exactly the pointwise mutual information between the response and the feedback given the input. This pMI can be raised by input-specific reasoning or by input-generic shortcuts, so we further decompose the teacher log-probability along the input axis. Based on this analysis, we propose CREDIT (Contrastive REward from DIsTillation), which isolates the input-specific component with a batch-contrastive baseline. At the sequence level, CREDIT is a teacher-side surrogate for a contrastive pMI objective that also penalizes responses remaining likely under unrelated inputs. Across coding, scientific reasoning, and tool-use benchmarks on two model families, CREDIT delivers the strongest aggregate performance at negligible additional compute.
CVNov 9, 2025
TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction TasksXuanle Zhao, Shuxin Zeng, Yinyuan Cai et al.
While Vision Language Models (VLMs) have demonstrated remarkable capabilities in general visual understanding, their application in the chemical domain has been limited, with previous works predominantly focusing on text and thus overlooking critical visual information, such as molecular structures. Current approaches that directly adopt standard VLMs for chemical tasks suffer from two primary issues: (i) computational inefficiency of processing entire chemical images with non-informative backgrounds. (ii) a narrow scope on molecular-level tasks that restricts progress in chemical reasoning. In this work, we propose \textbf{TinyChemVL}, an efficient and powerful chemical VLM that leverages visual token reduction and reaction-level tasks to improve model efficiency and reasoning capacity. Also, we propose \textbf{ChemRxn-V}, a reaction-level benchmark for assessing vision-based reaction recognition and prediction tasks. Directly predicting reaction products from molecular images poses a non-trivial challenge, as it requires models to integrate both recognition and reasoning capacities. Our results demonstrate that with only 4B parameters, TinyChemVL achieves superior performance on both molecular and reaction tasks while demonstrating faster inference and training speeds compared to existing models. Notably, TinyChemVL outperforms ChemVLM while utilizing only 1/16th of the visual tokens. This work builds efficient yet powerful VLMs for chemical domains by co-designing model architecture and task complexity.
LGOct 22, 2024Code
Graph Transformers Dream of Electric FlowXiang Cheng, Lawrence Carin, Suvrit Sra
We show theoretically and empirically that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems such as electric flow and eigenvector decomposition. The Transformer has access to information on the input graph only via the graph's incidence matrix. We present explicit weight configurations for implementing each algorithm, and we bound the constructed Transformers' errors by the errors of the underlying algorithms. Our theoretical findings are corroborated by experiments on synthetic data. Additionally, on a real-world molecular regression task, we observe that the linear Transformer is capable of learning a more effective positional encoding than the default one based on Laplacian eigenvectors. Our work is an initial step towards elucidating the inner-workings of the Transformer for graph data. Code is available at https://github.com/chengxiang/LinearGraphTransformer
CLMar 4Code
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement LearningLei Huang, Xiang Cheng, Chenxiao Zhao et al.
Large language models (LLMs) typically receive diverse natural language (NL) feedback through interaction with the environment. However, current reinforcement learning (RL) algorithms rely solely on scalar rewards, leaving the rich information in NL feedback underutilized and leading to inefficient exploration. In this work, we propose GOLF, an RL framework that explicitly exploits group-level language feedback to guide targeted exploration through actionable refinements. GOLF aggregates two complementary feedback sources: (i) external critiques that pinpoint errors or propose targeted fixes, and (ii) intra-group attempts that supply alternative partial ideas and diverse failure patterns. These group-level feedbacks are aggregated to produce high-quality refinements, which are adaptively injected into training as off-policy scaffolds to provide targeted guidance in sparse-reward regions. Meanwhile, GOLF jointly optimizes generation and refinement within a unified RL loop, creating a virtuous cycle that continuously improves both capabilities. Experiments on both verifiable and non-verifiable benchmarks show that GOLF achieves superior performance and exploration efficiency, achieving 2.2$\times$ improvements in sample efficiency compared to RL methods trained solely on scalar rewards. Code is available at https://github.com/LuckyyySTA/GOLF.
LGMay 10
One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement LearningBowen He, Juncheng Dong, Lin Lin et al.
A central challenge in reinforcement learning (RL) is to learn models that generalize beyond the tasks on which they are trained, a goal traditionally pursued through multi-task and meta RL. Recently, transformer architectures have emerged as a promising approach, enabling adaptation to new tasks via in-context learning without explicit parameter updates. From a functional perspective, a transformer can be viewed as a functional operator that maps a context to a task-specific function. It is thus fundamental to understand and design this operator to support stronger generalization in RL. In this work, we address this resulting question of generalization from a kernel-based perspective by establishing a connection between non-linear transformers and kernel-based temporal difference learning. By interpreting the transformer as performing regression in a Reproducing Kernel Hilbert Space (RKHS), we show that value functions from different domains can be represented using a shared set of weights, provided they lie within the same RKHS. Experiments on multiple MetaWorld domains support this interpretation, demonstrating convergence of the temporal-difference objective.
AIOct 16, 2025Code
Towards Agentic Self-Learning LLMs in Search EnvironmentWangtao Sun, Xiang Cheng, Jialin Fan et al.
We study whether self-learning can scale LLM-based agents without relying on human-curated datasets or predefined rule-based rewards. Through controlled experiments in a search-agent setting, we identify two key determinants of scalable agent training: the source of reward signals and the scale of agent task data. We find that rewards from a Generative Reward Model (GRM) outperform rigid rule-based signals for open-domain learning, and that co-evolving the GRM with the policy further boosts performance. Increasing the volume of agent task data-even when synthetically generated-substantially enhances agentic capabilities. Building on these insights, we propose \textbf{Agentic Self-Learning} (ASL), a fully closed-loop, multi-role reinforcement learning framework that unifies task generation, policy execution, and evaluation within a shared tool environment and LLM backbone. ASL coordinates a Prompt Generator, a Policy Model, and a Generative Reward Model to form a virtuous cycle of harder task setting, sharper verification, and stronger solving. Empirically, ASL delivers steady, round-over-round gains, surpasses strong RLVR baselines (e.g., Search-R1) that plateau or degrade, and continues improving under zero-labeled-data conditions, indicating superior sample efficiency and robustness. We further show that GRM verification capacity is the main bottleneck: if frozen, it induces reward hacking and stalls progress; continual GRM training on the evolving data distribution mitigates this, and a small late-stage injection of real verification data raises the performance ceiling. This work establishes reward source and data scale as critical levers for open-domain agent learning and demonstrates the efficacy of multi-role co-evolution for scalable, self-improving agents. The data and code of this paper are released at https://github.com/forangel2014/Towards-Agentic-Self-Learning
CVJun 30, 2025Code
Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy RisksXian Zhang, Xiang Cheng
Objectives: The rapid advancement of Multimodal Large Language Models (MLLMs) has significantly enhanced their reasoning capabilities, enabling a wide range of intelligent applications. However, these advancements also raise critical concerns regarding privacy and ethics. MLLMs are now capable of inferring the geographic location of images -- such as those shared on social media or captured from street views -- based solely on visual content, thereby posing serious risks of privacy invasion, including doxxing, surveillance, and other security threats. Methods: This study provides a comprehensive analysis of existing geolocation techniques based on MLLMs. It systematically reviews relevant litera-ture and evaluates the performance of state-of-the-art visual reasoning models on geolocation tasks, particularly in identifying the origins of street view imagery. Results: Empirical evaluation reveals that the most advanced visual large models can successfully localize the origin of street-level imagery with up to $49\%$ accuracy within a 1-kilometer radius. This performance underscores the models' powerful capacity to extract and utilize fine-grained geographic cues from visual data. Conclusions: Building on these findings, the study identifies key visual elements that contribute to suc-cessful geolocation, such as text, architectural styles, and environmental features. Furthermore, it discusses the potential privacy implications associated with MLLM-enabled geolocation and discuss several technical and policy-based coun-termeasures to mitigate associated risks. Our code and dataset are available at https://github.com/zxyl1003/MLLM-Geolocation-Evaluation.
AIDec 27, 2025
TravelBench: A Broader Real-World Benchmark for Multi-Turn and Tool-Using Travel PlanningXiang Cheng, Yulan Hu, Xiangwen Zhang et al.
Travel planning is a natural real-world task to test large language models (LLMs) planning and tool-use abilities. Although prior work has studied LLM performance on travel planning, existing settings still differ from real-world needs, mainly due to limited domain coverage, insufficient modeling of users' implicit preferences in multi-turn conversations, and a lack of clear evaluation of agents' capability boundaries. To mitigate these gaps, we propose \textbf{TravelBench}, a benchmark for fully real-world travel planning. We collect user queries, user profile and tools from real scenarios, and construct three subtasks-Single-Turn, Multi-Turn, and Unsolvable-to evaluate agent's three core capabilities in real settings: (1) solving problems autonomously, (2) interacting with users over multiple turns to refine requirements, and (3) recognizing the limits of own abilities. To enable stable tool invocation and reproducible evaluation, we cache real tool-call results and build a sandbox environment that integrates ten travel-related tools. Agents can combine these tools to solve most practical travel planning problems, and our systematic verification demonstrates the stability of the proposed benchmark. We further evaluate multiple LLMs on TravelBench and conduct an in-depth analysis of their behaviors and performance. TravelBench provides a practical and reproducible evaluation benchmark to advance research on LLM agents for travel planning.\footnote{Our code and data will be available after internal review.
LGDec 11, 2023
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In ContextXiang Cheng, Yuxin Chen, Suvrit Sra
Many neural network architectures are known to be Turing Complete, and can thus, in principle implement arbitrary algorithms. However, Transformers are unique in that they can implement gradient-based learning algorithms under simple parameter configurations. This paper provides theoretical and empirical evidence that (non-linear) Transformers naturally learn to implement gradient descent in function space, which in turn enable them to learn non-linear functions in context. Our results apply to a broad class of combinations of non-linear architectures and non-linear in-context learning tasks. Additionally, we show that the optimal choice of non-linear activation depends in a natural way on the class of functions that need to be learned.
STFeb 15, 2024
Efficient Sampling on Riemannian Manifolds via Langevin MCMCXiang Cheng, Jingzhao Zhang, Suvrit Sra
We study the task of efficiently sampling from a Gibbs distribution $d π^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama scheme, assuming $\nabla h$ is Lipschitz and $M$ has bounded sectional curvature. Our error bound matches the error of Euclidean Euler-Murayama in terms of its stepsize dependence. Combined with a contraction guarantee for the geometric Langevin Diffusion under Kendall-Cranston coupling, we prove that the Langevin MCMC iterates lie within $ε$-Wasserstein distance of $π^*$ after $\tilde{O}(ε^{-2})$ steps, which matches the iteration complexity for Euclidean Langevin MCMC. Our results apply in general settings where $h$ can be nonconvex and $M$ can have negative Ricci curvature. Under additional assumptions that the Riemannian curvature tensor has bounded derivatives, and that $π^*$ satisfies a $CD(\cdot,\infty)$ condition, we analyze the stochastic gradient version of Langevin MCMC, and bound its iteration complexity by $\tilde{O}(ε^{-2})$ as well.
SPMay 23, 2024
Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) ApproachZonghui Yang, Shijian Gao, Xiang Cheng
Integrated sensing and communication (ISAC) technology is essential for supporting vehicular networks. However, the communication channel in this scenario exhibits time variations, and the potential targets may move rapidly, resulting in double dynamics. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, they are complex and heavily rely on perfect channel-related information, which is impractical in double dynamics. To address this challenge, we propose using constrained deep reinforcement learning to facilitate dynamic updates to the ISAC precoder. Additionally, the primal dual-deep deterministic policy gradient and Wolpertinger architecture are tailored to efficiently train the algorithm under complex constraints and varying numbers of users. The proposed scheme not only adapts to the dynamics based on observations but also leverages environmental information to enhance performance and reduce complexity. Its superiority over existing candidates has been validated through experiments.
CLJun 17, 2025
Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shotXiang Cheng, Chengyan Pan, Minjun Zhao et al.
In-Context Learning (ICL) is an essential emergent ability of Large Language Models (LLMs), and recent studies introduce Chain-of-Thought (CoT) to exemplars of ICL to enhance the reasoning capability, especially in mathematics tasks. However, given the continuous advancement of model capabilities, it remains unclear whether CoT exemplars still benefit recent, stronger models in such tasks. Through systematic experiments, we find that for recent strong models such as the Qwen2.5 series, adding traditional CoT exemplars does not improve reasoning performance compared to Zero-Shot CoT. Instead, their primary function is to align the output format with human expectations. We further investigate the effectiveness of enhanced CoT exemplars, constructed using answers from advanced models such as \texttt{Qwen2.5-Max} and \texttt{DeepSeek-R1}. Experimental results indicate that these enhanced exemplars still fail to improve the model's reasoning performance. Further analysis reveals that models tend to ignore the exemplars and focus primarily on the instructions, leading to no observable gain in reasoning ability. Overall, our findings highlight the limitations of the current ICL+CoT framework in mathematical reasoning, calling for a re-examination of the ICL paradigm and the definition of exemplars.
LGMar 28, 2025
Probabilistic Uncertain Reward ModelWangtao Sun, Xiang Cheng, Xing Yu et al.
Reinforcement learning from human feedback (RLHF) is a critical technique for training large language models. However, conventional reward models based on the Bradley-Terry model (BTRM) often suffer from overconfidence when faced with inconsistent labels or out-of-distribution samples, leading to reward hacking, where the policy model blindly optimizes for proxy rewards while degrading true performance. This paper proposes the Probabilistic Uncertain Reward Model (PURM), which generalizes the Bradley-Terry model to learn the reward distributions that emerged from the preference data. We theoretically derive the loss function of PURM and introduce a novel method that uses the overlap between distributions to quantify uncertainty. Empirical results show that PURM outperforms existing methods with more accurate reward and sound uncertainty estimations, and sustains effective learning for more optimization steps and obtain higher maximum win rate in RLHF. The data and code of this paper are released at https://anonymous.4open.science/r/Probabilistic-Uncertain-Reward-Model/
AINov 27, 2024
Dependency-Aware CAV Task Scheduling via Diffusion-Based Reinforcement LearningXiang Cheng, Zhi Mao, Ying Wang et al.
In this paper, we propose a novel dependency-aware task scheduling strategy for dynamic unmanned aerial vehicle-assisted connected autonomous vehicles (CAVs). Specifically, different computation tasks of CAVs consisting of multiple dependency subtasks are judiciously assigned to nearby CAVs or the base station for promptly completing tasks. Therefore, we formulate a joint scheduling priority and subtask assignment optimization problem with the objective of minimizing the average task completion time. The problem aims at improving the long-term system performance, which is reformulated as a Markov decision process. To solve the problem, we further propose a diffusion-based reinforcement learning algorithm, named Synthetic DDQN based Subtasks Scheduling, which can make adaptive task scheduling decision in real time. A diffusion model-based synthetic experience replay is integrated into the reinforcement learning framework, which can generate sufficient synthetic data in experience replay buffer, thereby significantly accelerating convergence and improving sample efficiency. Simulation results demonstrate the effectiveness of the proposed algorithm on reducing task completion time, comparing to benchmark schemes.
LGDec 17, 2025
In-Context Semi-Supervised LearningJiashuo Fan, Paul Rosu, Aaron T. Wang et al.
There has been significant recent interest in understanding the capacity of Transformers for in-context learning (ICL), yet most theory focuses on supervised settings with explicitly labeled pairs. In practice, Transformers often perform well even when labels are sparse or absent, suggesting crucial structure within unlabeled contextual demonstrations. We introduce and study in-context semi-supervised learning (IC-SSL), where a small set of labeled examples is accompanied by many unlabeled points, and show that Transformers can leverage the unlabeled context to learn a robust, context-dependent representation. This representation enables accurate predictions and markedly improves performance in low-label regimes, offering foundational insights into how Transformers exploit unlabeled context for representation learning within the ICL framework.
LGOct 28, 2025
LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion PoliciesXiman Sun, Xiang Cheng
Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the unconditional mean, at the LRT-gated mean, or a blend, exposing a continuum from exploitation to conservatism. We standardize states and actions consistently at train and test time and report a state-conditional out-of-distribution (OOD) metric alongside return. On D4RL MuJoCo tasks, LRT-Diffusion improves the return-OOD trade-off over strong Q-guided baselines in our implementation while honoring the desired alpha. Theoretically, we establish level-alpha calibration, concise stability bounds, and a return comparison showing when LRT surpasses Q-guidance-especially when off-support errors dominate. Overall, LRT-Diffusion is a drop-in, inference-time method that adds principled, calibrated risk control to diffusion policies for offline RL.
SPSep 2, 2025
Synesthesia of Machines (SoM)-Based Task-Driven MIMO System for Image TransmissionSijiang Li, Rongqing Zhang, Xiang Cheng et al.
To support cooperative perception (CP) of networked mobile agents in dynamic scenarios, the efficient and robust transmission of sensory data is a critical challenge. Deep learning-based joint source-channel coding (JSCC) has demonstrated promising results for image transmission under adverse channel conditions, outperforming traditional rule-based codecs. While recent works have explored to combine JSCC with the widely adopted multiple-input multiple-output (MIMO) technology, these approaches are still limited to the discrete-time analog transmission (DTAT) model and simple tasks. Given the limited performance of existing MIMO JSCC schemes in supporting complex CP tasks for networked mobile agents with digital MIMO communication systems, this paper presents a Synesthesia of Machines (SoM)-based task-driven MIMO system for image transmission, referred to as SoM-MIMO. By leveraging the structural properties of the feature pyramid for perceptual tasks and the channel properties of the closed-loop MIMO communication system, SoM-MIMO enables efficient and robust digital MIMO transmission of images. Experimental results have shown that compared with two JSCC baseline schemes, our approach achieves average mAP improvements of 6.30 and 10.48 across all SNR levels, while maintaining identical communication overhead.
SPJun 15, 2025
Synesthesia of Machines (SoM)-Enhanced Sub-THz ISAC Transmission for Air-Ground NetworkZonghui Yang, Shijian Gao, Xiang Cheng et al.
Integrated sensing and communication (ISAC) within sub-THz frequencies is crucial for future air-ground networks, but unique propagation characteristics and hardware limitations present challenges in optimizing ISAC performance while increasing operational latency. This paper introduces a multi-modal sensing fusion framework inspired by synesthesia of machine (SoM) to enhance sub-THz ISAC transmission. By exploiting inherent degrees of freedom in sub-THz hardware and channels, the framework optimizes the radio-frequency environment. Squint-aware beam management is developed to improve air-ground network adaptability, enabling three-dimensional dynamic ISAC links. Leveraging multi-modal information, the framework enhances ISAC performance and reduces latency. Visual data rapidly localizes users and targets, while a customized multi-modal learning algorithm optimizes the hybrid precoder. A new metric provides comprehensive performance evaluation, and extensive experiments demonstrate that the proposed scheme significantly improves ISAC efficiency.
QMApr 27, 2025
Heterogeneous network drug-target interaction prediction model based on graph wavelet transform and multi-level contrastive learningWenfeng Dai, Yanhong Wang, Shuai Yan et al.
Drug-target interaction (DTI) prediction is a core task in drug development and precision medicine in the biomedical field. However, traditional machine learning methods generally have the black box problem, which makes it difficult to reveal the deep correlation between the model decision mechanism and the interaction pattern between biological molecules. This study proposes a heterogeneous network drug target interaction prediction framework, integrating graph neural network and multi scale signal processing technology to construct a model with both efficient prediction and multi level interpretability. Its technical breakthroughs are mainly reflected in the following three dimensions:Local global feature collaborative perception module. Based on heterogeneous graph convolutional neural network (HGCN), a multi order neighbor aggregation strategy is designed.Multi scale graph signal decomposition and biological interpretation module. A deep hierarchical node feature transform (GWT) architecture is proposed.Contrastive learning combining multi dimensional perspectives and hierarchical representations. By comparing the learning models, the node representations from the two perspectives of HGCN and GWT are aligned and fused, so that the model can integrate multi dimensional information and improve the prediction robustness. Experimental results show that our framework shows excellent prediction performance on all datasets. This study provides a complete solution for drug target discovery from black box prediction to mechanism decoding, and its methodology has important reference value for modeling complex biomolecular interaction systems.
LGApr 27, 2025
HyboWaveNet: Hyperbolic Graph Neural Networks with Multi-Scale Wavelet Transform for Protein-Protein Interaction PredictionQingzhi Yu, Shuai Yan, Wenfeng Dai et al.
Protein-protein interactions (PPIs) are fundamental for deciphering cellular functions,disease pathways,and drug discovery.Although existing neural networks and machine learning methods have achieved high accuracy in PPI prediction,their black-box nature leads to a lack of causal interpretation of the prediction results and difficulty in capturing hierarchical geometries and multi-scale dynamic interaction patterns among proteins.To address these challenges, we propose HyboWaveNet,a novel deep learning framework that collaborates with hyperbolic graphical neural networks (HGNNs) and multiscale graphical wavelet transform for robust PPI prediction. Mapping protein features to Lorentz space simulates hierarchical topological relationships among biomolecules via a hyperbolic distance metric,enabling node feature representations that better fit biological a priori.HyboWaveNet inherently simulates hierarchical and scale-free biological relationships, while the integration of wavelet transforms enables adaptive extraction of local and global interaction features across different resolutions. Our framework generates node feature representations via a graph neural network under the Lorenz model and generates pairs of positive samples under multiple different views for comparative learning, followed by further feature extraction via multi-scale graph wavelet transforms to predict potential PPIs. Experiments on public datasets show that HyboWaveNet improves over both existing state-of-the-art methods. We also demonstrate through ablation experimental studies that the multi-scale graph wavelet transform module improves the predictive performance and generalization ability of HyboWaveNet. This work links geometric deep learning and signal processing to advance PPI prediction, providing a principled approach for analyzing complex biological systems
CLDec 19, 2024
To Err Is Human; To Annotate, SILICON? Reducing Measurement Error in LLM AnnotationXiang Cheng, Raveesh Mayya, João Sedoc
Unstructured text data annotation is foundational to management research and Large Language Models (LLMs) promise a cost-effective and scalable alternative to human annotation. The validity of insights drawn from LLM annotated data critically depends on minimizing the discrepancy between LLM assigned labels and the unobserved ground truth, as well as ensuring long-term reproducibility of results. We address the gap in the literature on LLM annotation by decomposing measurement error in LLM-based text annotation into four distinct sources: (1) guideline-induced error from inconsistent annotation criteria, (2) baseline-induced error from unreliable human reference standards, (3) prompt-induced error from suboptimal meta-instruction formatting, and (4) model-induced error from architectural differences across LLMs. We develop the SILICON methodology to systematically reduce measurement error from LLM annotation in all four sources above. Empirical validation across seven management research cases shows iteratively refined guidelines substantially increases the LLM-human agreement compared to one-shot guidelines; expert-generated baselines exhibit higher inter-annotator agreement as well as are less prone to producing misleading LLM-human agreement estimates compared to crowdsourced baselines; placing content in the system prompt reduces prompt-induced error; and model performance varies substantially across tasks. To further reduce error, we introduce a cost-effective multi-LLM labeling method, where only low-confidence items receive additional labels from alternative models. Finally, in addressing closed source model retirement cycles, we introduce an intuitive regression-based methodology to establish robust reproducibility protocols. Our evidence indicates that reducing each error source is necessary, and that SILICON supports reproducible, rigorous annotation in management research.
OCMay 22, 2024
Riemannian Bilevel OptimizationSanchayan Dutta, Xiang Cheng, Suvrit Sra
We develop new algorithms for Riemannian bilevel optimization. We focus in particular on batch and stochastic gradient-based methods, with the explicit goal of avoiding second-order information such as Riemannian hyper-gradients. We propose and analyze $\mathrm{RF^2SA}$, a method that leverages first-order gradient information to navigate the complex geometry of Riemannian manifolds efficiently. Notably, $\mathrm{RF^2SA}$ is a single-loop algorithm, and thus easier to implement and use. Under various setups, including stochastic optimization, we provide explicit convergence rates for reaching $ε$-stationary points. We also address the challenge of optimizing over Riemannian manifolds with constraints by adjusting the multiplier in the Lagrangian, ensuring convergence to the desired solution without requiring access to second-order derivatives.
AIJun 15, 2021
Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement LearningDuzhen Zhang, Tielin Zhang, Shuncheng Jia et al.
With the Deep Neural Networks (DNNs) as a powerful function approximator, Deep Reinforcement Learning (DRL) has been excellently demonstrated on robotic control tasks. Compared to DNNs with vanilla artificial neurons, the biologically plausible Spiking Neural Network (SNN) contains a diverse population of spiking neurons, making it naturally powerful on state representation with spatial and temporal information. Based on a hybrid learning framework, where a spike actor-network infers actions from states and a deep critic network evaluates the actor, we propose a Population-coding and Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state representation from two different scales: input coding and neuronal coding. For input coding, we apply population coding with dynamically receptive fields to directly encode each input state component. For neuronal coding, we propose different types of dynamic-neurons (containing 1st-order and 2nd-order neuronal dynamics) to describe much more complex neuronal dynamics. Finally, the PDSAN is trained in conjunction with deep critic networks using the Twin Delayed Deep Deterministic policy gradient algorithm (TD3-PDSAN). Extensive experimental results show that our TD3-PDSAN model achieves better performance than state-of-the-art models on four OpenAI gym benchmark tasks. It is an important attempt to improve RL with SNN towards the effective computation satisfying biological plausibility.
STDec 23, 2020
Optimal dimension dependence of the Metropolis-Adjusted Langevin AlgorithmSinho Chewi, Chen Lu, Kwangjun Ahn et al.
Conventional wisdom in the sampling literature, backed by a popular diffusion scaling limit, suggests that the mixing time of the Metropolis-Adjusted Langevin Algorithm (MALA) scales as $O(d^{1/3})$, where $d$ is the dimension. However, the diffusion scaling limit requires stringent assumptions on the target distribution and is asymptotic in nature. In contrast, the best known non-asymptotic mixing time bound for MALA on the class of log-smooth and strongly log-concave distributions is $O(d)$. In this work, we establish that the mixing time of MALA on this class of target distributions is $\widetildeΘ(d^{1/2})$ under a warm start. Our upper bound proof introduces a new technique based on a projection characterization of the Metropolis adjustment which reduces the study of MALA to the well-studied discretization analysis of the Langevin SDE and bypasses direct computation of the acceptance probability.
CRDec 17, 2020
KHOVID: Interoperable Privacy Preserving Digital Contact TracingXiang Cheng, Hanchao Yang, Archanaa S Krishnan et al.
During a pandemic, contact tracing is an essential tool to drive down the infection rate within a population. To accelerate the laborious manual contact tracing process, digital contact tracing (DCT) tools can track contact events transparently and privately by using the sensing and signaling capabilities of the ubiquitous cell phone. However, an effective DCT must not only preserve user privacy but also augment the existing manual contact tracing process. Indeed, not every member of a population may own a cell phone or have a DCT app installed and enabled. We present KHOVID to fulfill the combined goal of manual contact-tracing interoperability and DCT user privacy. At KHOVID's core is a privacy-friendly mechanism to encode user trajectories using geolocation data. Manual contact tracing data can be integrated through the same geolocation format. The accuracy of the geolocation data from DCT is improved using Bluetooth proximity detection, and we propose a novel method to encode Bluetooth ephemeral IDs. This contribution describes the detailed design of KHOVID; presents a prototype implementation including an app and server software; and presents a validation based on simulation and field experiments. We also compare the strengths of KHOVID with other, earlier proposals of DCT.
CLDec 11, 2020
An End-to-End Solution for Named Entity Recognition in eCommerce SearchXiang Cheng, Mitchell Bowden, Bhushan Ramesh Bhange et al.
Named entity recognition (NER) is a critical step in modern search query understanding. In the domain of eCommerce, identifying the key entities, such as brand and product type, can help a search engine retrieve relevant products and therefore offer an engaging shopping experience. Recent research shows promising results on shared benchmark NER tasks using deep learning methods, but there are still unique challenges in the industry regarding domain knowledge, training data, and model production. This paper demonstrates an end-to-end solution to address these challenges. The core of our solution is a novel model training framework "TripleLearn" which iteratively learns from three separate training datasets, instead of one training set as is traditionally done. Using this approach, the best model lifts the F1 score from 69.5 to 93.3 on the holdout test data. In our offline experiments, TripleLearn improved the model performance compared to traditional training approaches which use a single set of training data. Moreover, in the online A/B test, we see significant improvements in user engagement and revenue conversion. The model has been live on homedepot.com for more than 9 months, boosting search conversions and revenue. Beyond our application, this TripleLearn framework, as well as the end-to-end process, is model-independent and problem-independent, so it can be generalized to more industrial applications, especially to the eCommerce industry which has similar data foundations and problems.
AIOct 10, 2020
Distilling a Deep Neural Network into a Takagi-Sugeno-Kang Fuzzy Inference SystemXiangming Gu, Xiang Cheng
Deep neural networks (DNNs) demonstrate great success in classification tasks. However, they act as black boxes and we don't know how they make decisions in a particular classification task. To this end, we propose to distill the knowledge from a DNN into a fuzzy inference system (FIS), which is Takagi-Sugeno-Kang (TSK)-type in this paper. The model has the capability to express the knowledge acquired by a DNN based on fuzzy rules, thus explaining a particular decision much easier. Knowledge distillation (KD) is applied to create a TSK-type FIS that generalizes better than one directly from the training data, which is guaranteed through experiments in this paper. To further improve the performances, we modify the baseline method of KD and obtain good results.
NEOct 9, 2020
Tuning Convolutional Spiking Neural Network with Biologically-plausible Reward PropagationTielin Zhang, Shuncheng Jia, Xiang Cheng et al.
Spiking Neural Networks (SNNs) contain more biologically realistic structures and biologically-inspired learning principles than those in standard Artificial Neural Networks (ANNs). SNNs are considered the third generation of ANNs, powerful on the robust computation with a low computational cost. The neurons in SNNs are non-differential, containing decayed historical states and generating event-based spikes after their states reaching the firing threshold. These dynamic characteristics of SNNs make it difficult to be directly trained with the standard backpropagation (BP), which is also considered not biologically plausible. In this paper, a Biologically-plausible Reward Propagation (BRP) algorithm is proposed and applied to the SNN architecture with both spiking-convolution (with both 1D and 2D convolutional kernels) and full-connection layers. Unlike the standard BP that propagates error signals from post to presynaptic neurons layer by layer, the BRP propagates target labels instead of errors directly from the output layer to all pre-hidden layers. This effort is more consistent with the top-down reward-guiding learning in cortical columns of the neocortex. Synaptic modifications with only local gradient differences are induced with pseudo-BP that might also be replaced with the Spike-Timing Dependent Plasticity (STDP). The performance of the proposed BRP-SNN is further verified on the spatial (including MNIST and Cifar-10) and temporal (including TIDigits and DvsGesture) tasks, where the SNN using BRP has reached a similar accuracy compared to other state-of-the-art BP-based SNNs and saved 50% more computational cost than ANNs. We think the introduction of biologically plausible learning rules to the training procedure of biologically realistic SNNs will give us more hints and inspirations toward a better understanding of the biological system's intelligent nature.
NEOct 7, 2020
Finite Meta-Dynamic Neurons in Spiking Neural Networks for Spatio-temporal LearningXiang Cheng, Tielin Zhang, Shuncheng Jia et al.
Spiking Neural Networks (SNNs) have incorporated more biologically-plausible structures and learning principles, hence are playing critical roles in bridging the gap between artificial and natural neural networks. The spikes are the sparse signals describing the above-threshold event-based firing and under-threshold dynamic computation of membrane potentials, which give us an alternative uniformed and efficient way on both information representation and computation. Inspired from the biological network, where a finite number of meta neurons integrated together for various of cognitive functions, we proposed and constructed Meta-Dynamic Neurons (MDN) to improve SNNs for a better network generalization during spatio-temporal learning. The MDNs are designed with basic neuronal dynamics containing 1st-order and 2nd-order dynamics of membrane potentials, including the spatial and temporal meta types supported by some hyper-parameters. The MDNs generated from a spatial (MNIST) and a temporal (TIDigits) datasets first, and then extended to various other different spatio-temporal tasks (including Fashion-MNIST, NETtalk, Cifar-10, TIMIT and N-MNIST). The comparable accuracy was reached compared to other SOTA SNN algorithms, and a better generalization was also achieved by SNNs using MDNs than that without using MDNs.
CRSep 14, 2020
Answering Multi-Dimensional Range Queries under Local Differential PrivacyJianyu Yang, Tianhao Wang, Ninghui Li et al.
In this paper, we tackle the problem of answering multi-dimensional range queries under local differential privacy. There are three key technical challenges: capturing the correlations among attributes, avoiding the curse of dimensionality, and dealing with the large domains of attributes. None of the existing approaches satisfactorily deals with all three challenges. Overcoming these three challenges, we first propose an approach called Two-Dimensional Grids (TDG). Its main idea is to carefully use binning to partition the two-dimensional (2-D) domains of all attribute pairs into 2-D grids that can answer all 2-D range queries and then estimate the answer of a higher dimensional range query from the answers of the associated 2-D range queries. However, in order to reduce errors due to noises, coarse granularities are needed for each attribute in 2-D grids, losing fine-grained distribution information for individual attributes. To correct this deficiency, we further propose Hybrid-Dimensional Grids (HDG), which also introduces 1-D grids to capture finer-grained information on distribution of each individual attribute and combines information from 1-D and 2-D grids to answer range queries. To make HDG consistently effective, we provide a guideline for properly choosing granularities of grids based on an analysis of how different sources of errors are impacted by these choices. Extensive experiments conducted on real and synthetic datasets show that HDG can give a significant improvement over the existing approaches.
CLDec 4, 2019
Cooperative Reasoning on Knowledge Graph and Corpus: A Multi-agentReinforcement Learning ApproachYunan Zhang, Xiang Cheng, Heting Gao et al.
Knowledge-graph-based reasoning has drawn a lot of attention due to its interpretability. However, previous methods suffer from the incompleteness of the knowledge graph, namely the interested link or entity that can be missing in the knowledge graph(explicit missing). Also, most previous models assume the distance between the target and source entity is short, which is not true on a real-world KG like Freebase(implicit missing). The sensitivity to the incompleteness of KG and the incapability to capture the long-distance link between entities have limited the performance of these models on large KG. In this paper, we propose a model that leverages the text corpus to cure such limitations, either the explicit or implicit missing links. We model the question answering on KG as a cooperative task between two agents, a knowledge graph reasoning agent and an information extraction agent. Each agent learns its skill to complete its own task, hopping on KG or select knowledge from the corpus, via maximizing the reward for correctly answering the question. The reasoning agent decides how to find an equivalent path for the given entity and relation. The extraction agent provide shortcut for long-distance target entity or provide missing relations for explicit missing links with messages from the reasoning agent. Through such cooperative reward design, our model can augment the incomplete KG strategically while not introduce much unnecessary noise that could enlarge the search space and lower the performance.
CLNov 11, 2019
Learning to Order Sub-questions for Complex Question AnsweringYunan Zhang, Xiang Cheng, Yufeng Zhang et al.
Answering complex questions involving multiple entities and relations is a challenging task. Logically, the answer to a complex question should be derived by decomposing the complex question into multiple simple sub-questions and then answering those sub-questions. Existing work has followed this strategy but has not attempted to optimize the order of how those sub-questions are answered. As a result, the sub-questions are answered in an arbitrary order, leading to larger search space and a higher risk of missing an answer. In this paper, we propose a novel reinforcement learning(RL) approach to answering complex questions that can learn a policy to dynamically decide which sub-question should be answered at each stage of reasoning. We lever-age the expected value-variance criterion to enable the learned policy to balance between the risk and utility of answering a sub-question. Experiment results show that the RL approach can substantially improve the optimality of ordering the sub-questions, leading to improved accuracy of question answering. The proposed method for learning to order sub-questions is general and can thus be potentially combined with many existing ideas for answering complex questions to enhance their performance.
LGJul 7, 2019
Stochastic Gradient and Langevin ProcessesXiang Cheng, Dong Yin, Peter L. Bartlett et al.
We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second moment of the additive noise. We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset.
MLFeb 4, 2019
Is There an Analog of Nesterov Acceleration for MCMC?Yi-An Ma, Niladri Chatterji, Xiang Cheng et al.
We formulate gradient-based Markov chain Monte Carlo (MCMC) sampling as optimization on the space of probability measures, with Kullback-Leibler (KL) divergence as the objective functional. We show that an underdamped form of the Langevin algorithm performs accelerated gradient descent in this metric. To characterize the convergence of the algorithm, we construct a Lyapunov functional and exploit hypocoercivity of the underdamped Langevin algorithm. As an application, we show that accelerated rates can be obtained for a class of nonconvex functions with the Langevin algorithm.
STFeb 3, 2019
Quantitative Weak Convergence for Discrete Stochastic ProcessesXiang Cheng, Peter L. Bartlett, Michael I. Jordan
In this paper, we quantitative convergence in $W_2$ for a family of Langevin-like stochastic processes that includes stochastic gradient descent and related gradient-based algorithms. Under certain regularity assumptions, we show that the iterates of these stochastic processes converge to an invariant distribution at a rate of $\tilde{O}\lrp{1/\sqrt{k}}$ where $k$ is the number of steps; this rate is provably tight up to log factors. Our result reduces to a quantitative form of the classical Central Limit Theorem in the special case when the potential is quadratic.