AIMay 27
The Shape of Overthinking: Backtracking Bursts in Long Reasoning TracesNavid Rezazadeh, Arash Gholami Davoodi · cmu
Reasoning models often generate long traces in which useful self-correction and unproductive revision are hard to distinguish. We study this distinction through backtracking dynamics: local reconsideration, retraction, or re-derivation inside long-form reasoning traces. On 6{,}000 Qwen3-8B AIME traces, we annotate segment-level backtrack severity and analyze event timing, normalized depth, and local burst structure. We find that early isolated repair is often compatible with correct reasoning, whereas incorrect traces more often show moderate-to-severe backtracks that persist and cluster late. Cross-corpus checks show the same qualitative asymmetry across additional model/domain pairs. Filtering analyses instantiate the signal as a prefix-causal selective early-exit policy: at shallow and intermediate depths, burst-aware filtering outperforms fixed length-based filtering while using only prefix-available features. Moderate length cutoffs remain strong completed-trace baselines, but burst-aware control provides a deployable mechanism for separating recoverable repair from likely instability.
LGMay 8
Vertex-Softmax: Tight Transformer Verification via Exact Softmax OptimizationNavid Rezazadeh, Arash Gholami Davoodi
Certified verification of transformer attention requires bounding the softmax function over interval constraints on the pre-softmax scores. Existing verifiers relax softmax ndependently of the downstream objective, leaving avoidable slack. We prove that the exact optimum of this score-box problem is attained at a vertex of the constraint box, and establish a threshold structure theorem showing that, after sorting the objective coefficients, the optimum lies among only linearly many candidates, yielding the Vertex-Softmax primitive with log-linear complexity in the sequence length. We further prove a formal optimality result showing that Vertex-Softmax is the tightest sound bound obtainable from score intervals alone, characterizing precisely what additional structure (score correlations, score-value coupling) is needed for further improvement. Integrated into a CROWN Convex Relaxation based Optimization for Worst-case Neurons)-style verifier with a formal soundness guarantee, Vertex-Softmax significantly improves certified rates and substantially tightens lower bounds across MNIST, Fashion-MNIST, and CIFAR-10 attention models, while consistently matching or outperforming alpha-CROWN and branch-and-bound baselines at a fraction of their cost.
CLFeb 10
Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language ModelsArash Gholami Davoodi, Navid Rezazadeh, Seyed Pouyan Mousavi Davoudi et al.
Large language models (LLMs) must balance diversity and creativity against logical coherence in open-ended generation. Existing truncation-based samplers are effective but largely heuristic, relying mainly on probability mass and entropy while ignoring semantic geometry of the token space. We present Top-W, a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry-to keep the cropped distribution close to the original, while explicitly balancing retained probability mass against the entropy of the kept set. Our theory yields a simple closed-form structure for the fixed-potential subset update: depending on the mass-entropy trade-off, the optimal crop either collapses to a single token or takes the form of a one-dimensional prefix that can be found efficiently with a linear scan. We implement Top-W using efficient geometry-based potentials (nearest-set or k-NN) and pair it with an alternating decoding routine that keeps the standard truncation-and-sampling interface unchanged. Extensive experiments on four benchmarks (GSM8K, GPQA, AlpacaEval, and MT-Bench) across three instruction-tuned models show that Top-W consistently outperforms prior state-of-the-art decoding approaches achieving up to 33.7% improvement. Moreover, we find that Top-W not only improves accuracy-focused performance, but also boosts creativity under judge-based open-ended evaluation.
LGDec 11, 2021
Learning Contraction Policies from Offline DataNavid Rezazadeh, Maxwell Kolarich, Solmaz S. Kia et al.
This paper proposes a data-driven method for learning convergent control policies from offline data using Contraction theory. Contraction theory enables constructing a policy that makes the closed-loop system trajectories inherently convergent towards a unique trajectory. At the technical level, identifying the contraction metric, which is the distance metric with respect to which a robot's trajectories exhibit contraction is often non-trivial. We propose to jointly learn the control policy and its corresponding contraction metric while enforcing contraction. To achieve this, we learn an implicit dynamics model of the robotic system from an offline data set consisting of the robot's state and input trajectories. Using this learned dynamics model, we propose a data augmentation algorithm for learning contraction policies. We randomly generate samples in the state-space and propagate them forward in time through the learned dynamics model to generate auxiliary sample trajectories. We then learn both the control policy and the contraction metric such that the distance between the trajectories from the offline data set and our generated auxiliary sample trajectories decreases over time. We evaluate the performance of our proposed framework on simulated robotic goal-reaching tasks and demonstrate that enforcing contraction results in faster convergence and greater robustness of the learned policy.
MAAug 12, 2019
A sub-modular receding horizon solution for mobile multi-agent persistent monitoringNavid Rezazadeh, Solmaz S. Kia
We study the problem of persistent monitoring of a finite number of inter-connected geographical nodes by a group of heterogeneous mobile agents. We assign to each geographical node a concave and increasing reward function that resets to zero after an agent's visit. Then, we design the optimal dispatch policy of which nodes to visit at what time and by what agent by finding a policy set that maximizes a utility that is defined as the total reward collected at visit times. We show that this optimization problem is NP-hard and its computational complexity increases exponentially with the number of the agents and the length of the mission horizon. By showing that the utility function is a monotone increasing and submodular set function of agents' policy, we proceed to propose a suboptimal dispatch policy design with a known optimality gap. To reduce the time complexity of constructing the feasible search set and also to induce robustness to changes in the operational factors, we perform our suboptimal policy design in a receding horizon fashion. Then, to compensate for the shortsightedness of the receding horizon approach for reward distribution beyond the feasible policies of the agents over the receding horizon, we add a new term to our utility, which provides a measure of nodal importance beyond the receding horizon's sight. This term gives the policy design an intuition to steer the agents towards the nodes with higher rewards on the patrolling graph. Finally, we discuss how our proposed algorithm can be implemented in a decentralized manner. A simulation study demonstrates our results.