CLJul 9, 2024Code
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case StudyAniruddha Roy, Pretam Ray, Ayush Maheshwari et al.
Neural Machine Translation (NMT) remains a formidable challenge, especially when dealing with low-resource languages. Pre-trained sequence-to-sequence (seq2seq) multi-lingual models, such as mBART-50, have demonstrated impressive performance in various low-resource NMT tasks. However, their pre-training has been confined to 50 languages, leaving out support for numerous low-resource languages, particularly those spoken in the Indian subcontinent. Expanding mBART-50's language support requires complex pre-training, risking performance decline due to catastrophic forgetting. Considering these expanding challenges, this paper explores a framework that leverages the benefits of a pre-trained language model along with knowledge distillation in a seq2seq architecture to facilitate translation for low-resource languages, including those not covered by mBART-50. The proposed framework employs a multilingual encoder-based seq2seq model as the foundational architecture and subsequently uses complementary knowledge distillation techniques to mitigate the impact of imbalanced training. Our framework is evaluated on three low-resource Indic languages in four Indic-to-Indic directions, yielding significant BLEU-4 and chrF improvements over baselines. Further, we conduct human evaluation to confirm effectiveness of our approach. Our code is publicly available at https://github.com/raypretam/Two-step-low-res-NMT.
AIDec 18, 2025
AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric ConstraintsAniruddha Roy, Jyoti Patel, Aman Chadha et al.
Merging large language models (LLMs) is a practical way to compose capabilities from multiple fine-tuned checkpoints without retraining. Yet standard schemes (linear weight soups, task vectors, and Fisher-weighted averaging) can preserve loss while quietly destroying alignment. We argue that merging is not a numerical trick but a geometry-constrained operation around an already-aligned anchor: fusion must be steered to respect safety geometry, not validated post hoc. We introduce AlignMerge, a geometry-aware merging framework that makes alignment an explicit invariant. In a local Fisher chart around an instruction-tuned base, we estimate an alignment subspace with projector P_A and optimize: L_AlignMerge = L_geo + lambda_align * L_align + lambda_bud * L_bud, where L_geo keeps the merge close to its experts in Fisher-Rao geometry, L_align penalizes motion along alignment-sensitive directions, and L_bud enforces a soft alignment budget. As the alignment functional we use the decoding-invariant Alignment Quality Index (AQI), a latent-space criterion that captures how cleanly aligned and misaligned behaviors separate in representation space. Across five model families (LLaMA-3 8B, Mistral 7B, Qwen 2, Phi-3.5, Gemma 2), merging safety anchors with task experts, AlignMerge improves alignment metrics (AQI, toxicity, LLM-judge alignment) while matching or exceeding the best expert on instruction-following, reasoning, and helpfulness. It also exhibits smaller alignment-subspace drift and fewer budget violations than Fisher soups, TIES, SafeMerge, and MergeAlign. These results make alignment-preserving merging a first-class design goal and suggest a path to geometry-aware composition of future foundation models.
62.7OCMay 13
Guaranteed cost structured control in infinite-horizon linear-quadratic cooperative differential gamesAniruddha Roy, Pavankumar Tallapragada
In this paper, we consider infinite-horizon linear-quadratic cooperative differential games with output feedback information structure. We first demonstrate that, under output feedback information structure, computing Pareto optimal controls can be difficult even for simple low-dimensional differential games. To address this issue, this paper introduces the concept of feedback guaranteed cost structured control (GCSC). The feedback GCSC concept is inspired from suboptimal control. At a feedback GCSC, the total weighted team cost remains below a prescribed threshold while satisfying the structural constraints. We derive fundamental properties of the feedback GCSC and the admissible weight set, including their monotonicity properties. In particular, we show that if Pareto optimal controls exist, they belong to the class of feedback GCSCs. We also quantify the suboptimalty of Pareto optimal controls (if they exist) and the proposed GCSC with respect to output feedback optimal control. Furthermore, we provide the conditions for verification and the synthesis of a feedback GCSC. Finally, we illustrate the effectiveness of the proposed approach through numerical examples, including a case study on tracking synchronization in a microgrid.
CLMay 10, 2025Code
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated FeedbackAniruddha Roy, Pretam Ray, Abhilash Nandy et al.
Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.