Nima Chitsazan

LG
h-index7
3papers
11citations
Novelty50%
AI Score47

3 Papers

LGJun 5, 2025Code
Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Andrei Mircea, Supriyo Chakraborty, Nima Chitsazan et al.

This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl

IRJan 22
SPARC-RAG: Adaptive Sequential-Parallel Scaling with Context Management for Retrieval-Augmented Generation

Yuxin Yang, Gangda Deng, Ömer Faruk Akgül et al.

Retrieval-Augmented Generation (RAG) grounds large language model outputs in external evidence, but remains challenged on multi-hop question answering that requires long reasoning. Recent works scale RAG at inference time along two complementary dimensions: sequential depth for iterative refinement and parallel width for coverage expansion. However, naive scaling causes context contamination and scaling inefficiency, leading to diminishing or negative returns despite increased computation. To address these limitations, we propose SPARC-RAG, a multi-agent framework that coordinates sequential and parallel inference-time scaling under a unified context management mechanism. SPARC-RAG employs specialized agents that maintain a shared global context and provide explicit control over the scaling process. It generates targeted, complementary sub-queries for each branch to enable diverse parallel exploration, and explicitly regulates exiting decisions based on answer correctness and evidence grounding. To optimize scaling behavior, we further introduce a lightweight fine-tuning method with process-level verifiable preferences, which improves the efficiency of sequential scaling and effectiveness of parallel scaling. Across single- and multi-hop QA benchmarks, SPARC-RAG consistently outperforms previous RAG baselines, yielding an average +6.2 F1 improvement under lower inference cost.

LGJun 22, 2021
Dynamic Customer Embeddings for Financial Service Applications

Nima Chitsazan, Samuel Sharpe, Dwipam Katariya et al.

As financial services (FS) companies have experienced drastic technology driven changes, the availability of new data streams provides the opportunity for more comprehensive customer understanding. We propose Dynamic Customer Embeddings (DCE), a framework that leverages customers' digital activity and a wide range of financial context to learn dense representations of customers in the FS industry. Our method examines customer actions and pageviews within a mobile or web digital session, the sequencing of the sessions themselves, and snapshots of common financial features across our organization at the time of login. We test our customer embeddings using real world data in three prediction problems: 1) the intent of a customer in their next digital session, 2) the probability of a customer calling the call centers after a session, and 3) the probability of a digital session to be fraudulent. DCE showed performance lift in all three downstream problems.