Chi-Ning Chou

h-index9

9papers

140citations

Novelty68%

AI Score53

Ranked #12,864 of 194,257 authors (top 7%)#3,295 in LG (top 8%)

9 Papers

11.7CCApr 2

Linear Space Streaming Lower Bounds for Approximating CSPs

Chi-Ning Chou, Alexander Golovnev, Madhu Sudan et al.

We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on $n$ variables taking values in $\{0,\ldots,q-1\}$, we prove that improving over the trivial approximability by a factor of $q$ requires $Î©(n)$ space even on instances with $O(n)$ constraints. We also identify a broad subclass of problems for which any improvement over the trivial approximability requires $Î©(n)$ space. The key technical core is an optimal, $q^{-(k-1)}$-inapproximability for the Max $k$-LIN-$\bmod\; q$ problem, which is the Max CSP problem where every constraint is given by a system of $k-1$ linear equations $\bmod\; q$ over $k$ variables. Our work builds on and extends the breakthrough work of Kapralov and Krachun (Proc. STOC 2019) who showed a linear lower bound on any non-trivial approximation of the MaxCut problem in graphs. MaxCut corresponds roughly to the case of Max $k$-LIN-$\bmod\; q$ with ${k=q=2}$. For general CSPs in the streaming setting, prior results only yielded $Î©(\sqrt{n})$ space bounds. In particular no linear space lower bound was known for an approximation factor less than $1/2$ for any CSP. Extending the work of Kapralov and Krachun to Max $k$-LIN-$\bmod\; q$ to $k>2$ and $q>2$ (while getting optimal hardness results) is the main technical contribution of this work. Each one of these extensions provides non-trivial technical challenges that we overcome in this work.

9.5LGMay 26

Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent

Chi-Ning Chou, Oscar Uzdelewicz, Neng-Chun Chiu et al.

Training loss and accuracy are the standard signals used to monitor generalization during deep neural network training. Two well-documented phenomena complicate this picture: in grokking, train loss falls rapidly while test performance improves abruptly only after a long delay; in epoch-wise double descent, train loss decreases monotonically while test loss or error rises and falls. Existing accounts are often task-specific, and a task-agnostic analysis framework for diagnosing and explaining these phenomena across realistic tasks and architectures is missing. We address this challenge by analyzing two competing processes that underlie learning dynamics: representation learning in the encoder and readout calibration in the final classifier. Using tools from representational geometry, neural tangent kernels, and linear probing, we show that both processes are active throughout training, with the fluctuations of their relative speed giving rise to seemingly anomalous generalization dynamics. Applying the representation-readout decomposition to grokking across a wide range of tasks and architectures, we find that the readout is train-biased before grokking onset, and representation learning is gradual but not absent, contrary to the lazy-to-rich account. The framework further provides diagnostic signatures distinguishing spurious from genuine generalization: in a previously reported MNIST grokking example and an epoch-wise double descent example, apparent delayed or non-monotone generalization is shown to arise from representation degradation and readout misalignment induced by non-standard training recipes. Together, these results establish the representation-readout decomposition as a top-down framework for understanding learning dynamics and revealing underlying algorithms for interpretability research.

1.4LGMar 2

Diagnosing Generalization Failures from Representational Geometry Markers

Chi-Ning Chou, Artem Kirsanov, Yao-Yuan Yang et al.

Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conventional approaches often take a ``bottom-up'' mechanistic route by reverse-engineering interpretable features or circuits to build explanatory models. While insightful, these methods often struggle to provide the high-level, predictive signals for anticipating failure in real-world deployment. Here, we propose using a ``top-down'' approach to studying generalization failures inspired by medical biomarkers: identifying system-level measurements that serve as robust indicators of a model's future performance. Rather than mapping out detailed internal mechanisms, we systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate predictions in real-world settings. In image classification, we find that task-relevant geometric properties of in-distribution (ID) object manifolds consistently forecast poor out-of-distribution (OOD) generalization. In particular, reductions in two geometric measures, effective manifold dimensionality and utility, predict weaker OOD performance across diverse architectures, optimizers, and datasets. We apply this finding to transfer learning with ImageNet-pretrained models. We consistently find that the same geometric patterns predict OOD transfer performance more reliably than ID accuracy. This work demonstrates that representational geometry can expose hidden vulnerabilities, offering more robust guidance for model selection and AI interpretability.

23.4CLFeb 11, 2025

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

Artem Kirsanov, Chi-Ning Chou, Kyunghyun Cho et al.

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.

3.3NCDec 21, 2023

Probing Biological and Artificial Neural Networks with Task-dependent Neural Manifolds

Michael Kuoch, Chi-Ning Chou, Nikhil Parthasarathy et al.

Recently, growth in our understanding of the computations performed in both biological and artificial neural networks has largely been driven by either low-level mechanistic studies or global normative approaches. However, concrete methodologies for bridging the gap between these levels of abstraction remain elusive. In this work, we investigate the internal mechanisms of neural networks through the lens of neural population geometry, aiming to provide understanding at an intermediate level of abstraction, as a way to bridge that gap. Utilizing manifold capacity theory (MCT) from statistical physics and manifold alignment analysis (MAA) from high-dimensional statistics, we probe the underlying organization of task-dependent manifolds in deep neural networks and macaque neural recordings. Specifically, we quantitatively characterize how different learning objectives lead to differences in the organizational strategies of these models and demonstrate how these geometric analyses are connected to the decodability of task-relevant information. These analyses present a strong direction for bridging mechanistic and normative theories in neural networks through neural population geometry, potentially opening up many future research avenues in both machine learning and neuroscience.

5.9OCJun 11, 2020

A General Framework for Analyzing Stochastic Dynamics in Learning Algorithms

Chi-Ning Chou, Juspreet Singh Sandhu, Mien Brabeeba Wang et al.

One of the challenges in analyzing learning algorithms is the circular entanglement between the objective value and the stochastic noise. This is also known as the "chicken and egg" phenomenon and traditionally, there is no principled way to tackle this issue. People solve the problem by utilizing the special structure of the dynamic, and hence the analysis would be difficult to generalize. In this work, we present a streamlined three-step recipe to tackle the "chicken and egg" problem and give a general framework for analyzing stochastic dynamics in learning algorithms. Our framework composes standard techniques from probability theory, such as stopping time and martingale concentration. We demonstrate the power and flexibility of our framework by giving a unifying analysis for three very different learning problems with the last iterate and the strong uniform high probability convergence guarantee. The problems are stochastic gradient descent for strongly convex functions, streaming principal component analysis, and linear bandit with stochastic gradient descent updates. We either improve or match the state-of-the-art bounds on all three dynamics.

5.9NCNov 4, 2019

ODE-Inspired Analysis for the Biological Version of Oja's Rule in Solving Streaming PCA

Chi-Ning Chou, Mien Brabeeba Wang

Oja's rule [Oja, Journal of mathematical biology 1982] is a well-known biologically-plausible algorithm using a Hebbian-type synaptic update rule to solve streaming principal component analysis (PCA). Computational neuroscientists have known that this biological version of Oja's rule converges to the top eigenvector of the covariance matrix of the input in the limit. However, prior to this work, it was open to prove any convergence rate guarantee. In this work, we give the first convergence rate analysis for the biological version of Oja's rule in solving streaming PCA. Moreover, our convergence rate matches the information theoretical lower bound up to logarithmic factors and outperforms the state-of-the-art upper bound for streaming PCA. Furthermore, we develop a novel framework inspired by ordinary differential equations (ODE) to analyze general stochastic dynamics. The framework abandons the traditional step-by-step analysis and instead analyzes a stochastic dynamic in one-shot by giving a closed-form solution to the entire dynamic. The one-shot framework allows us to apply stopping time and martingale techniques to have a flexible and precise control on the dynamic. We believe that this general framework is powerful and should lead to effective yet simple analysis for a large class of problems with stochastic dynamics.

7.2CRJul 9, 2018

Personalized Difficulty Adjustment for Countering the Double-Spending Attack in Proof-of-Work Consensus Protocols

Chi-Ning Chou, Yu-Jing Lin, Ren Chen et al.

Bitcoin is the first secure decentralized electronic currency system. However, it is known to be inefficient due to its proof-of-work (PoW) consensus algorithm and has the potential hazard of double spending. In this paper, we aim to reduce the probability of double spending by decreasing the probability of consecutive winning. We first formalize a PoW-based decentralized secure network model in order to present a quantitative analysis. Next, to resolve the risk of double spending, we propose the personalized difficulty adjustment (PDA) mechanism which modifies the difficulty of each participant such that those who win more blocks in the past few rounds have a smaller probability to win in the next round. To analyze the performance of the PDA mechanism, we observe that the system can be modeled by a high-order Markov chain. Finally, we show that PDA effectively decreases the probability of consecutive winning and results in a more trustworthy PoW-based system.

13.8DSMay 7, 2018

(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs

Boaz Barak, Chi-Ning Chou, Zhixian Lei et al.

We give a quasipolynomial time algorithm for the graph matching problem (also known as noisy or robust graph isomorphism) on correlated random graphs. Specifically, for every $γ>0$, we give a $n^{O(\log n)}$ time algorithm that given a pair of $γ$-correlated $G(n,p)$ graphs $G_0,G_1$ with average degree between $n^{\varepsilon}$ and $n^{1/153}$ for $\varepsilon = o(1)$, recovers the "ground truth" permutation $π\in S_n$ that matches the vertices of $G_0$ to the vertices of $G_n$ in the way that minimizes the number of mismatched edges. We also give a recovery algorithm for a denser regime, and a polynomial-time algorithm for distinguishing between correlated and uncorrelated graphs. Prior work showed that recovery is information-theoretically possible in this model as long the average degree was at least $\log n$, but sub-exponential time algorithms were only known in the dense case (i.e., for $p > n^{-o(1)}$). Moreover, "Percolation Graph Matching", which is the most common heuristic for this problem, has been shown to require knowledge of $n^{Ω(1)}$ "seeds" (i.e., input/output pairs of the permutation $π$) to succeed in this regime. In contrast our algorithms require no seed and succeed for $p$ which is as low as $n^{o(1)-1}$.