Chi-Ning Chou

LG
h-index82
13papers
200citations
Novelty65%
AI Score55

13 Papers

100.0CCApr 2
Linear Space Streaming Lower Bounds for Approximating CSPs

Chi-Ning Chou, Alexander Golovnev, Madhu Sudan et al.

We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on $n$ variables taking values in $\{0,\ldots,q-1\}$, we prove that improving over the trivial approximability by a factor of $q$ requires $Ω(n)$ space even on instances with $O(n)$ constraints. We also identify a broad subclass of problems for which any improvement over the trivial approximability requires $Ω(n)$ space. The key technical core is an optimal, $q^{-(k-1)}$-inapproximability for the Max $k$-LIN-$\bmod\; q$ problem, which is the Max CSP problem where every constraint is given by a system of $k-1$ linear equations $\bmod\; q$ over $k$ variables. Our work builds on and extends the breakthrough work of Kapralov and Krachun (Proc. STOC 2019) who showed a linear lower bound on any non-trivial approximation of the MaxCut problem in graphs. MaxCut corresponds roughly to the case of Max $k$-LIN-$\bmod\; q$ with ${k=q=2}$. For general CSPs in the streaming setting, prior results only yielded $Ω(\sqrt{n})$ space bounds. In particular no linear space lower bound was known for an approximation factor less than $1/2$ for any CSP. Extending the work of Kapralov and Krachun to Max $k$-LIN-$\bmod\; q$ to $k>2$ and $q>2$ (while getting optimal hardness results) is the main technical contribution of this work. Each one of these extensions provides non-trivial technical challenges that we overcome in this work.

80.9LGMay 26
Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent

Chi-Ning Chou, Oscar Uzdelewicz, Neng-Chun Chiu et al.

Training loss and accuracy are the standard signals used to monitor generalization during deep neural network training. Two well-documented phenomena complicate this picture: in grokking, train loss falls rapidly while test performance improves abruptly only after a long delay; in epoch-wise double descent, train loss decreases monotonically while test loss or error rises and falls. Existing accounts are often task-specific, and a task-agnostic analysis framework for diagnosing and explaining these phenomena across realistic tasks and architectures is missing. We address this challenge by analyzing two competing processes that underlie learning dynamics: representation learning in the encoder and readout calibration in the final classifier. Using tools from representational geometry, neural tangent kernels, and linear probing, we show that both processes are active throughout training, with the fluctuations of their relative speed giving rise to seemingly anomalous generalization dynamics. Applying the representation-readout decomposition to grokking across a wide range of tasks and architectures, we find that the readout is train-biased before grokking onset, and representation learning is gradual but not absent, contrary to the lazy-to-rich account. The framework further provides diagnostic signatures distinguishing spurious from genuine generalization: in a previously reported MNIST grokking example and an epoch-wise double descent example, apparent delayed or non-monotone generalization is shown to arise from representation degradation and readout misalignment induced by non-standard training recipes. Together, these results establish the representation-readout decomposition as a top-down framework for understanding learning dynamics and revealing underlying algorithms for interpretability research.

LGFeb 10, 2022Code
Understanding Rare Spurious Correlations in Neural Networks

Yao-Yuan Yang, Chi-Ning Chou, Kamalika Chaudhuri

Neural networks are known to use spurious correlations such as background information for classification. While prior work has looked at spurious correlations that are widespread in the training data, in this work, we investigate how sensitive neural networks are to rare spurious correlations, which may be harder to detect and correct, and may lead to privacy leaks. We introduce spurious patterns correlated with a fixed class to a few training examples and find that it takes only a handful of such examples for the network to learn the correlation. Furthermore, these rare spurious correlations also impact accuracy and privacy. We empirically and theoretically analyze different factors involved in rare spurious correlations and propose mitigation methods accordingly. Specifically, we observe that $\ell_2$ regularization and adding Gaussian noise to inputs can reduce the undesirable effects. Code available at https://github.com/yangarbiter/rare-spurious-correlation.

LGMar 2
Diagnosing Generalization Failures from Representational Geometry Markers

Chi-Ning Chou, Artem Kirsanov, Yao-Yuan Yang et al.

Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conventional approaches often take a ``bottom-up'' mechanistic route by reverse-engineering interpretable features or circuits to build explanatory models. While insightful, these methods often struggle to provide the high-level, predictive signals for anticipating failure in real-world deployment. Here, we propose using a ``top-down'' approach to studying generalization failures inspired by medical biomarkers: identifying system-level measurements that serve as robust indicators of a model's future performance. Rather than mapping out detailed internal mechanisms, we systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate predictions in real-world settings. In image classification, we find that task-relevant geometric properties of in-distribution (ID) object manifolds consistently forecast poor out-of-distribution (OOD) generalization. In particular, reductions in two geometric measures, effective manifold dimensionality and utility, predict weaker OOD performance across diverse architectures, optimizers, and datasets. We apply this finding to transfer learning with ImageNet-pretrained models. We consistently find that the same geometric patterns predict OOD transfer performance more reliably than ID accuracy. This work demonstrates that representational geometry can expose hidden vulnerabilities, offering more robust guidance for model selection and AI interpretability.

CLFeb 11, 2025
The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

Artem Kirsanov, Chi-Ning Chou, Kyunghyun Cho et al.

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.

NCDec 21, 2023
Probing Biological and Artificial Neural Networks with Task-dependent Neural Manifolds

Michael Kuoch, Chi-Ning Chou, Nikhil Parthasarathy et al.

Recently, growth in our understanding of the computations performed in both biological and artificial neural networks has largely been driven by either low-level mechanistic studies or global normative approaches. However, concrete methodologies for bridging the gap between these levels of abstraction remain elusive. In this work, we investigate the internal mechanisms of neural networks through the lens of neural population geometry, aiming to provide understanding at an intermediate level of abstraction, as a way to bridge that gap. Utilizing manifold capacity theory (MCT) from statistical physics and manifold alignment analysis (MAA) from high-dimensional statistics, we probe the underlying organization of task-dependent manifolds in deep neural networks and macaque neural recordings. Specifically, we quantitatively characterize how different learning objectives lead to differences in the organizational strategies of these models and demonstrate how these geometric analyses are connected to the decodability of task-relevant information. These analyses present a strong direction for bridging mechanistic and normative theories in neural networks through neural population geometry, potentially opening up many future research avenues in both machine learning and neuroscience.

LGMar 23, 2025
Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry

Chi-Ning Chou, Hang Le, Yichen Wang et al.

Integrating task-relevant information into neural representations is a fundamental ability of both biological and artificial intelligence systems. Recent theories have categorized learning into two regimes: the rich regime, where neural networks actively learn task-relevant features, and the lazy regime, where networks behave like random feature models. Yet this simple lazy-rich dichotomy overlooks a diverse underlying taxonomy of feature learning, shaped by differences in learning algorithms, network architectures, and data properties. To address this gap, we introduce an analysis framework to study feature learning via the geometry of neural representations. Rather than inspecting individual learned features, we characterize how task-relevant representational manifolds evolve throughout the learning process. We show, in both theoretical and empirical settings, that as networks learn features, task-relevant manifolds untangle, with changes in manifold geometry revealing distinct learning stages and strategies beyond the lazy-rich dichotomy. This framework provides novel insights into feature learning across neuroscience and machine learning, shedding light on structural inductive biases in neural circuits and the mechanisms underlying out-of-distribution generalization.

QUANT-PHAug 6, 2021
Quantum Meets the Minimum Circuit Size Problem

Nai-Hui Chia, Chi-Ning Chou, Jiayu Zhang et al.

In this work, we initiate the study of the Minimum Circuit Size Problem (MCSP) in the quantum setting. MCSP is a problem to compute the circuit complexity of Boolean functions. It is a fascinating problem in complexity theory -- its hardness is mysterious, and a better understanding of its hardness can have surprising implications to many fields in computer science. We first define and investigate the basic complexity-theoretic properties of minimum quantum circuit size problems for three natural objects: Boolean functions, unitaries, and quantum states. We show that these problems are not trivially in NP but in QCMA (or have QCMA protocols). Next, we explore the relations between the three quantum MCSPs and their variants. We discover that some reductions that are not known for classical MCSP exist for quantum MCSPs for unitaries and states, e.g., search-to-decision reduction and self-reduction. Finally, we systematically generalize results known for classical MCSP to the quantum setting (including quantum cryptography, quantum learning theory, quantum circuit lower bounds, and quantum fine-grained complexity) and also find new connections to tomography and quantum gravity. Due to the fundamental differences between classical and quantum circuits, most of our results require extra care and reveal properties and phenomena unique to the quantum setting. Our findings could be of interest for future studies, and we post several open problems for further exploration along this direction.

OCJun 11, 2020
A General Framework for Analyzing Stochastic Dynamics in Learning Algorithms

Chi-Ning Chou, Juspreet Singh Sandhu, Mien Brabeeba Wang et al.

One of the challenges in analyzing learning algorithms is the circular entanglement between the objective value and the stochastic noise. This is also known as the "chicken and egg" phenomenon and traditionally, there is no principled way to tackle this issue. People solve the problem by utilizing the special structure of the dynamic, and hence the analysis would be difficult to generalize. In this work, we present a streamlined three-step recipe to tackle the "chicken and egg" problem and give a general framework for analyzing stochastic dynamics in learning algorithms. Our framework composes standard techniques from probability theory, such as stopping time and martingale concentration. We demonstrate the power and flexibility of our framework by giving a unifying analysis for three very different learning problems with the last iterate and the strong uniform high probability convergence guarantee. The problems are stochastic gradient descent for strongly convex functions, streaming principal component analysis, and linear bandit with stochastic gradient descent updates. We either improve or match the state-of-the-art bounds on all three dynamics.

NCNov 4, 2019
ODE-Inspired Analysis for the Biological Version of Oja's Rule in Solving Streaming PCA

Chi-Ning Chou, Mien Brabeeba Wang

Oja's rule [Oja, Journal of mathematical biology 1982] is a well-known biologically-plausible algorithm using a Hebbian-type synaptic update rule to solve streaming principal component analysis (PCA). Computational neuroscientists have known that this biological version of Oja's rule converges to the top eigenvector of the covariance matrix of the input in the limit. However, prior to this work, it was open to prove any convergence rate guarantee. In this work, we give the first convergence rate analysis for the biological version of Oja's rule in solving streaming PCA. Moreover, our convergence rate matches the information theoretical lower bound up to logarithmic factors and outperforms the state-of-the-art upper bound for streaming PCA. Furthermore, we develop a novel framework inspired by ordinary differential equations (ODE) to analyze general stochastic dynamics. The framework abandons the traditional step-by-step analysis and instead analyzes a stochastic dynamic in one-shot by giving a closed-form solution to the entire dynamic. The one-shot framework allows us to apply stopping time and martingale techniques to have a flexible and precise control on the dynamic. We believe that this general framework is powerful and should lead to effective yet simple analysis for a large class of problems with stochastic dynamics.

CRJul 9, 2018
Personalized Difficulty Adjustment for Countering the Double-Spending Attack in Proof-of-Work Consensus Protocols

Chi-Ning Chou, Yu-Jing Lin, Ren Chen et al.

Bitcoin is the first secure decentralized electronic currency system. However, it is known to be inefficient due to its proof-of-work (PoW) consensus algorithm and has the potential hazard of double spending. In this paper, we aim to reduce the probability of double spending by decreasing the probability of consecutive winning. We first formalize a PoW-based decentralized secure network model in order to present a quantitative analysis. Next, to resolve the risk of double spending, we propose the personalized difficulty adjustment (PDA) mechanism which modifies the difficulty of each participant such that those who win more blocks in the past few rounds have a smaller probability to win in the next round. To analyze the performance of the PDA mechanism, we observe that the system can be modeled by a high-order Markov chain. Finally, we show that PDA effectively decreases the probability of consecutive winning and results in a more trustworthy PoW-based system.

DSMay 7, 2018
(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs

Boaz Barak, Chi-Ning Chou, Zhixian Lei et al.

We give a quasipolynomial time algorithm for the graph matching problem (also known as noisy or robust graph isomorphism) on correlated random graphs. Specifically, for every $γ>0$, we give a $n^{O(\log n)}$ time algorithm that given a pair of $γ$-correlated $G(n,p)$ graphs $G_0,G_1$ with average degree between $n^{\varepsilon}$ and $n^{1/153}$ for $\varepsilon = o(1)$, recovers the "ground truth" permutation $π\in S_n$ that matches the vertices of $G_0$ to the vertices of $G_n$ in the way that minimizes the number of mismatched edges. We also give a recovery algorithm for a denser regime, and a polynomial-time algorithm for distinguishing between correlated and uncorrelated graphs. Prior work showed that recovery is information-theoretically possible in this model as long the average degree was at least $\log n$, but sub-exponential time algorithms were only known in the dense case (i.e., for $p > n^{-o(1)}$). Moreover, "Percolation Graph Matching", which is the most common heuristic for this problem, has been shown to require knowledge of $n^{Ω(1)}$ "seeds" (i.e., input/output pairs of the permutation $π$) to succeed in this regime. In contrast our algorithms require no seed and succeed for $p$ which is as low as $n^{o(1)-1}$.

NEMar 28, 2018
On the Algorithmic Power of Spiking Neural Networks

Chi-Ning Chou, Kai-Min Chung, Chi-Jen Lu

Spiking Neural Networks (SNN) are mathematical models in neuroscience to describe the dynamics among a set of neurons that interact with each other by firing instantaneous signals, a.k.a., spikes. Interestingly, a recent advance in neuroscience [Barrett-Denève-Machens, NIPS 2013] showed that the neurons' firing rate, i.e., the average number of spikes fired per unit of time, can be characterized by the optimal solution of a quadratic program defined by the parameters of the dynamics. This indicated that SNN potentially has the computational power to solve non-trivial quadratic programs. However, the results were justified empirically without rigorous analysis. We put this into the context of natural algorithms and aim to investigate the algorithmic power of SNN. Especially, we emphasize on giving rigorous asymptotic analysis on the performance of SNN in solving optimization problems. To enforce a theoretical study, we first identify a simplified SNN model that is tractable for analysis. Next, we confirm the empirical observation in the work of Barrett et al. by giving an upper bound on the convergence rate of SNN in solving the quadratic program. Further, we observe that in the case where there are infinitely many optimal solutions, SNN tends to converge to the one with smaller l1 norm. We give an affirmative answer to our finding by showing that SNN can solve the l1 minimization problem under some regular conditions. Our main technical insight is a dual view of the SNN dynamics, under which SNN can be viewed as a new natural primal-dual algorithm for the l1 minimization problem. We believe that the dual view is of independent interest and may potentially find interesting interpretation in neuroscience.