Jonathan C. Kao

HC
h-index3
10papers
261citations
Novelty53%
AI Score53

10 Papers

HCSep 5, 2024
Shared Autonomy with IDA: Interventional Diffusion Assistance

Brandon J. McMahan, Zhenghao Peng, Bolei Zhou et al.

The rapid development of artificial intelligence (AI) has unearthed the potential to assist humans in controlling advanced technologies. Shared autonomy (SA) facilitates control by combining inputs from a human pilot and an AI copilot. In prior SA studies, the copilot is constantly active in determining the action played at each time step. This limits human autonomy and may have deleterious effects on performance. In general, the amount of helpful copilot assistance can vary greatly depending on the task dynamics. We therefore hypothesize that human autonomy and SA performance improve through dynamic and selective copilot intervention. To address this, we develop a goal-agnostic intervention assistance (IA) that dynamically shares control by having the copilot intervene only when the expected value of the copilot's action exceeds that of the human's action across all possible goals. We implement IA with a diffusion copilot (termed IDA) trained on expert demonstrations with goal masking. We prove a lower bound on the performance of IA that depends on pilot and copilot performance. Experiments with simulated human pilots show that IDA achieves higher performance than pilot-only and traditional SA control in variants of the Reacher environment and Lunar Lander. We then demonstrate that IDA achieves better control in Lunar Lander with human-in-the-loop experiments. Human participants report greater autonomy with IDA and prefer IDA over pilot-only and traditional SA control. We attribute the success of IDA to preserving human autonomy while simultaneously offering assistance to prevent the human pilot from entering universally bad states.

89.5HCMar 14Code
LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Ebrahim Feghhi, Junlin Hu, Nima Hadidi et al.

A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires ${\sim}$320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only ${\sim}$10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.

93.6HCMar 24
DiSCo: Diffusion Sequence Copilots for Shared Autonomy

Andy Wang, Xu Yan, Brandon McMahan et al.

Shared autonomy combines human user and AI copilot actions to control complex systems such as robotic arms. When a task is challenging, requires high dimensional control, or is subject to corruption, shared autonomy can significantly increase task performance by using a trained copilot to effectively correct user actions in a manner consistent with the user's goals. To significantly improve the performance of shared autonomy, we introduce Diffusion Sequence Copilots (DiSCo): a method of shared autonomy with diffusion policy that plans action sequences consistent with past user actions. DiSCo seeds and inpaints the diffusion process with user-provided actions with hyperparameters to balance conformity to expert actions, alignment with user intent, and perceived responsiveness. We demonstrate that DiSCo substantially improves task performance in simulated driving and robotic arm tasks. Project website: https://sites.google.com/view/disco-shared-autonomy/

MAJun 3, 2024Code
Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents

John L. Zhou, Weizhe Hong, Jonathan C. Kao

Cooperation between self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naive reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging literature on opponent shaping has demonstrated the ability to reach prosocial outcomes by influencing the learning of other agents. However, such methods differentiate through the learning step of other agents or optimize for meta-game dynamics, which rely on privileged access to opponents' learning algorithms or exponential sample complexity, respectively. To provide a learning rule-agnostic and sample-efficient alternative, we introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of opponents' actions on their returns. This approach seeks to modify other agents' $Q$-values by increasing their return following beneficial actions (with respect to the Reciprocator) and decreasing it after detrimental actions, guiding them towards mutually beneficial actions without directly differentiating through a model of their policy. We show that Reciprocators can be used to promote cooperation in temporally extended social dilemmas during simultaneous learning. Our code is available at https://github.com/johnlyzhou/reciprocator/.

LGMay 20, 2025
Flattening Hierarchies with Policy Bootstrapping

John L. Zhou, Jonathan C. Kao

Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https://johnlyzhou.github.io/saw/

53.6HCMar 9
Re-evaluating Position and Velocity Decoding for Hand Pose Estimation with Surface Electromyography

Nima Hadidi, Johannes Lee, Ebrahim Feghhi et al.

Recent progress in real-time hand pose estimation from surface electromyography (sEMG) has been driven by the emg2pose benchmark, whose original baseline study concluded that velocity decoding outperforms position decoding in both reconstruction accuracy and trajectory smoothness. We revisit that conclusion under the original causal evaluation protocol. Using the same core architecture but a more stable training recipe, we show that position decoding models were previously underestimated because they are highly sensitive to a previously unswept decoder output scalar and can otherwise collapse into low movement solutions. Once this scalar is tuned, position decoding outperforms velocity decoding on the Tracking task across all three emg2pose generalization conditions, consistent with greater robustness to error accumulation. On the Regression task, the gap between position and velocity decoding is much smaller; instead, the largest gains come from multi-task training with Tracking, suggesting that the Regression objective alone does not sufficiently constrain the learned dynamics. Although position decoding models exhibit greater local jitter, a causal speed-adaptive filter preserves their accuracy advantage while yielding a more favorable smoothness-accuracy tradeoff than velocity decoding. Altogether, our results revise the original emg2pose modeling conclusions and establish a new state of the art among published streaming-compatible models on this benchmark.

HCJun 14, 2025
SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography

Nima Hadidi, Jason Chan, Ebrahim Feghhi et al.

Surface electromyography (sEMG) at the wrists could enable natural, keyboard-free text entry, yet the state-of-the-art emg2qwerty baseline still misrecognizes $51.8\%$ of characters in the zero-shot setting on unseen users and $7.0\%$ after user-specific fine-tuning. We trace many of these errors to mismatched cross-user signal statistics, fragile reliance on high-order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization, which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low-order feature combinations more likely to generalize across users; and (iii) a Split-and-Share encoder that processes each hand independently with weight-shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five-fold reduction in spectral resolution ($33\!\rightarrow\!6$ frequency bands), these components yield a compact Split-and-Share model, SplashNet-mini, which uses only $\tfrac14$ the parameters and $0.6\times$ the FLOPs of the baseline while reducing character-error rate (CER) to $36.4\%$ zero-shot and $5.9\%$ after fine-tuning. An upscaled variant, SplashNet ($\tfrac12$ the parameters, $1.15\times$ the FLOPs of the baseline), further lowers error to $35.7\%$ and $5.5\%$, representing relative improvements of $31\%$ and $21\%$ in the zero-shot and fine-tuned settings, respectively. SplashNet therefore establishes a new state of the art without requiring additional data.

CLJun 3, 2024
What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores

Ebrahim Feghhi, Nima Hadidi, Bryan Song et al.

Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore use contiguous splits moving forward. Second, we explain the surprisingly high brain scores of untrained LLMs by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings; a small, additional amount is explained by sense-specific embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretations of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are mapping to in neural signals.

LGOct 6, 2020
Usable Information and Evolution of Optimal Representations During Training

Michael Kleinman, Alessandro Achille, Daksh Idnani et al.

We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training. We show that the implicit regularization coming from training with Stochastic Gradient Descent with a high learning-rate and small batch size plays an important role in learning minimal sufficient representations for the task. In the process of arriving at a minimal sufficient representation, we find that the content of the representation changes dynamically during training. In particular, we find that semantically meaningful but ultimately irrelevant information is encoded in the early transient dynamics of training, before being later discarded. In addition, we evaluate how perturbing the initial part of training impacts the learning dynamics and the resulting representations. We show these effects on both perceptual decision-making tasks inspired by neuroscience literature, as well as on standard image classification tasks.

NCOct 19, 2016
Making brain-machine interfaces robust to future neural variability

David Sussillo, Sergey D. Stavisky, Jonathan C. Kao et al.

A major hurdle to clinical translation of brain-machine interfaces (BMIs) is that current decoders, which are trained from a small quantity of recent data, become ineffective when neural recording conditions subsequently change. We tested whether a decoder could be made more robust to future neural variability by training it to handle a variety of recording conditions sampled from months of previously collected data as well as synthetic training data perturbations. We developed a new multiplicative recurrent neural network BMI decoder that successfully learned a large variety of neural-to- kinematic mappings and became more robust with larger training datasets. When tested with a non-human primate preclinical BMI model, this decoder was robust under conditions that disabled a state-of-the-art Kalman filter based decoder. These results validate a new BMI strategy in which accumulated data history is effectively harnessed, and may facilitate reliable daily BMI use by reducing decoder retraining downtime.