89.6HCMar 14Code
LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech NeuroprosthesesEbrahim Feghhi, Junlin Hu, Nima Hadidi et al.
A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires ${\sim}$320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only ${\sim}$10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.
53.6HCMar 9
Re-evaluating Position and Velocity Decoding for Hand Pose Estimation with Surface ElectromyographyNima Hadidi, Johannes Lee, Ebrahim Feghhi et al.
Recent progress in real-time hand pose estimation from surface electromyography (sEMG) has been driven by the emg2pose benchmark, whose original baseline study concluded that velocity decoding outperforms position decoding in both reconstruction accuracy and trajectory smoothness. We revisit that conclusion under the original causal evaluation protocol. Using the same core architecture but a more stable training recipe, we show that position decoding models were previously underestimated because they are highly sensitive to a previously unswept decoder output scalar and can otherwise collapse into low movement solutions. Once this scalar is tuned, position decoding outperforms velocity decoding on the Tracking task across all three emg2pose generalization conditions, consistent with greater robustness to error accumulation. On the Regression task, the gap between position and velocity decoding is much smaller; instead, the largest gains come from multi-task training with Tracking, suggesting that the Regression objective alone does not sufficiently constrain the learned dynamics. Although position decoding models exhibit greater local jitter, a causal speed-adaptive filter preserves their accuracy advantage while yielding a more favorable smoothness-accuracy tradeoff than velocity decoding. Altogether, our results revise the original emg2pose modeling conclusions and establish a new state of the art among published streaming-compatible models on this benchmark.
HCJun 14, 2025
SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface ElectromyographyNima Hadidi, Jason Chan, Ebrahim Feghhi et al.
Surface electromyography (sEMG) at the wrists could enable natural, keyboard-free text entry, yet the state-of-the-art emg2qwerty baseline still misrecognizes $51.8\%$ of characters in the zero-shot setting on unseen users and $7.0\%$ after user-specific fine-tuning. We trace many of these errors to mismatched cross-user signal statistics, fragile reliance on high-order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization, which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low-order feature combinations more likely to generalize across users; and (iii) a Split-and-Share encoder that processes each hand independently with weight-shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five-fold reduction in spectral resolution ($33\!\rightarrow\!6$ frequency bands), these components yield a compact Split-and-Share model, SplashNet-mini, which uses only $\tfrac14$ the parameters and $0.6\times$ the FLOPs of the baseline while reducing character-error rate (CER) to $36.4\%$ zero-shot and $5.9\%$ after fine-tuning. An upscaled variant, SplashNet ($\tfrac12$ the parameters, $1.15\times$ the FLOPs of the baseline), further lowers error to $35.7\%$ and $5.5\%$, representing relative improvements of $31\%$ and $21\%$ in the zero-shot and fine-tuned settings, respectively. SplashNet therefore establishes a new state of the art without requiring additional data.
CLJun 3, 2024
What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain ScoresEbrahim Feghhi, Nima Hadidi, Bryan Song et al.
Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore use contiguous splits moving forward. Second, we explain the surprisingly high brain scores of untrained LLMs by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings; a small, additional amount is explained by sense-specific embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretations of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are mapping to in neural signals.