LGFeb 3, 2023
SPARLING: Learning Latent Representations with Extremely Sparse ActivationsKavi Gupta, Osbert Bastani, Armando Solar-Lezama
Real-world processes often contain intermediate state that can be modeled as an extremely sparse activation tensor. In this work, we analyze the identifiability of such sparse and local latent intermediate variables, which we call motifs. We prove our Motif Identifiability Theorem, stating that under certain assumptions it is possible to precisely identify these motifs exclusively by reducing end-to-end error. Notably, we do not assume identifiability of parameters, but rather of a latent intermediate representation output by a local model, thus allowing these representations to be arbitrarily complex functions of the input. Additionally, we provide the Sparling algorithm, which uses a new kind of informational bottleneck that enforces levels of activation sparsity unachievable using other techniques. We confirm empirically that extreme sparsity is necessary to achieve good intermediate state modeling. On synthetic domains, we are able to precisely localize the intermediate states up to feature permutation with > 90% accuracy, even though we only train end-to-end.
LGJan 6, 2025
Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context LearningKavi Gupta, Kate Sanders, Armando Solar-Lezama
While LLMs have revolutionized the field of machine learning due to their high performance on a strikingly wide range of problems, they are also known to hallucinate false answers and underperform on less canonical versions of the same tasks. There are several emerging theories of LLM performance, among them that LLMs lack world modeling ability, that they have an undesirable bias towards an autoregressive prior, and that they struggle on more novel problems. The existing literature on LLM input novelty has focused on tasks of relatively high complexity, studying perturbations of canonical but complex problems. In this paper, we attempt to minimize complexity in order to isolate novelty as a factor in LLM underperformance and investigate the power of in-context-learning. To this end, we consider an extremely simple domain: next token prediction on simple language tasks. The twist is that these language tasks are wholly unseen, as they are randomly drawn from a large, parsimoniously defined set of languages arising from simple grammar rules. This experimental setup allows us to evaluate ICL independently of models' parametric knowledge. We find that LLMs uniformly underperform n-gram models on this task, both when used as next token predictors and in chain-of-thought.
LGJul 16, 2020
Synthesize, Execute and Debug: Learning to Repair for Neural Program SynthesisKavi Gupta, Peter Ebert Christensen, Xinyun Chen et al.
The use of deep learning techniques has achieved significant progress for program synthesis from input-output examples. However, when the program semantics become more complex, it still remains a challenge to synthesize programs that are consistent with the specification. In this work, we propose SED, a neural program generation framework that incorporates synthesis, execution, and debugging stages. Instead of purely relying on the neural program synthesizer to generate the final program, SED first produces initial programs using the neural program synthesizer component, then utilizes a neural program debugger to iteratively repair the generated programs. The integration of the debugger component enables SED to modify the programs based on the execution results and specification, which resembles the coding process of human programmers. On Karel, a challenging input-output program synthesis benchmark, SED reduces the error rate of the neural program synthesizer itself by a considerable margin, and outperforms the standard beam search for decoding.
LGDec 27, 2019
Synthetic Datasets for Neural Program SynthesisRichard Shin, Neel Kant, Kavi Gupta et al.
The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior. Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill. However, we empirically discover that applying test input generation techniques for languages with control flow and rich input space causes deep networks to generalize poorly to certain data distributions; to correct this, we propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.