67.8AIApr 23
Read the Paper, Write the Code: Agentic Reproduction of Social-Science ResultsBenjamin Kohler, David Zollikofer, Johanna Einsiedler et al.
Recent work has used LLM agents to reproduce empirical social science results with access to both the data and code. We broaden this scope by asking: Can they reproduce results given only a paper's methods description and original data? We develop an agentic reproduction system that extracts structured methods descriptions from papers, runs reimplementations under strict information isolation -- agents never see the original code, results, or paper -- and enables deterministic, cell-level comparison of reproduced outputs to the original results. An error attribution step traces discrepancies through the system chain to identify root causes. Evaluating four agent scaffolds and four LLMs on 48 papers with human-verified reproducibility, we find that agents can largely recover published results, but performance varies substantially between models, scaffolds, and papers. Root cause analysis reveals that failures stem both from agent errors and from underspecification in the papers themselves.
LGDec 2, 2024
Beyond Pairwise Correlations: Higher-Order Redundancies in Self-Supervised Representation LearningDavid Zollikofer, Béni Egressy, Frederik Benzing et al.
Several self-supervised learning (SSL) approaches have shown that redundancy reduction in the feature embedding space is an effective tool for representation learning. However, these methods consider a narrow notion of redundancy, focusing on pairwise correlations between features. To address this limitation, we formalize the notion of embedding space redundancy and introduce redundancy measures that capture more complex, higher-order dependencies. We mathematically analyze the relationships between these metrics, and empirically measure these redundancies in the embedding spaces of common SSL methods. Based on our findings, we propose Self Supervised Learning with Predictability Minimization (SSLPM) as a method for reducing redundancy in the embedding space. SSLPM combines an encoder network with a predictor engaging in a competitive game of reducing and exploiting dependencies respectively. We demonstrate that SSLPM is competitive with state-of-the-art methods and find that the best performing SSL methods exhibit low embedding space redundancy, suggesting that even methods without explicit redundancy reduction mechanisms perform redundancy reduction implicitly.
CRJun 27, 2024
Synthetic Cancer -- Augmenting Worms with LLMsBenjamin Zimmerman, David Zollikofer
With increasingly sophisticated large language models (LLMs), the potential for abuse rises drastically. As a submission to the Swiss AI Safety Prize, we present a novel type of metamorphic malware leveraging LLMs for two key processes. First, LLMs are used for automatic code rewriting to evade signature-based detection by antimalware programs. The malware then spreads its copies via email by utilizing an LLM to socially engineer email replies to encourage recipients to execute the attached malware. Our submission includes a functional minimal prototype, highlighting the risks that LLMs pose for cybersecurity and underscoring the need for further research into intelligent malware.