Sooyoung Cha

HC
h-index5
4papers
24citations
Novelty73%
AI Score50

4 Papers

LGMar 30
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Kieran Didi, Zuobai Zhang, Guoqing Zhou et al.

Protein interaction modeling is central to protein design, which has been transformed by machine learning with applications in drug discovery and beyond. In this landscape, structure-based de novo binder design is cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Proteina-Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architectures and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Proteina-Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We also demonstrate interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.

HCApr 7
Symetra: Visual Analytics for the Parameter Tuning Process of Symbolic Execution Engines

Donghee Hong, Minjong Kim, Sooyoung Cha et al.

Symbolic execution engines such as KLEE automatically generate test cases to maximize branch coverage, but their numerous parameters make it difficult to understand the parameters' impact, leading the user to rely on suboptimal default configurations. While automated tuners have shown promising results, they provide limited insights into why certain configurations work well, motivating the need for Human-in-the-Loop approaches. In this work, we present a visual analytics system, Symetra, designed to support Human-in-the-Loop parameter tuning of symbolic execution engines. To handle a large number of parameters and their configurations, we provide two complementary overviews of their impact on branch coverage values and patterns. Building on these overviews, our system enables collective analysis, allowing the user to contrast groups of configurations and identify differences that may affect branch coverage. We also report on case studies and a Human-in-the-Loop tuning process, demonstrating that experts not only interpreted parameter impacts and identified complementary configurations, but also improved upon fully automated approaches in both branch coverage and tuning efficiency.

PLOct 12, 2025
ECO: Enhanced Code Optimization via Performance-Aware Prompting for Code-LLMs

Su-Hyeon Kim, Joonghyuk Hahn, Sooyoung Cha et al.

Code runtime optimization-the task of rewriting a given code to a faster one-remains challenging, as it requires reasoning about performance trade-offs involving algorithmic and structural choices. Recent approaches employ code-LLMs with slow-fast code pairs provided as optimization guidance, but such pair-based methods obscure the causal factors of performance gains and often lead to superficial pattern imitation rather than genuine performance reasoning. We introduce ECO, a performance-aware prompting framework for code optimization. ECO first distills runtime optimization instructions (ROIs) from reference slow-fast code pairs; Each ROI describes root causes of inefficiency and the rationales that drive performance improvements. For a given input code, ECO in parallel employs (i) a symbolic advisor to produce a bottleneck diagnosis tailored to the code, and (ii) an ROI retriever to return related ROIs. These two outputs are then composed into a performance-aware prompt, providing actionable guidance for code-LLMs. ECO's prompts are model-agnostic, require no fine-tuning, and can be easily prepended to any code-LLM prompt. Our empirical studies highlight that ECO prompting significantly improves code-LLMs' ability to generate efficient code, achieving speedups of up to 7.81x while minimizing correctness loss.

SEJul 23, 2019
Enhancing Dynamic Symbolic Execution by Automatically Learning Search Heuristics

Sooyoung Cha, Seongjoon Hong, Jingyoung Kim et al.

We present a technique to automatically generate search heuristics for dynamic symbolic execution. A key challenge in dynamic symbolic execution is how to effectively explore the program's execution paths to achieve high code coverage in a limited time budget. Dynamic symbolic execution employs a search heuristic to address this challenge, which favors exploring particular types of paths that are most likely to maximize the final coverage. However, manually designing a good search heuristic is nontrivial and typically ends up with suboptimal and unstable outcomes. The goal of this paper is to overcome this shortcoming of dynamic symbolic execution by automatically learning search heuristics. We define a class of search heuristics, namely a parametric search heuristic, and present an algorithm that efficiently finds an optimal heuristic for each subject program. Experimental results with industrial-strength symbolic execution tools (e.g., KLEE) show that our technique can successfully generate search heuristics that significantly outperform existing manually-crafted heuristics in terms of branch coverage and bug-finding.