Ripon K. Saha

SE
4papers
372citations
Novelty57%
AI Score41

4 Papers

CVDec 30, 2025
F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

Devendra K. Jangid, Ripon K. Saha, Dilshan Godaliyadda et al.

With the advent of Generative AI, Single Image Super-Resolution (SISR) quality has seen substantial improvement, as the strong priors learned by Text-2-Image Diffusion (T2IDiff) Foundation Models (FM) can bridge the gap between High-Resolution (HR) and Low-Resolution (LR) images. However, flagship smartphone cameras have been slow to adopt generative models because strong generation can lead to undesirable hallucinations. For substantially degraded LR images, as seen in academia, strong generation is required and hallucinations are more tolerable because of the wide gap between LR and HR images. In contrast, in consumer photography, the LR image has substantially higher fidelity, requiring only minimal hallucination-free generation. We hypothesize that generation in SISR is controlled by the stringency and richness of the FM's conditioning feature. First, text features are high level features, which often cannot describe subtle textures in an image. Additionally, Smartphone LR images are at least $12MP$, whereas SISR networks built on T2IDiff FM are designed to perform inference on much smaller images ($<1MP$). As a result, SISR inference has to be performed on small patches, which often cannot be accurately described by text feature. To address these shortcomings, we introduce an SISR network built on a FM with lower-level feature conditioning, specifically DINOv2 features, which we call a Feature-to-Image Diffusion (F2IDiff) Foundation Model (FM). Lower level features provide stricter conditioning while being rich descriptors of even small patches.

LGFeb 18, 2022
SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions

Ripon K. Saha, Akira Ura, Sonal Mahajan et al.

Automatic machine learning, or AutoML, holds the promise of truly democratizing the use of machine learning (ML), by substantially automating the work of data scientists. However, the huge combinatorial search space of candidate pipelines means that current AutoML techniques, generate sub-optimal pipelines, or none at all, especially on large, complex datasets. In this work we propose an AutoML technique SapientML, that can learn from a corpus of existing datasets and their human-written pipelines, and efficiently generate a high-quality pipeline for a predictive task on a new dataset. To combat the search space explosion of AutoML, SapientML employs a novel divide-and-conquer strategy realized as a three-stage program synthesis approach, that reasons on successively smaller search spaces. The first stage uses a machine-learned model to predict a set of plausible ML components to constitute a pipeline. In the second stage, this is then refined into a small pool of viable concrete pipelines using syntactic constraints derived from the corpus and the machine-learned model. Dynamically evaluating these few pipelines, in the third stage, provides the best solution. We instantiate SapientML as part of a fully automated tool-chain that creates a cleaned, labeled learning corpus by mining Kaggle, learns from it, and uses the learned models to then synthesize pipelines for new predictive tasks. We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets, including 10 new, large, real-world datasets from Kaggle, and against 3 state-of-the-art AutoML tools and 2 baselines. Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.

SEDec 21, 2021
Elixir: Effective object-oriented program repair

Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida et al.

This work is motivated by the pervasive use of method invocations in object-oriented (OO) programs, and indeed their prevalence in patches of OO-program bugs. We propose a generate-and-validate repair technique, called ELIXIR designed to be able to generate such patches. ELIXIR aggressively uses method calls, on par with local variables, fields, or constants, to construct more expressive repair-expressions, that go into synthesizing patches. The ensuing enlargement of the repair space, on account of the wider use of method calls, is effectively tackled by using a machine-learnt model to rank concrete repairs. The machine-learnt model relies on four features derived from the program context, i.e., the code surrounding the potential repair location, and the bug report. We implement ELIXIR and evaluate it on two datasets, the popular Defects4J dataset and a new dataset Bugs.jar created by us, and against 2 baseline versions of our technique, and 5 other techniques representing the state of the art in program repair. Our evaluation shows that ELIXIR is able to increase the number of correctly repaired bugs in Defects4J by 85% (from 14 to 26) and by 57% in Bugs.jar (from 14 to 22), while also significantly out-performing other state-of-the-art repair techniques including ACS, HD-Repair, NOPOL, PAR, and jGenProg.

SEJun 21, 2019
Harnessing Evolution for Multi-Hunk Program Repair

Seemanta Saha, Ripon K. Saha, Mukul R. Prasad

Despite significant advances in automatic program repair (APR)techniques over the past decade, practical deployment remains an elusive goal. One of the important challenges in this regard is the general inability of current APR techniques to produce patches that require edits in multiple locations, i.e., multi-hunk patches. In this work, we present a novel APR technique that generalizes single-hunk repair techniques to include an important class of multi-hunk bugs, namely bugs that may require applying a substantially similar patch at a number of locations. We term such sets of repair locations as evolutionary siblings - similar looking code, instantiated in similar contexts, that are expected to undergo similar changes. At the heart of our proposed method is an analysis to accurately identify a set of evolutionary siblings, for a given bug. This analysis leverages three distinct sources of information, namely the test-suite spectrum, a novel code similarity analysis, and the revision history of the project. The discovered siblings are then simultaneously repaired in a similar fashion. We instantiate this technique in a tool called Hercules and demonstrate that it is able to correctly fix 49 bugs in the Defects4J dataset, the highest of any individual APR technique to date. This includes 15 multi-hunk bugs and overall 13 bugs which have not been fixed by any other technique so far.