Ngoc Tran

SE
6papers
309citations
Novelty50%
AI Score30

6 Papers

CVMar 27, 2023Code
Anti-DreamBooth: Protecting users from personalized text-to-image synthesis

Thanh Van Le, Hao Phung, Thuan Hoang Nguyen et al.

Text-to-image diffusion models are nothing but a revolution, allowing anyone, even without design skills, to create realistic images from simple text inputs. With powerful personalization tools like DreamBooth, they can generate images of a specific person just by learning from his/her few reference images. However, when misused, such a powerful and convenient tool can produce fake news or disturbing content targeting any individual victim, posing a severe negative social impact. In this paper, we explore a defense system called Anti-DreamBooth against such malicious use of DreamBooth. The system aims to add subtle noise perturbation to each user's image before publishing in order to disrupt the generation quality of any DreamBooth model trained on these perturbed images. We investigate a wide range of algorithms for perturbation optimization and extensively evaluate them on two facial datasets over various text-to-image model versions. Despite the complicated formulation of DreamBooth and Diffusion-based text-to-image models, our methods effectively defend users from the malicious use of those models. Their effectiveness withstands even adverse conditions, such as model or prompt/term mismatching between training and testing. Our code will be available at https://github.com/VinAIResearch/Anti-DreamBooth.git.

LGJun 4, 2022
Stochastic Multiple Target Sampling Gradient Descent

Hoang Phan, Ngoc Tran, Trung Le et al.

Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of particles to approximate the distribution of interest. Furthermore, when analysing its asymptotic properties, SVGD reduces exactly to a single-objective optimization problem and can be viewed as a probabilistic version of this single-objective optimization problem. A natural question then arises: "Can we derive a probabilistic version of the multi-objective optimization?". To answer this question, we propose Stochastic Multiple Target Sampling Gradient Descent (MT-SGD), enabling us to sample from multiple unnormalized target distributions. Specifically, our MT-SGD conducts a flow of intermediate distributions gradually orienting to multiple target distributions, which allows the sampled particles to move to the joint high-likelihood region of the target distributions. Interestingly, the asymptotic analysis shows that our approach reduces exactly to the multiple-gradient descent algorithm for multi-objective optimization, as expected. Finally, we conduct comprehensive experiments to demonstrate the merit of our approach to multi-task learning.

SEJun 8, 2019Code
Recovering Variable Names for Minified Code with Usage Contexts

Hieu Tran, Ngoc Tran, Son Nguyen et al.

In modern Web technology, JavaScript (JS) code plays an important role. To avoid the exposure of original source code, the variable names in JS code deployed in the wild are often replaced by short, meaningless names, thus making the code extremely difficult to manually understand and analysis. This paper presents JSNeat, an information retrieval (IR)-based approach to recover the variable names in minified JS code. JSNeat follows a data-driven approach to recover names by searching for them in a large corpus of open-source JS code. We use three types of contexts to match a variable in given minified code against the corpus including the context of properties and roles of the variable, the context of that variable and relations with other variables under recovery, and the context of the task of the function to which the variable contributes. We performed several empirical experiments to evaluate JSNeat on the dataset of more than 322K JS files with 1M functions, and 3.5M variables with 176K unique variable names. We found that JSNeat achieves a high accuracy of 69.1%, which is the relative improvements of 66.1% and 43% over two state-of-the-art approaches JSNice and JSNaughty, respectively. The time to recover for a file or for a variable with JSNeat is twice as fast as with JSNice and 4x as fast as with JNaughty, respectively.

MLFeb 3, 2020
Stochastic geometry to generalize the Mondrian Process

Eliza O'Reilly, Ngoc Tran

The stable under iterated tessellation (STIT) process is a stochastic process that produces a recursive partition of space with cut directions drawn independently from a distribution over the sphere. The case of random axis-aligned cuts is known as the Mondrian process. Random forests and Laplace kernel approximations built from the Mondrian process have led to efficient online learning methods and Bayesian optimization. In this work, we utilize tools from stochastic geometry to resolve some fundamental questions concerning STIT processes in machine learning. First, we show that a STIT process with cut directions drawn from a discrete distribution can be efficiently simulated by lifting to a higher dimensional axis-aligned Mondrian process. Second, we characterize all possible kernels that stationary STIT processes and their mixtures can approximate. We also give a uniform convergence rate for the approximation error of the STIT kernels to the targeted kernels, generalizing the work of [3] for the Mondrian case. Third, we obtain consistency results for STIT forests in density estimation and regression. Finally, we give a formula for the density estimator arising from an infinite STIT random forest. This allows for precise comparisons between the Mondrian forest, the Mondrian kernel and the Laplace kernel in density estimation. Our paper calls for further developments at the novel intersection of stochastic geometry and machine learning.

SENov 18, 2019
Feature-Interaction Aware Configuration Prioritization for Configurable Code

Son Nguyen, Hoan Nguyen, Ngoc Tran et al.

Unexpected interactions among features induce most bugs in a configurable software system. Exhaustively analyzing all the exponential number of possible configurations is prohibitively costly. Thus, various sampling techniques have been proposed to systematically narrow down the exponential number of legal configurations to be analyzed. Since analyzing all selected configurations can require a huge amount of effort, fault-based configuration prioritization, that helps detect faults earlier, can yield practical benefits in quality assurance. In this paper, we propose CoPro, a novel formulation of feature-interaction bugs via common program entities enabled/disabled by the features. Leveraging from that, we develop an efficient feature-interaction aware configuration prioritization technique for a configurable system by ranking the configurations according to their total number of potential bugs. We conducted several experiments to evaluate CoPro on the ability to detect configuration-related bugs in a public benchmark. We found that CoPro outperforms the state-of-the-art configuration prioritization techniques when we add them on advanced sampling algorithms. In 78% of the cases, CoPro ranks the buggy configurations at the top 3 positions in the resulting list. Interestingly, CoPro is able to detect 17 not-yet-discovered feature-interaction bugs.

SEJun 12, 2019
Does BLEU Score Work for Code Migration?

Ngoc Tran, Hieu Tran, Son Nguyen et al.

Statistical machine translation (SMT) is a fast-growing sub-field of computational linguistics. Until now, the most popular automatic metric to measure the quality of SMT is BiLingual Evaluation Understudy (BLEU) score. Lately, SMT along with the BLEU metric has been applied to a Software Engineering task named code migration. (In)Validating the use of BLEU score could advance the research and development of SMT-based code migration tools. Unfortunately, there is no study to approve or disapprove the use of BLEU score for source code. In this paper, we conducted an empirical study on BLEU score to (in)validate its suitability for the code migration task due to its inability to reflect the semantics of source code. In our work, we use human judgment as the ground truth to measure the semantic correctness of the migrated code. Our empirical study demonstrates that BLEU does not reflect translation quality due to its weak correlation with the semantic correctness of translated code. We provided counter-examples to show that BLEU is ineffective in comparing the translation quality between SMT-based models. Due to BLEU's ineffectiveness for code migration task, we propose an alternative metric RUBY, which considers lexical, syntactical, and semantic representations of source code. We verified that RUBY achieves a higher correlation coefficient with the semantic correctness of migrated code, 0.775 in comparison with 0.583 of BLEU score. We also confirmed the effectiveness of RUBY in reflecting the changes in translation quality of SMT-based translation models. With its advantages, RUBY can be used to evaluate SMT-based code migration models.