Ngoc Mai Tran

h-index12

8papers

167citations

Novelty44%

AI Score28

Ranked #150,639 of 194,257 authors (top 78%)#1,745 in SE (top 57%)

8 Papers

22.4AISep 23, 2022Code

Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network

Mario Krenn, Lorenzo Buffoni, Bruno Coutinho et al.

A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could significantly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over the last years, making it challenging for human researchers to keep track of the progress. Here, we use AI techniques to predict the future research directions of AI itself. We develop a new graph-based benchmark based on real-world data -- the Science4Cast benchmark, which aims to predict the future state of an evolving semantic network of AI. For that, we use more than 100,000 research papers and build up a knowledge network with more than 64,000 concept nodes. We then present ten diverse methods to tackle this task, ranging from pure statistical to pure learning methods. Surprisingly, the most powerful methods use a carefully curated set of network features, rather than an end-to-end AI approach. It indicates a great potential that can be unleashed for purely ML approaches without human knowledge. Ultimately, better predictions of new future research directions will be a crucial component of more advanced research suggestion tools.

1.2GTOct 24, 2017

Product-Mix Auctions and Tropical Geometry

Ngoc Mai Tran, Josephine Yu

In a recent and ongoing work, Baldwin and Klemperer explored a connection between tropical geometry and economics. They gave a sufficient condition for the existence of competitive equilibrium in product-mix auctions of indivisible goods. This result, which we call the Unimodularity Theorem, can also be traced back to the work of Danilov, Koshevoy, and Murota in discrete convex analysis. We give a new proof of the Unimodularity Theorem via the classical unimodularity theorem in integer programming. We give a unified treatment of these results via tropical geometry and formulate a new sufficient condition for competitive equilibrium when there are only two types of product. Generalizations of our theorem in higher dimensions are equivalent to various forms of the Oda conjecture in algebraic geometry.

1.2MEJan 23, 2012

HodgeRank is the limit of Perron Rank

Ngoc Mai Tran

We study the map which takes an elementwise positive matrix to the k-th root of the principal eigenvector of its k-th Hadamard power. We show that as $k$ tends to 0 one recovers the row geometric mean vector and discuss the geometric significance of this convergence. In the context of pairwise comparison ranking, our result states that HodgeRank is the limit of Perron Rank, thereby providing a novel mathematical link between two important pairwise ranking methods.

11.1SEJun 8, 2019Code

Recovering Variable Names for Minified Code with Usage Contexts

Hieu Tran, Ngoc Tran, Son Nguyen et al.

In modern Web technology, JavaScript (JS) code plays an important role. To avoid the exposure of original source code, the variable names in JS code deployed in the wild are often replaced by short, meaningless names, thus making the code extremely difficult to manually understand and analysis. This paper presents JSNeat, an information retrieval (IR)-based approach to recover the variable names in minified JS code. JSNeat follows a data-driven approach to recover names by searching for them in a large corpus of open-source JS code. We use three types of contexts to match a variable in given minified code against the corpus including the context of properties and roles of the variable, the context of that variable and relations with other variables under recovery, and the context of the task of the function to which the variable contributes. We performed several empirical experiments to evaluate JSNeat on the dataset of more than 322K JS files with 1M functions, and 3.5M variables with 176K unique variable names. We found that JSNeat achieves a high accuracy of 69.1%, which is the relative improvements of 66.1% and 43% over two state-of-the-art approaches JSNice and JSNaughty, respectively. The time to recover for a file or for a variable with JSNeat is twice as fast as with JSNice and 4x as fast as with JNaughty, respectively.

1.2SINov 29, 2021Code

Improving random walk rankings with feature selection and imputation

Ngoc Mai Tran, Yangxinyu Xie

The Science4cast Competition consists of predicting new links in a semantic network, with each node representing a concept and each edge representing a link proposed by a paper relating two concepts. This network contains information from 1994-2017, with a discretization of days (which represents the publication date of the underlying papers). Team Hash Brown's final submission, \emph{ee5a}, achieved a score of 0.92738 on the test set. Our team's score ranks \emph{second place}, 0.01 below the winner's score. This paper details our model, its intuition, and the performance of its variations in the test set.

1.4MLMar 19, 2020

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Hanlin Zhu, Xue Li, Liuyang Sun et al.

Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods. To solve this problem we develop C-FAR, a novel method for Fast, Automated and Reproducible assessment of multiple hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. While it is applicable to large dataset in any domain that utilizes pairwise comparisons for assessment, our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons. On simulated data of 96 neurons under adverse conditions, including drifting and 25\% blackout, our algorithm produces near-perfect tracking relative to the ground truth. Our runtime scales linearly in the number of input trees, making it a competitive computational tool. These results indicate that C-FAR is highly suitable as a model selection and assessment tool in clustering tasks.

6.9SENov 18, 2019

Feature-Interaction Aware Configuration Prioritization for Configurable Code

Son Nguyen, Hoan Nguyen, Ngoc Tran et al.

Unexpected interactions among features induce most bugs in a configurable software system. Exhaustively analyzing all the exponential number of possible configurations is prohibitively costly. Thus, various sampling techniques have been proposed to systematically narrow down the exponential number of legal configurations to be analyzed. Since analyzing all selected configurations can require a huge amount of effort, fault-based configuration prioritization, that helps detect faults earlier, can yield practical benefits in quality assurance. In this paper, we propose CoPro, a novel formulation of feature-interaction bugs via common program entities enabled/disabled by the features. Leveraging from that, we develop an efficient feature-interaction aware configuration prioritization technique for a configurable system by ranking the configurations according to their total number of potential bugs. We conducted several experiments to evaluate CoPro on the ability to detect configuration-related bugs in a public benchmark. We found that CoPro outperforms the state-of-the-art configuration prioritization techniques when we add them on advanced sampling algorithms. In 78% of the cases, CoPro ranks the buggy configurations at the top 3 positions in the resulting list. Interestingly, CoPro is able to detect 17 not-yet-discovered feature-interaction bugs.

1.4LGOct 24, 2017

Classification on Large Networks: A Quantitative Bound via Motifs and Graphons

Andreas Haupt, Mohammad Khatami, Thomas Schultz et al.

When each data point is a large graph, graph statistics such as densities of certain subgraphs (motifs) can be used as feature vectors for machine learning. While intuitive, motif counts are expensive to compute and difficult to work with theoretically. Via graphon theory, we give an explicit quantitative bound for the ability of motif homomorphisms to distinguish large networks under both generative and sampling noise. Furthermore, we give similar bounds for the graph spectrum and connect it to homomorphism densities of cycles. This results in an easily computable classifier on graph data with theoretical performance guarantee. Our method yields competitive results on classification tasks for the autoimmune disease Lupus Erythematosus.