Vinh Tran

CV
h-index117
6papers
3,104citations
Novelty47%
AI Score49

6 Papers

QMMay 8Code
MeTime: An R package for reproducible longitudinal metabolomics data analysis

Bharadwaj Marella, Patrick Weinisch, Lara Vehovec et al.

MeTime is an opensource R package for reproducible analysis of longitudinal metabolomics data. It builds upon a central S4 container, metime_analyser, that stores multiple datasets, associated metadata and analysis outputs, enabling unified handling of complex longitudinal studies. Analyses are constructed by piping modular functions, beginning with data transformations (mod_), followed by calculations (calc_), and optional meta-analysis (meta_), so entire workflows remain transparent and easy to modify. MeTime wraps numerous existing methods within a consistent interface, including sample and metabolite distributions, correlation and distance matrices, dimensionality reduction (PCA, UMAP, tSNE), random forest imputation and feature selection via Boruta, eigenmetabolites and WGCNA based clustering, conservation index analysis, regression models (linear, mixed effects, and generalized additive), and partial correlation networks. By retaining all intermediate results and provenance within the container, MeTime facilitates iterative exploration and ensures reproducible reporting via automatically generated HTML and PDF outputs. Comprehensive user guides, case studies and reference documentation accompany the package, making MeTime a versatile platform for longitudinal omics workflows.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

CVApr 10, 2019
Attentive Action and Context Factorization

Yang Wang, Vinh Tran, Gedas Bertasius et al.

We propose a method for human action recognition, one that can localize the spatiotemporal regions that `define' the actions. This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual elements. To address this challenge, we utilize conjugate samples of human actions, which are video clips that are contextually similar to human action samples but do not contain the action. We introduce a novel attentional mechanism that can spatially and temporally separate human actions from the co-occurring contextual factors. The separation of the action and context factors is weakly supervised, eliminating the need for laboriously detailed annotation of these two factors in training samples. Our method can be used to build human action classifiers with higher accuracy and better interpretability. Experiments on several human action recognition datasets demonstrate the quantitative and qualitative benefits of our approach.

CVApr 9, 2019
Knowledge Distillation for Human Action Anticipation

Vinh Tran, Yang Wang, Minh Hoai

We consider the task of training a neural network to anticipate human actions in video. This task is challenging given the complexity of video data, the stochastic nature of the future, and the limited amount of annotated training data. In this paper, we propose a novel knowledge distillation framework that uses an action recognition network to supervise the training of an action anticipation network, guiding the latter to attend to the relevant information needed for correctly anticipating the future actions. This framework is possible thanks to a novel loss function to account for positional shifts of semantic concepts in a dynamic video. The knowledge distillation framework is a form of self-supervised learning, and it takes advantage of unlabeled data. Experimental results on JHMDB and EPIC-KITCHENS dataset show the effectiveness of our approach.

CVAug 17, 2017
Eigen Evolution Pooling for Human Action Recognition

Yang Wang, Vinh Tran, Minh Hoai

We introduce Eigen Evolution Pooling, an efficient method to aggregate a sequence of feature vectors. Eigen evolution pooling is designed to produce compact feature representations for a sequence of feature vectors, while maximally preserving as much information about the sequence as possible, especially the temporal evolution of the features over time. Eigen evolution pooling is a general pooling method that can be applied to any sequence of feature vectors, from low-level RGB values to high-level Convolutional Neural Network (CNN) feature vectors. We show that eigen evolution pooling is more effective than average, max, and rank pooling for encoding the dynamics of human actions in video. We demonstrate the power of eigen evolution pooling on UCF101 and Hollywood2 datasets, two human action recognition benchmarks, and achieve state-of-the-art performance.

CVFeb 14, 2017
Evolution-Preserving Dense Trajectory Descriptors

Yang Wang, Vinh Tran, Minh Hoai

Recently Trajectory-pooled Deep-learning Descriptors were shown to achieve state-of-the-art human action recognition results on a number of datasets. This paper improves their performance by applying rank pooling to each trajectory, encoding the temporal evolution of deep learning features computed along the trajectory. This leads to Evolution-Preserving Trajectory (EPT) descriptors, a novel type of video descriptor that significantly outperforms Trajectory-pooled Deep-learning Descriptors. EPT descriptors are defined based on dense trajectories, and they provide complimentary benefits to video descriptors that are not based on trajectories. In particular, we show that the combination of EPT descriptors and VideoDarwin leads to state-of-the-art performance on Hollywood2 and UCF101 datasets.