Yan Bin Ng

CV
4papers
40citations
Novelty50%
AI Score35

4 Papers

CVOct 31, 2025
Hyperbolic Optimal Transport

Yan Bin Ng, Xianfeng Gu

The optimal transport (OT) problem aims to find the most efficient mapping between two probability distributions under a given cost function, and has diverse applications in many fields such as machine learning, computer vision and computer graphics. However, existing methods for computing optimal transport maps are primarily developed for Euclidean spaces and the sphere. In this paper, we explore the problem of computing the optimal transport map in hyperbolic space, which naturally arises in contexts involving hierarchical data, networks, and multi-genus Riemann surfaces. We propose a novel and efficient algorithm for computing the optimal transport map in hyperbolic space using a geometric variational technique by extending methods for Euclidean and spherical geometry to the hyperbolic setting. We also perform experiments on synthetic data and multi-genus surface models to validate the efficacy of the proposed method.

CVJul 19, 2021
Action Forecasting with Feature-wise Self-Attention

Yan Bin Ng, Basura Fernando

We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary classifier which helps to understand what has happened so far. Then the decoder generates actions for the future based on the output of the recurrent encoder and the self-attention model. Experimentally, we validate each component of our architecture where we see that the impact of self-attention to identify relevant feature dimensions, temporal masking, and observed auxiliary classifier. We evaluate our method on two standard action forecasting benchmarks and obtain state-of-the-art results.

CVDec 10, 2019
Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting

Yan Bin Ng, Basura Fernando

Future human action forecasting from partial observations of activities is an important problem in many practical applications such as assistive robotics, video surveillance and security. We present a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture. The input to this model is the observed RGB video, and the objective is to forecast the correct future symbolic action sequence. Unlike prior methods that make action predictions for some unseen percentage of video one for each frame, we predict the complete action sequence that is required to accomplish the activity. We coin this task action sequence forecasting. To cater for two types of uncertainty in the future predictions, we propose a novel loss function. We show a combination of optimal transport and future uncertainty losses help to improve results. We extend our action sequence forecasting model to perform weakly supervised action forecasting on two challenging datasets, the Breakfast and the 50Salads. Specifically, we propose a model to predict actions of future unseen frames without using frame level annotations during training. Using Fisher vector features, our supervised model outperforms the state-of-the-art action forecasting model by 0.83% and 7.09% on the Breakfast and the 50Salads datasets respectively. Our weakly supervised model is only 0.6% behind the most recent state-of-the-art supervised model and obtains comparable results to other published fully supervised methods, and sometimes even outperforms them on the Breakfast dataset. Most interestingly, our weakly supervised model outperforms prior models by 1.04% leveraging on proposed weakly supervised architecture, and effective use of attention mechanism and loss functions.

CVOct 7, 2019
Human Action Sequence Classification

Yan Bin Ng, Basura Fernando

This paper classifies human action sequences from videos using a machine translation model. In contrast to classical human action classification which outputs a set of actions, our method output a sequence of action in the chronological order of the actions performed by the human. Therefore our method is evaluated using sequential performance measures such as Bilingual Evaluation Understudy (BLEU) scores. Action sequence classification has many applications such as learning from demonstration, action segmentation, detection, localization and video captioning. Furthermore, we use our model that is trained to output action sequences to solve downstream tasks; such as video captioning and action localization. We obtain state of the art results for video captioning in challenging Charades dataset obtaining BLEU-4 score of 34.8 and METEOR score of 33.6 outperforming previous state-of-the-art of 18.8 and 19.5 respectively. Similarly, on ActivityNet captioning, we obtain excellent results in-terms of ROUGE (20.24) and CIDER (37.58) scores. For action localization, without using any explicit start/end action annotations, our method obtains localization performance of 22.2 mAP outperforming prior fully supervised methods.