FLU-DYNJun 21, 2023Code
Neural Multigrid Memory For Computational Fluid DynamicsDuc Minh Nguyen, Minh Chau Vu, Tuan Anh Nguyen et al.
Turbulent flow simulation plays a crucial role in various applications, including aircraft and ship design, industrial process optimization, and weather prediction. In this paper, we propose an advanced data-driven method for simulating turbulent flow, representing a significant improvement over existing approaches. Our methodology combines the strengths of Video Prediction Transformer (VPTR) (Ye & Bilodeau, 2022) and Multigrid Architecture (MgConv, MgResnet) (Ke et al., 2017). VPTR excels in capturing complex spatiotemporal dependencies and handling large input data, making it a promising choice for turbulent flow prediction. Meanwhile, Multigrid Architecture utilizes multiple grids with different resolutions to capture the multiscale nature of turbulent flows, resulting in more accurate and efficient simulations. Through our experiments, we demonstrate the effectiveness of our proposed approach, named MGxTransformer, in accurately predicting velocity, temperature, and turbulence intensity for incompressible turbulent flows across various geometries and flow conditions. Our results exhibit superior accuracy compared to other baselines, while maintaining computational efficiency. Our implementation in PyTorch is available publicly at https://github.com/Combi2k2/MG-Turbulent-Flow
66.1CVMay 22
DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous DrivingHao Vo, Khoa Vo, Phu Loc Nguyen et al.
Spatiotemporal intelligence in autonomous driving (AD) requires an agent to integrate multi-view observations into a coherent scene representation, maintain object continuity across viewpoints and time, and reason about spatial relations, interactions, and future dynamics. However, existing AD vision-language benchmarks largely focus on single-view, static, ego-centric, or single-source question answering, leaving it unclear whether current Vision-Language Models (VLMs) can truly construct and reason over dynamic driving scenes. We introduce DriveSpatial, a benchmark of 15.6K human-verified QA pairs across 20 tasks from five large-scale AD datasets. DriveSpatial evaluates four abilities: Cognitive Scene Construction, Multi-view Relational Understanding, Temporal Reasoning, and Generalization. Unlike prior benchmarks, DriveSpatial is generated from a dynamic multi-relational scene graph that encodes object states, spatial relations, interactions, camera visibility, and temporal correspondences, enabling QA pairs that enforce genuine cross-view and spatiotemporal reasoning. Evaluating 15 representative VLMs reveals a substantial human-model gap: the strongest model trails humans by 28.4 points, with Cognitive Scene Construction emerging as the key bottleneck. Further diagnostics show that language-only prompting is insufficient, while explicit BEV grounding consistently improves performance. These results suggest that current VLMs lack the scene-construction ability needed for reliable spatiotemporal driving intelligence. DriveSpatial and its construction pipeline will be released to support future research.
AIOct 13, 2020
Temporal Collaborative Filtering with Graph Convolutional Neural NetworksEsther Rodrigo Bonet, Duc Minh Nguyen, Nikos Deligiannis
Temporal collaborative filtering (TCF) methods aim at modelling non-static aspects behind recommender systems, such as the dynamics in users' preferences and social trends around items. State-of-the-art TCF methods employ recurrent neural networks (RNNs) to model such aspects. These methods deploy matrix-factorization-based (MF-based) approaches to learn the user and item representations. Recently, graph-neural-network-based (GNN-based) approaches have shown improved performance in providing accurate recommendations over traditional MF-based approaches in non-temporal CF settings. Motivated by this, we propose a novel TCF method that leverages GNNs to learn user and item representations, and RNNs to model their temporal dynamics. A challenge with this method lies in the increased data sparsity, which negatively impacts obtaining meaningful quality representations with GNNs. To overcome this challenge, we train a GNN model at each time step using a set of observed interactions accumulated time-wise. Comprehensive experiments on real-world data show the improved performance obtained by our method over several state-of-the-art temporal and non-temporal CF models.
LGAug 28, 2020
Graph Convolutional Neural Networks with Node Transition Probability-based Message Passing and DropNode RegularizationTien Huu Do, Duc Minh Nguyen, Giannis Bekoulis et al.
Graph convolutional neural networks (GCNNs) have received much attention recently, owing to their capability in handling graph-structured data. Among the existing GCNNs, many methods can be viewed as instances of a neural message passing motif; features of nodes are passed around their neighbors, aggregated and transformed to produce better nodes' representations. Nevertheless, these methods seldom use node transition probabilities, a measure that has been found useful in exploring graphs. Furthermore, when the transition probabilities are used, their transition direction is often improperly considered in the feature aggregation step, resulting in an inefficient weighting scheme. In addition, although a great number of GCNN models with increasing level of complexity have been introduced, the GCNNs often suffer from over-fitting when being trained on small graphs. Another issue of the GCNNs is over-smoothing, which tends to make nodes' representations indistinguishable. This work presents a new method to improve the message passing process based on node transition probabilities by properly considering the transition direction, leading to a better weighting scheme in nodes' features aggregation compared to the existing counterpart. Moreover, we propose a novel regularization method termed DropNode to address the over-fitting and over-smoothing issues simultaneously. DropNode randomly discards part of a graph, thus it creates multiple deformed versions of the graph, leading to data augmentation regularization effect. Additionally, DropNode lessens the connectivity of the graph, mitigating the effect of over-smoothing in deep GCNNs. Extensive experiments on eight benchmark datasets for node and graph classification tasks demonstrate the effectiveness of the proposed methods in comparison with the state of the art.
SIApr 18, 2019
Rumour Detection via News Propagation Dynamics and User Representation LearningTien Huu Do, Xiao Luo, Duc Minh Nguyen et al.
Rumours have existed for a long time and have been known for serious consequences. The rapid growth of social media platforms has multiplied the negative impact of rumours; it thus becomes important to early detect them. Many methods have been introduced to detect rumours using the content or the social context of news. However, most existing methods ignore or do not explore effectively the propagation pattern of news in social media, including the sequence of interactions of social media users with news across time. In this work, we propose a novel method for rumour detection based on deep learning. Our method leverages the propagation process of the news by learning the users' representation and the temporal interrelation of users' responses. Experiments conducted on Twitter and Weibo datasets demonstrate the state-of-the-art performance of the proposed method.
LGJan 29, 2019
Geometric Matrix Completion with Deep Conditional Random FieldsDuc Minh Nguyen, Robert Calderbank, Nikos Deligiannis
The problem of completing high-dimensional matrices from a limited set of observations arises in many big data applications, especially, recommender systems. Existing matrix completion models generally follow either a memory- or a model-based approach, whereas, geometric matrix completion models combine the best from both approaches. Existing deep-learning-based geometric models yield good performance, but, in order to operate, they require a fixed structure graph capturing the relationships among the users and items. This graph is typically constructed by evaluating a pre-defined similarity metric on the available observations or by using side information, e.g., user profiles. In contrast, Markov-random-fields-based models do not require a fixed structure graph but rely on handcrafted features to make predictions. When no side information is available and the number of available observations becomes very low, existing solutions are pushed to their limits. In this paper, we propose a geometric matrix completion approach that addresses these challenges. We consider matrix completion as a structured prediction problem in a conditional random field (CRF), which is characterized by a maximum a posterior (MAP) inference, and we propose a deep model that predicts the missing entries by solving the MAP inference problem. The proposed model simultaneously learns the similarities among matrix entries, computes the CRF potentials, and solves the inference problem. Its training is performed in an end-to-end manner, with a method to supervise the learning of entry similarities. Comprehensive experiments demonstrate the superior performance of the proposed model compared to various state-of-the-art models on popular benchmark datasets and underline its superior capacity to deal with highly incomplete matrices.
LGDec 4, 2018
Matrix Factorization via Deep LearningDuc Minh Nguyen, Evaggelia Tsiligianni, Nikos Deligiannis
Matrix completion is one of the key problems in signal processing and machine learning. In recent years, deep-learning-based models have achieved state-of-the-art results in matrix completion. Nevertheless, they suffer from two drawbacks: (i) they can not be extended easily to rows or columns unseen during training; and (ii) their results are often degraded in case discrete predictions are required. This paper addresses these two drawbacks by presenting a deep matrix factorization model and a generic method to allow joint training of the factorization model and the discretization operator. Experiments on a real movie rating dataset show the efficacy of the proposed models.
LGNov 5, 2018
Matrix Completion With Variational Graph Autoencoders: Application in Hyperlocal Air Quality InferenceTien Huu Do, Duc Minh Nguyen, Evaggelia Tsiligianni et al.
Inferring air quality from a limited number of observations is an essential task for monitoring and controlling air pollution. Existing inference methods typically use low spatial resolution data collected by fixed monitoring stations and infer the concentration of air pollutants using additional types of data, e.g., meteorological and traffic information. In this work, we focus on street-level air quality inference by utilizing data collected by mobile stations. We formulate air quality inference in this setting as a graph-based matrix completion problem and propose a novel variational model based on graph convolutional autoencoders. Our model captures effectively the spatio-temporal correlation of the measurements and does not depend on the availability of additional information apart from the street-network topology. Experiments on a real air quality dataset, collected with mobile stations, shows that the proposed model outperforms state-of-the-art approaches.
LGJul 4, 2018
Regularizing Autoencoder-Based Matrix Completion Models via Manifold LearningDuc Minh Nguyen, Evaggelia Tsiligianni, Robert Calderbank et al.
Autoencoders are popular among neural-network-based matrix completion models due to their ability to retrieve potential latent factors from the partially observed matrices. Nevertheless, when training data is scarce their performance is significantly degraded due to overfitting. In this paper, we mit- igate overfitting with a data-dependent regularization technique that relies on the principles of multi-task learning. Specifically, we propose an autoencoder-based matrix completion model that performs prediction of the unknown matrix values as a main task, and manifold learning as an auxiliary task. The latter acts as an inductive bias, leading to solutions that generalize better. The proposed model outperforms the existing autoencoder-based models designed for matrix completion, achieving high reconstruction accuracy in well-known datasets.
MLMay 13, 2018
Extendable Neural Matrix CompletionDuc Minh Nguyen, Evaggelia Tsiligianni, Nikos Deligiannis
Matrix completion is one of the key problems in signal processing and machine learning, with applications ranging from image pro- cessing and data gathering to classification and recommender sys- tems. Recently, deep neural networks have been proposed as la- tent factor models for matrix completion and have achieved state- of-the-art performance. Nevertheless, a major problem with existing neural-network-based models is their limited capabilities to extend to samples unavailable at the training stage. In this paper, we propose a deep two-branch neural network model for matrix completion. The proposed model not only inherits the predictive power of neural net- works, but is also capable of extending to partially observed samples outside the training set, without the need of retraining or fine-tuning. Experimental studies on popular movie rating datasets prove the ef- fectiveness of our model compared to the state of the art, in terms of both accuracy and extendability.
SIMay 11, 2018
Twitter User Geolocation using Deep Multiview LearningTien Huu Do, Duc Minh Nguyen, Evaggelia Tsiligianni et al.
Predicting the geographical location of users on social networks like Twitter is an active research topic with plenty of methods proposed so far. Most of the existing work follows either a content-based or a network-based approach. The former is based on user-generated content while the latter exploits the structure of the network of users. In this paper, we propose a more generic approach, which incorporates not only both content-based and network-based features, but also other available information into a unified model. Our approach, named Multi-Entry Neural Network (MENET), leverages the latest advances in deep learning and multiview learning. A realization of MENET with textual, network and metadata features results in an effective method for Twitter user geolocation, achieving the state of the art on two well-known datasets.
LGDec 21, 2017
Multiview Deep Learning for Predicting Twitter Users' LocationTien Huu Do, Duc Minh Nguyen, Evaggelia Tsiligianni et al.
The problem of predicting the location of users on large social networks like Twitter has emerged from real-life applications such as social unrest detection and online marketing. Twitter user geolocation is a difficult and active research topic with a vast literature. Most of the proposed methods follow either a content-based or a network-based approach. The former exploits user-generated content while the latter utilizes the connection or interaction between Twitter users. In this paper, we introduce a novel method combining the strength of both approaches. Concretely, we propose a multi-entry neural network architecture named MENET leveraging the advances in deep learning and multiview learning. The generalizability of MENET enables the integration of multiple data representations. In the context of Twitter user geolocation, we realize MENET with textual, network, and metadata features. Considering the natural distribution of Twitter users across the concerned geographical area, we subdivide the surface of the earth into multi-scale cells and train MENET with the labels of the cells. We show that our method outperforms the state of the art by a large margin on three benchmark datasets.
CVAug 28, 2017
Deep Learning Sparse Ternary Projections for Compressed Sensing of ImagesDuc Minh Nguyen, Evaggelia Tsiligianni, Nikos Deligiannis
Compressed sensing (CS) is a sampling theory that allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix. The CS theory is based on random Gaussian projection matrices, which satisfy recovery guarantees with high probability; however, sparse ternary {0, -1, +1} projections are more suitable for hardware implementation. In this paper, we present a deep learning approach to obtain very sparse ternary projections for compressed sensing. Our deep learning architecture jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion. The experimental results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity.