Joseph F. JaJa

3papers

3 Papers

CVAug 3, 2022Code

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

Jun Wang, Mingfei Gao, Yuqian Hu et al.

Text-VQA aims at answering questions that require understanding the textual cues in an image. Despite the great progress of existing Text-VQA methods, their performance suffers from insufficient human-labeled question-answer (QA) pairs. However, we observe that, in general, the scene text is not fully exploited in the existing datasets -- only a small portion of the text in each image participates in the annotated QA activities. This results in a huge waste of useful information. To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image. Specifically, we propose, TAG, a text-aware visual question-answer generation architecture that learns to produce meaningful, and accurate QA samples using a multimodal transformer. The architecture exploits underexplored scene text information and enhances scene understanding of Text-VQA models by combining the generated QA pairs with the initial training data. Extensive experimental results on two well-known Text-VQA benchmarks (TextVQA and ST-VQA) demonstrate that our proposed TAG effectively enlarges the training data that helps improve the Text-VQA performance without extra labeling effort. Moreover, our model outperforms state-of-the-art approaches that are pre-trained with extra large-scale data. Code is available at https://github.com/HenryJunW/TAG.

SIOct 10, 2019

Graph Coarsening with Preserved Spectral Properties

Yu Jin, Andreas Loukas, Joseph F. JaJa

Large-scale graphs are widely used to represent object relationships in many real world applications. The occurrence of large-scale graphs presents significant computational challenges to process, analyze, and extract information. Graph coarsening techniques are commonly used to reduce the computational load while attempting to maintain the basic structural properties of the original graph. As there is no consensus on the specific graph properties preserved by coarse graphs, how to measure the differences between original and coarse graphs remains a key challenge. In this work, we introduce a new perspective regarding the graph coarsening based on concepts from spectral graph theory. We propose and justify new distance functions that characterize the differences between original and coarse graphs. We show that the proposed spectral distance naturally captures the structural differences in the graph coarsening process. In addition, we provide efficient graph coarsening algorithms to generate graphs which provably preserve the spectral properties from original graphs. Experiments show that our proposed algorithms consistently achieve better results compared to previous graph coarsening methods on graph classification and block recovery tasks.

LGMay 20, 2018

Learning Graph-Level Representations with Recurrent Neural Networks

Yu Jin, Joseph F. JaJa

Recently a variety of methods have been developed to encode graphs into low-dimensional vectors that can be easily exploited by machine learning algorithms. The majority of these methods start by embedding the graph nodes into a low-dimensional vector space, followed by using some scheme to aggregate the node embeddings. In this work, we develop a new approach to learn graph-level representations, which includes a combination of unsupervised and supervised learning components. We start by learning a set of node representations in an unsupervised fashion. Graph nodes are mapped into node sequences sampled from random walk approaches approximated by the Gumbel-Softmax distribution. Recurrent neural network (RNN) units are modified to accommodate both the node representations as well as their neighborhood information. Experiments on standard graph classification benchmarks demonstrate that our proposed approach achieves superior or comparable performance relative to the state-of-the-art algorithms in terms of convergence speed and classification accuracy. We further illustrate the effectiveness of the different components used by our approach.