CLSep 23, 2023
Diversifying Question Generation over Knowledge Base via External Natural QuestionsShasha Guo, Jing Zhang, Xirui Ke et al.
Previous methods on knowledge base question generation (KBQG) primarily focus on enhancing the quality of a single generated question. Recognizing the remarkable paraphrasing ability of humans, we contend that diverse texts should convey the same semantics through varied expressions. The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity. Current metrics inadequately assess the above diversity since they calculate the ratio of unique n-grams in the generated question itself, which leans more towards measuring duplication rather than true diversity. Accordingly, we devise a new diversity evaluation metric, which measures the diversity among top-k generated questions for each instance while ensuring their relevance to the ground truth. Clearly, the second challenge is how to enhance diversifying question generation. To address this challenge, we introduce a dual model framework interwoven by two selection strategies to generate diverse questions leveraging external natural questions. The main idea of our dual framework is to extract more diverse expressions and integrate them into the generation model to enhance diversifying question generation. Extensive experiments on widely used benchmarks for KBQG demonstrate that our proposed approach generates highly diverse questions and improves the performance of question answering tasks.
CVOct 13, 2025Code
GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line CreationShasha Guo, Liang Pang, Xi Wang et al.
Auxiliary lines are essential for solving complex geometric problems but remain challenging for large vision-language models (LVLMs). Rather than editing diagrams to draw auxiliary lines, which current image editing models struggle to render with geometric precision, we generate textual descriptions of auxiliary-line constructions to better align with the representational strengths of LVLMs. To bridge the gap between textual descriptions and spatial structure, we propose a reinforcement learning framework that enhances diagram-text alignment. At the core of our approach is a cross-modal reward that evaluates how well the generated auxiliary-line description for an original diagram matches a ground-truth auxiliary-line diagram. Built on this reward, we present GeoVLMath, an open-source LVLM tailored to auxiliary-line reasoning in solid geometry. This fine-grained signal drives a GRPO-based RL stage, yielding precise diagram-text alignment. To support training, we develop a scalable data creation pipeline and construct AuxSolidMath, a dataset of 3,018 real-exam geometry problems with paired diagrams and aligned textual fields. At the 3B and 7B scales, GeoVLMath achieves competitive and often superior performance compared with strong open-source and proprietary LVLMs on auxiliary-line reasoning benchmarks.
CLMar 9, 2025Code
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question AnsweringYanling Wang, Yihan Zhao, Xiaodong Chen et al.
Large vision-language models (LVLMs) have demonstrated remarkable achievements, yet the generation of non-factual responses remains prevalent in fact-seeking question answering (QA). Current multimodal fact-seeking benchmarks primarily focus on comparing model outputs to ground truth answers, providing limited insights into the performance of modality-specific modules. To bridge this gap, we introduce VisualSimpleQA, a multimodal fact-seeking benchmark with two key features. First, it enables streamlined and decoupled evaluation of LVLMs in visual and linguistic modalities. Second, it incorporates well-defined difficulty criteria to guide human annotation and facilitates the extraction of a challenging subset, VisualSimpleQA-hard. Experiments on 15 LVLMs show that even state-of-the-art models such as GPT-4o achieve merely 60%+ correctness in multimodal fact-seeking QA on VisualSimpleQA and 30%+ on VisualSimpleQA-hard. Furthermore, the decoupled evaluation across these models highlights substantial opportunities for improvement in both visual and linguistic modules. The dataset is available at https://huggingface.co/datasets/WYLing/VisualSimpleQA.
CLApr 2, 2024
SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question GenerationShasha Guo, Lizi Liao, Jing Zhang et al.
Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB. Existing methods have significantly boosted the performance of KBQG via pre-trained language models (PLMs) thanks to the richly endowed semantic knowledge. With the advance of pre-training techniques, large language models (LLMs) (e.g., GPT-3.5) undoubtedly possess much more semantic knowledge. Therefore, how to effectively organize and exploit the abundant knowledge for KBQG becomes the focus of our study. In this work, we propose SGSH--a simple and effective framework to Stimulate GPT-3.5 with Skeleton Heuristics to enhance KBQG. The framework incorporates "skeleton heuristics", which provides more fine-grained guidance associated with each input to stimulate LLMs to generate optimal questions, encompassing essential elements like the question phrase and the auxiliary verb.More specifically, we devise an automatic data construction strategy leveraging ChatGPT to construct a skeleton training dataset, based on which we employ a soft prompting approach to train a BART model dedicated to generating the skeleton associated with each input. Subsequently, skeleton heuristics are encoded into the prompt to incentivize GPT-3.5 to generate desired questions. Extensive experiments demonstrate that SGSH derives the new state-of-the-art performance on the KBQG tasks.
CLFeb 28, 2024
A Survey on Neural Question Generation: Methods, Applications, and ProspectsShasha Guo, Lizi Liao, Cuiping Li et al.
In this survey, we present a detailed examination of the advancements in Neural Question Generation (NQG), a field leveraging neural network techniques to generate relevant questions from diverse inputs like knowledge bases, texts, and images. The survey begins with an overview of NQG's background, encompassing the task's problem formulation, prevalent benchmark datasets, established evaluation metrics, and notable applications. It then methodically classifies NQG approaches into three predominant categories: structured NQG, which utilizes organized data sources, unstructured NQG, focusing on more loosely structured inputs like texts or visual content, and hybrid NQG, drawing on diverse input modalities. This classification is followed by an in-depth analysis of the distinct neural network models tailored for each category, discussing their inherent strengths and potential limitations. The survey culminates with a forward-looking perspective on the trajectory of NQG, identifying emergent research trends and prospective developmental paths. Accompanying this survey is a curated collection of related research papers, datasets and codes, systematically organized on Github, providing an extensive reference for those delving into NQG.
CVJun 13, 2024
A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event CamerasChenyang Shi, Shasha Guo, Boyi Wei et al.
Event cameras are renowned for their high efficiency due to outputting a sparse, asynchronous stream of events. However, they are plagued by noisy events, especially in low light conditions. Denoising is an essential task for event cameras, but evaluating denoising performance is challenging. Label-dependent denoising metrics involve artificially adding noise to clean sequences, complicating evaluations. Moreover, the majority of these metrics are monotonic, which can inflate scores by removing substantial noise and valid events. To overcome these limitations, we propose the first label-free and non-monotonic evaluation metric, the area of the continuous contrast curve (AOCC), which utilizes the area enclosed by event frame contrast curves across different time intervals. This metric is inspired by how events capture the edge contours of scenes or objects with high temporal resolution. An effective denoising method removes noise without eliminating these edge-contour events, thus preserving the contrast of event frames. Consequently, contrast across various time ranges serves as a metric to assess denoising effectiveness. As the time interval lengthens, the curve will initially rise and then fall. The proposed metric is validated through both theoretical and experimental evidence.
CVJun 2, 2020
SeqXFilter: A Memory-efficient Denoising Filter for Dynamic Vision SensorsShasha Guo, Lei Wang, Xiaofan Chen et al.
Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imaging sensors. However, they are sensitive to background activity (BA) events that are unwanted. There are some filters for tackling this problem based on spatio-temporal correlation. However, they are either memory-intensive or computing-intensive. We propose \emph{SeqXFilter}, a spatio-temporal correlation filter with only a past event window that has an O(1) space complexity and has simple computations. We explore the spatial correlation of an event with its past few events by analyzing the distribution of the events when applying different functions on the spatial distances. We find the best function to check the spatio-temporal correlation for an event for \emph{SeqXFilter}, best separating real events and noise events. We not only give the visual denoising effect of the filter but also use two metrics for quantitatively analyzing the filter's performance. Four neuromorphic event-based datasets, recorded from four DVS with different output sizes, are used for validation of our method. The experimental results show that \emph{SeqXFilter} achieves similar performance as baseline NNb filters, but with extremely small memory cost and simple computation logic.
CVApr 6, 2020
Exploration of Input Patterns for Enhancing the Performance of Liquid State MachinesShasha Guo, Lianhua Qu, Lei Wang et al.
Spiking Neural Networks (SNN) have gained increasing attention for its low power consumption. But training SNN is challenging. Liquid State Machine (LSM), as a major type of Reservoir computing, has been widely recognized for its low training cost among SNNs. The exploration of LSM topology for enhancing performance often requires hyper-parameter search, which is both resource-expensive and time-consuming. We explore the influence of input scale reduction on LSM instead. There are two main reasons for studying input reduction of LSM. One is that the input dimension of large images requires efficient processing. Another one is that input exploration is generally more economic than architecture search. To mitigate the difficulty in effectively dealing with huge input spaces of LSM, and to find that whether input reduction can enhance LSM performance, we explore several input patterns, namely fullscale, scanline, chessboard, and patch. Several datasets have been used to evaluate the performance of the proposed input patterns, including two spatio image datasets and one spatio-temporal image database. The experimental results show that the reduced input under chessboard pattern improves the accuracy by up to 5%, and reduces execution time by up to 50% with up to 75\% less input storage than the fullscale input pattern for LSM.