Bin Wang

h-index18

3papers

532citations

Novelty40%

AI Score29

Ranked #143,211 of 194,257 authors (top 74%)#25,481 in CL (top 83%)

3 Papers

45.4CVNov 29, 2023Code

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

Hallucination, posed as a pervasive challenge of multi-modal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with either training with specific designed data or inferencing with external knowledge from other sources, incurring inevitable additional costs. In this paper, we present OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a Retrospection-Allocation strategy, serving as a nearly free lunch to alleviate the hallucination issue without additional data, knowledge, or training. Our approach begins with an interesting observation that, most hallucinations are closely tied to the knowledge aggregation patterns manifested in the self-attention matrix, i.e., MLLMs tend to generate new tokens by focusing on a few summary tokens, but not all the previous tokens. Such partial over-trust inclination results in the neglecting of image tokens and describes the image content with hallucination. Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy that retrospects the presence of summary tokens in the previously generated tokens, and re-allocate the token selection if necessary. With extensive experiments, OPERA shows significant hallucination-mitigating performance on different MLLMs and metrics, proving its effectiveness and generality. Our code is available at: https://github.com/shikiw/OPERA.

2.9CLJun 18, 2023

UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning

Kang Zhao, Wei Liu, Jian Luan et al.

Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation. Specifically, we decompose the main task into three subtasks based on probability graphs: 1) conversation summarization, 2) memory retrieval, 3) memory-augmented generation. Each subtask involves learning a representation for calculating the relevance between the query and memory, which is modelled by inserting a special token at the beginning of the decoder input. The relevance representation learning strengthens the connection across subtasks through parameter sharing and joint training. Extensive experimental results show that the proposed method consistently improves over strong baselines and yields better dialogue consistency and engagingness.

3.3SPDec 7, 2018

Synthetic Dynamic PMU Data Generation: A Generative Adversarial Network Approach

Xiangtian Zheng, Bin Wang, Le Xie

This paper concerns with the production of synthetic phasor measurement unit (PMU) data for research and education purposes. Due to the confidentiality of real PMU data and no public access to the real power systems infrastructure information, the lack of credible realistic data becomes a growing concern. Instead of constructing synthetic power grids and then producing synthetic PMU measurement data by time simulations, we propose a model-free approach to directly generate synthetic PMU data. we train the generative adversarial network (GAN) with real PMU data, which can be used to generate synthetic PMU data capturing the system dynamic behaviors. To validate the sequential generation by GAN to mimic PMU data, we theoretically analyze GAN's capacity of learning system dynamics. Further by evaluating the synthetic PMU data by a proposed quantitative method, we verify GAN's potential to synthesize realistic samples and meanwhile realize that GAN model in this paper still has room to improve. Moreover it is the first time that such generative model is applied to synthesize PMU data.