Wang Xia

5papers

65citations

Novelty33%

AI Score25

Ranked #170,399 of 205,806 authors (top 83%)#51,581 in CV (top 88%)

5 Papers

CVAug 23, 2024Code

Image Segmentation in Foundation Model Era: A Survey

Tianfei Zhou, Wang Xia, Fei Zhang et al.

Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image segmentation or developing dedicated segmentation foundation models (e.g., SAM). These approaches not only deliver superior segmentation performance, but also herald newfound segmentation capabilities previously unseen in deep learning context. However, current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation. We investigate two basic lines of research -- generic image segmentation (i.e., semantic segmentation, instance segmentation, panoptic segmentation), and promptable image segmentation (i.e., interactive segmentation, referring segmentation, few-shot segmentation) -- by delineating their respective task settings, background concepts, and key challenges. Furthermore, we provide insights into the emergence of segmentation knowledge from FMs like CLIP, Stable Diffusion, and DINO. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts. Subsequently, we engage in a discussion of open issues and potential avenues for future research. We envisage that this fresh, comprehensive, and systematic survey catalyzes the evolution of advanced image segmentation systems. A public website is created to continuously track developments in this fast advancing field: \url{https://github.com/stanley-313/ImageSegFM-Survey}.

CVJun 11, 2022

VAC2: Visual Analysis of Combined Causality in Event Sequences

Sujia Zhu, Yue Shen, Zihao Zhu et al.

Identifying causality behind complex systems plays a significant role in different domains, such as decision making, policy implementations, and management recommendations. However, existing causality studies on temporal event sequences data mainly focus on individual causal discovery, which is incapable of exploiting combined causality. To fill the absence of combined causes discovery on temporal event sequence data,eliminating and recruiting principles are defined to balance the effectiveness and controllability on cause combinations. We also leverage the Granger causality algorithm based on the reactive point processes to describe impelling or inhibiting behavior patterns among entities. In addition, we design an informative and aesthetic visual metaphor of "electrocircuit" to encode aggregated causality for ensuring our causality visualization is non-overlapping and non-intersecting. Diverse sorting strategies and aggregation layout are also embedded into our parallel-based, directed and weighted hypergraph for illustrating combined causality. Our developed combined causality visual analysis system can help users effectively explore combined causes as well as an individual cause. This interactive system supports multi-level causality exploration with diverse ordering strategies and a focus and context technique to help users obtain different levels of information abstraction. The usefulness and effectiveness of the system are further evaluated by conducting a pilot user study and two case studies on event sequence data.

AIAug 10, 2023

C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT

Pan Liang, Danwei Ye, Zihao Zhu et al.

Large language models (LLMs), such as ChatGPT, have demonstrated outstanding performance in various fields, particularly in natural language understanding and generation tasks. In complex application scenarios, users tend to engage in multi-turn conversations with ChatGPT to keep contextual information and obtain comprehensive responses. However, human forgetting and model contextual forgetting remain prominent issues in multi-turn conversation scenarios, which challenge the users' conversation comprehension and contextual continuity for ChatGPT. To address these challenges, we propose an interactive conversation visualization system called C5, which includes Global View, Topic View, and Context-associated Q\&A View. The Global View uses the GitLog diagram metaphor to represent the conversation structure, presenting the trend of conversation evolution and supporting the exploration of locally salient features. The Topic View is designed to display all the question and answer nodes and their relationships within a topic using the structure of a knowledge graph, thereby display the relevance and evolution of conversations. The Context-associated Q\&A View consists of three linked views, which allow users to explore individual conversations deeply while providing specific contextual information when posing questions. The usefulness and effectiveness of C5 were evaluated through a case study and a user study.

HCMay 26, 2022

DGSVis: Visual Analysis of Hierarchical Snapshots in Dynamic Graph

Baofeng Chang, Sujia Zhu, Qi Jiang et al.

Dynamic graph visualization attracts researchers' concentration as it represents time-varying relationships between entities in multiple domains (e.g., social media analysis, academic cooperation analysis, team sports analysis). Integrating visual analytic methods is consequential in presenting, comparing, and reviewing dynamic graphs. Even though dynamic graph visualization is developed for many years, how to effectively visualize large-scale and time-intensive dynamic graph data with subtle changes is still challenging for researchers. To provide an effective analysis method for this type of dynamic graph data, we propose a snapshot generation algorithm involving Human-In-Loop to help users divide the dynamic graphs into multi-granularity and hierarchical snapshots for further analysis. In addition, we design a visual analysis prototype system (DGSVis) to assist users in accessing the dynamic graph insights effectively. DGSVis integrates a graphical operation interface to help users generate snapshots visually and interactively. It is equipped with the overview and details for visualizing hierarchical snapshots of the dynamic graph data. To illustrate the usability and efficiency of our proposed methods for this type of dynamic graph data, we introduce two case studies based on basketball player networks in a competition. In addition, we conduct an evaluation and receive exciting feedback from experienced visualization experts.

CVJun 18, 2024

LFMamba: Light Field Image Super-Resolution with State Space Model

Wang xia, Yao Lu, Shunzhou Wang et al.

Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.