CVJan 17, 2023Code
Face Inverse Rendering via Hierarchical DecouplingMeng Wang, Xiaojie Guo, Wenjing Dai et al.
Previous face inverse rendering methods often require synthetic data with ground truth and/or professional equipment like a lighting stage. However, a model trained on synthetic data or using pre-defined lighting priors is typically unable to generalize well for real-world situations, due to the gap between synthetic data/lighting priors and real data. Furthermore, for common users, the professional equipment and skill make the task expensive and complex. In this paper, we propose a deep learning framework to disentangle face images in the wild into their corresponding albedo, normal, and lighting components. Specifically, a decomposition network is built with a hierarchical subdivision strategy, which takes image pairs captured from arbitrary viewpoints as input. In this way, our approach can greatly mitigate the pressure from data preparation, and significantly broaden the applicability of face inverse rendering. Extensive experiments are conducted to demonstrate the efficacy of our design, and show its superior performance in face relighting over other state-of-the-art alternatives. {Our code is available at \url{https://github.com/AutoHDR/HD-Net.git}}
SEApr 15Code
Figma2Code: Automating Multimodal Design to Code in the WildYi Gui, Jiawan Zhang, Yina Wang et al.
Front-end development constitutes a substantial portion of software engineering, yet converting design mockups into production-ready User Interface (UI) code remains tedious and costly. While recent work has explored automating this process with Multimodal Large Language Models (MLLMs), existing approaches typically rely solely on design images. As a result, they must infer complex UI details from images alone, often leading to degraded results. In real-world development workflows, however, design mockups are usually delivered as Figma files, a widely used tool for front-end design, that embed rich multimodal information (e.g., metadata and assets) essential for generating high-quality UI. To bridge this gap, we introduce Figma2Code, a new task that advances design-to-code into a multimodal setting and aims to automate design-to-code in the wild. Specifically, we collect paired design images and their corresponding metadata files from the Figma community. We then apply a series of processing operations, including rule-based filtering, human- and MLLM-based annotation and screening, and metadata refinement. This process yields 3,055 samples, from which designers curate a balanced dataset of 213 high-quality cases. Using this dataset, we benchmark ten state-of-the-art open-source and proprietary MLLMs. Our results show that while proprietary models achieve superior visual fidelity, they remain limited in layout responsiveness and code maintainability. Further experiments across modalities and ablation studies corroborate this limitation, partly due to models' tendency to directly map primitive visual attributes from Figma metadata.
CVAug 6, 2022
Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature FusionFangzhou Gao, Meng Wang, Lianghao Zhang et al.
Uncalibrated photometric stereo is proposed to estimate the detailed surface normal from images under varying and unknown lightings. Recently, deep learning brings powerful data priors to this underdetermined problem. This paper presents a new method for deep uncalibrated photometric stereo, which efficiently utilizes the inter-image representation to guide the normal estimation. Previous methods use optimization-based neural inverse rendering or a single size-independent pooling layer to deal with multiple inputs, which are inefficient for utilizing information among input images. Given multi-images under different lighting, we consider the intra-image and inter-image variations highly correlated. Motivated by the correlated variations, we designed an inter-intra image feature fusion module to introduce the inter-image representation into the per-image feature extraction. The extra representation is used to guide the per-image feature extraction and eliminate the ambiguity in normal estimation. We demonstrate the effect of our design on a wide range of samples, especially on dark materials. Our method produces significantly better results than the state-of-the-art methods on both synthetic and real data.
CVJan 5Code
Face Normal Estimation from Rags to RichesMeng Wang, Wenjing Dai, Jiawan Zhang et al.
Although recent approaches to face normal estimation have achieved promising results, their effectiveness heavily depends on large-scale paired data for training. This paper concentrates on relieving this requirement via developing a coarse-to-fine normal estimator. Concretely, our method first trains a neat model from a small dataset to produce coarse face normals that perform as guidance (called exemplars) for the following refinement. A self-attention mechanism is employed to capture long-range dependencies, thus remedying severe local artifacts left in estimated coarse facial normals. Then, a refinement network is customized for the sake of mapping input face images together with corresponding exemplars to fine-grained high-quality facial normals. Such a logical function split can significantly cut the requirement of massive paired data and computational resource. Extensive experiments and ablation studies are conducted to demonstrate the efficacy of our design and reveal its superiority over state-of-the-art methods in terms of both training expense as well as estimation quality. Our code and models are open-sourced at: https://github.com/AutoHDR/FNR2R.git.
CVDec 18, 2025
Adaptive Frequency Domain Alignment Network for Medical image segmentationZhanwei Li, Liang Li, Jiawan Zhang
High-quality annotated data plays a crucial role in achieving accurate segmentation. However, such data for medical image segmentation are often scarce due to the time-consuming and labor-intensive nature of manual annotation. To address this challenge, we propose the Adaptive Frequency Domain Alignment Network (AFDAN)--a novel domain adaptation framework designed to align features in the frequency domain and alleviate data scarcity. AFDAN integrates three core components to enable robust cross-domain knowledge transfer: an Adversarial Domain Learning Module that transfers features from the source to the target domain; a Source-Target Frequency Fusion Module that blends frequency representations across domains; and a Spatial-Frequency Integration Module that combines both frequency and spatial features to further enhance segmentation accuracy across domains. Extensive experiments demonstrate the effectiveness of AFDAN: it achieves an Intersection over Union (IoU) of 90.9% for vitiligo segmentation in the newly constructed VITILIGO2025 dataset and a competitive IoU of 82.6% on the retinal vessel segmentation benchmark DRIVE, surpassing existing state-of-the-art approaches.
CLSep 24, 2025
Instruction Boundary: Quantifying Biases in LLM Reasoning under Various CoverageZipeng Ling, Yuehao Tang, Chen Huang et al.
Nowadays, automatically generated datasets are increasingly used in LLM reasoning tasks; however, large-scale corpora often contain inherent flaws. For example, a single-choice question may include none or multiple correct options, while true-or-false questions may involve vague or unverifiable statements. We refer to these exceptional answer forms as sparse labels. To compare LLMs' ability to recognize various question forms and produce correct answers, we investigate how different instruction formats can either facilitate or mislead LLM reasoning ability. We introduce the concept of Instruction Boundary, which systematically analyzes how different levels of prompt coverage -- sufficient, redundant, or insufficient -- can lead to reasoning biases and performance changes in LLMs. To examine this phenomenon, we design eight experimental settings across five dataset forms. We further propose BiasDetector, a unified framework that quantifies LLMs' ability to identify sparse labels under different kinds of Instruction Boundary conditions. Evaluations on five mainstream LLMs show that, despite their seemingly high accuracy, substantial reasoning biases persist in many downstream tasks as a direct consequence of prompt coverage. We analyze the impact of these biases and outline possible mitigation strategies. Our findings highlight not only the importance of addressing sparse labels, but also the need for developers to recognize and mitigate the risks introduced by Instruction Boundary.
HCAug 15, 2021
Towards Visual Explainable Active Learning for Zero-Shot ClassificationShichao Jia, Zeyu Li, Nuo Chen et al.
Zero-shot classification is a promising paradigm to solve an applicable problem when the training classes and test classes are disjoint. Achieving this usually needs experts to externalize their domain knowledge by manually specifying a class-attribute matrix to define which classes have which attributes. Designing a suitable class-attribute matrix is the key to the subsequent procedure, but this design process is tedious and trial-and-error with no guidance. This paper proposes a visual explainable active learning approach with its design and implementation called semantic navigator to solve the above problems. This approach promotes human-AI teaming with four actions (ask, explain, recommend, respond) in each interaction loop. The machine asks contrastive questions to guide humans in the thinking process of attributes. A novel visualization called semantic map explains the current status of the machine. Therefore analysts can better understand why the machine misclassifies objects. Moreover, the machine recommends the labels of classes for each attribute to ease the labeling burden. Finally, humans can steer the model by modifying the labels interactively, and the machine adjusts its recommendations. The visual explainable active learning approach improves humans' efficiency of building zero-shot classification models interactively, compared with the method without guidance. We justify our results with user studies using the standard benchmarks for zero-shot classification.
HCApr 24, 2021
Regshock: Interactive Visual Analytics of Systemic Risk in Financial NetworksZhibin Niu, Junqi Wu, Dawei Cheng et al.
Financial regulatory agencies are struggling to manage the systemic risks attributed to negative economic shocks. Preventive interventions are prominent to eliminate the risks and help to build a more resilient financial system. Although tremendous efforts have been made to measure multi-risk severity levels, understand the contagion behaviors and other risk management problems, there still lacks a theoretical framework revealing what and how regulatory intervention measurements can mitigate systemic risk. Here we demonstrate regshock, a practical visual analytical approach to support the exploration and evaluation of financial regulation measurements. We propose risk-island, an unprecedented risk-centered visualization algorithm to help uncover the risk patterns while preserving the topology of financial networks. We further propose regshock, a novel visual exploration and assessment approach based on the simulation-intervention-evaluation analysis loop, to provide a heuristic surgical intervention capability for systemic risk mitigation. We evaluate our approach through extensive case studies and expert reviews. To our knowledge, this is the first practical systemic method for the financial network intervention and risk mitigation problem; our validated approach potentially improves the risk management and control capabilities of financial experts.
CYJul 7, 2020
regvis.net -- A Visual Bibliography of Regulatory VisualizationZhibin Niu, Runlin Li, Junqi Wu et al.
Information visualization and visual analytics technology has attracted significant attention from the financial regulation community. In this research, we present regvis.net, a visual survey of regulatory visualization that allows researchers from both the computing and financial communities to review their literature of interest. We have collected and manually tagged more than 80 regulation visualization related publications. To the best of our knowledge, this is the first publication set tailored for regulatory visualization. We have provided a webpage (http://regvis.net) for interactive searches and filtering. Each publication is represented by a thumbnail of the representative system interface or key visualization chart, and users can conduct multi-condition screening explorations and fixed text searches.
HCJun 16, 2020
$E^3$: Visual Exploration of Spatiotemporal Energy DemandJunqi Wu, Zhibin Niu, Jing Wu et al.
Understanding demand-side energy behaviour is critical for making efficiency responses for energy demand management. We worked closely with energy experts and identified the key elements of the energy demand problem including temporal and spatial demand and shifts in spatiotemporal demand. To our knowledge, no previous research has investigated the shifts in spatiotemporal demand. To fill this research gap, we propose a unified visual analytics approach to support exploratory demand analysis; we developed E3, a highly interactive tool that support users in making and verifying hypotheses through human-client-server interactions. A novel potential flow based approach was formalized to model shifts in energy demand and integrated into a server-side engine. Experts then evaluated and affirmed the usefulness of this approach through case studies of real-world electricity data. In the future, we will improve the modelling algorithm, enhance visualisation, and expand the process to support more forms of energy data.
CVJul 10, 2019
Dunhuang Grottoes Painting Dataset and BenchmarkTianxiu Yu, Shijie Zhang, Cong Lin et al.
This document introduces the background and the usage of the Dunhuang Grottoes Dataset and the benchmark. The documentation first starts with the background of the Dunhuang Grotto, which is widely recognised as an priceless heritage. Given that digital method is the modern trend for heritage protection and restoration. Follow the trend, we release the first public dataset for Dunhuang Grotto Painting restoration. The rest of the documentation details the painting data generation. To enable a data driven fashion, this dataset provided a large number of training and testing example which is sufficient for a deep learning approach. The detailed usage of the dataset as well as the benchmark is described.
CVMay 24, 2019
OVSNet : Towards One-Pass Real-Time Video Object SegmentationPeng Sun, Peiwen Lin, Guangliang Cheng et al.
Video object segmentation aims at accurately segmenting the target object regions across consecutive frames. It is technically challenging for coping with complicated factors (e.g., shape deformations, occlusion and out of the lens). Recent approaches have largely solved them by using backforth re-identification and bi-directional mask propagation. However, their methods are extremely slow and only support offline inference, which in principle cannot be applied in real time. Motivated by this observation, we propose a efficient detection-based paradigm for video object segmentation. We propose an unified One-Pass Video Segmentation framework (OVS-Net) for modeling spatial-temporal representation in a unified pipeline, which seamlessly integrates object detection, object segmentation, and object re-identification. The proposed framework lends itself to one-pass inference that effectively and efficiently performs video object segmentation. Moreover, we propose a maskguided attention module for modeling the multi-scale object boundary and multi-level feature fusion. Experiments on the challenging DAVIS 2017 demonstrate the effectiveness of the proposed framework with comparable performance to the state-of-the-art, and the great efficiency about 11.5 FPS towards pioneering real-time work to our knowledge, more than 5 times faster than other state-of-the-art methods.
CVMay 4, 2019
Kindling the Darkness: A Practical Low-light Image EnhancerYonghua Zhang, Jiawan Zhang, Xiaojie Guo
Images captured under low-light conditions often suffer from (partially) poor visibility. Besides unsatisfactory lightings, multiple types of degradations, such as noise and color distortion due to the limited quality of cameras, hide in the dark. In other words, solely turning up the brightness of dark regions will inevitably amplify hidden artifacts. This work builds a simple yet effective network for \textbf{Kin}dling the \textbf{D}arkness (denoted as KinD), which, inspired by Retinex theory, decomposes images into two components. One component (illumination) is responsible for light adjustment, while the other (reflectance) for degradation removal. In such a way, the original space is decoupled into two smaller subspaces, expecting to be better regularized/learned. It is worth to note that our network is trained with paired images shot under different exposure conditions, instead of using any ground-truth reflectance and illumination information. Extensive experiments are conducted to demonstrate the efficacy of our design and its superiority over state-of-the-art alternatives. Our KinD is robust against severe visual defects, and user-friendly to arbitrarily adjust light levels. In addition, our model spends less than 50ms to process an image in VGA resolution on a 2080Ti GPU. All the above merits make our KinD attractive for practical use.
CVMar 20, 2019
Single Image Deraining: A Comprehensive Benchmark AnalysisSiyuan Li, Iago Breno Araujo, Wenqi Ren et al.
We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images.This dataset highlights diverse data sources and image contents, and is divided into three subsets (rain streak, rain drop, rain and mist), each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on the dataset shed light on the comparisons and limitations of state-of-the-art deraining algorithms, and suggest promising future directions.
CVFeb 28, 2019
PFLD: A Practical Facial Landmark DetectorXiaojie Guo, Siyuan Li, Jinke Yu et al.
Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at \url{http://sites.google.com/view/xjguo/fld} for encouraging comparisons and improvements from the community.
CVApr 8, 2018
Fast Single Image Rain Removal via a Deep Decomposition-Composition NetworkSiyuan LI, Wenqi Ren, Jiawan Zhang et al.
Rain effect in images typically is annoying for many multimedia and computer vision tasks. For removing rain effect from a single image, deep leaning techniques have been attracting considerable attentions. This paper designs a novel multi-task leaning architecture in an end-to-end manner to reduce the mapping range from input to output and boost the performance. Concretely, a decomposition net is built to split rain images into clean background and rain layers. Different from previous architectures, our model consists of, besides a component representing the desired clean image, an extra component for the rain layer. During the training phase, we further employ a composition structure to reproduce the input by the separated clean image and rain information for improving the quality of decomposition. Experimental results on both synthetic and real images are conducted to reveal the high-quality recovery by our design, and show its superiority over other state-of-the-art methods. Furthermore, our design is also applicable to other layer decomposition tasks like dust removal. More importantly, our method only requires about 50ms, significantly faster than the competitors, to process a testing image in VGA resolution on a GTX 1080 GPU, making it attractive for practical use.
SIApr 6, 2017
Visual analytics for networked-guarantee loans risk managementZhibin Niu, Dawei Cheng, Liqing Zhang et al.
Groups of enterprises guarantee each other and form complex guarantee networks when they try to obtain loans from banks. Such secured loan can enhance the solvency and promote the rapid growth in the economic upturn period. However, potential systemic risk may happen within the risk binding community. Especially, during the economic down period, the crisis may spread in the guarantee network like a domino. Monitoring the financial status, preventing or reducing systematic risk when crisis happens is highly concerned by the regulatory commission and banks. We propose visual analytics approach for loan guarantee network risk management, and consolidate the five analysis tasks with financial experts: i) visual analytics for enterprises default risk, whereby a hybrid representation is devised to predict the default risk and developed an interface to visualize key indicators; ii) visual analytics for high default groups, whereby a community detection based interactive approach is presented; iii) visual analytics for high defaults pattern, whereby a motif detection based interactive approach is described, and we adopt a Shneiderman Mantra strategy to reduce the computation complexity. iv) visual analytics for evolving guarantee network, whereby animation is used to help understanding the guarantee dynamic; v) visual analytics approach and interface for default diffusion path. The temporal diffusion path analysis can be useful for the government and bank to monitor the default spread status. It also provides insight for taking precautionary measures to prevent and dissolve systemic financial risk. We implement the system with case studies on a real-world guarantee network. Two financial experts are consulted with endorsement on the developed tool. To the best of our knowledge, this is the first visual analytics tool to explore the guarantee network risks in a systematic manner.
CVDec 20, 2016
Automatic Generation of Grounded Visual QuestionsShijie Zhang, Lizhen Qu, Shaodi You et al.
In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.