SEApr 26, 2022
GypSum: Learning Hybrid Representations for Code SummarizationYu Wang, Yu Dong, Xuesong Lu et al.
Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.
50.2HCApr 19
What Security and Privacy Transparency Users Need from Consumer-Facing Generative AIJiaxun Cao, Yu Dong, Chunxi Zhan et al.
Users increasingly rely on consumer-facing generative AI (GenAI) for tasks ranging from everyday needs to sensitive use cases. Yet, it remains unclear whether and how existing security and privacy (S&P) communications in GenAI tools shape users' adoption decisions and subsequent experiences. Understanding how users seek, interpret, and evaluate S&P information is critical for designing usable transparency that users can trust and act on. We conducted semi-structured interviews and design sessions with 21 U.S. GenAI users. We find that available S&P information rarely drove initial adoption in practice, as participants often perceived it as incomplete, ineffective, or lacking credibility. Instead, they relied on rough proxies, such as popularity, to infer S&P practices. After adoption, uncertainty about S&P practices constrained participants' willingness to use GenAI tools, particularly in high-stakes contexts, and, in some cases, contributed to discontinued use. Participants therefore called for transparency that supports decision-making and use, including trustworthy information (e.g., independent evaluations) and usable interfaces (e.g., on-demand disclosure). We synthesize participants' desired design practices into five dimensions to facilitate systematic future investigation into best practices. We conclude with recommendations for researchers, designers, and policymakers to improve S&P transparency in consumer-facing GenAI.
62.3GRMar 17
Toward Reliable Scientific Visualization Pipeline Construction with Structure-Aware Retrieval-Augmented LLMsGuanghui Zhao, Zhe Wang, Yu Dong et al.
Scientific visualization pipelines encode domain-specific procedural knowledge with strict execution dependencies, making their construction sensitive to missing stages, incorrect operator usage, or improper ordering. Thus, generating executable scientific visualization pipelines from natural-language descriptions remains challenging for large language models, particularly in web-based environments where visualization authoring relies on explicit code-level pipeline assembly. In this work, we investigate the reliability of LLM-based scientific visualization pipeline generation, focusing on vtk.js as a representative web-based visualization library. We propose a structure-aware retrieval-augmented generation workflow that provides pipeline-aligned vtk.js code examples as contextual guidance, supporting correct module selection, parameter configuration, and execution order. We evaluate the proposed workflow across multiple multi-stage scientific visualization tasks and LLMs, measuring reliability in terms of pipeline executability and human correction effort. To this end, we introduce correction cost as metric for the amount of manual intervention required to obtain a valid pipeline. Our results show that structured, domain-specific context substantially improves pipeline executability and reduces correction cost. We additionally provide an interactive analysis interface to support human-in-the-loop inspection and systematic evaluation of generated visualization pipelines.
GNJun 18, 2025Code
AuraGenome: An LLM-Powered Framework for On-the-Fly Reusable and Scalable Circular Genome VisualizationsChi Zhang, Yu Dong, Yang Wang et al.
Circular genome visualizations are essential for exploring structural variants and gene regulation. However, existing tools often require complex scripting and manual configuration, making the process time-consuming, error-prone, and difficult to learn. To address these challenges, we introduce AuraGenome, an LLM-powered framework for rapid, reusable, and scalable generation of multi-layered circular genome visualizations. AuraGenome combines a semantic-driven multi-agent workflow with an interactive visual analytics system. The workflow employs seven specialized LLM-driven agents, each assigned distinct roles such as intent recognition, layout planning, and code generation, to transform raw genomic data into tailored visualizations. The system supports multiple coordinated views tailored for genomic data, offering ring, radial, and chord-based layouts to represent multi-layered circular genome visualizations. In addition to enabling interactions and configuration reuse, the system supports real-time refinement and high-quality report export. We validate its effectiveness through two case studies and a comprehensive user study. AuraGenome is available at: https://github.com/Darius18/AuraGenome.
ROMay 2, 2021Code
A LiDAR Assisted Control Module with High Precision in Parking Scenarios for Autonomous Driving VehicleXin Xu, Yu Dong, Fan Zhu
Autonomous driving has been quite promising in recent years. The public has seen Robotaxi delivered by Waymo, Baidu, Cruise, and so on. While autonomous driving vehicles certainly have a bright future, we have to admit that it is still a long way to go for products such as Robotaxi. On the other hand, in less complex scenarios autonomous driving may have the potentiality to reliably outperform humans. For example, humans are good at interactive tasks (while autonomous driving systems usually do not), but we are often incompetent for tasks with strict precision demands. In this paper, we introduce a real-world, industrial scenario of which human drivers are not capable. The task required the ego vehicle to keep a stationary lateral distance (i.e. 3? <= 5 centimeters) with respect to a reference. To address this challenge, we redesigned the control module from Baidu Apollo open-source autonomous driving system. A precise (3? <= 2 centimeters) Error Feedback System was first built to partly replace the localization module. Then we investigated the control module thoroughly and added a real-time calibration algorithm to gain extra precision. We also built a simulation to fine-tune the control parameters. After all those works, the results are encouraging, showing that an end-to-end lateral precision with 3? <= 5 centimeters has been achieved. Further, we show that the results not only outperformed original Apollo modules but also beat specially trained and highly experienced human test drivers.
CVNov 17, 2025
A Trajectory-free Crash Detection Framework with Generative Approach and Segment Map DiffusionWeiying Shen, Hao Yu, Yu Dong et al.
Real-time crash detection is essential for developing proactive safety management strategy and enhancing overall traffic efficiency. To address the limitations associated with trajectory acquisition and vehicle tracking, road segment maps recording the individual-level traffic dynamic data were directly served in crash detection. A novel two-stage trajectory-free crash detection framework, was present to generate the rational future road segment map and identify crashes. The first-stage diffusion-based segment map generation model, Mapfusion, conducts a noisy-to-normal process that progressively adds noise to the road segment map until the map is corrupted to pure Gaussian noise. The denoising process is guided by sequential embedding components capturing the temporal dynamics of segment map sequences. Furthermore, the generation model is designed to incorporate background context through ControlNet to enhance generation control. Crash detection is achieved by comparing the monitored segment map with the generations from diffusion model in second stage. Trained on non-crash vehicle motion data, Mapfusion successfully generates realistic road segment evolution maps based on learned motion patterns and remains robust across different sampling intervals. Experiments on real-world crashes indicate the effectiveness of the proposed two-stage method in accurately detecting crashes.
CVSep 5, 2025
TemporalFlowViz: Parameter-Aware Visual Analytics for Interpreting Scramjet Combustion EvolutionYifei Jia, Shiyu Cheng, Yu Dong et al.
Understanding the complex combustion dynamics within scramjet engines is critical for advancing high-speed propulsion technologies. However, the large scale and high dimensionality of simulation-generated temporal flow field data present significant challenges for visual interpretation, feature differentiation, and cross-case comparison. In this paper, we present TemporalFlowViz, a parameter-aware visual analytics workflow and system designed to support expert-driven clustering, visualization, and interpretation of temporal flow fields from scramjet combustion simulations. Our approach leverages hundreds of simulated combustion cases with varying initial conditions, each producing time-sequenced flow field images. We use pretrained Vision Transformers to extract high-dimensional embeddings from these frames, apply dimensionality reduction and density-based clustering to uncover latent combustion modes, and construct temporal trajectories in the embedding space to track the evolution of each simulation over time. To bridge the gap between latent representations and expert reasoning, domain specialists annotate representative cluster centroids with descriptive labels. These annotations are used as contextual prompts for a vision-language model, which generates natural-language summaries for individual frames and full simulation cases. The system also supports parameter-based filtering, similarity-based case retrieval, and coordinated multi-view exploration to facilitate in-depth analysis. We demonstrate the effectiveness of TemporalFlowViz through two expert-informed case studies and expert feedback, showing TemporalFlowViz enhances hypothesis generation, supports interpretable pattern discovery, and enhances knowledge discovery in large-scale scramjet combustion analysis.
CVMay 23, 2023
WaveDM: Wavelet-Based Diffusion Models for Image RestorationYi Huang, Jiancheng Huang, Jianzhuang Liu et al.
Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM). WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. To ensure restoration performance, a unique training strategy is proposed where the low-frequency and high-frequency spectrums are learned using distinct modules. In addition, an Efficient Conditional Sampling (ECS) strategy is developed from experiments, which reduces the number of total sampling steps to around 5. Evaluations on twelve benchmark datasets including image raindrop removal, rain steaks removal, dehazing, defocus deblurring, demoiréing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100$\times$ faster than existing image restoration methods using vanilla diffusion models.
CLMay 18, 2023
A unified front-end framework for English text-to-speech synthesisZelin Ying, Chen Li, Yu Dong et al.
The front-end is a critical component of English text-to-speech (TTS) systems, responsible for extracting linguistic features that are essential for a text-to-speech model to synthesize speech, such as prosodies and phonemes. The English TTS front-end typically consists of a text normalization (TN) module, a prosody word prosody phrase (PWPP) module, and a grapheme-to-phoneme (G2P) module. However, current research on the English TTS front-end focuses solely on individual modules, neglecting the interdependence between them and resulting in sub-optimal performance for each module. Therefore, this paper proposes a unified front-end framework that captures the dependencies among the English TTS front-end modules. Extensive experiments have demonstrated that the proposed method achieves state-of-the-art (SOTA) performance in all modules.
MMDec 20, 2021
A Multi-user Oriented Live Free-viewpoint Video Streaming System Based On View InterpolationJingchuan Hu, Shuai Guo, Kai Zhou et al.
As an important application form of immersive multimedia services, free-viewpoint video(FVV) enables users with great immersive experience by strong interaction. However, the computational complexity of virtual view synthesis algorithms poses a significant challenge to the real-time performance of an FVV system. Furthermore, the individuality of user interaction makes it difficult to serve multiple users simultaneously for a system with conventional architecture. In this paper, we novelly introduce a CNN-based view interpolation algorithm to synthesis dense virtual views in real time. Based on this, we also build an end-to-end live free-viewpoint system with a multi-user oriented streaming strategy. Our system can utilize a single edge server to serve multiple users at the same time without having to bring a large view synthesis load on the client side. We analyze the whole system and show that our approaches give the user a pleasant immersive experience, in terms of both visual quality and latency.
CVSep 15, 2021
New Perspective on Progressive GANs Distillation for One-class Novelty DetectionZhiwei Zhang, Yu Dong, Hanyu Peng et al.
One-class novelty detection is conducted to identify anomalous instances, with different distributions from the expected normal instances. In this paper, the Generative Adversarial Network based on the Encoder-Decoder-Encoder scheme (EDE-GAN) achieves state-of-the-art performance. The two factors bellow serve the above purpose: 1) The EDE-GAN calculates the distance between two latent vectors as the anomaly score, which is unlike the previous methods by utilizing the reconstruction error between images. 2) The model obtains best results when the batch size is set to 1. To illustrate their superiority, we design a new GAN architecture, and compare performances according to different batch sizes. Moreover, with experimentation leads to discovery, our result implies there is also evidence of just how beneficial constraint on the latent space are when engaging in model training. In an attempt to learn compact and fast models, we present a new technology, Progressive Knowledge Distillation with GANs (P-KDGAN), which connects two standard GANs through the designed distillation loss. Two-step progressive learning continuously augments the performance of student GANs with improved results over single-step approach. Our experimental results on CIFAR-10, MNIST, and FMNIST datasets illustrate that P-KDGAN improves the performance of the student GAN by 2.44%, 1.77%, and 1.73% when compressing the computationat ratios of 24.45:1, 311.11:1, and 700:1, respectively.
AIMar 11, 2021
Metapaths guided Neighbors aggregated Network for?Heterogeneous Graph ReasoningBang Lin, Xiuchong Wang, Yu Dong et al.
Most real-world datasets are inherently heterogeneous graphs, which involve a diversity of node and relation types. Heterogeneous graph embedding is to learn the structure and semantic information from the graph, and then embed it into the low-dimensional node representation. Existing methods usually capture the composite relation of a heterogeneous graph by defining metapath, which represent a semantic of the graph. However, these methods either ignore node attributes, or discard the local and global information of the graph, or only consider one metapath. To address these limitations, we propose a Metapaths-guided Neighbors-aggregated Heterogeneous Graph Neural Network(MHN) to improve performance. Specially, MHN employs node base embedding to encapsulate node attributes, BFS and DFS neighbors aggregation within a metapath to capture local and global information, and metapaths aggregation to combine different semantics of the heterogeneous graph. We conduct extensive experiments for the proposed MHN on three real-world heterogeneous graph datasets, including node classification, link prediction and online A/B test on Alibaba mobile application. Results demonstrate that MHN performs better than other state-of-the-art baselines.
CVMay 7, 2020
NTIRE 2020 Challenge on NonHomogeneous DehazingCodruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu et al.
This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing.
CVJan 20, 2020
FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image DehazingYu Dong, Yihao Liu, He Zhang et al.
Recently, convolutional neural networks (CNNs) have achieved great improvements in single image dehazing and attained much attention in research. Most existing learning-based dehazing methods are not fully end-to-end, which still follow the traditional dehazing procedure: first estimate the medium transmission and the atmospheric light, then recover the haze-free image based on the atmospheric scattering model. However, in practice, due to lack of priors and constraints, it is hard to precisely estimate these intermediate parameters. Inaccurate estimation further degrades the performance of dehazing, resulting in artifacts, color distortion and insufficient haze removal. To address this, we propose a fully end-to-end Generative Adversarial Networks with Fusion-discriminator (FD-GAN) for image dehazing. With the proposed Fusion-discriminator which takes frequency information as additional priors, our model can generator more natural and realistic dehazed images with less color distortion and fewer artifacts. Moreover, we synthesize a large-scale training dataset including various indoor and outdoor hazy images to boost the performance and we reveal that for learning-based dehazing methods, the performance is strictly influenced by the training data. Experiments have shown that our method reaches state-of-the-art performance on both public synthetic datasets and real-world images with more visually pleasing dehazed results.