96.7CVApr 24Code
Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image GenerationRan Zhao, Sheng Jin, Size Wu et al.
Recent text-to-image (T2I) models have demonstrated impressive capabilities in photorealistic synthesis and instruction following. However, their reliability in knowledge-intensive settings remains largely unexplored. Unlike natural image generation, knowledge visualization requires not only semantic alignment but also strict adherence to domain knowledge, structural constraints, and symbolic conventions, exposing a critical gap between visual plausibility and scientific correctness. To systematically study this problem, we introduce KVBench, a curriculum-grounded benchmark for evaluating knowledge-intensive T2I generation. KVBench covers six senior high-school subjects: Biology, Chemistry, Geography, History, Mathematics, and Physics. The benchmark consists of 1,800 expert-curated prompts derived from over 30 authoritative textbooks. Using this benchmark, we evaluate 14 state-of-the-art open- and closed-source models, revealing substantial deficiencies in logical reasoning, symbolic precision, and multilingual robustness, with open-source models consistently underperforming proprietary systems. To address these limitations, we further propose KE-Check, a two-stage framework that improves scientific fidelity via (1) Knowledge Elaboration for structured prompt enrichment, and (2) Checklist-Guided Refinement for explicit constraint enforcement through violation identification and constraint-guided editing. KE-Check effectively mitigates scientific hallucinations, narrowing the performance gap between open-source and leading closed-source models. Data and codes are publicly available at https://github.com/zhaoran66/KVBench.
AIJul 27, 2024Code
Mamba-UIE: Enhancing Underwater Images with Physical Model ConstraintSong Zhang, Yuqing Duan, Daoliang Li et al.
In underwater image enhancement (UIE), convolutional neural networks (CNN) have inherent limitations in modeling long-range dependencies and are less effective in recovering global features. While Transformers excel at modeling long-range dependencies, their quadratic computational complexity with increasing image resolution presents significant efficiency challenges. Additionally, most supervised learning methods lack effective physical model constraint, which can lead to insufficient realism and overfitting in generated images. To address these issues, we propose a physical model constraint-based underwater image enhancement framework, Mamba-UIE. Specifically, we decompose the input image into four components: underwater scene radiance, direct transmission map, backscatter transmission map, and global background light. These components are reassembled according to the revised underwater image formation model, and the reconstruction consistency constraint is applied between the reconstructed image and the original image, thereby achieving effective physical constraint on the underwater image enhancement process. To tackle the quadratic computational complexity of Transformers when handling long sequences, we introduce the Mamba-UIE network based on linear complexity state space models. By incorporating the Mamba in Convolution block, long-range dependencies are modeled at both the channel and spatial levels, while the CNN backbone is retained to recover local features and details. Extensive experiments on three public datasets demonstrate that our proposed Mamba-UIE outperforms existing state-of-the-art methods, achieving a PSNR of 27.13 and an SSIM of 0.93 on the UIEB dataset. Our method is available at https://github.com/zhangsong1213/Mamba-UIE.
CVSep 28, 2024Code
PDCFNet: Enhancing Underwater Images through Pixel Difference ConvolutionSong Zhang, Daoliang Li, Ran Zhao
Majority of deep learning methods utilize vanilla convolution for enhancing underwater images. While vanilla convolution excels in capturing local features and learning the spatial hierarchical structure of images, it tends to smooth input images, which can somewhat limit feature expression and modeling. A prominent characteristic of underwater degraded images is blur, and the goal of enhancement is to make the textures and details (high-frequency features) in the images more visible. Therefore, we believe that leveraging high-frequency features can improve enhancement performance. To address this, we introduce Pixel Difference Convolution (PDC), which focuses on gradient information with significant changes in the image, thereby improving the modeling of enhanced images. We propose an underwater image enhancement network, PDCFNet, based on PDC and cross-level feature fusion. Specifically, we design a detail enhancement module based on PDC that employs parallel PDCs to capture high-frequency features, leading to better detail and texture enhancement. The designed cross-level feature fusion module performs operations such as concatenation and multiplication on features from different levels, ensuring sufficient interaction and enhancement between diverse features. Our proposed PDCFNet achieves a PSNR of 27.37 and an SSIM of 92.02 on the UIEB dataset, attaining the best performance to date. Our code is available at https://github.com/zhangsong1213/PDCFNet.
55.8COMP-PHApr 23
A Thin Sheet Volume Integral Equation Solver for Simulation of Bianisotropic MetasurfacesSebastian Celis Sierra, Meruyert Khamitova, Ran Zhao et al.
A thin-sheet (TS) volume integral equation (VIE) formulation incorporating generalized sheet transition conditions (GSTCs) is presented for the simulation of three-dimensional (3D) bianisotropic metasurfaces. The metasurface is represented as an equivalent TS, with its constitutive tensors derived from the GSTC susceptibility tensors. Invoking the TS approximation, the governing VIEs are reduced to surface integral equations (SIEs), in which tangential and normal flux density components are treated as distinct sets of unknowns and discretized using Rao-Wilton-Glisson and pulse basis functions, respectively. In contrast to conventional GSTC approaches based on conventional SIEs, which represent only tangential fields, the proposed framework rigorously enforces the bianisotropic GSTCs, including normal field interactions, while retaining the flux-based VIE character of the formulation. Numerical examples demonstrate the accuracy and robustness of the proposed TS-VIE-GSTC solver for polarization rotation, perfect reflection, multi-directional attenuation, and oblique phase-shift transformation.
AIJan 23, 2024
UR4NNV: Neural Network Verification, Under-approximation Reachability Works!Zhen Liang, Taoran Wu, Ran Zhao et al.
Recently, formal verification of deep neural networks (DNNs) has garnered considerable attention, and over-approximation based methods have become popular due to their effectiveness and efficiency. However, these strategies face challenges in addressing the "unknown dilemma" concerning whether the exact output region or the introduced approximation error violates the property in question. To address this, this paper introduces the UR4NNV verification framework, which utilizes under-approximation reachability analysis for DNN verification for the first time. UR4NNV focuses on DNNs with Rectified Linear Unit (ReLU) activations and employs a binary tree branch-based under-approximation algorithm. In each epoch, UR4NNV under-approximates a sub-polytope of the reachable set and verifies this polytope against the given property. Through a trial-and-error approach, UR4NNV effectively falsifies DNN properties while providing confidence levels when reaching verification epoch bounds and failing falsifying properties. Experimental comparisons with existing verification methods demonstrate the effectiveness and efficiency of UR4NNV, significantly reducing the impact of the "unknown dilemma".
CVJan 25, 2022
Real-time automatic polyp detection in colonoscopy using feature enhancement module and spatiotemporal similarity correlation unitJianwei Xu, Ran Zhao, Yizhou Yu et al.
Automatic detection of polyps is challenging because different polyps vary greatly, while the changes between polyps and their analogues are small. The state-of-the-art methods are based on convolutional neural networks (CNNs). However, they may fail due to lack of training data, resulting in high rates of missed detection and false positives (FPs). In order to solve these problems, our method combines the two-dimensional (2-D) CNN-based real-time object detector network with spatiotemporal information. Firstly, we use a 2-D detector network to detect static images and frames, and based on the detector network, we propose two feature enhancement modules-the FP Relearning Module (FPRM) to make the detector network learning more about the features of FPs for higher precision, and the Image Style Transfer Module (ISTM) to enhance the features of polyps for sensitivity improvement. In video detection, we integrate spatiotemporal information, which uses Structural Similarity (SSIM) to measure the similarity between video frames. Finally, we propose the Inter-frame Similarity Correlation Unit (ISCU) to combine the results obtained by the detector network and frame similarity to make the final decision. We verify our method on both private databases and publicly available databases. Experimental results show that these modules and units provide a performance improvement compared with the baseline method. Comparison with the state-of-the-art methods shows that the proposed method outperforms the existing ones which can meet real-time constraints. It's demonstrated that our method provides a performance improvement in sensitivity, precision and specificity, and has great potential to be applied in clinical colonoscopy.
ROOct 31, 2021
Shape Programmable Magnetic Pixel Soft RobotRan Zhao, Hanchen Yao, Houde Dai
Magnetic response soft robot realizes programmable shape regulation with the help of magnetic field and produces various actions. The shape control of magnetic soft robot is based on the magnetic anisotropy caused by the orderly distribution of magnetic particles in the elastic matrix. In the previous technologies, magnetic programming is coupled with the manufacturing process, and the orientation of magnetic particles cannot be modified, which brings restrictions to the design and use of magnetic soft robot. This paper presents a magnetic pixel robot with shape programmable function. By encapsulating NdFeB/gallium composites into silicone shell, a thermo-magnetic response functional film with lattice structure are fabricated. Basing on thermal-assisted magnetization technique, we realized the discrete magnetization region distribution on the film. Therefore, we proposed a magnetic coding technique to realize the mathematical response action design of software robot. Using these methods, we prepared several magnetic soft robots based on origami structure. The experiments show that the behavior mode of robot can be flexibly and repeatedly regulated by magnetic encoding technique. This work provides a basis for the programmed shape regulation and motion design of soft robot.
MLAug 5, 2019
Some Developments in Clustering Analysis on Stochastic ProcessesQidi Peng, Nan Rao, Ran Zhao
We review some developments on clustering stochastic processes and come with the conclusion that asymptotically consistent clustering algorithms can be obtained when the processes are ergodic and the dissimilarity measure satisfies the triangle inequality. Examples are provided when the processes are distribution ergodic, covariance ergodic and locally asymptotically self-similar, respectively.
MLApr 13, 2018
Cluster Analysis on Locally Asymptotically Self-similar Processes with Known Number of ClustersQidi Peng, Nan Rao, Ran Zhao
We conduct cluster analysis on a class of locally asymptotically self-similar stochastic processes, which includes multifractional Brownian motion as a representative. When the true number of clusters is supposed to be known, a new covariance-based dissimilarity measure is introduced, from which we obtain the approximately asymptotically consistent clustering algorithms. In simulation studies, clustering data sampled from multifractional Brownian motions with distinct functional Hurst parameters illustrates the approximated asymptotic consistency of the proposed algorithms. Clustering global financial markets' equity indexes returns and sovereign CDS spreads provides a successful real world application.
MLJan 27, 2018
Covariance-based Dissimilarity Measures Applied to Clustering Wide-sense Stationary Ergodic ProcessesQidi Peng, Nan Rao, Ran Zhao
We introduce a new unsupervised learning problem: clustering wide-sense stationary ergodic stochastic processes. A covariance-based dissimilarity measure together with asymptotically consistent algorithms is designed for clustering offline and online datasets, respectively. We also suggest a formal criterion on the efficiency of dissimilarity measures, and discuss of some approach to improve the efficiency of our clustering algorithms, when they are applied to cluster particular type of processes, such as self-similar processes with wide-sense stationary ergodic increments. Clustering synthetic data and real-world data are provided as examples of applications.
CLMar 31, 2017
Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational AutoencodersTiancheng Zhao, Ran Zhao, Maxine Eskenazi
While recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making.
CLOct 10, 2016
Leveraging Recurrent Neural Networks for Multimodal Recognition of Social Norm Violation in DialogTiancheng Zhao, Ran Zhao, Zhao Meng et al.
Social norms are shared rules that govern and facilitate social interaction. Violating such social norms via teasing and insults may serve to upend power imbalances or, on the contrary reinforce solidarity and rapport in conversation, rapport which is highly situated and context-dependent. In this work, we investigate the task of automatically identifying the phenomena of social norm violation in discourse. Towards this goal, we leverage the power of recurrent neural networks and multimodal information present in the interaction, and propose a predictive model to recognize social norm violation. Using long-term temporal and contextual information, our model achieves an F1 score of 0.705. Implications of our work regarding developing a social-aware agent are discussed.
NAJun 23, 2015
Randomized Block Kaczmarz Method with Projection for Solving Least SquaresDeanna Needell, Ran Zhao, Anastasios Zouzias
The Kaczmarz method is an iterative method for solving overcomplete linear systems of equations Ax=b. The randomized version of the Kaczmarz method put forth by Strohmer and Vershynin iteratively projects onto a randomly chosen solution space given by a single row of the matrix A and converges exponentially in expectation to the solution of a consistent system. In this paper we analyze two block versions of the method each with a randomized projection, that converge in expectation to the least squares solution of inconsistent systems. Our approach utilizes a paving of the matrix A to guarantee exponential convergence, and suggests that paving yields a significant improvement in performance in certain regimes. The proposed method is an extension of the block Kaczmarz method analyzed by Needell and Tropp and the Randomized Extended Kaczmarz method of Zouzias and Freris. The contribution is thus two-fold; unlike the standard Kaczmarz method, our methods converge to the least-squares solution of inconsistent systems, and by using appropriate blocks of the matrix this convergence can be significantly accelerated. Numerical experiments suggest that the proposed algorithm can indeed lead to advantages in practice.
NAApr 22, 2014
A Comparison of Clustering and Missing Data Methods for Health SciencesRan Zhao, Deanna Needell, Christopher Johansen et al.
In this paper, we compare and analyze clustering methods with missing data in health behavior research. In particular, we propose and analyze the use of compressive sensing's matrix completion along with spectral clustering to cluster health related data. The empirical tests and real data results show that these methods can outperform standard methods like LPA and FIML, in terms of lower misclassification rates in clustering and better matrix completion performance in missing data problems. According to our examination, a possible explanation of these improvements is that spectral clustering takes advantage of high data dimension and compressive sensing methods utilize the near-to-low-rank property of health data.