Liang Zhou

LG
h-index74
18papers
4,514citations
Novelty48%
AI Score58

18 Papers

CLOct 25, 2024
GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P. Goucher et al. · openai

GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.

CVFeb 10, 2023Code
Adjacent-Level Feature Cross-Fusion With 3-D CNN for Remote Sensing Image Change Detection

Yuanxin Ye, Mengmeng Wang, Liang Zhou et al.

Deep learning-based change detection (CD) using remote sensing images has received increasing attention in recent years. However, how to effectively extract and fuse the deep features of bi-temporal images for improving the accuracy of CD is still a challenge. To address that, a novel adjacent-level feature fusion network with 3D convolution (named AFCF3D-Net) is proposed in this article. First, through the inner fusion property of 3D convolution, we design a new feature fusion way that can simultaneously extract and fuse the feature information from bi-temporal images. Then, to alleviate the semantic gap between low-level features and high-level features, we propose an adjacent-level feature cross-fusion (AFCF) module to aggregate complementary feature information between the adjacent levels. Furthermore, the full-scale skip connection strategy is introduced to improve the capability of pixel-wise prediction and the compactness of changed objects in the results. Finally, the proposed AFCF3D-Net has been validated on the three challenging remote sensing CD datasets: the Wuhan building dataset (WHU-CD), the LEVIR building dataset (LEVIR-CD), and the Sun Yat-Sen University dataset (SYSU-CD). The results of quantitative analysis and qualitative comparison demonstrate that the proposed AFCF3D-Net achieves better performance compared to other state-of-the-art methods. The code for this work is available at https://github.com/wm-Githuber/AFCF3D-Net.

CVFeb 2, 2023
Advances and Challenges in Multimodal Remote Sensing Image Registration

Bai Zhu, Liang Zhou, Simiao Pu et al.

Over the past few decades, with the rapid development of global aerospace and aerial remote sensing technology, the types of sensors have evolved from the traditional monomodal sensors (e.g., optical sensors) to the new generation of multimodal sensors [e.g., multispectral, hyperspectral, light detection and ranging (LiDAR) and synthetic aperture radar (SAR) sensors]. These advanced devices can dynamically provide various and abundant multimodal remote sensing images with different spatial, temporal, and spectral resolutions according to different application requirements. Since then, it is of great scientific significance to carry out the research of multimodal remote sensing image registration, which is a crucial step for integrating the complementary information among multimodal data and making comprehensive observations and analysis of the Earths surface. In this work, we will present our own contributions to the field of multimodal image registration, summarize the advantages and limitations of existing multimodal image registration methods, and then discuss the remaining challenges and make a forward-looking prospect for the future development of the field.

89.8CLMay 9Code
Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology

Jucheng Hu, Zhangquan Chen, Yulin Chen et al.

Deciphering animal intent is a fundamental challenge in computational ethology, largely because of semantic aliasing, the phenomenon where identical external signals (e.g., a cat's purr) correspond to radically different internal states depending on physiological context. Existing Multimodal Large Language Models (MLLMs) are blind to high-frequency biological time-series data, restricting them to superficial behavioural pattern matching rather than genuine latent-state reasoning. To bridge this gap, we introduce Meow-Omni 1, the first open-source, quad-modal MLLM purpose-built for computational ethology. It natively fuses video, audio, and physiological time-series streams with textual reasoning. Through targeted architectural adaptation, we integrate specialized scientific encoders into a unified backbone and formalize intent inference via physiologically grounded cross-modal alignment. Evaluated on MeowBench, a novel, expert-verified quad-modal benchmark, Meow-Omni 1 achieves state-of-the-art intent-recognition accuracy (71.16%), substantially outperforming leading vision-language and omni-modal baselines. We release the complete open-source pipeline including model weights, training framework, and the Meow-10K dataset, to establish a scalable paradigm for inter-species intent understanding and to advance foundation models toward real-world veterinary diagnostics and wildlife conservation.

LGJul 31, 2024
Visual Analysis of Multi-outcome Causal Graphs

Mengjie Fan, Jinlu Yu, Daniel Weiskopf et al.

We introduce a visual analysis method for multiple causal graphs with different outcome variables, namely, multi-outcome causal graphs. Multi-outcome causal graphs are important in healthcare for understanding multimorbidity and comorbidity. To support the visual analysis, we collaborated with medical experts to devise two comparative visualization techniques at different stages of the analysis process. First, a progressive visualization method is proposed for comparing multiple state-of-the-art causal discovery algorithms. The method can handle mixed-type datasets comprising both continuous and categorical variables and assist in the creation of a fine-tuned causal graph of a single outcome. Second, a comparative graph layout technique and specialized visual encodings are devised for the quick comparison of multiple causal graphs. In our visual analysis approach, analysts start by building individual causal graphs for each outcome variable, and then, multi-outcome causal graphs are generated and visualized with our comparative technique for analyzing differences and commonalities of these causal graphs. Evaluation includes quantitative measurements on benchmark datasets, a case study with a medical expert, and expert user studies with real-world health research data.

34.6HCApr 10
Enhance Comprehension of Over-the-Counter Drug Instructions for the General Public and Medical Professionals through Visualization Design

Mengjie Fan, Katrin Angerbauer, Yinchu Cheng et al.

Drug instructions are crucial for guiding the rational use of medication. We conduct a visualization design study to enhance the comprehension of over-the-counter (OTC) drug instructions, targeting both the general public and medical professionals. We devise two tailored drug instruction designs for different audience groups through an iterative design process. A controlled user study reveals that our design outperforms traditional text-based instructions in terms of response time and usability, and the availability of two versions is also found to be beneficial. This study also motivates a taxonomy based on a systematic classification of OTC drug instructions sampled from an official drug database, which received positive expert feedback. Finally, this study summarizes a workflow for a visualization design strategy based on our design exploration and user study feedback, which can be generalized to other OTC drug instructions.

29.9LGMar 29
Visualization of Machine Learning Models through Their Spatial and Temporal Listeners

Siyu Wu, Lei Shi, Lei Xia et al.

Model visualization (ModelVis) has emerged as a major research direction, yet existing taxonomies are largely organized by data or tasks, making it difficult to treat models as first-class analysis objects. We present a model-centric two-stage framework that employs abstract listeners to capture spatial and temporal model behaviors, and then connects the translated model behavior data to the classical InfoVis pipeline. To apply the framework at scale, we build a retrieval-augmented human--large language model (LLM) extraction workflow and curate a corpus of 128 VIS/VAST ModelVis papers with 331 coded figures. Our analysis shows a dominant result-centric priority on visualizing model outcomes, quantitative/nominal data type, statistical charts, and performance evaluation. Citation-weighted trends further indicate that less frequent model-mechanism-oriented studies have disproportionately high impact while are less investigated recently. Overall, the framework is a general approach for comparing existing ModelVis systems and guiding possible future designs.

37.7HCMar 24
MRATTS: An MR-Based Acupoint Therapy Training System with Real-Time Acupoint Detection and Evaluation Standards

Jiacheng Liu, Bohan Chen, Qian Wang et al.

Acupoint therapy is a core therapeutic method of Traditional Chinese Medicine (TCM), and it requires a high level of expertise and skills to detect acupoints and perform acupuncture and moxibustion. Existing mixed reality (MR)-based training methods often fall short in accurate real-time detection and visualization of acupoints on the hand, limb, or torso of a real person and do not support various techniques of acupuncture and moxibustion. Moreover, evaluation standards and visual guidance with fine details for each step during MR-based training are typically missing. To this end, we propose the MR-based TCM Acupoint Therapy Teaching System (MRATTS)--an MR-based acupoint therapy teaching and training framework. MRATTS is based on a real-time hand, limb, and torso acupoint detection method to accurately track and visualize acupoints on real patients through MR. On top of that, in collaboration with an experienced acupoint therapist, we design a practice method with interactive visual guidance for various acupoint therapy techniques that simulate acupressure, acupuncture (insertion, lifting-thrusting, and twisting), and moxibustion (mild, sparrow-pecking, and whirling). A set of TCM theory-based evaluation standards is formulated within MRATTS to enable the scoring and visualization of the accuracy and proficiency of acupoint therapy. The effectiveness and usefulness of MRATTS are evaluated through a controlled user study and expert feedback. Results of the study indicate that the MRATTS group shows clear improvements in understanding 3D locations of acupoints and proficiency in acupoint therapy compared to control groups.

CVNov 10, 2025
Certified L2-Norm Robustness of 3D Point Cloud Recognition in the Frequency Domain

Liang Zhou, Qiming Wang, Tianze Chen

3D point cloud classification is a fundamental task in safety-critical applications such as autonomous driving, robotics, and augmented reality. However, recent studies reveal that point cloud classifiers are vulnerable to structured adversarial perturbations and geometric corruptions, posing risks to their deployment in safety-critical scenarios. Existing certified defenses limit point-wise perturbations but overlook subtle geometric distortions that preserve individual points yet alter the overall structure, potentially leading to misclassification. In this work, we propose FreqCert, a novel certification framework that departs from conventional spatial domain defenses by shifting robustness analysis to the frequency domain, enabling structured certification against global L2-bounded perturbations. FreqCert first transforms the input point cloud via the graph Fourier transform (GFT), then applies structured frequency-aware subsampling to generate multiple sub-point clouds. Each sub-cloud is independently classified by a standard model, and the final prediction is obtained through majority voting, where sub-clouds are constructed based on spectral similarity rather than spatial proximity, making the partitioning more stable under L2 perturbations and better aligned with the object's intrinsic structure. We derive a closed-form lower bound on the certified L2 robustness radius and prove its tightness under minimal and interpretable assumptions, establishing a theoretical foundation for frequency domain certification. Extensive experiments on the ModelNet40 and ScanObjectNN datasets demonstrate that FreqCert consistently achieves higher certified accuracy and empirical accuracy under strong perturbations. Our results suggest that spectral representations provide an effective pathway toward certifiable robustness in 3D point cloud recognition.

LGSep 16, 2025
Ensemble Visualization With Variational Autoencoder

Cenyang Wu, Qinhan Yu, Liang Zhou

We present a new method to visualize data ensembles by constructing structured probabilistic representations in latent spaces, i.e., lower-dimensional representations of spatial data features. Our approach transforms the spatial features of an ensemble into a latent space through feature space conversion and unsupervised learning using a variational autoencoder (VAE). The resulting latent spaces follow multivariate standard Gaussian distributions, enabling analytical computation of confidence intervals and density estimation of the probabilistic distribution that generates the data ensemble. Preliminary results on a weather forecasting ensemble demonstrate the effectiveness and versatility of our method.

LGMay 6, 2025
SinBasis Networks: Matrix-Equivalent Feature Extraction for Wave-Like Optical Spectrograms

Yuzhou Zhu, Zheng Zhang, Ruyi Zhang et al.

Wave-like images-from attosecond streaking spectrograms to optical spectra, audio mel-spectrograms and periodic video frames-encode critical harmonic structures that elude conventional feature extractors. We propose a unified, matrix-equivalent framework that reinterprets convolution and attention as linear transforms on flattened inputs, revealing filter weights as basis vectors spanning latent feature subspaces. To infuse spectral priors we apply elementwise $\sin(\cdot)$ mappings to each weight matrix. Embedding these transforms into CNN, ViT and Capsule architectures yields Sin-Basis Networks with heightened sensitivity to periodic motifs and built-in invariance to spatial shifts. Experiments on a diverse collection of wave-like image datasets-including 80,000 synthetic attosecond streaking spectrograms, thousands of Raman, photoluminescence and FTIR spectra, mel-spectrograms from AudioSet and cycle-pattern frames from Kinetics-demonstrate substantial gains in reconstruction accuracy, translational robustness and zero-shot cross-domain transfer. Theoretical analysis via matrix isomorphism and Mercer-kernel truncation quantifies how sinusoidal reparametrization enriches expressivity while preserving stability in data-scarce regimes. Sin-Basis Networks thus offer a lightweight, physics-informed approach to deep learning across all wave-form imaging modalities.

CRApr 9, 2021
SPoTKD: A Protocol for Symmetric Key Distribution over Public Channels Using Self-Powered Timekeeping Devices

Mustafizur Rahman, Liang Zhou, Shantanu Chakrabartty

In this paper, we propose a novel class of symmetric key distribution protocols that leverages basic security primitives offered by low-cost, hardware chipsets containing millions of synchronized self-powered timers. The keys are derived from the temporal dynamics of a physical, micro-scale time-keeping device which makes the keys immune to any potential side-channel attacks, malicious tampering, or snooping. Using the behavioral model of the self-powered timers, we first show that the derived key-strings can pass the randomness test as defined by the National Institute of Standards and Technology (NIST) suite. The key-strings are then used in two SPoTKD (Self-Powered Timer Key Distribution) protocols that exploit the timer's dynamics as one-way functions: (a) protocol 1 facilitates secure communications between a user and a remote Server, and (b) protocol 2 facilitates secure communications between two users. In this paper, we investigate the security of these protocols under standard model and against different adversarial attacks. Using Monte-Carlo simulations, we also investigate the robustness of these protocols in the presence of real-world operating conditions and propose error-correcting SPoTKD protocols to mitigate these noise-related artifacts.

LGSep 7, 2020
Implicit Multidimensional Projection of Local Subspaces

Rongzheng Bian, Yumeng Xue, Liang Zhou et al.

We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.

CVApr 21, 2020
Fast and Robust Registration of Aerial Images and LiDAR data Based on Structrual Features and 3D Phase Correlation

Bai Zhu, Yuanxin Ye, Chao Yang et al.

Co-Registration of aerial imagery and Light Detection and Ranging (LiDAR) data is quilt challenging because the different imaging mechanism causes significant geometric and radiometric distortions between such data. To tackle the problem, this paper proposes an automatic registration method based on structural features and three-dimension (3D) phase correlation. In the proposed method, the LiDAR point cloud data is first transformed into the intensity map, which is used as the reference image. Then, we employ the Fast operator to extract uniformly distributed interest points in the aerial image by a partition strategy and perform a local geometric correction by using the collinearity equation to eliminate scale and rotation difference between images. Subsequently, a robust structural feature descriptor is build based on dense gradient features, and the 3D phase correlation is used to detect control points (CPs) between aerial images and LiDAR data in the frequency domain, where the image matching is accelerated by the 3D Fast Fourier Transform (FFT). Finally, the obtained CPs are employed to correct the exterior orientation elements, which is used to achieve co-registration of aerial images and LiDAR data. Experiments with two datasets of aerial images and LiDAR data show that the proposed method is much faster and more robust than state of the art methods

MLMar 10, 2020
Generalized Energy Based Models

Michael Arbel, Liang Zhou, Arthur Gretton

We introduce the Generalized Energy Based Model (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. Both the energy function and base jointly constitute the final model, unlike GANs, which retain only the base distribution (the "generator"). GEBMs are trained by alternating between learning the energy and the base. We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base. Samples from the posterior on the latent space of the trained model can be obtained via MCMC, thus finding regions in this space that produce better quality samples. Empirically, the GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity. When using normalizing flows as base measures, GEBMs succeed on density modelling tasks, returning comparable performance to direct maximum likelihood of the same networks.

CRNov 24, 2017
A Fowler-Nordheim Integrator can Track the Density of Prime Numbers

Liang Zhou, SriHarsha Kondapalli, Shantanu Chakrabartty

"Does there exist a naturally occurring counting device that might elucidate the hidden structure of prime numbers ?" is a question that has fascinated computer scientists and mathematical physicists for decades. While most recent research in this area have explored the role of the Riemann zeta-function in different formulations of statistical mechanics, condensed matter physics and quantum chaotic systems, the resulting devices (quantum or classical) have only existed in theory or the fabrication of the device has been found to be not scalable to large prime numbers. Here we report for the first time that any hypothetical prime number generator, to our knowledge, has to be a special case of a dynamical system that is governed by the physics of Fowler-Nordheim (FN) quantum-tunneling. In this paper we report how such a dynamical system can be implemented using a counting process that naturally arises from sequential FN tunneling and integration of electrons on a floating-gate (FG) device. The self-compensating physics of the FG device makes the operation reliable and repeatable even when tunneling-currents approach levels below 1 attoamperes. We report measured results from different variants of fabricated prototypes, each of which shows an excellent match with the asymptotic prime number statistics. We also report similarities between the spectral signatures produced by the FN device and the spectral statistics of a hypothetical prime number sequence generator. We believe that the proposed floating-gate device could have future implications in understanding the process that generates prime numbers with applications in security and authentication.

AIJul 31, 2017
Cost and Actual Causation

Liang Zhou

I propose the purpose our concept of actual causation serves is minimizing various cost in intervention practice. Actual causation has three features: nonredundant sufficiency, continuity and abnormality; these features correspond to the minimization of exploitative cost, exploratory cost and risk cost in intervention practice. Incorporating these three features, a definition of actual causation is given. I test the definition in 66 causal cases from actual causation literature and show that this definition's application fit intuition better than some other causal modelling based definitions.

CRJun 5, 2017
A Privacy-preserving Community-based P2P OSNs Using Broadcast Encryption Supporting Recommendation Mechanism

Ruihui Zhao, Mingjie Ding, Keiichi Koyanagi et al.

Online Social Networks (OSNs) have become one of the most important activities on the Internet, such as Facebook and Google+. However, security and privacy have become major concerns in existing C/S based OSNs. In this paper, we propose a novel scheme called a Privacy-preserving Community-based P2P OSNs Using Broadcast Encryption Supporting Recommendation Mechanism (PCBE) that supports cross-platform availability in stringent privacy requirements. For the first time, we introduce recommendation mechanism into a privacy-preserving P2P based OSNs, in which we firstly employ the Open Directory Project to generate user interest model. We firstly introduce broadcast encryption into P2P community-based social networks together with reputation mechanism to decrease the system overhead. We formulate the security requirements and design goals for privacy- preserving P2P based OSNs supporting recommendation mechanism. The RESTful web-services help to ensure cross-platform availability and transmission security. As a result, thorough security analysis and performance evaluation on experiments demonstrate that the PCBE scheme indeed accords with our proposed design goals.