CVApr 29
TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in TransportationHan Gong, Zhen Zhou, Yunyang Shi et al.
Large language models (LLMs) and multimodal large models (MLLMs) are increasingly used for transportation tasks such as regulation question answering, traffic management support, engineering review, and autonomous-driving scene reasoning. Yet transportation workflows are rule-intensive, computation-intensive, safety-critical, and inherently multimodal. Existing general benchmarks provide limited evidence of whether a model can apply regulations correctly, perform verifiable engineering calculations, or interpret traffic scenes reliably, while the small number of public transportation benchmarks remain narrow in scope and rarely support fine-grained diagnosis across text, images, and point-cloud data. To address this gap, we present TRIP-Evaluate, an open multimodal benchmark for large models in transportation. The benchmark organizes 837 items using a role-task-knowledge taxonomy that covers vehicle, traffic-management, traveler, and planning-and-design functions. Each item is annotated with capability, modality, and difficulty labels, enabling diagnosis from overall accuracy down to specific failure modes. The current release includes 596 text items, 198 image items, and 43 point-cloud items. TRIP-Evaluate also standardizes item construction, quality control, prompting, decoding, and scoring to improve cross-model comparability. Results on a diverse panel of models show that text-based performance is improving, but substantial weaknesses remain in multi-step engineering calculation, rule-constrained reasoning, multimodal scene understanding, and point-cloud understanding. Overall, TRIP-Evaluate provides a reproducible, diagnosable, and engineering-aligned evaluation baseline for model selection, regression testing, and safer deployment in transportation applications.
LGFeb 16
Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization MisalignmentHong Li, Zhen Zhou, Honggang Zhang et al.
Data-parallel (DP) training with synchronous all-reduce is a dominant paradigm for full-parameter fine-tuning of large language models (LLMs). While parameter synchronization guarantees numerical equivalence of model weights after each iteration, it does not necessarily imply alignment of worker-level optimization dynamics before gradient aggregation. This paper identifies and studies this latent mismatch, termed \emph{silent inconsistency}, where cross-worker divergence in losses and gradients can remain invisible under conventional aggregated monitoring signals. We propose a lightweight, model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines. Specifically, we introduce three complementary metrics: loss dispersion, gradient-norm dispersion, and gradient-direction consistency measured by inter-worker cosine similarity. The proposed metrics incur negligible overhead and require no modification to model architecture, synchronization mechanisms, or optimization algorithms. We validate the framework by fully fine-tuning the 1B-parameter \texttt{openPangu-Embedded-1B-V1.1} model on the \texttt{tatsu-lab/alpaca} dataset using an 8-NPU DP setup, under controlled perturbations of cross-rank stochasticity. Experimental results show that progressively desynchronized data shuffling and random seeds lead to substantial increases in loss/gradient dispersion and reduced directional alignment, despite smooth globally averaged loss curves. These findings demonstrate that the proposed indicators provide actionable visibility into hidden instability modes in large-scale DP fine-tuning, enabling more reliable diagnosis and configuration assessment.
CVJul 19, 2025
Adaptive 3D Gaussian Splatting Video Streaming: Visual Saliency-Aware Tiling and Meta-Learning-Based Bitrate AdaptationHan Gong, Qiyue Li, Jie Li et al.
3D Gaussian splatting video (3DGS) streaming has recently emerged as a research hotspot in both academia and industry, owing to its impressive ability to deliver immersive 3D video experiences. However, research in this area is still in its early stages, and several fundamental challenges, such as tiling, quality assessment, and bitrate adaptation, require further investigation. In this paper, we tackle these challenges by proposing a comprehensive set of solutions. Specifically, we propose an adaptive 3DGS tiling technique guided by saliency analysis, which integrates both spatial and temporal features. Each tile is encoded into versions possessing dedicated deformation fields and multiple quality levels for adaptive selection. We also introduce a novel quality assessment framework for 3DGS video that jointly evaluates spatial-domain degradation in 3DGS representations during streaming and the quality of the resulting 2D rendered images. Additionally, we develop a meta-learning-based adaptive bitrate algorithm specifically tailored for 3DGS video streaming, achieving optimal performance across varying network conditions. Extensive experiments demonstrate that our proposed approaches significantly outperform state-of-the-art methods.
CVJul 19, 2025
Adaptive 3D Gaussian Splatting Video StreamingHan Gong, Qiyue Li, Zhi Liu et al.
The advent of 3D Gaussian splatting (3DGS) has significantly enhanced the quality of volumetric video representation. Meanwhile, in contrast to conventional volumetric video, 3DGS video poses significant challenges for streaming due to its substantially larger data volume and the heightened complexity involved in compression and transmission. To address these issues, we introduce an innovative framework for 3DGS volumetric video streaming. Specifically, we design a 3DGS video construction method based on the Gaussian deformation field. By employing hybrid saliency tiling and differentiated quality modeling of 3DGS video, we achieve efficient data compression and adaptation to bandwidth fluctuations while ensuring high transmission quality. Then we build a complete 3DGS video streaming system and validate the transmission performance. Through experimental evaluation, our method demonstrated superiority over existing approaches in various aspects, including video quality, compression effectiveness, and transmission rate.
CVJun 6, 2020
Simple Primary Colour Editing for Consumer Product ImagesHan Gong, Luwen Yu, Stephen Westland
We present a simple primary colour editing method for consumer product images. We show that by using colour correction and colour blending, we can automate the pain-staking colour editing task and save time for consumer colour preference researchers. To improve the colour harmony between the primary colour and its complementary colours, our algorithm also tunes the other colours in the image. Preliminary experiment has shown some promising results compared with a state-of-the-art method and human editing.
CVJan 14, 2020
Convolutional Mean: A Simple Convolutional Neural Network for Illuminant EstimationHan Gong
We present Convolutional Mean (CM) - a simple and fast convolutional neural network for illuminant estimation. Our proposed method only requires a small neural network model (1.1K parameters) and a 48 x 32 thumbnail input image. Our unoptimized Python implementation takes 1 ms/image, which is arguably 3-3750x faster than the current leading solutions with similar accuracy. Using two public datasets, we show that our proposed light-weight method offers accuracy comparable to the current leading methods' (which consist of thousands/millions of parameters) across several measures.
CVJul 27, 2017
Concise Radiometric Calibration Using The Power of RankingHan Gong, Graham D. Finlayson, Maryam M. Darrodi
Compared with raw images, the more common JPEG images are less useful for machine vision algorithms and professional photographers because JPEG-sRGB does not preserve a linear relation between pixel values and the light measured from the scene. A camera is said to be radiometrically calibrated if there is a computational model which can predict how the raw linear sensor image is mapped to the corresponding rendered image (e.g. JPEGs) and vice versa. This paper begins with the observation that the rank order of pixel values are mostly preserved post colour correction. We show that this observation is the key to solving for the whole camera pipeline (colour correction, tone and gamut mapping). Our rank-based calibration method is simpler than the prior art and so is parametrised by fewer variables which, concomitantly, can be solved for using less calibration data. Another advantage is that we can derive the camera pipeline from a single pair of raw-JPEG images. Experiments demonstrate that our method delivers state-of-the-art results (especially for the most interesting case of JPEG to raw).
CVAug 4, 2016
Recoding Color Transfer as a Color HomographyHan Gong, Graham D. Finlayson, Robert B. Fisher
Color transfer is an image editing process that adjusts the colors of a picture to match a target picture's color theme. A natural color transfer not only matches the color styles but also prevents after-transfer artifacts due to image compression, noise, and gradient smoothness change. The recently discovered color homography theorem proves that colors across a change in photometric viewing condition are related by a homography. In this paper, we propose a color-homography-based color transfer decomposition which encodes color transfer as a combination of chromaticity shift and shading adjustment. A powerful form of shading adjustment is shown to be a global shading curve by which the same shading homography can be applied elsewhere. Our experiments show that the proposed color transfer decomposition provides a very close approximation to many popular color transfer methods. The advantage of our approach is that the learned color transfer can be applied to many other images (e.g. other frames in a video), instead of a frame-to-frame basis. We demonstrate two applications for color transfer enhancement and video color grading re-application. This simple model of color transfer is also important for future color transfer algorithm design.
CVAug 2, 2016
Interactive Removal and Ground Truth for Difficult Shadow ScenesHan Gong, Darren P. Cosker
A user-centric method for fast, interactive, robust and high-quality shadow removal is presented. Our algorithm can perform detection and removal in a range of difficult cases: such as highly textured and colored shadows. To perform detection an on-the-fly learning approach is adopted guided by two rough user inputs for the pixels of the shadow and the lit area. After detection, shadow removal is performed by registering the penumbra to a normalized frame which allows us efficient estimation of non-uniform shadow illumination changes, resulting in accurate and robust removal. Another major contribution of this work is the first validated and multi-scene category ground truth for shadow removal algorithms. This data set containing 186 images eliminates inconsistencies between shadow and shadow-free images and provides a range of different shadow types such as soft, textured, colored and broken shadow. Using this data, the most thorough comparison of state-of-the-art shadow removal methods to date is performed, showing our proposed new algorithm to outperform the state-of-the-art across several measures and shadow category. To complement our dataset, an online shadow removal benchmark website is also presented to encourage future open comparisons in this challenging field of research.
CVJul 20, 2016
Interactive Illumination InvarianceHan Gong, Graham Finlayson
Illumination effects cause problems for many computer vision algorithms. We present a user-friendly interactive system for robust illumination-invariant image generation. Compared with the previous automated illumination-invariant image derivation approaches, our system enables users to specify a particular kind of illumination variation for removal. The derivation of illumination-invariant image is guided by the user input. The input is a stroke that defines an area covering a set of pixels whose intensities are influenced predominately by the illumination variation. This additional flexibility enhances the robustness for processing non-linearly rendered images and the images of the scenes where their illumination variations are difficult to estimate automatically. Finally, we present some evaluation results of our method.
CVJul 20, 2016
Color Homography Color CorrectionGraham D. Finlayson, Han Gong, Robert B. Fisher
Homographies -- a mathematical formalism for relating image points across different camera viewpoints -- are at the foundations of geometric methods in computer vision and are used in geometric camera calibration, image registration, and stereo vision and other tasks. In this paper, we show the surprising result that colors across a change in viewing condition (changing light color, shading and camera) are also related by a homography. We propose a new color correction method based on color homography. Experiments demonstrate that solving the color homography problem leads to more accurate calibration.
CVMay 13, 2016
Color HomographyGraham D. Finlayson, Han Gong, Robert B. Fisher
We show the surprising result that colors across a change in viewing condition (changing light color, shading and camera) are related by a homography. Our homography color correction application delivers improved color fidelity compared with the linear least-square.