Honglin Liu

CV
h-index13
5papers
103citations
Novelty52%
AI Score49

5 Papers

35.3CVMay 11
Halo Separation-guided Underwater Multi-scale Image Restoration

Jiaxin Yang, Honglin Liu, Yongli Wang et al.

Underwater images captured by Autonomous Underwater Vehicles (AUVs) are inevitably affected by artificial light sources, which often produce halos in the foreground of the camera and seriously interfere with the quality of the image. The existing underwater image enhancement methods fail to fully consider this key problem, and the robustness of processing images under artificial light scenes is poor. In practical applications, since underwater image enhancement itself is a very challenging task, the influence of artificial light sources will lead to serious degradation of image performance and affect subsequent vision tasks. In order to effectively deal with this problem, this paper designs a single halo image correction method based on an iterative structure. The network is mainly divided into two sub-networks, one is the halo layer separation sub-network which aims to separate the halo by gradient minimization, and the other is the multi-scale recovery sub-network which aims to recover the image information masked by halo. The UIEB and EUVP synthetic datasets are used for training to ensure that the network can fully learn the characteristics and laws of underwater halo images. Then a large number of halo images taken in an underwater environment with real artificial light are collected for testing. In addition, the brightness distribution characteristics of underwater halo images are analyzed and the radial gradient is introduced to constraint eliminate halo to improve the effect of underwater image restoration.

CVOct 6, 2025Code
Conditional Representation Learning for Customized Tasks

Honglin Liu, Chao Sun, Peng Hu et al.

Conventional representation learning methods learn a universal representation that primarily captures dominant semantics, which may not always align with customized downstream tasks. For instance, in animal habitat analysis, researchers prioritize scene-related features, whereas universal embeddings emphasize categorical semantics, leading to suboptimal results. As a solution, existing approaches resort to supervised fine-tuning, which however incurs high computational and annotation costs. In this paper, we propose Conditional Representation Learning (CRL), aiming to extract representations tailored to arbitrary user-specified criteria. Specifically, we reveal that the semantics of a space are determined by its basis, thereby enabling a set of descriptive words to approximate the basis for a customized feature space. Building upon this insight, given a user-specified criterion, CRL first employs a large language model (LLM) to generate descriptive texts to construct the semantic basis, then projects the image representation into this conditional feature space leveraging a vision-language model (VLM). The conditional representation better captures semantics for the specific criterion, which could be utilized for multiple customized tasks. Extensive experiments on classification and retrieval tasks demonstrate the superiority and generality of the proposed CRL. The code is available at https://github.com/XLearning-SCU/2025-NeurIPS-CRL.

CVJun 11, 2021Code
MlTr: Multi-label Classification with Transformer

Xing Cheng, Hezheng Lin, Xiangyu Wu et al.

The task of multi-label image classification is to recognize all the object labels presented in an image. Though advancing for years, small objects, similar objects and objects with high conditional probability are still the main bottlenecks of previous convolutional neural network(CNN) based models, limited by convolutional kernels' representational capacity. Recent vision transformer networks utilize the self-attention mechanism to extract the feature of pixel granularity, which expresses richer local semantic information, while is insufficient for mining global spatial dependence. In this paper, we point out the three crucial problems that CNN-based methods encounter and explore the possibility of conducting specific transformer modules to settle them. We put forward a Multi-label Transformer architecture(MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention, particularly improving the performance of multi-label image classification tasks. The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, and NUS-WIDE with 88.5%, 95.8%, and 65.5% respectively. The code will be available soon at https://github.com/starmemda/MlTr/

LGOct 15, 2024
Cross-Dataset Generalization in Deep Learning

Xuyu Zhang, Haofan Huang, Dawei Zhang et al.

Deep learning has been extensively used in various fields, such as phase imaging, 3D imaging reconstruction, phase unwrapping, and laser speckle reduction, particularly for complex problems that lack analytic models. Its data-driven nature allows for implicit construction of mathematical relationships within the network through training with abundant data. However, a critical challenge in practical applications is the generalization issue, where a network trained on one dataset struggles to recognize an unknown target from a different dataset. In this study, we investigate imaging through scattering media and discover that the mathematical relationship learned by the network is an approximation dependent on the training dataset, rather than the true mapping relationship of the model. We demonstrate that enhancing the diversity of the training dataset can improve this approximation, thereby achieving generalization across different datasets, as the mapping relationship of a linear physical model is independent of inputs. This study elucidates the nature of generalization across different datasets and provides insights into the design of training datasets to ultimately address the generalization issue in various deep learning-based applications.

CRJan 26, 2022
Speckle-based optical cryptosystem and its application for human face recognition via deep learning

Qi Zhao, Huanhao Li, Zhipeng Yu et al.

Face recognition has recently become ubiquitous in many scenes for authentication or security purposes. Meanwhile, there are increasing concerns about the privacy of face images, which are sensitive biometric data that should be carefully protected. Software-based cryptosystems are widely adopted nowadays to encrypt face images, but the security level is limited by insufficient digital secret key length or computing power. Hardware-based optical cryptosystems can generate enormously longer secret keys and enable encryption at light speed, but most reported optical methods, such as double random phase encryption, are less compatible with other systems due to system complexity. In this study, a plain yet high-efficient speckle-based optical cryptosystem is proposed and implemented. A scattering ground glass is exploited to generate physical secret keys of gigabit length and encrypt face images via seemingly random optical speckles at light speed. Face images can then be decrypted from the random speckles by a well-trained decryption neural network, such that face recognition can be realized with up to 98% accuracy. The proposed cryptosystem has wide applicability, and it may open a new avenue for high-security complex information encryption and decryption by utilizing optical speckles.