Yihong Wang

CV
h-index28
12papers
68citations
Novelty44%
AI Score38

12 Papers

CVOct 8, 2023
ITRE: Low-light Image Enhancement Based on Illumination Transmission Ratio Estimation

Yu Wang, Yihong Wang, Tong Liu et al.

Noise, artifacts, and over-exposure are significant challenges in the field of low-light image enhancement. Existing methods often struggle to address these issues simultaneously. In this paper, we propose a novel Retinex-based method, called ITRE, which suppresses noise and artifacts from the origin of the model, prevents over-exposure throughout the enhancement process. Specifically, we assume that there must exist a pixel which is least disturbed by low light within pixels of same color. First, clustering the pixels on the RGB color space to find the Illumination Transmission Ratio (ITR) matrix of the whole image, which determines that noise is not over-amplified easily. Next, we consider ITR of the image as the initial illumination transmission map to construct a base model for refined transmission map, which prevents artifacts. Additionally, we design an over-exposure module that captures the fundamental characteristics of pixel over-exposure and seamlessly integrate it into the base model. Finally, there is a possibility of weak enhancement when inter-class distance of pixels with same color is too small. To counteract this, we design a Robust-Guard module that safeguards the robustness of the image enhancement process. Extensive experiments demonstrate the effectiveness of our approach in suppressing noise, preventing artifacts, and controlling over-exposure level simultaneously. Our method performs superiority in qualitative and quantitative performance evaluations by comparing with state-of-the-art methods.

CVNov 30, 2022
Gradient Domain Weighted Guided Image Filtering

Bo Wang, Yihong Wang, Xiubao Sui et al.

Guided image filter is a well-known local filter in image processing. However, the presence of halo artifacts is a common issue associated with this type of filter. This paper proposes an algorithm that utilizes gradient information to accurately identify the edges of an image. Furthermore, the algorithm uses weighted information to distinguish flat areas from edge areas, resulting in sharper edges and reduced blur in flat areas. This approach mitigates the excessive blurring near edges that often leads to halo artifacts. Experimental results demonstrate that the proposed algorithm significantly suppresses halo artifacts at the edges, making it highly effective for both image denoising and detail enhancement.

CVSep 27, 2025Code
FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection

Ben Liang, Yuan Liu, Bingwen Qiu et al.

Aerial-view object detection is a critical technology for real-world applications such as natural resource monitoring, traffic management, and UAV-based search and rescue. Detecting tiny objects in high-resolution aerial imagery presents a long-standing challenge due to their limited visual cues and the difficulty of modeling global context in complex scenes. Existing methods are often hampered by delayed contextual fusion and inadequate non-linear modeling, failing to effectively use global information to refine shallow features and thus encountering a performance bottleneck. To address these challenges, we propose FMC-DETR, a novel framework with frequency-decoupled fusion for aerial-view object detection. First, we introduce the Wavelet Kolmogorov-Arnold Transformer (WeKat) backbone, which applies cascaded wavelet transforms to enhance global low-frequency context perception in shallow features while preserving fine-grained details, and employs Kolmogorov-Arnold networks to achieve adaptive non-linear modeling of multi-scale dependencies. Next, a lightweight Cross-stage Partial Fusion (CPF) module reduces redundancy and improves multi-scale feature interaction. Finally, we introduce the Multi-Domain Feature Coordination (MDFC) module, which unifies spatial, frequency, and structural priors to to balance detail preservation and global enhancement. Extensive experiments on benchmark aerial-view datasets demonstrate that FMC-DETR achieves state-of-the-art performance with fewer parameters. On the challenging VisDrone dataset, our model achieves improvements of 6.5% AP and 8.2% AP50 over the baseline, highlighting its effectiveness in tiny object detection. The code can be accessed at https://github.com/bloomingvision/FMC-DETR.

IRDec 21, 2024Code
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval

Luo Ji, Feixiang Guo, Teng Chen et al.

Despite the recent advancement in Retrieval-Augmented Generation (RAG) systems, most retrieval methodologies are often developed for factual retrieval, which assumes query and positive documents are semantically similar. In this paper, we instead propose and study a more challenging type of retrieval task, called hidden rationale retrieval, in which query and document are not similar but can be inferred by reasoning chains, logic relationships, or empirical experiences. To address such problems, an instruction-tuned Large language model (LLM) with a cross-encoder architecture could be a reasonable choice. To further strengthen pioneering LLM-based retrievers, we design a special instruction that transforms the retrieval task into a generative task by prompting LLM to answer a binary-choice question. The model can be fine-tuned with direct preference optimization (DPO). The framework is also optimized for computational efficiency with no performance degradation. We name this retrieval framework by RaHoRe and verify its zero-shot and fine-tuned performance superiority on Emotional Support Conversation (ESC), compared with previous retrieval works. Our study suggests the potential to employ LLM as a foundation for a wider scope of retrieval tasks. Our codes, models, and datasets are available on https://github.com/flyfree5/LaHoRe.

CLSep 10, 2024
LaMsS: When Large Language Models Meet Self-Skepticism

Yetao Wu, Yihong Wang, Teng Chen et al.

Hallucination is a major challenge for large language models (LLMs), preventing their further application in some fields. The skeptical thinking of humankind could be useful for LLMs to self-cognition, self-reflection and alleviate their hallucinations. Inspired by this consideration, we propose a novel approach called LaMsS, which combines the semantic understanding capability of LLMs with self-skepticism. By introducing a series of skepticism tokens and augmenting them into the vocabulary, we conduct both pertaining and finetuning, which allow the LLM to decode each normal token followed by a skeptical token, representing different skepticism levels. By calculating the response skepticism given a query, one can define a new self-aware LLM which is only willing to answer with relative lower skepticism level than the threshold. By examining the accuracy, AUC and AP of willingly answering questions, we demonstrate that LaMsS achieves better performance than baselines on both multi-choice questions and open-domain question-answering benchmarks, and can generalize to multi-task and out-of-domain settings. Our study sheds some lights on the self-skepticism modeling on further artificial intelligence. Project code and model checkpoints can be found in https://anonymous.4open.science/r/SM-1E76.

CLFeb 5, 2024
LB-KBQA: Large-language-model and BERT based Knowledge-Based Question and Answering System

Yan Zhao, Zhongyun Li, Yushan Pan et al.

Generative Artificial Intelligence (AI), because of its emergent abilities, has empowered various fields, one typical of which is large language models (LLMs). One of the typical application fields of Generative AI is large language models (LLMs), and the natural language understanding capability of LLM is dramatically improved when compared with conventional AI-based methods. The natural language understanding capability has always been a barrier to the intent recognition performance of the Knowledge-Based-Question-and-Answer (KBQA) system, which arises from linguistic diversity and the newly appeared intent. Conventional AI-based methods for intent recognition can be divided into semantic parsing-based and model-based approaches. However, both of the methods suffer from limited resources in intent recognition. To address this issue, we propose a novel KBQA system based on a Large Language Model(LLM) and BERT (LB-KBQA). With the help of generative AI, our proposed method could detect newly appeared intent and acquire new knowledge. In experiments on financial domain question answering, our model has demonstrated superior effectiveness.

QMMay 20, 2025
Predicting Neoadjuvant Chemotherapy Response in Triple-Negative Breast Cancer Using Pre-Treatment Histopathologic Images

Hikmat Khan, Ziyu Su, Huina Zhang et al.

Triple-negative breast cancer (TNBC) remains a major clinical challenge due to its aggressive behavior and lack of targeted therapies. Accurate early prediction of response to neoadjuvant chemotherapy (NACT) is essential for guiding personalized treatment strategies and improving patient outcomes. In this study, we present an attention-based multiple instance learning (MIL) framework designed to predict pathologic complete response (pCR) directly from pre-treatment hematoxylin and eosin (H&E)-stained biopsy slides. The model was trained on a retrospective in-house cohort of 174 TNBC patients and externally validated on an independent cohort (n = 30). It achieved a mean area under the curve (AUC) of 0.85 during five-fold cross-validation and 0.78 on external testing, demonstrating robust predictive performance and generalizability. To enhance model interpretability, attention maps were spatially co-registered with multiplex immuno-histochemistry (mIHC) data stained for PD-L1, CD8+ T cells, and CD163+ macrophages. The attention regions exhibited moderate spatial overlap with immune-enriched areas, with mean Intersection over Union (IoU) scores of 0.47 for PD-L1, 0.45 for CD8+ T cells, and 0.46 for CD163+ macrophages. The presence of these biomarkers in high-attention regions supports their biological relevance to NACT response in TNBC. This not only improves model interpretability but may also inform future efforts to identify clinically actionable histological biomarkers directly from H&E-stained biopsy slides, further supporting the utility of this approach for accurate NACT response prediction and advancing precision oncology in TNBC.

CLJul 17, 2025
Making Language Model a Hierarchical Classifier

Yihong Wang, Zhonglin Jiang, Ningyuan Xi et al.

Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. We also provide thorough theoretical analysis to validate the convergence and computational savings of our methodology. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.

HCFeb 26, 2021
Casual and Hardcore Player Traits and Gratifications of Pokémon GO, Harry Potter: Wizards Unite, Ingress

John Dunham, Konstantinos Papangelis, Nicolas LaLone et al.

Location-based games (LBG) impose virtual spaces on top of physical locations. Studies have explored LBG from various perspectives. However, a comprehensive study of who these players are, their traits, their gratifications, and the links between them is conspicuously absent from the literature. In this paper, we aim to address this lacuna through a series of surveys with 2390 active LBG players utilizing Tondello's Player Traits Model and Scale of Game playing Preferences, and Hamari's scale of LBG gratifications. Our findings (1) illustrate an association between player satisfaction and social aspects of the studied games, (2) explicate how the core-loops of the studied games impact the expressed gratifications and the affine traits of players, and (3) indicate a strong distinction between hardcore and casual players based on both traits and gratifications. Overall our findings shed light into the players of LBG, their traits, and gratifications they derive from playing LBGs.

CVSep 17, 2018
Computer-Aided Diagnosis of Label-Free 3-D Optical Coherence Microscopy Images of Human Cervical Tissue

Yutao Ma, Tao Xu, Xiaolei Huang et al.

Objective: Ultrahigh-resolution optical coherence microscopy (OCM) has recently demonstrated its potential for accurate diagnosis of human cervical diseases. One major challenge for clinical adoption, however, is the steep learning curve clinicians need to overcome to interpret OCM images. Developing an intelligent technique for computer-aided diagnosis (CADx) to accurately interpret OCM images will facilitate clinical adoption of the technology and improve patient care. Methods: 497 high-resolution 3-D OCM volumes (600 cross-sectional images each) were collected from 159 ex vivo specimens of 92 female patients. OCM image features were extracted using a convolutional neural network (CNN) model, concatenated with patient information (e.g., age, HPV results), and classified using a support vector machine classifier. Ten-fold cross-validations were utilized to test the performance of the CADx method in a five-class classification task and a binary classification task. Results: An 88.3 plus or minus 4.9% classification accuracy was achieved for five fine-grained classes of cervical tissue, namely normal, ectropion, low-grade and high-grade squamous intraepithelial lesions (LSIL and HSIL), and cancer. In the binary classification task (low-risk [normal, ectropion and LSIL] vs. high-risk [HSIL and cancer]), the CADx method achieved an area-under-the-curve (AUC) value of 0.959 with an 86.7 plus or minus 11.4% sensitivity and 93.5 plus or minus 3.8% specificity. Conclusion: The proposed deep-learning based CADx method outperformed three human experts. It was also able to identify morphological characteristics in OCM images that were consistent with histopathological interpretations. Significance: Label-free OCM imaging, combined with deep-learning based CADx methods, hold a great promise to be used in clinical settings for the effective screening and diagnosis of cervical diseases.

NAOct 28, 2016
Uniform convergent scheme for strongly anisotropic diffusion equations with closed field lines

Yihong Wang, Wenjun Ying, Min Tang

In magnetized plasma, the magnetic field confines particles around field lines. The ratio between the intensity of the parallel and perpendicular viscosity or heat conduction may reach the order of $10^{12}$. When the magnetic fields have closed field lines and form a "magnetic island", the convergence order of most known schemes depends on the anisotropy strength. In this paper, by integration of the original differential equation along each closed field line, we introduce a simple but very efficient asymptotic preserving reformulation, which yields uniform convergence with respect to the anisotropy strength. Only slight modification to the original code is required and neither change of coordinates nor mesh adaptation is needed. Numerical examples demonstrating the performance of the new scheme are presented.

NAJul 29, 2016
An Asymptotic Preserving method for strongly anisotropic diffusion equations based on field line integration

Min Tang, Yihong Wang

In magnetized plasma, the magnetic field confines the particles around the field lines. The anisotropy intensity in the viscosity and heat conduction may reach the order of $10^{12}$. When the boundary conditions are periodic or Neumann, the strong diffusion leads to an ill-posed limiting problem. To remove the ill-conditionedness in the highly anisotropic diffusion equations, we introduce a simple but very efficient asymptotic preserving reformulation in this paper. The key idea is that, instead of discretizing the Neumann boundary conditions locally, we replace one of the Neumann boundary condition by the integration of the original problem along the field line, the singular $1/ε$ terms can be replaced by $O(1)$ terms after the integration, so that yields a well-posed problem. Small modifications to the original code are required and no change of coordinates nor mesh adaptation are needed. Uniform convergence with respect to the anisotropy strength $1/ε$ can be observed numerically and the condition number does not scale with the anisotropy.