CV AI HCDec 9, 2025

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Jing Jie Tan, Anissa Mokraoui, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum

arXiv:2512.08873v12 citationsh-index: 152024 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

Originality Incremental advance

AI Analysis

This work addresses the problem of lightweight image captioning for low-resolution images, which is incremental as it builds on existing methods to reduce resource demands.

The paper tackles the challenge of generating captions for low-resolution images by proposing SOLI, a Siamese network-based method that optimizes latent embeddings to improve efficiency and accuracy, achieving competitive performance with reduced computational overhead.

Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process. By focusing on a dual-pathway neural network structure, SOLI minimizes computational overhead without sacrificing performance, making it an ideal choice for training on resource-constrained scenarios.

View on arXiv PDF

Similar