CVSep 8, 2025
Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language ModelsJaemin Son, Sujin Choi, Inyong Yun
Recent progress in vision-language models (VLMs) has led to impressive results in document understanding tasks, but their high computational demands remain a challenge. To mitigate the compute burdens, we propose a lightweight token pruning framework that filters out non-informative background regions from document images prior to VLM processing. A binary patch-level classifier removes non-text areas, and a max-pooling refinement step recovers fragmented text regions to enhance spatial coherence. Experiments on real-world document datasets demonstrate that our approach substantially lowers computational costs, while maintaining comparable accuracy.
IVNov 9, 2020
EPSR: Edge Profile Super resolutionJiun Lee, Jaekwang Kim, Inyong Yun
In this paper, we propose Edge Profile Super Resolution(EPSR) method to preserve structure information and to restore texture. We make EPSR by stacking modified Fractal Residual Network(mFRN) structures hierarchically and repeatedly. mFRN is made up of lots of Residual Edge Profile Blocks(REPBs) consisting of three different modules such as Residual Efficient Channel Attention Block(RECAB) module, Edge Profile(EP) module, and Context Network(CN) module. RECAB produces more informative features with high frequency components. From the feature, EP module produce structure informed features by generating edge profile itself. Finally, CN module captures details by exploiting high frequency information such as texture and structure with proper sharpness. As repeating the procedure in mFRN structure, our EPSR could extract high-fidelity features and thus it prevents texture loss and preserves structure with appropriate sharpness. Experimental results present that our EPSR achieves competitive performance against state-of-the-art methods in PSNR and SSIM evaluation metrics as well as visual results.
CVAug 27, 2020
Edge and Identity Preserving Network for Face Super-ResolutionJonghyun Kim, Gen Li, Inyong Yun et al.
Face super-resolution (SR) has become an indispensable function in security solutions such as video surveillance and identification system, but the distortion in facial components is a great challenge in it. Most state-of-the-art methods have utilized facial priors with deep neural networks. These methods require extra labels, longer training time, and larger computation memory. In this paper, we propose a novel Edge and Identity Preserving Network for Face SR Network, named as EIPNet, to minimize the distortion by utilizing a lightweight edge block and identity information. We present an edge block to extract perceptual edge information, and concatenate it to the original feature maps in multiple scales. This structure progressively provides edge information in reconstruction to aggregate local and global structural information. Moreover, we define an identity loss function to preserve identification of SR images. The identity loss function compares feature distributions between SR images and their ground truth to recover identities in SR images. In addition, we provide a luminance-chrominance error (LCE) to separately infer brightness and color information in SR images. The LCE method not only reduces the dependency of color information by dividing brightness and color components but also enables our network to reflect differences between SR images and their ground truth in two color spaces of RGB and YUV. The proposed method facilitates the proposed SR network to elaborately restore facial components and generate high quality 8x scaled SR images with a lightweight network structure. Furthermore, our network is able to reconstruct an 128x128 SR image with 215 fps on a GTX 1080Ti GPU. Extensive experiments demonstrate that our network qualitatively and quantitatively outperforms state-of-the-art methods on two challenging datasets: CelebA and VGGFace2.
CVOct 1, 2018
Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box AlignmentInyong Yun, Cheolkon Jung, Xinran Wang et al.
Pedestrians in videos have a wide range of appearances such as body poses, occlusions, and complex backgrounds, and there exists the proposal shift problem in pedestrian detection that causes the loss of body parts such as head and legs. To address it, we propose part-level convolutional neural networks (CNN) for pedestrian detection using saliency and boundary box alignment in this paper. The proposed network consists of two sub-networks: detection and alignment. We use saliency in the detection sub-network to remove false positives such as lamp posts and trees. We adopt bounding box alignment on detection proposals in the alignment sub-network to address the proposal shift problem. First, we combine FCN and CAM to extract deep features for pedestrian detection. Then, we perform part-level CNN to recall the lost body parts. Experimental results on various datasets demonstrate that the proposed method remarkably improves accuracy in pedestrian detection and outperforms existing state-of-the-arts in terms of log average miss rate at false position per image (FPPI).