Shuheng Wang

CV
h-index9
5papers
10citations
Novelty48%
AI Score44

5 Papers

CVMay 15
SeamCam: Quantifying Seamless Camouflage via Multi-Cue Visual Detectability

Amin Karimi Monsefi, Abolfazl Meyarian, Mridul Khurana et al.

Animals are described as effectively camouflaged when they blend seamlessly with their surrounding, yet no standardized quantitative measure of this seamlessness exists. We address this gap by framing camouflage evaluation as a visual localization problem: a well-camouflaged animal is one that remains difficult to detect even when its category is known. We introduce SeamCam (Seamless Camouflage), a metric that quantifies how detectable an animal is from the available visual evidence. Given an image and a target species, SeamCam generates category-conditioned detection proposals, extracts segmentation masks, and identifies the subset whose collective union yields the highest IoU with the ground-truth mask. The SeamCam score is one minus this maximum recoverable localization signal, where a higher score indicates stronger camouflage (i.e., lower detectability). In a human two-alternative forced-choice study with 94 participants and 2,390 comparisons, SeamCam achieves 78.82% agreement with human camouflage difficulty judgments, outperforming state-of-the-art by about 25%. We then demonstrate SeamCam's utility as a preference signal for Direct Preference Optimization (DPO) to fine-tune a diffusion-based inpainting model for camouflage generation. This offers an affordable training approach with an objective explicitly suited for camouflage generation, unlike typical diffusion models. To support rigorous benchmarking, we further introduce CamFG-1.5k, a curated dataset of 1,521 high-resolution images in which animals are fully visible prior to camouflage generation, enabling unbiased evaluation by controlling for occlusion artifacts present in existing datasets. https://7amin.github.io/SeamCam/

CVOct 31, 2025
BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing

Fangxun Liu, S M Rayeed, Samuel Stevens et al.

In entomology and ecology research, biologists often need to collect a large number of insects, among which beetles are the most common species. A common practice for biologists to organize beetles is to place them on trays and take a picture of each tray. Given the images of thousands of such trays, it is important to have an automated pipeline to process the large-scale data for further research. Therefore, we develop a 3-stage pipeline to detect all the beetles on each tray, sort and crop the image of each beetle, and do morphological segmentation on the cropped beetles. For detection, we design an iterative process utilizing a transformer-based open-vocabulary object detector and a vision-language model. For segmentation, we manually labeled 670 beetle images and fine-tuned two variants of a transformer-based segmentation model to achieve fine-grained segmentation of beetles with relatively high accuracy. The pipeline integrates multiple deep learning methods and is specialized for beetle image processing, which can greatly improve the efficiency to process large-scale beetle data and accelerate biological research.

ROMar 6
HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild

Ziyang Zhao, Shuheng Wang, Zhonghua Miao et al.

This work presents the first study on transferring vision-language-action (VLA) policies to real greenhouse tabletop strawberry harvesting, a long-horizon, unstructured task challenged by occlusion and specular reflections. We built an end-to-end closed-loop system on the HarvestFlex platform using three-view RGB sensing (two fixed scene views plus a wrist-mounted view) and intentionally avoided depth clouds and explicit geometric calibration. We collected 3.71 h of VR teleoperated demonstrations (227 episodes) and fine-tuned pi_0, pi_0.5, and WALL-OSS with full fine-tuning and LoRA. Under a unified 50 trials real-greenhouse protocol and metrics spanning completion, pi_0.5 with full fine-tuning achieved success rate of 74.0% with 32.6 s/pick and damage rate of 4.1%. Asynchronous inference-control decoupling further improved performance over synchronous deployment. Results showed non-trivial closed-loop picking with fewer than four hours of real data, while remaining limited by close-range observability loss and contact-dynamics mismatch. A demonstration video is available at: https://youtu.be/bN8ZowZKPMI.

CVMar 7, 2025
EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation

Chao Zhang, Yifeng Zhou, Shuheng Wang et al.

We have recently seen great progress in 3D scene reconstruction through explicit point-based 3D Gaussian Splatting (3DGS), notable for its high quality and fast rendering speed. However, reconstructing dynamic scenes such as complex human performances with long durations remains challenging. Prior efforts fall short of modeling a long-term sequence with drastic motions, frequent topology changes or interactions with props, and resort to segmenting the whole sequence into groups of frames that are processed independently, which undermines temporal stability and thereby leads to an unpleasant viewing experience and inefficient storage footprint. In view of this, we introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to coarsely align with the target frame, and then refines it with minimal point addition/subtraction, particularly in fast-changing areas. Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics while maintaining fast rendering through its purely explicit representation. Moreover, by exploiting temporal coherence between successive frames, we propose a simple yet effective compression algorithm that achieves over 50x compression rate. Extensive experiments on both public benchmarks and challenging custom datasets demonstrate that our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.

MLJan 2, 2018
A novel improved fuzzy support vector machine based stock price trend forecast model

Shuheng Wang, Guohao Li, Yifan Bao

Application of fuzzy support vector machine in stock price forecast. Support vector machine is a new type of machine learning method proposed in 1990s. It can deal with classification and regression problems very successfully. Due to the excellent learning performance of support vector machine, the technology has become a hot research topic in the field of machine learning, and it has been successfully applied in many fields. However, as a new technology, there are many limitations to support vector machines. There is a large amount of fuzzy information in the objective world. If the training of support vector machine contains noise and fuzzy information, the performance of the support vector machine will become very weak and powerless. As the complexity of many factors influence the stock price prediction, the prediction results of traditional support vector machine cannot meet people with precision, this study improved the traditional support vector machine fuzzy prediction algorithm is proposed to improve the new model precision. NASDAQ Stock Market, Standard & Poor's (S&P) Stock market are considered. Novel advanced- fuzzy support vector machine (NA-FSVM) is the proposed methodology.