Yaochun Shen

CVJun 1, 2025

CountingFruit: Language-Guided 3D Fruit Counting with Semantic Gaussian Splatting

Fengze Li, Yangle Liu, Jieming Ma et al.

Accurate 3D fruit counting in orchards is challenging due to heavy occlusion, semantic ambiguity between fruits and surrounding structures, and the high computational cost of volumetric reconstruction. Existing pipelines often rely on multi-view 2D segmentation and dense volumetric sampling, which lead to accumulated fusion errors and slow inference. We introduce FruitLangGS, a language-guided 3D fruit counting framework that reconstructs orchard-scale scenes using an adaptive-density Gaussian Splatting pipeline with radius-aware pruning and tile-based rasterization, enabling scalable 3D representation. During inference, compressed CLIP-aligned semantic vectors embedded in each Gaussian are filtered via a dual-threshold cosine similarity mechanism, retrieving Gaussians relevant to target prompts while suppressing common distractors (e.g., foliage), without requiring retraining or image-space masks. The selected Gaussians are then sampled into dense point clouds and clustered geometrically to estimate fruit instances, remaining robust under severe occlusion and viewpoint variation. Experiments on nine different orchard-scale datasets demonstrate that FruitLangGS consistently outperforms existing pipelines in instance counting recall, avoiding multi-view segmentation fusion errors and achieving up to 99.7% recall on Pfuji-Size_Orch2018 orchard dataset. Ablation studies further confirm that language-conditioned semantic embedding and dual-threshold prompt filtering are essential for suppressing distractors and improving counting accuracy under heavy occlusion. Beyond fruit counting, the same framework enables prompt-driven 3D semantic retrieval without retraining, highlighting the potential of language-guided 3D perception for scalable agricultural scene understanding.

CVJun 16, 2015

Subsampled terahertz data reconstruction based on spatio-temporal dictionary learning

Vahid Abolghasemi, Hao Shen, Yaochun Shen et al.

In this paper, the problem of terahertz pulsed imaging and reconstruction is addressed. It is assumed that an incomplete (subsampled) three dimensional THz data set has been acquired and the aim is to recover all missing samples. A sparsity-inducing approach is proposed for this purpose. First, a simple interpolation is applied to incomplete noisy data. Then, we propose a spatio-temporal dictionary learning method to obtain an appropriate sparse representation of data based on a joint sparse recovery algorithm. Then, using the sparse coefficients and the learned dictionary, the 3D data is effectively denoised by minimizing a simple cost function. We consider two types of terahertz data to evaluate the performance of the proposed approach; THz data acquired for a model sample with clear layered structures (e.g., a T-shape plastic sheet buried in a polythene pellet), and pharmaceutical tablet data (with low spatial resolution). The achieved signal-to-noise-ratio for reconstruction of T-shape data, from only 5% observation was 19 dB. Moreover, the accuracies of obtained thickness and depth measurements for pharmaceutical tablet data after reconstruction from 10% observation were 98.8%, and 99.9%, respectively. These results, along with chemical mapping analysis, presented at the end of this paper, confirm the accuracy of the proposed method.

Yaochun Shen

2 Papers