Chaoqi Chen

h-index23

3papers

19citations

Novelty33%

AI Score23

Ranked #175,236 of 194,257 authors (top 90%)#55,572 in CV (top 94%)

3 Papers

2.0CVJul 29, 2024

A Model Generalization Study in Localizing Indoor Cows with COw LOcalization (COLO) dataset

Mautushi Das, Gonzalo Ferreira, C. P. James Chen

Precision livestock farming (PLF) increasingly relies on advanced object localization techniques to monitor livestock health and optimize resource management. This study investigates the generalization capabilities of YOLOv8 and YOLOv9 models for cow detection in indoor free-stall barn settings, focusing on varying training data characteristics such as view angles and lighting, and model complexities. Leveraging the newly released public dataset, COws LOcalization (COLO) dataset, we explore three key hypotheses: (1) Model generalization is equally influenced by changes in lighting conditions and camera angles; (2) Higher model complexity guarantees better generalization performance; (3) Fine-tuning with custom initial weights trained on relevant tasks always brings advantages to detection tasks. Our findings reveal considerable challenges in detecting cows in images taken from side views and underscore the importance of including diverse camera angles in building a detection model. Furthermore, our results emphasize that higher model complexity does not necessarily lead to better performance. The optimal model configuration heavily depends on the specific task and dataset. Lastly, while fine-tuning with custom initial weights trained on relevant tasks offers advantages to detection tasks, simpler models do not benefit similarly from this approach. It is more efficient to train a simple model with pre-trained weights without relying on prior relevant information, which can require intensive labor efforts. Future work should focus on adaptive methods and advanced data augmentation to improve generalization and robustness. This study provides practical guidelines for PLF researchers on deploying computer vision models from existing studies, highlights generalization issues, and contributes the COLO dataset containing 1254 images and 11818 cow instances for further research.

15.3CVDec 4, 2024

DIVE: Taming DINO for Subject-Driven Video Editing

Yi Huang, Wei Xiong, He Zhang et al.

Building on the success of diffusion models in image generation and editing, video editing has recently gained substantial attention. However, maintaining temporal consistency and motion alignment still remains challenging. To address these issues, this paper proposes DINO-guided Video Editing (DIVE), a framework designed to facilitate subject-driven editing in source videos conditioned on either target text prompts or reference images with specific identities. The core of DIVE lies in leveraging the powerful semantic features extracted from a pretrained DINOv2 model as implicit correspondences to guide the editing process. Specifically, to ensure temporal motion consistency, DIVE employs DINO features to align with the motion trajectory of the source video. For precise subject editing, DIVE incorporates the DINO features of reference images into a pretrained text-to-image model to learn Low-Rank Adaptations (LoRAs), effectively registering the target subject's identity. Extensive experiments on diverse real-world videos demonstrate that our framework can achieve high-quality editing results with robust motion consistency, highlighting the potential of DINO to contribute to video editing. Project page: https://dino-video-editing.github.io

2.0CVOct 28, 2024

Ant Detective: An Automated Approach for Counting Ants in Densely Populated Images and Gaining Insight into Ant Foraging Behavior

Mautushi Das, Fang-Ling Chloe Liu, Charly Hartle et al.

Ant foraging behavior is essential to understanding ecological dynamics and developing effective pest management strategies, but quantifying this behavior is challenging due to the labor-intensive nature of manual counting, especially in densely populated images. This study presents an automated approach using computer vision to count ants and analyze their foraging behavior. Leveraging the YOLOv8 model, the system was calibrated and evaluated on datasets encompassing various imaging scenarios and densities. The study results demonstrate that the system achieves average precision and recall of up to 87.96% and 87,78%, respectively, with only 64 calibration images provided when the both calibration and evaluation images share similar imaging backgrounds. When the background is more complex than the calibration images, the system requires a larger calibration set to generalize effectively, with 1,024 images yielding the precision and recall of up to 83.60% and 78.88, respectively. In more challenging scenarios where more than one thousand ants are present in a single image, the system significantly improves detection accuracy by slicing images into smaller patches, reaching a precision and recall of 77.97% and 71.36%, respectively. The system's ability to generate heatmaps visualizes the spatial distribution of ant activity over time, providing valuable insights into their foraging patterns. This spatial-temporal analysis enables a more comprehensive understanding of ant behavior, which is crucial for ecological studies and improving pest control methods. By automating the counting process and offering detailed behavioral analysis, this study provides an efficient tool for researchers and pest control professionals to develop more effective strategies.