CVFeb 26
PhotoAgent: Agentic Photo Editing with Exploratory Visual Aesthetic PlanningMingde Yao, Zhiyuan You, King-Man Tam et al.
With the recent fast development of generative models, instruction-based image editing has shown great potential in generating high-quality images. However, the quality of editing highly depends on carefully designed instructions, placing the burden of task decomposition and sequencing entirely on the user. To achieve autonomous image editing, we present PhotoAgent, a system that advances image editing through explicit aesthetic planning. Specifically, PhotoAgent formulates autonomous image editing as a long-horizon decision-making problem. It reasons over user aesthetic intent, plans multi-step editing actions via tree search, and iteratively refines results through closed-loop execution with memory and visual feedback, without requiring step-by-step user prompts. To support reliable evaluation in real-world scenarios, we introduce UGC-Edit, an aesthetic evaluation benchmark consisting of 7,000 photos and a learned aesthetic reward model. We also construct a test set containing 1,017 photos to systematically assess autonomous photo editing performance. Extensive experiments demonstrate that PhotoAgent consistently improves both instruction adherence and visual quality compared with baseline methods. The project page is https://mdyao.github.io/PhotoAgent/.
CVMar 23, 2025Code
PolarFree: Polarization-based Reflection-free ImagingMingde Yao, Menglu Wang, King-Man Tam et al.
Reflection removal is challenging due to complex light interactions, where reflections obscure important details and hinder scene understanding. Polarization naturally provides a powerful cue to distinguish between reflected and transmitted light, enabling more accurate reflection removal. However, existing methods often rely on small-scale or synthetic datasets, which fail to capture the diversity and complexity of real-world scenarios. To this end, we construct a large-scale dataset, PolaRGB, for Polarization-based reflection removal of RGB images, which enables us to train models that generalize effectively across a wide range of real-world scenarios. The PolaRGB dataset contains 6,500 well-aligned mixed-transmission image pairs, 8x larger than existing polarization datasets, and is the first to include both RGB and polarization images captured across diverse indoor and outdoor environments with varying lighting conditions. Besides, to fully exploit the potential of polarization cues for reflection removal, we introduce PolarFree, which leverages diffusion process to generate reflection-free cues for accurate reflection removal. Extensive experiments show that PolarFree significantly enhances image clarity in challenging reflective scenarios, setting a new benchmark for polarized imaging and reflection removal. Code and dataset are available at https://github.com/mdyao/PolarFree.
CVJun 18, 2019Code
Crop Lodging Prediction from UAV-Acquired Images of Wheat and Canola using a DCNN Augmented with Handcrafted Texture FeaturesSara Mardanisamani, Farhad Maleki, Sara Hosseinzadeh Kassani et al.
Lodging, the permanent bending over of food crops, leads to poor plant growth and development. Consequently, lodging results in reduced crop quality, lowers crop yield, and makes harvesting difficult. Plant breeders routinely evaluate several thousand breeding lines, and therefore, automatic lodging detection and prediction is of great value aid in selection. In this paper, we propose a deep convolutional neural network (DCNN) architecture for lodging classification using five spectral channel orthomosaic images from canola and wheat breeding trials. Also, using transfer learning, we trained 10 lodging detection models using well-established deep convolutional neural network architectures. Our proposed model outperforms the state-of-the-art lodging detection methods in the literature that use only handcrafted features. In comparison to 10 DCNN lodging detection models, our proposed model achieves comparable results while having a substantially lower number of parameters. This makes the proposed model suitable for applications such as real-time classification using inexpensive hardware for high-throughput phenotyping pipelines. The GitHub repository at https://github.com/FarhadMaleki/LodgedNet contains code and models.
AIJan 15
FilDeep: Learning Large Deformations of Elastic-Plastic Solids with Multi-Fidelity DataJianheng Tang, Shilong Tao, Zhe Feng et al.
The scientific computation of large deformations in elastic-plastic solids is crucial in various manufacturing applications. Traditional numerical methods exhibit several inherent limitations, prompting Deep Learning (DL) as a promising alternative. The effectiveness of current DL techniques typically depends on the availability of high-quantity and high-accuracy datasets, which are yet difficult to obtain in large deformation problems. During the dataset construction process, a dilemma stands between data quantity and data accuracy, leading to suboptimal performance in the DL models. To address this challenge, we focus on a representative application of large deformations, the stretch bending problem, and propose FilDeep, a Fidelity-based Deep Learning framework for large Deformation of elastic-plastic solids. Our FilDeep aims to resolve the quantity-accuracy dilemma by simultaneously training with both low-fidelity and high-fidelity data, where the former provides greater quantity but lower accuracy, while the latter offers higher accuracy but in less quantity. In FilDeep, we provide meticulous designs for the practical large deformation problem. Particularly, we propose attention-enabled cross-fidelity modules to effectively capture long-range physical interactions across MF data. To the best of our knowledge, our FilDeep presents the first DL framework for large deformation problems using MF data. Extensive experiments demonstrate that our FilDeep consistently achieves state-of-the-art performance and can be efficiently deployed in manufacturing.