Ben Li

h-index61

4papers

67citations

Novelty43%

AI Score41

Ranked #92,092 of 205,806 authors (top 45%)#30,654 in CV (top 52%)

4 Papers

AIMar 17, 2025

The Amazon Nova Family of Models: Technical Report and Model Card

Amazon AGI, Aaron Langford, Aayush Shah et al. · amazon-science

We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.

75.7GRMay 18

CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts

Xuan Yang, Yuhang Zeng, Dinglong Fang et al.

Watertight remeshing aims to recover a surface that induces a globally consistent interior--exterior partition of 3D space. However, for meshes with complex topology, single-layer structures, or large missing regions, inferring such a partition from local surface geometry is inherently ambiguous. As a result, existing methods often produce surface-accurate yet volumetrically inconsistent reconstructions, e.g., closely spaced double shells. The key insight of this work is that watertight remeshing should be treated as a volumetric partitioning problem rather than a surface-level repair task. To this end, we propose CelloCut, a constructive framework that formulates watertight conversion as a binary labeling problem over a Delaunay tetrahedral partition of space. We solve this via graph-cut energy minimization with one-sided constraints that preserve proxy-supported interior evidence and weighted interface penalties that discourage unsupported newly introduced boundaries. By computing a globally consistent volumetric partition, CelloCut guarantees a strictly watertight output by construction and strongly suppresses pseudo-watertight artifacts such as double shells, even under severe topological defects. Experimental results on two newly introduced challenging benchmarks, CelloScan and CelloFill, as well as standard ModelNet10 dataset, demonstrate that CelloCut significantly outperforms state-of-the-art methods, particularly in handling complex topologies and single-layer structures, producing compact and volumetrically consistent solid reconstructions. The project page is available at https://rangeryx-66.github.io/CelloCut/.

CVDec 4, 2025

LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging

Zhijian Shu, Cheng Lin, Tao Xie et al.

3D vision foundation models like Visual Geometry Grounded Transformer (VGGT) have advanced greatly in geometric perception. However, it is time-consuming and memory-intensive for long sequences, limiting application to large-scale scenes beyond hundreds of images. To address this, we propose LiteVGGT, achieving up to 10x speedup and substantial memory reduction, enabling efficient processing of 1000-image scenes. We derive two key insights for 3D reconstruction: (1) tokens from local image regions have inherent geometric correlations, leading to high similarity and computational redundancy; (2) token similarity across adjacent network layers remains stable, allowing for reusable merge decisions. Guided by these, we design a simple yet efficient strategy, dubbed geometry-aware cached token merging. We analyze each token's geometric importance, optimizing anchor token selection to better preserve key information for reconstruction. We also cache and reuse merge indices across layers, substantially reducing latency with minimal accuracy impact. This strategy retains VGGT's core performance, enabling efficient fine-tuning and FP8 quantization for further gains. Extensive experiments validate LiteVGGT's effectiveness, scalability, and robustness. Project page: https://garlicba.github.io/LiteVGGT/

CVMay 29, 2025

VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration

Ben Li, Minqi Li, Jie Ren et al.

Image-based virtual try-on aims to fit a target garment to a specific person image and has attracted extensive research attention because of its huge application potential in the e-commerce and fashion industries. To generate high-quality try-on results, accurately warping the clothing item to fit the human body plays a significant role, as slight misalignment may lead to unrealistic artifacts in the fitting image. Most existing methods warp the clothing by feature matching and thin-plate spline (TPS). However, it often fails to preserve clothing details due to self-occlusion, severe misalignment between poses, etc. To address these challenges, this paper proposes a detail retention virtual try-on method via accurate non-rigid registration (VITON-DRR) for diverse human poses. Specifically, we reconstruct a human semantic segmentation using a dual-pyramid-structured feature extractor. Then, a novel Deformation Module is designed for extracting the cloth key points and warping them through an accurate non-rigid registration algorithm. Finally, the Image Synthesis Module is designed to synthesize the deformed garment image and generate the human pose information adaptively. {Compared with} traditional methods, the proposed VITON-DRR can make the deformation of fitting images more accurate and retain more garment details. The experimental results demonstrate that the proposed method performs better than state-of-the-art methods.