CVAug 7, 2024
Fast Sprite Decomposition from Animated GraphicsTomoyuki Suzuki, Kotaro Kikuchi, Kota Yamaguchi
This paper presents an approach to decomposing animated graphics into sprites, a set of basic elements or layers. Our approach builds on the optimization of sprite parameters to fit the raster video. For efficiency, we assume static textures for sprites to reduce the search space while preventing artifacts using a texture prior model. To further speed up the optimization, we introduce the initialization of the sprite parameters utilizing a pre-trained video object segmentation model and user input of single frame annotations. For our study, we construct the Crello Animation dataset from an online design service and define quantitative metrics to measure the quality of the extracted sprites. Experiments show that our method significantly outperforms baselines for similar decomposition tasks in terms of the quality/efficiency tradeoff.
GRApr 3, 2025Code
MG-Gen: Single Image to Motion Graphics GenerationTakahiro Shirakawa, Tomoyuki Suzuki, Takuto Narumoto et al.
We introduce MG-Gen, a framework that generates motion graphics directly from a single raster image. MG-Gen decompose a single raster image into layered structures represented as HTML, generate animation scripts for each layer, and then render them into a video. Experiments confirm MG-Gen generates dynamic motion graphics while preserving text readability and fidelity to the input conditions, whereas state-of-the-art image-to-video generation methods struggle with them. The code is available at https://github.com/CyberAgentAILab/MG-GEN.
GRSep 29, 2025
LayerD: Decomposing Raster Graphic Designs into LayersTomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.
Designers craft and edit graphic designs in a layer representation, but layer-based editing becomes impossible once composited into a raster image. In this work, we propose LayerD, a method to decompose raster graphic designs into layers for re-editable creative workflow. LayerD addresses the decomposition task by iteratively extracting unoccluded foreground layers. We propose a simple yet effective refinement approach taking advantage of the assumption that layers often exhibit uniform appearance in graphic designs. As decomposition is ill-posed and the ground-truth layer structure may not be reliable, we develop a quality metric that addresses the difficulty. In experiments, we show that LayerD successfully achieves high-quality decomposition and outperforms baselines. We also demonstrate the use of LayerD with state-of-the-art image generators and layer-based editing.
CVAug 20, 2021
Video Ads Content Structuring by Combining Scene Confidence Prediction and TaggingTomoyuki Suzuki, Antonio Tejero-de-Pablos
Video ads segmentation and tagging is a challenging task due to two main reasons: (1) the video scene structure is complex and (2) it includes multiple modalities (e.g., visual, audio, text.). While previous work focuses mostly on activity videos (e.g. "cooking", "sports"), it is not clear how they can be leveraged to tackle the task of video ads content structuring. In this paper, we propose a two-stage method that first provides the boundaries of the scenes, and then combines a confidence score for each segmented scene and the tag classes predicted for that scene. We provide extensive experimental results on the network architectures and modalities used for the proposed method. Our combined method improves the previous baselines on the challenging "Tencent Advertisement Video" dataset.
CVApr 8, 2018
Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DBTomoyuki Suzuki, Hirokatsu Kataoka, Yoshimitsu Aoki et al.
In this paper, we propose a novel approach for traffic accident anticipation through (i) Adaptive Loss for Early Anticipation (AdaLEA) and (ii) a large-scale self-annotated incident database for anticipation. The proposed AdaLEA allows a model to gradually learn an earlier anticipation as training progresses. The loss function adaptively assigns penalty weights depending on how early the model can an- ticipate a traffic accident at each epoch. Additionally, we construct a Near-miss Incident DataBase for anticipation. This database contains an enormous number of traffic near- miss incident videos and annotations for detail evaluation of two tasks, risk anticipation and risk-factor anticipation. In our experimental results, we found our proposal achieved the highest scores for risk anticipation (+6.6% better on mean average precision (mAP) and 2.36 sec earlier than previous work on the average time-to-collision (ATTC)) and risk-factor anticipation (+4.3% better on mAP and 0.70 sec earlier than previous work on ATTC).