Fumihiko Takahashi

CV
5papers
5citations
Novelty36%
AI Score37

5 Papers

CVMay 28Code
Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding

Fumiya Tatematsu, Fumihiko Takahashi

We present the 1st-place solution to the ACCIDENT challenge at the CVPR 2026 AUTOPILOT Workshop, which asks for zero-shot prediction of accident timing, impact centroid, and collision type from CCTV footage. On a frozen Qwen3-VL-32B-Instruct checkpoint we build a three-stage pipeline (full-video joint prediction, time refinement, and single-frame grounding of the impact centroid), run the same pipeline a second time on a 235B Mixture-of-Experts sibling, blend the two outputs 9:1, and finally snap each predicted point onto the nearest vehicle detection. The final system reaches Public LB 0.55469 / Private LB 0.57080, roughly +0.21 over the strongest host baseline (Molmo-7B, 0.358) and wins the challenge. We ablate each component, report the negative results that shaped the final design, and release the code at https://github.com/fuumin621/cvpr2026-accident-1st-place-solution.

CVDec 4, 2020
Prediction of Lane Number Using Results From Lane Detection

Panumate Chetprayoon, Fumihiko Takahashi, Yusuke Uchida

The lane number that the vehicle is traveling in is a key factor in intelligent vehicle fields. Many lane detection algorithms were proposed and if we can perfectly detect the lanes, we can directly calculate the lane number from the lane detection results. However, in fact, lane detection algorithms sometimes underperform. Therefore, we propose a new approach for predicting the lane number, where we combine the drive recorder image with the lane detection results to predict the lane number. Experiments on our own dataset confirmed that our approach delivered outstanding results without significantly increasing computational cost.

CVMar 30, 2020
Streaming Networks: Increase Noise Robustness and Filter Diversity via Hard-wired and Input-induced Sparsity

Sergey Tarasenko, Fumihiko Takahashi

The CNNs have achieved a state-of-the-art performance in many applications. Recent studies illustrate that CNN's recognition accuracy drops drastically if images are noise corrupted. We focus on the problem of robust recognition accuracy of noise-corrupted images. We introduce a novel network architecture called Streaming Networks. Each stream is taking a certain intensity slice of the original image as an input, and stream parameters are trained independently. We use network capacity, hard-wired and input-induced sparsity as the dimensions for experiments. The results indicate that only the presence of both hard-wired and input-induces sparsity enables robust noisy image recognition. Streaming Nets is the only architecture which has both types of sparsity and exhibits higher robustness to noise. Finally, to illustrate increase in filter diversity we illustrate that a distribution of filter weights of the first conv layer gradually approaches uniform distribution as the degree of hard-wired and domain-induced sparsity and capacities increases.

CVMar 27, 2020
Applications of the Streaming Networks

Sergey Tarasenko, Fumihiko Takahashi

Most recently Streaming Networks (STnets) have been introduced as a mechanism of robust noise-corrupted images classification. STnets is a family of convolutional neural networks, which consists of multiple neural networks (streams), which have different inputs and their outputs are concatenated and fed into a single joint classifier. The original paper has illustrated how STnets can successfully classify images from Cifar10, EuroSat and UCmerced datasets, when images were corrupted with various levels of random zero noise. In this paper, we demonstrate that STnets are capable of high accuracy classification of images corrupted with Gaussian noise, fog, snow, etc. (Cifar10 corrupted dataset) and low light images (subset of Carvana dataset). We also introduce a new type of STnets called Hybrid STnets. Thus, we illustrate that STnets is a universal tool of image classification when original training dataset is corrupted with noise or other transformations, which lead to information loss from original images.

CVOct 23, 2019
Streaming Networks: Enable A Robust Classification of Noise-Corrupted Images

Sergey Tarasenko, Fumihiko Takahashi

The convolution neural nets (conv nets) have achieved a state-of-the-art performance in many applications of image and video processing. The most recent studies illustrate that the conv nets are fragile in terms of recognition accuracy to various image distortions such as noise, scaling, rotation, etc. In this study we focus on the problem of robust recognition accuracy of random noise distorted images. A common solution to this problem is either to add a lot of noisy images into a training dataset, which can be very costly, or use sophisticated loss function and denoising techniques. We introduce a novel conv net architecture with multiple streams. Each stream is taking a certain intensity slice of the original image as an input, and stream parameters are trained independently. We call this novel network a "Streaming Net". Our results indicate that Streaming Net outperforms 1-stream conv net (employed as a single stream) and 1-stream wide conv net (employs the same number of filters as Streaming Net) in recognition accuracy of noise-corrupted images, while producing the same or higher recognition accuracy of no noise images in almost all of the tests. Thus, we introduce a new simple method to increase robustness of recognition of noisy images without using data generation or sophisticated training techniques.