CVFeb 19, 2024

Designing High-Performing Networks for Multi-Scale Computer Vision

arXiv:2402.12536v1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for improved network designs in computer vision, but it appears incremental as it focuses on enhancing existing architectures rather than introducing a new paradigm.

The thesis tackled the problem of designing better network architectures for multi-scale computer vision tasks, aiming to outperform existing baselines with fair comparisons and publicly available code.

Since the emergence of deep learning, the computer vision field has flourished with models improving at a rapid pace on more and more complex tasks. We distinguish three main ways to improve a computer vision model: (1) improving the data aspect by for example training on a large, more diverse dataset, (2) improving the training aspect by for example designing a better optimizer, and (3) improving the network architecture (or network for short). In this thesis, we chose to improve the latter, i.e. improving the network designs of computer vision models. More specifically, we investigate new network designs for multi-scale computer vision tasks, which are tasks requiring to make predictions about concepts at different scales. The goal of these new network designs is to outperform existing baseline designs from the literature. Specific care is taken to make sure the comparisons are fair, by guaranteeing that the different network designs were trained and evaluated with the same settings. Code is publicly available at https://github.com/CedricPicron/DetSeg.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes