Hannes Fassold

CV
h-index9
15papers
39citations
Novelty28%
AI Score23

15 Papers

CVJul 7, 2022
FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet Level

Gabriele Lagani, Claudio Gennaro, Hannes Fassold et al.

Learning algorithms for Deep Neural Networks are typically based on supervised end-to-end Stochastic Gradient Descent (SGD) training with error backpropagation (backprop). Backprop algorithms require a large number of labelled training samples to achieve high performance. However, in many realistic applications, even if there is plenty of image samples, very few of them are labelled, and semi-supervised sample-efficient training strategies have to be used. Hebbian learning represents a possible approach towards sample efficient training; however, in current solutions, it does not scale well to large datasets. In this paper, we present FastHebb, an efficient and scalable solution for Hebbian learning which achieves higher efficiency by 1) merging together update computation and aggregation over a batch of inputs, and 2) leveraging efficient matrix multiplication algorithms on GPU. We validate our approach on different computer vision benchmarks, in a semi-supervised learning scenario. FastHebb outperforms previous solutions by up to 50 times in terms of training speed, and notably, for the first time, we are able to bring Hebbian algorithms to ImageNet scale.

CVApr 4, 2023
A real-time algorithm for human action recognition in RGB and thermal video

Hannes Fassold, Karlheinz Gutjahr, Anna Weber et al.

Monitoring the movement and actions of humans in video in real-time is an important task. We present a deep learning based algorithm for human action recognition for both RGB and thermal cameras. It is able to detect and track humans and recognize four basic actions (standing, walking, running, lying) in real-time on a notebook with a NVIDIA GPU. For this, it combines state of the art components for object detection (Scaled YoloV4), optical flow (RAFT) and pose estimation (EvoSkeleton). Qualitative experiments on a set of tunnel videos show that the proposed algorithm works robustly for both RGB and thermal video.

LGMar 3, 2022
AdaFamily: A family of Adam-like adaptive gradient methods

Hannes Fassold

We propose AdaFamily, a novel method for training deep neural networks. It is a family of adaptive gradient methods and can be interpreted as sort of a blend of the optimization algorithms Adam, AdaBelief and AdaMomentum. We perform experiments on standard datasets for image classification, demonstrating that our proposed method outperforms these algorithms.

CVApr 19, 2022
A qualitative investigation of optical flow algorithms for video denoising

Hannes Fassold

A good optical flow estimation is crucial in many video analysis and restoration algorithms employed in application fields like media industry, industrial inspection and automotive. In this work, we investigate how well optical flow algorithms perform qualitatively when integrated into a state of the art video denoising algorithm. Both classic optical flow algorithms (e.g. TV-L1) as well as recent deep learning based algorithm (like RAFT or BMBC) will be taken into account. For the qualitative investigation, we will employ realistic content with challenging characteristic (noisy content, large motion etc.) instead of the standard images used in most publications.

LGAug 28, 2023
Do the Frankenstein, or how to achieve better out-of-distribution performance with manifold mixing model soup

Hannes Fassold

The standard recipe applied in transfer learning is to finetune a pretrained model on the task-specific dataset with different hyperparameter settings and pick the model with the highest accuracy on the validation dataset. Unfortunately, this leads to models which do not perform well under distribution shifts, e.g. when the model is given graphical sketches of the object as input instead of photos. In order to address this, we propose the manifold mixing model soup, an algorithm which mixes together the latent space manifolds of multiple finetuned models in an optimal way in order to generate a fused model. We show that the fused model gives significantly better out-of-distribution performance (+3.5 % compared to best individual model) when finetuning a CLIP model for image classification. In addition, it provides also better accuracy on the original dataset where the finetuning has been done.

CVApr 24, 2024
Porting Large Language Models to Mobile Devices for Question Answering

Hannes Fassold

Deploying Large Language Models (LLMs) on mobile devices makes all the capabilities of natural language processing available on the device. An important use case of LLMs is question answering, which can provide accurate and contextually relevant answers to a wide array of user queries. We describe how we managed to port state of the art LLMs to mobile devices, enabling them to operate natively on the device. We employ the llama.cpp framework, a flexible and self-contained C++ framework for LLM inference. We selected a 6-bit quantized version of the Orca-Mini-3B model with 3 billion parameters and present the correct prompt format for this model. Experimental results show that LLM inference runs in interactive speed on a Galaxy S21 smartphone and that the model delivers high-quality answers to user queries related to questions from different subjects like politics, geography or history.

CVFeb 13, 2025
Faster than real-time detection of shot boundaries, sampling structure and dynamic keyframes in video

Hannes Fassold

The detection of shot boundaries (hardcuts and short dissolves), sampling structure (progressive / interlaced / pulldown) and dynamic keyframes in a video are fundamental video analysis tasks which have to be done before any further high-level analysis tasks. We present a novel algorithm which does all these analysis tasks in an unified way, by utilizing a combination of inter-frame and intra-frame measures derived from the motion field and normalized cross correlation. The algorithm runs four times faster than real-time due to sparse and selective calculation of these measures. An initial evaluation furthermore shows that the proposed algorithm is extremely robust even for challenging content showing large camera or object motion, flashlights, flicker or low contrast / noise.

CVOct 25, 2021
Some like it tough: Improving model generalization via progressively increasing the training difficulty

Hannes Fassold

In this work, we propose to progressively increase the training difficulty during learning a neural network model via a novel strategy which we call mini-batch trimming. This strategy makes sure that the optimizer puts its focus in the later training stages on the more difficult samples, which we identify as the ones with the highest loss in the current mini-batch. The strategy is very easy to integrate into an existing training pipeline and does not necessitate a change of the network model. Experiments on several image classification problems show that mini-batch trimming is able to increase the generalization ability (measured via final test error) of the trained model.

CVOct 25, 2021
Detecting speaking persons in video

Hannes Fassold

We present a novel method for detecting speaking persons in video, by extracting facial landmarks with a neural network and analysing these landmarks statistically over time

CVAug 1, 2021
Hyper360 -- a Next Generation Toolset for Immersive Media

Hannes Fassold, Antonis Karakottas, Dorothea Tsatsou et al.

Spherical 360° video is a novel media format, rapidly becoming adopted in media production and consumption of immersive media. Due to its novelty, there is a lack of tools for producing highly engaging interactive 360° video for consumption on a multitude of platforms. In this work, we describe the work done so far in the Hyper360 project on tools for mixed 360° video and 3D content. Furthermore, the first pilots which have been produced with the Hyper360 tools and results of the audience assessment of the produced pilots are presented.

CVSep 2, 2020
Automatic cinematography for 360 video

Hannes Fassold

We describe our method for automatic generation of a visually interesting camera path (automatic cinematography)from a 360 video. Based on the information from the scene objects, multiple shot hypotheses for different shot types are constructed and the best one is rendered.

MMApr 19, 2020
The Hyper360 toolset for enriched 360$^\circ$ video

Hannes Fassold, Antonis Karakottas, Dorothea Tsatsou et al.

360$^\circ$ video is a novel media format, rapidly becoming adopted in media production and consumption as part of todays ongoing virtual reality revolution. Due to its novelty, there is a lack of tools for producing highly engaging 360$^\circ$ video for consumption on a multitude of platforms. In this work, we describe the work done so far in the Hyper360 project on tools for 360$^\circ$ video. Furthermore, the first pilots which have been produced with the Hyper360 tools are presented.

CVOct 14, 2019
OmniTrack: Real-time detection and tracking of objects, text and logos in video

Hannes Fassold, Ridouane Ghermi

The automatic detection and tracking of general objects (like persons, animals or cars), text and logos in a video is crucial for many video understanding tasks, and usually real-time processing as required. We propose OmniTrack, an efficient and robust algorithm which is able to automatically detect and track objects, text as well as brand logos in real-time. It combines a powerful deep learning based object detector (YoloV3) with high-quality optical flow methods. Based on the reference YoloV3 C++ implementation, we did some important performance optimizations which will be described. The major steps in the training procedure for the combined detector for text and logo will be presented. We will describe then the OmniTrack algorithm, consisting of the phases preprocessing, feature calculation, prediction, matching and update. Several performance optimizations have been implemented there as well, like doing the object detection and optical flow calculation asynchronously. Experiments show that the proposed algorithm runs in real-time for standard definition ($720x576$) video on a PC with a Quadro RTX 5000 GPU.

CVJul 22, 2019
Adapting Computer Vision Algorithms for Omnidirectional Video

Hannes Fassold

Omnidirectional (360°) video has got quite popular because it provides a highly immersive viewing experience. For computer vision algorithms, it poses several challenges, like the special (equirectangular) projection commonly employed and the huge image size. In this work, we give a high-level overview of these challenges and outline strategies how to adapt computer vision algorithm for the specifics of omnidirectional video.