Paul Lukowicz

LG
h-index62
79papers
2,018citations
Novelty42%
AI Score55

79 Papers

AIJun 23, 2023
Human-AI Coevolution

Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina et al.

Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political.

CVSep 14, 2022
TASKED: Transformer-based Adversarial learning for human activity recognition using wearable sensors via Self-KnowledgE Distillation

Sungho Suh, Vitor Fortes Rey, Paul Lukowicz

Wearable sensor-based human activity recognition (HAR) has emerged as a principal research area and is utilized in a variety of applications. Recently, deep learning-based methods have achieved significant improvement in the HAR field with the development of human-computer interaction applications. However, they are limited to operating in a local neighborhood in the process of a standard convolution neural network, and correlations between different sensors on body positions are ignored. In addition, they still face significant challenging problems with performance degradation due to large gaps in the distribution of training and test data, and behavioral differences between subjects. In this work, we propose a novel Transformer-based Adversarial learning framework for human activity recognition using wearable sensors via Self-KnowledgE Distillation (TASKED), that accounts for individual sensor orientations and spatial and temporal features. The proposed method is capable of learning cross-domain embedding feature representations from multiple subjects datasets using adversarial learning and the maximum mean discrepancy (MMD) regularization to align the data distribution over multiple domains. In the proposed method, we adopt the teacher-free self-knowledge distillation to improve the stability of the training procedure and the performance of human activity recognition. Experimental results show that TASKED not only outperforms state-of-the-art methods on the four real-world public HAR datasets (alone or combined) but also improves the subject generalization effectively.

LGFeb 8, 2023
InMyFace: Inertial and Mechanomyography-Based Sensor Fusion for Wearable Facial Activity Recognition

Hymalai Bello, Luis Alfredo Sanchez Marin, Sungho Suh et al.

Recognizing facial activity is a well-understood (but non-trivial) computer vision problem. However, reliable solutions require a camera with a good view of the face, which is often unavailable in wearable settings. Furthermore, in wearable applications, where systems accompany users throughout their daily activities, a permanently running camera can be problematic for privacy (and legal) reasons. This work presents an alternative solution based on the fusion of wearable inertial sensors, planar pressure sensors, and acoustic mechanomyography (muscle sounds). The sensors were placed unobtrusively in a sports cap to monitor facial muscle activities related to facial expressions. We present our integrated wearable sensor system, describe data fusion and analysis methods, and evaluate the system in an experiment with thirteen subjects from different cultural backgrounds (eight countries) and both sexes (six women and seven men). In a one-model-per-user scheme and using a late fusion approach, the system yielded an average F1 score of 85.00% for the case where all sensing modalities are combined. With a cross-user validation and a one-model-for-all-user scheme, an F1 score of 79.00% was obtained for thirteen participants (six females and seven males). Moreover, in a hybrid fusion (cross-user) approach and six classes, an average F1 score of 82.00% was obtained for eight users. The results are competitive with state-of-the-art non-camera-based solutions for a cross-user study. In addition, our unique set of participants demonstrates the inclusiveness and generalizability of the approach.

LGAug 7, 2023
Worker Activity Recognition in Manufacturing Line Using Near-body Electric Field

Sungho Suh, Vitor Fortes Rey, Sizhen Bian et al.

Manufacturing industries strive to improve production efficiency and product quality by deploying advanced sensing and control systems. Wearable sensors are emerging as a promising solution for achieving this goal, as they can provide continuous and unobtrusive monitoring of workers' activities in the manufacturing line. This paper presents a novel wearable sensing prototype that combines IMU and body capacitance sensing modules to recognize worker activities in the manufacturing line. To handle these multimodal sensor data, we propose and compare early, and late sensor data fusion approaches for multi-channel time-series convolutional neural networks and deep convolutional LSTM. We evaluate the proposed hardware and neural network model by collecting and annotating sensor data using the proposed sensing prototype and Apple Watches in the testbed of the manufacturing line. Experimental results demonstrate that our proposed methods achieve superior performance compared to the baseline methods, indicating the potential of the proposed approach for real-world applications in manufacturing industries. Furthermore, the proposed sensing prototype with a body capacitive sensor and feature fusion method improves by 6.35%, yielding a 9.38% higher macro F1 score than the proposed sensing prototype without a body capacitive sensor and Apple Watch data, respectively.

LGOct 4, 2022
Learning from the Best: Contrastive Representations Learning Across Sensor Locations for Wearable Activity Recognition

Vitor Fortes Rey, Sungho Suh, Paul Lukowicz

We address the well-known wearable activity recognition problem of having to work with sensors that are non-optimal in terms of information they provide but have to be used due to wearability/usability concerns (e.g. the need to work with wrist-worn IMUs because they are embedded in most smart watches). To mitigate this problem we propose a method that facilitates the use of information from sensors that are only present during the training process and are unavailable during the later use of the system. The method transfers information from the source sensors to the latent representation of the target sensor data through contrastive loss that is combined with the classification loss during joint training. We evaluate the method on the well-known PAMAP2 and Opportunity benchmarks for different combinations of source and target sensors showing average (over all activities) F1 score improvements of between 5% and 13% with the improvement on individual activities, particularly well suited to benefit from the additional information going up to between 20% and 40%.

LGOct 29, 2023
Remaining useful life prediction of Lithium-ion batteries using spatio-temporal multimodal attention networks

Sungho Suh, Dhruv Aditya Mittal, Hymalai Bello et al.

Lithium-ion batteries are widely used in various applications, including electric vehicles and renewable energy storage. The prediction of the remaining useful life (RUL) of batteries is crucial for ensuring reliable and efficient operation, as well as reducing maintenance costs. However, determining the life cycle of batteries in real-world scenarios is challenging, and existing methods have limitations in predicting the number of cycles iteratively. In addition, existing works often oversimplify the datasets, neglecting important features of the batteries such as temperature, internal resistance, and material type. To address these limitations, this paper proposes a two-stage RUL prediction scheme for Lithium-ion batteries using a spatio-temporal multimodal attention network (ST-MAN). The proposed ST-MAN is to capture the complex spatio-temporal dependencies in the battery data, including the features that are often neglected in existing works. Despite operating without prior knowledge of end-of-life (EOL) events, our method consistently achieves lower error rates, boasting mean absolute error (MAE) and mean square error (MSE) of 0.0275 and 0.0014, respectively, compared to existing convolutional neural networks (CNN) and long short-term memory (LSTM)-based methods. The proposed method has the potential to improve the reliability and efficiency of battery operations and is applicable in various industries.

CVFeb 1, 2023
PresSim: An End-to-end Framework for Dynamic Ground Pressure Profile Generation from Monocular Videos Using Physics-based 3D Simulation

Lala Shakti Swarup Ray, Bo Zhou, Sungho Suh et al.

Ground pressure exerted by the human body is a valuable source of information for human activity recognition (HAR) in unobtrusive pervasive sensing. While data collection from pressure sensors to develop HAR solutions requires significant resources and effort, we present a novel end-to-end framework, PresSim, to synthesize sensor data from videos of human activities to reduce such effort significantly. PresSim adopts a 3-stage process: first, extract the 3D activity information from videos with computer vision architectures; then simulate the floor mesh deformation profiles based on the 3D activity information and gravity-included physics simulation; lastly, generate the simulated pressure sensor data with deep learning models. We explored two approaches for the 3D activity information: inverse kinematics with mesh re-targeting, and volumetric pose and shape estimation. We validated PresSim with an experimental setup with a monocular camera to provide input and a pressure-sensing fitness mat (80x28 spatial resolution) to provide the sensor ground truth, where nine participants performed a set of predefined yoga sequences.

LGJun 7, 2023
CaptAinGlove: Capacitive and Inertial Fusion-Based Glove for Real-Time on Edge Hand Gesture Recognition for Drone Control

Hymalai Bello, Sungho Suh, Daniel Geißler et al.

We present CaptAinGlove, a textile-based, low-power (1.15Watts), privacy-conscious, real-time on-the-edge (RTE) glove-based solution with a tiny memory footprint (2MB), designed to recognize hand gestures used for drone control. We employ lightweight convolutional neural networks as the backbone models and a hierarchical multimodal fusion to reduce power consumption and improve accuracy. The system yields an F1-score of 80% for the offline evaluation of nine classes; eight hand gesture commands and null activity. For the RTE, we obtained an F1-score of 67% (one user).

LGOct 3, 2022
Smart-Badge: A wearable badge with multi-modal sensors for kitchen activity recognition

Mengxi Liu, Sungho Suh, Bo Zhou et al.

Human health is closely associated with their daily behavior and environment. However, keeping a healthy lifestyle is still challenging for most people as it is difficult to recognize their living behaviors and identify their surrounding situations to take appropriate action. Human activity recognition is a promising approach to building a behavior model of users, by which users can get feedback about their habits and be encouraged to develop a healthier lifestyle. In this paper, we present a smart light wearable badge with six kinds of sensors, including an infrared array sensor MLX90640 offering privacy-preserving, low-cost, and non-invasive features, to recognize daily activities in a realistic unmodified kitchen environment. A multi-channel convolutional neural network (MC-CNN) based on data and feature fusion methods is applied to classify 14 human activities associated with potentially unhealthy habits. Meanwhile, we evaluate the impact of the infrared array sensor on the recognition accuracy of these activities. We demonstrate the performance of the proposed work to detect the 14 activities performed by ten volunteers with an average accuracy of 92.44 % and an F1 score of 88.27 %.

LGJul 3, 2023
Don't freeze: Finetune encoders for better Self-Supervised HAR

Vitor Fortes Rey, Dominique Nshimyimana, Paul Lukowicz

Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive predictive coding, useful representations can be learned that then can be used for classification. Those approaches follow the pretrain, freeze and fine-tune procedure. In this paper we will show how a simple change - not freezing the representation - leads to substantial performance gains across pretext tasks. The improvement was found in all four investigated datasets and across all four pretext tasks and is inversely proportional to amount of labelled data. Moreover the effect is present whether the pretext task is carried on the Capture24 dataset or directly in unlabelled data of the target dataset.

CVJul 21, 2023
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methods

Lala Shakti Swarup Ray, Bo Zhou, Sungho Suh et al.

To help smart wearable researchers choose the optimal ground truth methods for motion capturing (MoCap) for all types of loose garments, we present a benchmark, DrapeMoCapBench (DMCB), specifically designed to evaluate the performance of optical marker-based and marker-less MoCap. High-cost marker-based MoCap systems are well-known as precise golden standards. However, a less well-known caveat is that they require skin-tight fitting markers on bony areas to ensure the specified precision, making them questionable for loose garments. On the other hand, marker-less MoCap methods powered by computer vision models have matured over the years, which have meager costs as smartphone cameras would suffice. To this end, DMCB uses large real-world recorded MoCap datasets to perform parallel 3D physics simulations with a wide range of diversities: six levels of drape from skin-tight to extremely draped garments, three levels of motions and six body type - gender combinations to benchmark state-of-the-art optical marker-based and marker-less MoCap methods to identify the best-performing method in different scenarios. In assessing the performance of marker-based and low-cost marker-less MoCap for casual loose garments both approaches exhibit significant performance loss (>10cm), but for everyday activities involving basic and fast motions, marker-less MoCap slightly outperforms marker-based MoCap, making it a favorable and cost-effective choice for wearable studies.

CVJun 24, 2023
ClothFit: Cloth-Human-Attribute Guided Virtual Try-On Network Using 3D Simulated Dataset

Yunmin Cho, Lala Shakti Swarup Ray, Kundan Sai Prabhu Thota et al.

Online clothing shopping has become increasingly popular, but the high rate of returns due to size and fit issues has remained a major challenge. To address this problem, virtual try-on systems have been developed to provide customers with a more realistic and personalized way to try on clothing. In this paper, we propose a novel virtual try-on method called ClothFit, which can predict the draping shape of a garment on a target body based on the actual size of the garment and human attributes. Unlike existing try-on models, ClothFit considers the actual body proportions of the person and available cloth sizes for clothing virtualization, making it more appropriate for current online apparel outlets. The proposed method utilizes a U-Net-based network architecture that incorporates cloth and human attributes to guide the realistic virtual try-on synthesis. Specifically, we extract features from a cloth image using an auto-encoder and combine them with features from the user's height, weight, and cloth size. The features are concatenated with the features from the U-Net encoder, and the U-Net decoder synthesizes the final virtual try-on image. Our experimental results demonstrate that ClothFit can significantly improve the existing state-of-the-art methods in terms of photo-realistic virtual try-on results.

CVMay 28, 2022
Estimation of 3D Body Shape and Clothing Measurements from Frontal- and Side-view Images

Kundan Sai Prabhu Thota, Sungho Suh, Bo Zhou et al.

The estimation of 3D human body shape and clothing measurements is crucial for virtual try-on and size recommendation problems in the fashion industry but has always been a challenging problem due to several conditions, such as lack of publicly available realistic datasets, ambiguity in multiple camera resolutions, and the undefinable human shape space. Existing works proposed various solutions to these problems but could not succeed in the industry adaptation because of complexity and restrictions. To solve the complexity and challenges, in this paper, we propose a simple yet effective architecture to estimate both shape and measures from frontal- and side-view images. We utilize silhouette segmentation from the two multi-view images and implement an auto-encoder network to learn low-dimensional features from segmented silhouettes. Then, we adopt a kernel-based regularized regression module to estimate the body shape and measurements. The experimental results show that the proposed method provides competitive results on the synthetic dataset, NOMO-3d-400-scans Dataset, and RGB Images of humans captured in different cameras.

CLAug 2, 2024
Misinforming LLMs: vulnerabilities, challenges and opportunities

Bo Zhou, Daniel Geißler, Paul Lukowicz

Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. Despite exhibiting coherent answers and apparent reasoning behaviors, LLMs rely on statistical patterns in word embeddings rather than true cognitive processes. This leads to vulnerabilities such as "hallucination" and misinformation. The paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. However, ongoing research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs capable of generating statements based on given truth and explaining their self-reasoning process.

HCJul 3, 2023
Origami Single-end Capacitive Sensing for Continuous Shape Estimation of Morphing Structures

Lala Shakti Swarup Ray, Daniel Geißler, Bo Zhou et al.

In this work, we propose a novel single-end morphing capacitive sensing method for shape tracking, FxC, by combining Folding origami structures and Capacitive sensing to detect the morphing structural motions using state-of-the-art sensing circuits and deep learning. It was observed through embedding areas of origami structures with conductive materials as single-end capacitive sensing patches, that the sensor signals change coherently with the motion of the structure. Different from other origami capacitors where the origami structures are used in adjusting the thickness of the dielectric layer of double-plate capacitors, FxC uses only a single conductive plate per channel, and the origami structure directly changes the geometry of the conductive plate. We examined the operation principle of morphing single-end capacitors through 3D geometry simulation combined with physics theoretical deduction, which deduced similar behaviour as observed in experimentation. Then a software pipeline was developed to use the sensor signals to reconstruct the dynamic structural geometry through data-driven deep neural network regression of geometric primitives extracted from vision tracking. We created multiple folding patterns to validate our approach, based on folding patterns including Accordion, Chevron, Sunray and V-Fold patterns with different layouts of capacitive sensors using paper-based and textile-based materials. Experimentation results show that the geometry primitives predicted from the capacitive signals have a strong correlation with the visual ground truth with R-squared value of up to 95% and tracking error of 6.5 mm for patches. The simulation and machine learning constitute two-way information exchange between the sensing signals and structural geometry.

CVSep 14, 2023
A Novel Local-Global Feature Fusion Framework for Body-weight Exercise Recognition with Pressure Mapping Sensors

Davinder Pal Singh, Lala Shakti Swarup Ray, Bo Zhou et al.

We present a novel local-global feature fusion framework for body-weight exercise recognition with floor-based dynamic pressure maps. One step further from the existing studies using deep neural networks mainly focusing on global feature extraction, the proposed framework aims to combine local and global features using image processing techniques and the YOLO object detection to localize pressure profiles from different body parts and consider physical constraints. The proposed local feature extraction method generates two sets of high-level local features consisting of cropped pressure mapping and numerical features such as angular orientation, location on the mat, and pressure area. In addition, we adopt a knowledge distillation for regularization to preserve the knowledge of the global feature extraction and improve the performance of the exercise recognition. Our experimental results demonstrate a notable 11 percent improvement in F1 score for exercise recognition while preserving label-specific features.

LGAug 7, 2023
Two-stage Early Prediction Framework of Remaining Useful Life for Lithium-ion Batteries

Dhruv Mittal, Hymalai Bello, Bo Zhou et al.

Early prediction of remaining useful life (RUL) is crucial for effective battery management across various industries, ranging from household appliances to large-scale applications. Accurate RUL prediction improves the reliability and maintainability of battery technology. However, existing methods have limitations, including assumptions of data from the same sensors or distribution, foreknowledge of the end of life (EOL), and neglect to determine the first prediction cycle (FPC) to identify the start of the unhealthy stage. This paper proposes a novel method for RUL prediction of Lithium-ion batteries. The proposed framework comprises two stages: determining the FPC using a neural network-based model to divide the degradation data into distinct health states and predicting the degradation pattern after the FPC to estimate the remaining useful life as a percentage. Experimental results demonstrate that the proposed method outperforms conventional approaches in terms of RUL prediction. Furthermore, the proposed method shows promise for real-world scenarios, providing improved accuracy and applicability for battery management.

CVJul 3, 2023
A Synthetic Benchmarking Pipeline to Compare Camera Calibration Algorithms

Lala Shakti Swarup Ray, Bo Zhou, Lars Krupp et al.

Accurate camera calibration is crucial for various computer vision applications. However, measuring calibration accuracy in the real world is challenging due to the lack of datasets with ground truth to evaluate them. In this paper, we present SynthCal, a synthetic camera calibration benchmarking pipeline that generates images of calibration patterns to measure and enable accurate quantification of calibration algorithm performance in camera parameter estimation. We present a SynthCal generated calibration dataset with four common patterns, two camera types, and two environments with varying view, distortion, lighting, and noise levels for both monocular and multi-camera systems. The dataset evaluates both single and multi-view calibration algorithms by measuring re-projection and root-mean-square errors for identical patterns and camera settings. Additionally, we analyze the significance of different patterns using different calibration configurations. The experimental results demonstrate the effectiveness of SynthCal in evaluating various calibration algorithms and patterns.

CVJun 19, 2023
MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities

Hymalai Bello, Sungho Suh, Bo Zhou et al.

The increasing prevalence of stress-related eating behaviors and their impact on overall health highlights the importance of effective and ubiquitous monitoring systems. In this paper, we present MeciFace, an innovative wearable technology designed to monitor facial expressions and eating activities in real-time on-the-edge (RTE). MeciFace aims to provide a low-power, privacy-conscious, and highly accurate tool for promoting healthy eating behaviors and stress management. We employ lightweight convolutional neural networks as backbone models for facial expression and eating monitoring scenarios. The MeciFace system ensures efficient data processing with a tiny memory footprint, ranging from 11KB to 19 KB. During RTE evaluation, the system achieves an F1-score of < 86% for facial expression recognition and 94% for eating/drinking monitoring, for the RTE of unseen users (user-independent case).

SPOct 8, 2022
Smart Cup: An impedance sensing based fluid intake monitoring system for beverages classification and freshness detection

Mengxi Liu, Sizhen Bian, Bo Zhou et al.

This paper presents a novel beverage intake monitoring system that can accurately recognize beverage kinds and freshness. By mounting carbon electrodes on the commercial cup, the system measures the electrochemical impedance spectrum of the fluid in the cup. We studied the frequency sensitivity of the electrochemical impedance spectrum regarding distinct beverages and the importance of features like amplitude, phase, and real and imaginary components for beverage classification. The results show that features from a low-frequency domain (100 Hz to 1000 Hz) provide more meaningful information for beverage classification than the higher frequency domain. Twenty beverages, including carbonated drinks and juices, were classified with nearly perfect accuracy using a supervised machine learning approach. The same performance was also observed in the freshness recognition, where four different kinds of milk and fruit juice were studied.

CVAug 1, 2023
PressureTransferNet: Human Attribute Guided Dynamic Ground Pressure Profile Transfer using 3D simulated Pressure Maps

Lala Shakti Swarup Ray, Vitor Fortes Rey, Bo Zhou et al.

We propose PressureTransferNet, a novel method for Human Activity Recognition (HAR) using ground pressure information. Our approach generates body-specific dynamic ground pressure profiles for specific activities by leveraging existing pressure data from different individuals. PressureTransferNet is an encoder-decoder model taking a source pressure map and a target human attribute vector as inputs, producing a new pressure map reflecting the target attribute. To train the model, we use a sensor simulation to create a diverse dataset with various human attributes and pressure profiles. Evaluation on a real-world dataset shows its effectiveness in accurately transferring human attributes to ground pressure profiles across different scenarios. We visually confirm the fidelity of the synthesized pressure shapes using a physics-based deep learning model and achieve a binary R-square value of 0.79 on areas with ground contact. Validation through classification with F1 score (0.911$\pm$0.015) on physical pressure mat data demonstrates the correctness of the synthesized pressure maps, making our method valuable for data augmentation, denoising, sensor simulation, and anomaly detection. Applications span sports science, rehabilitation, and bio-mechanics, contributing to the development of HAR systems.

ED-PHAug 21, 2023
Unreflected Acceptance -- Investigating the Negative Consequences of ChatGPT-Assisted Problem Solving in Physics Education

Lars Krupp, Steffen Steinert, Maximilian Kiefer-Emmanouilidis et al.

Large language models (LLMs) have recently gained popularity. However, the impact of their general availability through ChatGPT on sensitive areas of everyday life, such as education, remains unclear. Nevertheless, the societal impact on established educational methods is already being experienced by both students and educators. Our work focuses on higher physics education and examines problem solving strategies. In a study, students with a background in physics were assigned to solve physics exercises, with one group having access to an internet search engine (N=12) and the other group being allowed to use ChatGPT (N=27). We evaluated their performance, strategies, and interaction with the provided tools. Our results showed that nearly half of the solutions provided with the support of ChatGPT were mistakenly assumed to be correct by the students, indicating that they overly trusted ChatGPT even in their field of expertise. Likewise, in 42% of cases, students used copy & paste to query ChatGPT -- an approach only used in 4% of search engine queries -- highlighting the stark differences in interaction behavior between the groups and indicating limited reflection when using ChatGPT. In our work, we demonstrated a need to (1) guide students on how to interact with LLMs and (2) create awareness of potential shortcomings for users.

AIJul 15, 2024
Leveraging Hybrid Intelligence Towards Sustainable and Energy-Efficient Machine Learning

Daniel Geissler, Paul Lukowicz

Hybrid intelligence aims to enhance decision-making, problem-solving, and overall system performance by combining the strengths of both, human cognitive abilities and artificial intelligence. With the rise of Large Language Models (LLM), progressively participating as smart agents to accelerate machine learning development, Hybrid Intelligence is becoming an increasingly important topic for effective interaction between humans and machines. This paper presents an approach to leverage Hybrid Intelligence towards sustainable and energy-aware machine learning. When developing machine learning models, final model performance commonly rules the optimization process while the efficiency of the process itself is often neglected. Moreover, in recent times, energy efficiency has become equally crucial due to the significant environmental impact of complex and large-scale computational processes. The contribution of this work covers the interactive inclusion of secondary knowledge sources through Human-in-the-loop (HITL) and LLM agents to stress out and further resolve inefficiencies in the machine learning development process.

CVFeb 24, 2023
A Knowledge Distillation framework for Multi-Organ Segmentation of Medaka Fish in Tomographic Image

Jwalin Bhatt, Yaroslav Zharov, Sungho Suh et al.

Morphological atlases are an important tool in organismal studies, and modern high-throughput Computed Tomography (CT) facilities can produce hundreds of full-body high-resolution volumetric images of organisms. However, creating an atlas from these volumes requires accurate organ segmentation. In the last decade, machine learning approaches have achieved incredible results in image segmentation tasks, but they require large amounts of annotated data for training. In this paper, we propose a self-training framework for multi-organ segmentation in tomographic images of Medaka fish. We utilize the pseudo-labeled data from a pretrained Teacher model and adopt a Quality Classifier to refine the pseudo-labeled data. Then, we introduce a pixel-wise knowledge distillation method to prevent overfitting to the pseudo-labeled data and improve the segmentation performance. The experimental results demonstrate that our method improves mean Intersection over Union (IoU) by 5.9% on the full dataset and enables keeping the quality while using three times less markup.

LGAug 26, 2024
TSAK: Two-Stage Semantic-Aware Knowledge Distillation for Efficient Wearable Modality and Model Optimization in Manufacturing Lines

Hymalai Bello, Daniel Geißler, Sungho Suh et al.

Smaller machine learning models, with less complex architectures and sensor inputs, can benefit wearable sensor-based human activity recognition (HAR) systems in many ways, from complexity and cost to battery life. In the specific case of smart factories, optimizing human-robot collaboration hinges on the implementation of cutting-edge, human-centric AI systems. To this end, workers' activity recognition enables accurate quantification of performance metrics, improving efficiency holistically. We present a two-stage semantic-aware knowledge distillation (KD) approach, TSAK, for efficient, privacy-aware, and wearable HAR in manufacturing lines, which reduces the input sensor modalities as well as the machine learning model size, while reaching similar recognition performance as a larger multi-modal and multi-positional teacher model. The first stage incorporates a teacher classifier model encoding attention, causal, and combined representations. The second stage encompasses a semantic classifier merging the three representations from the first stage. To evaluate TSAK, we recorded a multi-modal dataset at a smart factory testbed with wearable and privacy-aware sensors (IMU and capacitive) located on both workers' hands. In addition, we evaluated our approach on OpenPack, the only available open dataset mimicking the wearable sensor placements on both hands in the manufacturing HAR scenario. We compared several KD strategies with different representations to regulate the training process of a smaller student model. Compared to the larger teacher model, the student model takes fewer sensor channels from a single hand, has 79% fewer parameters, runs 8.88 times faster, and requires 96.6% less computing power (FLOPS).

AIAug 18, 2024
ALS-HAR: Harnessing Wearable Ambient Light Sensors to Enhance IMU-based Human Activity Recogntion

Lala Shakti Swarup Ray, Daniel Geißler, Mengxi Liu et al.

Despite the widespread integration of ambient light sensors (ALS) in smart devices commonly used for screen brightness adaptation, their application in human activity recognition (HAR), primarily through body-worn ALS, is largely unexplored. In this work, we developed ALS-HAR, a robust wearable light-based motion activity classifier. Although ALS-HAR achieves comparable accuracy to other modalities, its natural sensitivity to external disturbances, such as changes in ambient light, weather conditions, or indoor lighting, makes it challenging for daily use. To address such drawbacks, we introduce strategies to enhance environment-invariant IMU-based activity classifications through augmented multi-modal and contrastive classifications by transferring the knowledge extracted from the ALS. Our experiments on a real-world activity dataset for three different scenarios demonstrate that while ALS-HAR's accuracy strongly relies on external lighting conditions, cross-modal information can still improve other HAR systems, such as IMU-based classifiers.Even in scenarios where ALS performs insufficiently, the additional knowledge enables improved accuracy and macro F1 score by up to 4.2 % and 6.4 %, respectively, for IMU-based classifiers and even surpasses multi-modal sensor fusion models in two of our three experiment scenarios. Our research highlights the untapped potential of ALS integration in advancing sensor-based HAR technology, paving the way for practical and efficient wearable ALS-based activity recognition systems with potential applications in healthcare, sports monitoring, and smart indoor environments.

LGSep 13, 2024
Towards certifiable AI in aviation: landscape, challenges, and opportunities

Hymalai Bello, Daniel Geißler, Lala Ray et al.

Artificial Intelligence (AI) methods are powerful tools for various domains, including critical fields such as avionics, where certification is required to achieve and maintain an acceptable level of safety. General solutions for safety-critical systems must address three main questions: Is it suitable? What drives the system's decisions? Is it robust to errors/attacks? This is more complex in AI than in traditional methods. In this context, this paper presents a comprehensive mind map of formal AI certification in avionics. It highlights the challenges of certifying AI development with an example to emphasize the need for qualification beyond performance metrics.

AIMay 18
KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

Mengxi Liu, Sizhen Bian, Vitor Fortes et al.

Kolmogorov-Arnold Networks (KANs) have demonstrated an exceptional ability to learn complex functions on clean, low-dimensional data but struggle to maintain performance on noisy and imperfect real-world datasets. In contrast, conventional multi-layer perceptrons (MLPs) are far more tolerant to noise and computationally efficient. Replacing all MLP components with KANs in HAR models often degrades accuracy and computation efficiency, highlighting an open challenge: how to combine KANs' precision with MLPs' noise robustness and efficiency. To address this, we systematically explore various placements of KAN modules within deep HAR networks and propose a hybrid architecture that strategically synergizes the strengths of both paradigms, which uses a KAN-based input embedding layer, retains MLP layers for intermediate feature mixing, and introduces a specialized LarctanKAN module for final activity classification. Across eight public HAR datasets, the hybrid KAN-MLP model achieves an average macro F1 score relative improvement of 5.33\% compared pure-MLP model, significantly outperforming standalone KAN and MLP baselines. Furthermore, integrating this hybrid strategy into other state-of-the-art HAR architectures consistently boosts their performance. Our findings demonstrate that a carefully orchestrated combination of KAN, MLP, or other conventional neural components yields more robust and accurate HAR models for real-world wearable sensing environments.

PFMar 16
This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Lars Krupp, Daniel Geißler, Francisco M. Calatrava-Nicolas et al.

The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use. Yet, these energy costs remain largely opaque to users, especially when models are accessed through an API -- a black box in which all information depends on what providers choose to disclose. In this work, we investigate inference time measurements as a proxy to approximate the associated energy costs of API-based LLMs. We ground our approach by comparing our estimations with actual energy measurements from locally hosted equivalents. Our results show that time measurements allow us to infer GPU models for API-based LLMs, grounding our energy cost estimations. Our work aims to create means for understanding the associated energy costs of API-based LLMs, especially for end users.

AINov 6, 2025
Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis

Lars Krupp, Daniel Geißler, Vishal Banwari et al.

Web agents, like OpenAI's Operator and Google's Project Mariner, are powerful agentic systems pushing the boundaries of Large Language Models (LLM). They can autonomously interact with the internet at the user's behest, such as navigating websites, filling search masks, and comparing price lists. Though web agent research is thriving, induced sustainability issues remain largely unexplored. To highlight the urgency of this issue, we provide an initial exploration of the energy and $CO_2$ cost associated with web agents from both a theoretical -via estimation- and an empirical perspective -by benchmarking. Our results show how different philosophies in web agent creation can severely impact the associated expended energy, and that more energy consumed does not necessarily equate to better results. We highlight a lack of transparency regarding disclosing model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. Our work contributes towards a change in thinking of how we evaluate web agents, advocating for dedicated metrics measuring energy consumption in benchmarks.

LGMar 27
SPECTRA: An Efficient Spectral-Informed Neural Network for Sensor-Based Activity Recognition

Deepika Gurung, Lala Shakti Swarup Ray, Mengxi Liu et al.

Real time sensor based applications in pervasive computing require edge deployable models to ensure low latency privacy and efficient interaction. A prime example is sensor based human activity recognition where models must balance accuracy with stringent resource constraints. Yet many deep learning approaches treat temporal sensor signals as black box sequences overlooking spectral temporal structure while demanding excessive computation. We present SPECTRA a deployment first co designed spectral temporal architecture that integrates short time Fourier transform STFT feature extraction depthwise separable convolutions and channel wise self attention to capture spectral temporal dependencies under real edge runtime and memory constraints. A compact bidirectional GRU with attention pooling summarizes within window dynamics at low cost reducing downstream model burden while preserving accuracy. Across five public HAR datasets SPECTRA matches or approaches larger CNN LSTM and Transformer baselines while substantially reducing parameters latency and energy. Deployments on a Google Pixel 9 smartphone and an STM32L4 microcontroller further demonstrate end to end deployable realtime private and efficient HAR.

CVMar 2
OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments

Hymalai Bello, Lala Ray, Joanna Sorysz et al.

Smart factories use advanced technologies to optimize production and increase efficiency. To this end, the recognition of worker activity allows for accurate quantification of performance metrics, improving efficiency holistically while contributing to worker safety. OpenMarcie is, to the best of our knowledge, the biggest multimodal dataset designed for human action monitoring in manufacturing environments. It includes data from wearables sensing modalities and cameras distributed in the surroundings. The dataset is structured around two experimental settings, involving a total of 36 participants. In the first setting, twelve participants perform a bicycle assembly and disassembly task under semi-realistic conditions without a fixed protocol, promoting divergent and goal-oriented problem-solving. The second experiment involves twenty-five volunteers (24 valid data) engaged in a 3D printer assembly task, with the 3D printer manufacturer's instructions provided to guide the volunteers in acquiring procedural knowledge. This setting also includes sequential collaborative assembly, where participants assess and correct each other's progress, reflecting real-world manufacturing dynamics. OpenMarcie includes over 37 hours of egocentric and exocentric, multimodal, and multipositional data, featuring eight distinct data types and more than 200 independent information channels. The dataset is benchmarked across three human activity recognition tasks: activity classification, open vocabulary captioning, and cross-modal alignment.

LGMar 5Code
Embedded Inter-Subject Variability in Adversarial Learning for Inertial Sensor-Based Human Activity Recognition

Francisco M. Calatrava-Nicolás, Shoko Miyauchi, Vitor Fortes Rey et al.

This paper addresses the problem of Human Activity Recognition (HAR) using data from wearable inertial sensors. An important challenge in HAR is the model's generalization capabilities to new unseen individuals due to inter-subject variability, i.e., the same activity is performed differently by different individuals. To address this problem, we propose a novel deep adversarial framework that integrates the concept of inter-subject variability in the adversarial task, thereby encouraging subject-invariant feature representations and enhancing the classification performance in the HAR problem. Our approach outperforms previous methods in three well-established HAR datasets using a leave-one-subject-out (LOSO) cross-validation. Further results indicate that our proposed adversarial task effectively reduces inter-subject variability among different users in the feature space, and it outperforms adversarial tasks from previous works when integrated into our framework. Code: https://github.com/FranciscoCalatrava/EmbeddedSubjectVariability.git

LGAug 21, 2025Code
Bridging Generalization and Personalization in Human Activity Recognition via On-Device Few-Shot Learning

Pixi Kang, Julian Moosmann, Mengxi Liu et al.

Human Activity Recognition (HAR) with different sensing modalities requires both strong generalization across diverse users and efficient personalization for individuals. However, conventional HAR models often fail to generalize when faced with user-specific variations, leading to degraded performance. To address this challenge, we propose a novel on-device few-shot learning framework that bridges generalization and personalization in HAR. Our method first trains a generalizable representation across users and then rapidly adapts to new users with only a few labeled samples, updating lightweight classifier layers directly on resource-constrained devices. This approach achieves robust on-device learning with minimal computation and memory cost, making it practical for real-world deployment. We implement our framework on the energy-efficient RISC-V GAP9 microcontroller and evaluate it on three benchmark datasets (RecGym, QVAR-Gesture, Ultrasound-Gesture). Across these scenarios, post-deployment adaptation improves accuracy by 3.73\%, 17.38\%, and 3.70\%, respectively. These results demonstrate that few-shot on-device learning enables scalable, user-aware, and energy-efficient wearable human activity recognition by seamlessly uniting generalization and personalization. The related framework is open sourced for further research\footnote{https://github.com/kangpx/onlineTiny2023}.

CVJul 10, 2025Code
TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices

Sizhen Bian, Mengxi Liu, Vitor Fortes Rey et al.

Human Activity Recognition (HAR) on resource-constrained wearable devices demands inference models that harmonize accuracy with computational efficiency. This paper introduces TinierHAR, an ultra-lightweight deep learning architecture that synergizes residual depthwise separable convolutions, gated recurrent units (GRUs), and temporal aggregation to achieve SOTA efficiency without compromising performance. Evaluated across 14 public HAR datasets, TinierHAR reduces Parameters by 2.7x (vs. TinyHAR) and 43.3x (vs. DeepConvLSTM), and MACs by 6.4x and 58.6x, respectively, while maintaining the averaged F1-scores. Beyond quantitative gains, this work provides the first systematic ablation study dissecting the contributions of spatial-temporal components across proposed TinierHAR, prior SOTA TinyHAR, and the classical DeepConvLSTM, offering actionable insights for designing efficient HAR systems. We finally discussed the findings and suggested principled design guidelines for future efficient HAR. To catalyze edge-HAR research, we open-source all materials in this work for future benchmarking\footnote{https://github.com/zhaxidele/TinierHAR}

LGSep 26, 2021Code
Generalized multiscale feature extraction for remaining useful life prediction of bearings with generative adversarial networks

Sungho Suh, Paul Lukowicz, Yong Oh Lee

Bearing is a key component in industrial machinery and its failure may lead to unwanted downtime and economic loss. Hence, it is necessary to predict the remaining useful life (RUL) of bearings. Conventional data-driven approaches of RUL prediction require expert domain knowledge for manual feature extraction and may suffer from data distribution discrepancy between training and test data. In this study, we propose a novel generalized multiscale feature extraction method with generative adversarial networks. The adversarial training learns the distribution of training data from different bearings and is introduced for health stage division and RUL prediction. To capture the sequence feature from a one-dimensional vibration signal, we adapt a U-Net architecture that reconstructs features to process them with multiscale layers in the generator of the adversarial network. To validate the proposed method, comprehensive experiments on two rotating machinery datasets have been conducted to predict the RUL. The experimental results show that the proposed feature extraction method can effectively predict the RUL and outperforms the conventional RUL prediction approaches based on deep neural networks. The implementation code is available at https://github.com/opensuh/GMFE.

HCNov 2, 2020Code
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset

Anton Smerdov, Bo Zhou, Paul Lukowicz et al.

Proper training and analytics in eSports require accurately collected and annotated data. Most eSports research focuses exclusively on in-game data analysis, and there is a lack of prior work involving eSports athletes' psychophysiological data. In this paper, we present a dataset collected from professional and amateur teams in 22 matches in League of Legends video game with more than 40 hours of recordings. Recorded data include the players' physiological activity, e.g. movements, pulse, saccades, obtained from various sensors, self-reported aftermatch survey, and in-game data. An important feature of the dataset is simultaneous data collection from five players, which facilitates the analysis of sensor data on a team level. Upon the collection of dataset we carried out its validation. In particular, we demonstrate that stress and concentration levels for professional players are less correlated, meaning more independent playstyle. Also, we show that the absence of team communication does not affect the professional players as much as amateur ones. To investigate other possible use cases of the dataset, we have trained classical machine learning algorithms for skill prediction and player re-identification using 3-minute sessions of sensor data. Best models achieved 0.856 and 0.521 (0.10 for a chance level) accuracy scores on a validation set for skill prediction and player re-id problems, respectively. The dataset is available at https://github.com/smerdov/eSports Sensors Dataset.

CVOct 20, 2020Code
Two-stage generative adversarial networks for document image binarization with color noise and background removal

Sungho Suh, Jihun Kim, Paul Lukowicz et al.

Document image enhancement and binarization methods are often used to improve the accuracy and efficiency of document image analysis tasks such as text recognition. Traditional non-machine-learning methods are constructed on low-level features in an unsupervised manner but have difficulty with binarization on documents with severely degraded backgrounds. Convolutional neural network-based methods focus only on grayscale images and on local textual features. In this paper, we propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks. In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image for document image enhancement. In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size. For the adversarial neural networks, we formulate loss functions between a discriminator and generators having an encoder-decoder structure. Experimental results show that the proposed method achieves better performance than many classical and state-of-the-art algorithms over the Document Image Binarization Contest (DIBCO) datasets, the LRDE Document Binarization Dataset (LRDE DBD), and our shipping label image dataset. We plan to release the shipping label dataset as well as our implementation code at github.com/opensuh/DocumentBinarization/.

SPJan 11, 2024
Body-Area Capacitive or Electric Field Sensing for Human Activity Recognition and Human-Computer Interaction: A Comprehensive Survey

Sizhen Bian, Mengxi Liu, Bo Zhou et al.

Due to the fact that roughly sixty percent of the human body is essentially composed of water, the human body is inherently a conductive object, being able to, firstly, form an inherent electric field from the body to the surroundings and secondly, deform the distribution of an existing electric field near the body. Body-area capacitive sensing, also called body-area electric field sensing, is becoming a promising alternative for wearable devices to accomplish certain tasks in human activity recognition and human-computer interaction. Over the last decade, researchers have explored plentiful novel sensing systems backed by the body-area electric field. On the other hand, despite the pervasive exploration of the body-area electric field, a comprehensive survey does not exist for an enlightening guideline. Moreover, the various hardware implementations, applied algorithms, and targeted applications result in a challenging task to achieve a systematic overview of the subject. This paper aims to fill in the gap by comprehensively summarizing the existing works on body-area capacitive sensing so that researchers can have a better view of the current exploration status. To this end, we first sorted the explorations into three domains according to the involved body forms: body-part electric field, whole-body electric field, and body-to-body electric field, and enumerated the state-of-art works in the domains with a detailed survey of the backed sensing tricks and targeted applications. We then summarized the three types of sensing frontends in circuit design, which is the most critical part in body-area capacitive sensing, and analyzed the data processing pipeline categorized into three kinds of approaches. Finally, we described the challenges and outlooks of body-area electric sensing.

SPJan 31, 2024
iMove: Exploring Bio-impedance Sensing for Fitness Activity Recognition

Mengxi Liu, Vitor Fortes Rey, Yu Zhang et al.

Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning.To evaluate our methods, we conducted an experiment including six upper body fitness activities performed by ten subjects over five days to collect synchronized data from bio-impedance across two wrists and IMU on the left wrist.The contrastive learning framework uses the two modalities to train a better IMU-only classification model, where bio-impedance is only required at the training phase, by which the average Macro F1 score with the input of a single IMU was improved by 3.22 \% reaching 84.71 \% compared to the 81.49 \% of the IMU baseline model. We have also shown how bio-impedance can improve human activity recognition (HAR) directly through sensor fusion, reaching an average Macro F1 score of 89.57 \% (two modalities required for both training and inference) even if Bio-impedance alone has an average macro F1 score of 75.36 \%, which is outperformed by IMU alone. In addition, similar results were obtained in an extended study on lower body fitness activity classification, demonstrating the generalisability of our approach.Our findings underscore the potential of sensor fusion and contrastive learning as valuable tools for advancing fitness activity recognition, with bio-impedance playing a pivotal role in augmenting the capabilities of IMU-based systems.

LGJan 3, 2024
The Power of Training: How Different Neural Network Setups Influence the Energy Demand

Daniel Geißler, Bo Zhou, Mengxi Liu et al.

This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy consumption and carbon emission. Therefore, the goal of this work is to raise awareness about the energy impact of general training parameters and processes, from learning rate over batch size to knowledge transfer. Multiple setups with different hyperparameter configurations are evaluated on three different hardware systems. Among many results, we have found out that even with the same model and hardware to reach the same accuracy, improperly set training hyperparameters consume up to 5 times the energy of the optimal setup. We also extensively examined the energy-saving benefits of learning paradigms including recycling knowledge through pretraining and sharing knowledge through multitask training.

SPNov 11, 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task

Harish Haresamudram, Chi Ian Tang, Sungho Suh et al.

In the many years since the inception of wearable sensor-based Human Activity Recognition (HAR), a wide variety of methods have been introduced and evaluated for their ability to recognize activities. Substantial gains have been made since the days of hand-crafting heuristics as features, yet, progress has seemingly stalled on many popular benchmarks, with performance falling short of what may be considered 'sufficient'-- despite the increase in computational power and scale of sensor data, as well as rising complexity in techniques being employed. The HAR community approaches a new paradigm shift, this time incorporating world knowledge from foundational models. In this paper, we take stock of sensor-based HAR -- surveying it from its beginnings to the current state of the field, and charting its future. This is accompanied by a hands-on tutorial, through which we guide practitioners in developing HAR systems for real-world application scenarios. We provide a compendium for novices and experts alike, of methods that aim at finally solving the activity recognition problem.

SPApr 25, 2024
Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Parham Zolfaghari, Vitor Fortes Rey, Lala Ray et al.

The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.

LGFeb 22, 2024
Text me the data: Generating Ground Pressure Sequence from Textual Descriptions for HAR

Lala Shakti Swarup Ray, Bo Zhou, Sungho Suh et al.

In human activity recognition (HAR), the availability of substantial ground truth is necessary for training efficient models. However, acquiring ground pressure data through physical sensors itself can be cost-prohibitive, time-consuming. To address this critical need, we introduce Text-to-Pressure (T2P), a framework designed to generate extensive ground pressure sequences from textual descriptions of human activities using deep learning techniques. We show that the combination of vector quantization of sensor data along with simple text conditioned auto regressive strategy allows us to obtain high-quality generated pressure sequences from textual descriptions with the help of discrete latent correlation between text and pressure maps. We achieved comparable performance on the consistency between text and generated motion with an R squared value of 0.722, Masked R squared value of 0.892, and FID score of 1.83. Additionally, we trained a HAR model with the the synthesized data and evaluated it on pressure dynamics collected by a real pressure sensor which is on par with a model trained on only real data. Combining both real and synthesized training data increases the overall macro F1 score by 5.9 percent.

CLJan 8, 2025
Quantum-inspired Embeddings Projection and Similarity Metrics for Representation Learning

Ivan Kankeu, Stefan Gerd Fritsch, Gunnar Schönhoff et al.

Over the last decade, representation learning, which embeds complex information extracted from large amounts of data into dense vector spaces, has emerged as a key technique in machine learning. Among other applications, it has been a key building block for large language models and advanced computer vision systems based on contrastive learning. A core component of representation learning systems is the projection head, which maps the original embeddings into different, often compressed spaces, while preserving the similarity relationship between vectors. In this paper, we propose a quantum-inspired projection head that includes a corresponding quantum-inspired similarity metric. Specifically, we map classical embeddings onto quantum states in Hilbert space and introduce a quantum circuit-based projection head to reduce embedding dimensionality. To evaluate the effectiveness of this approach, we extended the BERT language model by integrating our projection head for embedding compression. We compared the performance of embeddings, which were compressed using our quantum-inspired projection head, with those compressed using a classical projection head on information retrieval tasks using the TREC 2019 and TREC 2020 Deep Learning benchmarks. The results demonstrate that our quantum-inspired method achieves competitive performance relative to the classical method while utilizing 32 times fewer parameters. Furthermore, when trained from scratch, it notably excels, particularly on smaller datasets. This work not only highlights the effectiveness of the quantum-inspired approach but also emphasizes the utility of efficient, ad hoc low-entanglement circuit simulations within neural networks as a powerful quantum-inspired technique.

LGApr 24, 2024
BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation

Hymalai Bello, Sungho Suh, Bo Zhou et al.

Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energy (BLE) and ultrasound coordinates. BLE position estimation offers a very low-power and cost-effective solution, as the technology is available on smartphones and is scalable due to the large number of smartphone users, facilitating worker localization and safety protocol transmission. Ultrasound signals provide faster response times and higher accuracy but require custom hardware, increasing costs. To combine the benefits of both modalities, we employ knowledge distillation (KD) from ultrasound signals to BLE RSSI data. Once the student model is trained, the model only takes as inputs the BLE-RSSI data for inference, retaining the advantages of ubiquity and low cost of BLE RSSI. We tested our approach using data from an experiment with twelve participants in a smart factory test bed environment. We obtained an increase of 11.79% in the F1-score compared to the baseline (target model without KD and trained with BLE-RSSI data only).

LGDec 12, 2024
Beyond Confusion: A Fine-grained Dialectical Examination of Human Activity Recognition Benchmark Datasets

Daniel Geissler, Dominique Nshimyimana, Vitor Fortes Rey et al.

The research of machine learning (ML) algorithms for human activity recognition (HAR) has made significant progress with publicly available datasets. However, most research prioritizes statistical metrics over examining negative sample details. While recent models like transformers have been applied to HAR datasets with limited success from the benchmark metrics, their counterparts have effectively solved problems on similar levels with near 100% accuracy. This raises questions about the limitations of current approaches. This paper aims to address these open questions by conducting a fine-grained inspection of six popular HAR benchmark datasets. We identified for some parts of the data, none of the six chosen state-of-the-art ML methods can correctly classify, denoted as the intersect of false classifications (IFC). Analysis of the IFC reveals several underlying problems, including ambiguous annotations, irregularities during recording execution, and misaligned transition periods. We contribute to the field by quantifying and characterizing annotated data ambiguities, providing a trinary categorization mask for dataset patching, and stressing potential improvements for future data collections.

LGDec 11, 2024
Spend More to Save More (SM2): An Energy-Aware Implementation of Successive Halving for Sustainable Hyperparameter Optimization

Daniel Geissler, Bo Zhou, Sungho Suh et al.

A fundamental step in the development of machine learning models commonly involves the tuning of hyperparameters, often leading to multiple model training runs to work out the best-performing configuration. As machine learning tasks and models grow in complexity, there is an escalating need for solutions that not only improve performance but also address sustainability concerns. Existing strategies predominantly focus on maximizing the performance of the model without considering energy efficiency. To bridge this gap, in this paper, we introduce Spend More to Save More (SM2), an energy-aware hyperparameter optimization implementation based on the widely adopted successive halving algorithm. Unlike conventional approaches including energy-intensive testing of individual hyperparameter configurations, SM2 employs exploratory pretraining to identify inefficient configurations with minimal energy expenditure. Incorporating hardware characteristics and real-time energy consumption tracking, SM2 identifies an optimal configuration that not only maximizes the performance of the model but also enables energy-efficient training. Experimental validations across various datasets, models, and hardware setups confirm the efficacy of SM2 to prevent the waste of energy during the training of hyperparameter configurations.

LGDec 11, 2024
Enhancing Interpretability Through Loss-Defined Classification Objective in Structured Latent Spaces

Daniel Geissler, Bo Zhou, Mengxi Liu et al.

Supervised machine learning often operates on the data-driven paradigm, wherein internal model parameters are autonomously optimized to converge predicted outputs with the ground truth, devoid of explicitly programming rules or a priori assumptions. Although data-driven methods have yielded notable successes across various benchmark datasets, they inherently treat models as opaque entities, thereby limiting their interpretability and yielding a lack of explanatory insights into their decision-making processes. In this work, we introduce Latent Boost, a novel approach that integrates advanced distance metric learning into supervised classification tasks, enhancing both interpretability and training efficiency. Thus during training, the model is not only optimized for classification metrics of the discrete data points but also adheres to the rule that the collective representation zones of each class should be sharply clustered. By leveraging the rich structural insights of intermediate model layer latent representations, Latent Boost improves classification interpretability, as demonstrated by higher Silhouette scores, while accelerating training convergence. These performance and latent structural benefits are achieved with minimum additional cost, making it broadly applicable across various datasets without requiring data-specific adjustments. Furthermore, Latent Boost introduces a new paradigm for aligning classification performance with improved model transparency to address the challenges of black-box models.

QUANT-PHJun 27, 2025
QuKAN: A Quantum Circuit Born Machine approach to Quantum Kolmogorov Arnold Networks

Yannick Werner, Akash Malemath, Mengxi Liu et al.

Kolmogorov Arnold Networks (KANs), built upon the Kolmogorov Arnold representation theorem (KAR), have demonstrated promising capabilities in expressing complex functions with fewer neurons. This is achieved by implementing learnable parameters on the edges instead of on the nodes, unlike traditional networks such as Multi-Layer Perceptrons (MLPs). However, KANs potential in quantum machine learning has not yet been well explored. In this work, we present an implementation of these KAN architectures in both hybrid and fully quantum forms using a Quantum Circuit Born Machine (QCBM). We adapt the KAN transfer using pre-trained residual functions, thereby exploiting the representational power of parametrized quantum circuits. In the hybrid model we combine classical KAN components with quantum subroutines, while the fully quantum version the entire architecture of the residual function is translated to a quantum model. We demonstrate the feasibility, interpretability and performance of the proposed Quantum KAN (QuKAN) architecture.