CVJun 28, 2023Code
Points for Energy Renovation (PointER): A LiDAR-Derived Point Cloud Dataset of One Million English Buildings Linked to Energy CharacteristicsSebastian Krapf, Kevin Mayer, Martin Fischer
Rapid renovation of Europe's inefficient buildings is required to reduce climate change. However, analyzing and evaluating buildings at scale is challenging because every building is unique. In current practice, the energy performance of buildings is assessed during on-site visits, which are slow, costly, and local. This paper presents a building point cloud dataset that promotes a data-driven, large-scale understanding of the 3D representation of buildings and their energy characteristics. We generate building point clouds by intersecting building footprints with geo-referenced LiDAR data and link them with attributes from UK's energy performance database via the Unique Property Reference Number (UPRN). To achieve a representative sample, we select one million buildings from a range of rural and urban regions across England, of which half a million are linked to energy characteristics. Building point clouds in new regions can be generated with the open-source code published alongside the paper. The dataset enables novel research in building energy modeling and can be easily expanded to other research fields by adding building features via the UPRN or geo-location.
CVJun 5, 2022
Estimating building energy efficiency from street view imagery, aerial imagery, and land surface temperature dataKevin Mayer, Lukas Haas, Tianyuan Huang et al.
Current methods to determine the energy efficiency of buildings require on-site visits of certified energy auditors which makes the process slow, costly, and geographically incomplete. To accelerate the identification of promising retrofit targets on a large scale, we propose to estimate building energy efficiency from widely available and remotely sensed data sources only, namely street view, aerial view, footprint, and satellite-borne land surface temperature (LST) data. After collecting data for almost 40,000 buildings in the United Kingdom, we combine these data sources by training multiple end-to-end deep learning models with the objective to classify buildings as energy efficient (EU rating A-D) or inefficient (EU rating E-G). After evaluating the trained models quantitatively as well as qualitatively, we extend our analysis by studying the predictive power of each data source in an ablation study. We find that the end-to-end deep learning model trained on all four data sources achieves a macro-averaged F1 score of 64.64% and outperforms the k-NN and SVM-based baseline models by 14.13 to 12.02 percentage points, respectively. Thus, this work shows the potential and complementary nature of remotely sensed data in predicting energy efficiency and opens up new opportunities for future work to integrate additional data sources.
CVAug 28, 2025Code
SYNBUILD-3D: A large, multi-modal, and semantically rich synthetic dataset of 3D building models at Level of Detail 4Kevin Mayer, Alex Vesel, Xinyi Zhao et al.
3D building models are critical for applications in architecture, energy simulation, and navigation. Yet, generating accurate and semantically rich 3D buildings automatically remains a major challenge due to the lack of large-scale annotated datasets in the public domain. Inspired by the success of synthetic data in computer vision, we introduce SYNBUILD-3D, a large, diverse, and multi-modal dataset of over 6.2 million synthetic 3D residential buildings at Level of Detail (LoD) 4. In the dataset, each building is represented through three distinct modalities: a semantically enriched 3D wireframe graph at LoD 4 (Modality I), the corresponding floor plan images (Modality II), and a LiDAR-like roof point cloud (Modality III). The semantic annotations for each building wireframe are derived from the corresponding floor plan images and include information on rooms, doors, and windows. Through its tri-modal nature, future work can use SYNBUILD-3D to develop novel generative AI algorithms that automate the creation of 3D building models at LoD 4, subject to predefined floor plan layouts and roof geometries, while enforcing semantic-geometric consistency. Dataset and code samples are publicly available at https://github.com/kdmayer/SYNBUILD-3D.
CRFeb 18, 2025Code
Malware Detection based on API callsChristofer Fellicious, Manuel Bischof, Kevin Mayer et al.
Malware attacks pose a significant threat in today's interconnected digital landscape, causing billions of dollars in damages. Detecting and identifying families as early as possible provides an edge in protecting against such malware. We explore a lightweight, order-invariant approach to detecting and mitigating malware threats: analyzing API calls without regard to their sequence. We publish a public dataset of over three hundred thousand samples and their function call parameters for this task, annotated with labels indicating benign or malicious activity. The complete dataset is above 550GB uncompressed in size. We leverage machine learning algorithms, such as random forests, and conduct behavioral analysis by examining patterns and anomalies in API call sequences. By investigating how the function calls occur regardless of their order, we can identify discriminating features that can help us identify malware early on. The models we've developed are not only effective but also efficient. They are lightweight and can run on any machine with minimal performance overhead, while still achieving an impressive F1-Score of over 85\%. We also empirically show that we only need a subset of the function call sequence, specifically calls to the ntdll.dll library, to identify malware. Our research demonstrates the efficacy of this approach through empirical evaluations, underscoring its accuracy and scalability. The code is open source and available at Github along with the dataset on Zenodo.
CRApr 7
SoK: Understanding Anti-Forensics Concepts and Research Practices Across Forensic SubdomainsJanine Schneider, Florian Ramming, Maximilian Eichhorn et al.
Anti-forensics includes a growing set of techniques designed to obstruct forensic analysis. While cybercriminals increasingly rely on these methods, they also help researchers identify and remedy weaknesses in forensic tools, advancing the overall robustness of digital forensics. Despite repeated efforts to define it, anti-forensics remains vague and inconsistent in its use. It also poses ethical challenges regarding the appropriateness of research practices and the legitimacy of the field itself. This article presents a systematic analysis of 123 publications on anti-forensics, combining qualitative and quantitative methods. We quantify the main techniques and attack vectors, examine their occurrence in different digital forensic subdomains, and identify typical research methods, motivations, and applications. This work also discusses what these findings mean for future research and proposes directions for building a more coherent and ethically grounded understanding of anti-forensics.
CVDec 7, 2020
An Enriched Automated PV Registry: Combining Image Recognition and 3D Building DataBenjamin Rausch, Kevin Mayer, Marie-Louise Arlt et al.
While photovoltaic (PV) systems are installed at an unprecedented rate, reliable information on an installation level remains scarce. As a result, automatically created PV registries are a timely contribution to optimize grid planning and operations. This paper demonstrates how aerial imagery and three-dimensional building data can be combined to create an address-level PV registry, specifying area, tilt, and orientation angles. We demonstrate the benefits of this approach for PV capacity estimation. In addition, this work presents, for the first time, a comparison between automated and officially-created PV registries. Our results indicate that our enriched automated registry proves to be useful to validate, update, and complement official registries.