55.0CRJun 2
Don't Trust Us: A privacy-by-design android malware detection pipelineEmmanuele Massidda, Diego Soi, Giorgio Giacinto
Android malware detection increasingly relies on collecting and processing sensitive user data, including device identifiers, network artifacts, and runtime traces, while privacy is too often treated as a secondary concern. Existing privacy-aware approaches typically enforce privacy after data collection, for example, through anonymization, encryption, or federated learning, yet still require access to user information and therefore demand a high level of user trust in systems that already operate with privileged access to device activity. We argue that this requirement should be removed rather than managed. Android malware detection should be privacy-aware by design, so that effective analysis does not depend on sensitive data being accessed in the first place. To this end, we first formalize a set of design requirements for privacy-by-design detection and then implement each requirement in a comprehensive pipeline. First, static analysis is performed to extract relevant data from each APK, following the Drebin representation, which is then submitted to an SVM after vectorization. The model is equipped with a dual-reject threshold rule that either commits to a confident decision or defers uncertain samples to a dynamic analysis stage within a sandboxed environment, so that genuine user information never enters the analysis loop. Results confirm that, on a temporally split dataset spanning from 2024 to 2025, the pipeline achieves an F1 score of 0.87 with the first static analysis stage, deferring only 6.7% of test samples to secondary dynamic analysis. Additionally, dynamic sandboxing helps recognize applications' maliciousness with high confidence without extracting any sensitive data. These results demonstrate that strong detection performance is achievable without sacrificing user privacy.
LGSep 2, 2024Code
Adversarial Pruning: A Survey and Benchmark of Pruning Methods for Adversarial RobustnessGiorgio Piras, Maura Pintor, Ambra Demontis et al.
Recent work has proposed neural network pruning techniques to reduce the size of a network while preserving robustness against adversarial examples, i.e., well-crafted inputs inducing a misclassification. These methods, which we refer to as adversarial pruning methods, involve complex and articulated designs, making it difficult to analyze the differences and establish a fair and accurate comparison. In this work, we overcome these issues by surveying current adversarial pruning methods and proposing a novel taxonomy to categorize them based on two main dimensions: the pipeline, defining when to prune; and the specifics, defining how to prune. We then highlight the limitations of current empirical analyses and propose a novel, fair evaluation benchmark to address them. We finally conduct an empirical re-evaluation of current adversarial pruning methods and discuss the results, highlighting the shared traits of top-performing adversarial pruning methods, as well as common issues. We welcome contributions in our publicly-available benchmark at https://github.com/pralab/AdversarialPruningBenchmark
LGJul 11, 2024Code
HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm AttacksRaffaele Mura, Giuseppe Floris, Luca Scionis et al.
Gradient-based attacks are a primary tool to evaluate robustness of machine-learning models. However, many attacks tend to provide overly-optimistic evaluations as they use fixed loss functions, optimizers, step-size schedulers, and default hyperparameters. In this work, we tackle these limitations by proposing a parametric variation of the well-known fast minimum-norm attack algorithm, whose loss, optimizer, step-size scheduler, and hyperparameters can be dynamically adjusted. We re-evaluate 12 robust models, showing that our attack finds smaller adversarial perturbations without requiring any additional tuning. This also enables reporting adversarial robustness as a function of the perturbation budget, providing a more complete evaluation than that offered by fixed-budget attacks, while remaining efficient. We release our open-source code at https://github.com/pralab/HO-FMN.
64.9LGMar 30
Label-efficient Training Updates for Malware Detection over TimeLuca Minnei, Cristian Manca, Giorgio Piras et al.
Machine Learning (ML)-based detectors are becoming essential to counter the proliferation of malware. However, common ML algorithms are not designed to cope with the dynamic nature of real-world settings, where both legitimate and malicious software evolve. This distribution drift causes models trained under static assumptions to degrade over time unless they are continuously updated. Regularly retraining these models, however, is expensive, since labeling new acquired data requires costly manual analysis by security experts. To reduce labeling costs and address distribution drift in malware detection, prior work explored active learning (AL) and semi-supervised learning (SSL) techniques. Yet, existing studies (i) are tightly coupled to specific detector architectures and restricted to a specific malware domain, resulting in non-uniform comparisons; and (ii) lack a consistent methodology for analyzing the distribution drift, despite the critical sensitivity of the malware domain to temporal changes. In this work, we bridge this gap by proposing a model-agnostic framework that evaluates an extensive set of AL and SSL techniques, isolated and combined, for Android and Windows malware detection. We show that these techniques, when combined, can reduce manual annotation costs by up to 90% across both domains while achieving comparable detection performance to full-labeling retraining. We also introduce a methodology for feature-level drift analysis that measures feature stability over time, showing its correlation with the detector performance. Overall, our study provides a detailed understanding of how AL and SSL behave under distribution drift and how they can be successfully combined, offering practical insights for the design of effective detectors over time.
66.0CRMar 10
An Analysis of Modern Web Security Vulnerabilities Inside WebAssembly ApplicationsLorenzo Corrias, Lorenzo Pisu, Davide Maiorca et al.
The growth in the adoption of the WebAssembly (WASM) standard has given rise to a rapidly increasing landscape of binary applications that are natively ported to the environment of websites. The flexibility of WASM has made it the preferred way to run fast and resource-heavy applications, replacing a field that JavaScript previously monopolized. Despite its success, researchers have raised concerns over the security implementations of WASM, demonstrating that binary vulnerabilities, such as Buffer Overflows and Use After Free, remain a present danger for WASM binaries. Our work aims to demonstrate that such vulnerabilities, when occurring on a WebAssembly module, can affect the behavior of a web application in unexpected ways, enabling an attacker to exploit vulnerabilities that are typical of the web security landscape. We provide several scenarios to provide examples of how each binary vulnerability might lead to a web security vulnerability, such as SQL Injections, XS-Leaks, and SSTI. Our results show that binary vulnerabilities can invalidate common security mechanisms that web developer implement in their applications, demonstrating how the security of WASM modules remains a problem that needs to be addressed. We also provide a list of best practices and defensive strategies that developers can implement to mitigate the risks associated with running unsafe WASM modules in their web applications.
CRApr 23, 2019Code
PowerDrive: Accurate De-Obfuscation and Analysis of PowerShell MalwareDenis Ugarte, Davide Maiorca, Fabrizio Cara et al.
PowerShell is nowadays a widely-used technology to administrate and manage Windows-based operating systems. However, it is also extensively used by malware vectors to execute payloads or drop additional malicious contents. Similarly to other scripting languages used by malware, PowerShell attacks are challenging to analyze due to the extensive use of multiple obfuscation layers, which make the real malicious code hard to be unveiled. To the best of our knowledge, a comprehensive solution for properly de-obfuscating such attacks is currently missing. In this paper, we present PowerDrive, an open-source, static and dynamic multi-stage de-obfuscator for PowerShell attacks. PowerDrive instruments the PowerShell code to progressively de-obfuscate it by showing the analyst the employed obfuscation steps. We used PowerDrive to successfully analyze thousands of PowerShell attacks extracted from various malware vectors and executables. The attained results show interesting patterns used by attackers to devise their malicious scripts. Moreover, we provide a taxonomy of behavioral models adopted by the analyzed codes and a comprehensive list of the malicious domains contacted during the analysis.
LGNov 25, 2018Code
Poisoning Behavioral Malware ClusteringBattista Biggio, Konrad Rieck, Davide Ariu et al.
Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data. In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior. To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms.
LGJan 30, 2014Code
Security Evaluation of Support Vector Machines in Adversarial EnvironmentsBattista Biggio, Igino Corona, Blaine Nelson et al.
Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machine-learning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in real-world security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversary-aware design of SVMs. Our experiments are easily reproducible thanks to open-source code that we have made available, together with all the employed datasets, on a public repository.
CVDec 2, 2024
Exploring the Robustness of AI-Driven Tools in Digital Forensics: A Preliminary StudySilvia Lucia Sanna, Leonardo Regano, Davide Maiorca et al.
Nowadays, many tools are used to facilitate forensic tasks about data extraction and data analysis. In particular, some tools leverage Artificial Intelligence (AI) to automatically label examined data into specific categories (\ie, drugs, weapons, nudity). However, this raises a serious concern about the robustness of the employed AI algorithms against adversarial attacks. Indeed, some people may need to hide specific data to AI-based digital forensics tools, thus manipulating the content so that the AI system does not recognize the offensive/prohibited content and marks it at as suspicious to the analyst. This could be seen as an anti-forensics attack scenario. For this reason, we analyzed two of the most important forensics tools employing AI for data classification: Magnet AI, used by Magnet Axiom, and Excire Photo AI, used by X-Ways Forensics. We made preliminary tests using about $200$ images, other $100$ sent in $3$ chats about pornography and teenage nudity, drugs and weapons to understand how the tools label them. Moreover, we loaded some deepfake images (images generated by AI forging real ones) of some actors to understand if they would be classified in the same category as the original images. From our preliminary study, we saw that the AI algorithm is not robust enough, as we expected since these topics are still open research problems. For example, some sexual images were not categorized as nudity, and some deepfakes were categorized as the same real person, while the human eye can see the clear nudity image or catch the difference between the deepfakes. Building on these results and other state-of-the-art works, we provide some suggestions for improving how digital forensics analysis tool leverage AI and their robustness against adversarial attacks or different scenarios than the trained one.
64.0CRApr 1
Obfuscating Code Vulnerabilities against Static Analysis in JavaScript CodeFrancesco Pagano, Lorenzo Pisu, Leonardo Regano et al.
Code obfuscation is widely adopted in modern software development to protect intellectual property and hinder reverse engineering, but it also provides attackers with a powerful means to conceal malicious logic inside otherwise legitimate JavaScript code. In a software supply chain where a single compromised package can affect thousands of applications, this raises a critical question: how robust are the Static Application Security Testing (SAST) tools that CI/CD pipelines rely on as automated security gatekeepers? This paper answers that question by empirically quantifying the impact of JavaScript obfuscation on state-of-practice SAST. We define a realistic supply-chain threat model in which an adversary injects vulnerable code and iteratively obfuscates it until the pipeline reports a clean scan. To measure the resulting degradation, we introduce the Vulnerability Detection Loss (VDL) metric and conduct a two-phase study. First, we analyze 16 vulnerable-by-design Node.js web applications from the OWASP directory; second, we extend the analysis to 260 in-the-wild JavaScript/Node.js projects from GitHub. Across both datasets, we apply eight semantics-preserving obfuscation techniques and their combinations and evaluate two representative SAST tools, Njsscan and Bearer. Even a single obfuscation technique typically suppresses most baseline findings, including high-severity issues, while stacking techniques yield near-total evasion, with VDL often approaching 100%. Our results show that current JavaScript SAST is fundamentally not robust against commonplace obfuscations and that "clean" reports on obfuscated code may offer only a false sense of security. Finally, we discuss practical mitigation guidelines and directions for obfuscation-aware analysis.
CRJun 9, 2025
Are Trees Really Green? A Detection Approach of IoT Malware AttacksSilvia Lucia Sanna, Diego Soi, Davide Maiorca et al.
Nowadays, the Internet of Things (IoT) is widely employed, and its usage is growing exponentially because it facilitates remote monitoring, predictive maintenance, and data-driven decision making, especially in the healthcare and industrial sectors. However, IoT devices remain vulnerable due to their resource constraints and difficulty in applying security patches. Consequently, various cybersecurity attacks are reported daily, such as Denial of Service, particularly in IoT-driven solutions. Most attack detection methodologies are based on Machine Learning (ML) techniques, which can detect attack patterns. However, the focus is more on identification rather than considering the impact of ML algorithms on computational resources. This paper proposes a green methodology to identify IoT malware networking attacks based on flow privacy-preserving statistical features. In particular, the hyperparameters of three tree-based models -- Decision Trees, Random Forest and Extra-Trees -- are optimized based on energy consumption and test-time performance in terms of Matthew's Correlation Coefficient. Our results show that models maintain high performance and detection accuracy while consistently reducing power usage in terms of watt-hours (Wh). This suggests that on-premise ML-based Intrusion Detection Systems are suitable for IoT and other resource-constrained devices.
LGMay 4, 2020
Do Gradient-based Explanations Tell Anything About Adversarial Robustness to Android Malware?Marco Melis, Michele Scalas, Ambra Demontis et al.
While machine-learning algorithms have demonstrated a strong ability in detecting Android malware, they can be evaded by sparse evasion attacks crafted by injecting a small set of fake components, e.g., permissions and system calls, without compromising intrusive functionality. Previous work has shown that, to improve robustness against such attacks, learning algorithms should avoid overemphasizing few discriminant features, providing instead decisions that rely upon a large subset of components. In this work, we investigate whether gradient-based attribution methods, used to explain classifiers' decisions by identifying the most relevant features, can be used to help identify and select more robust algorithms. To this end, we propose to exploit two different metrics that represent the evenness of explanations, and a new compact security measure called Adversarial Robustness Metric. Our experiments conducted on two different datasets and five classification algorithms for Android malware detection show that a strong connection exists between the uniformity of explanations and adversarial robustness. In particular, we found that popular techniques like Gradient*Input and Integrated Gradients are strongly correlated to security when applied to both linear and nonlinear detectors, while more elementary explanation techniques like the simple Gradient do not provide reliable information about the robustness of such classifiers.
CROct 2, 2019
Automotive Cybersecurity: Foundations for Next-Generation VehiclesMichele Scalas, Giorgio Giacinto
The automotive industry is experiencing a serious transformation due to a digitalisation process and the transition to the new paradigm of Mobility-as-a-Service. The next-generation vehicles are going to be very complex cyber-physical systems, whose design must be reinvented to fulfil the increasing demand of smart services, both for safety and entertainment purposes, causing the manufacturers' model to converge towards that of IT companies. Connected cars and autonomous driving are the preeminent factors that drive along this route, and they cause the necessity of a new design to address the emerging cybersecurity issues: the "old" automotive architecture relied on a single closed network, with no external communications; modern vehicles are going to be always connected indeed, which means the attack surface will be much more extended. The result is the need for a paradigm shift towards a secure-by-design approach.
CRNov 2, 2018
Towards Adversarial Malware Detection: Lessons Learned from PDF-based AttacksDavide Maiorca, Battista Biggio, Giorgio Giacinto
Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such threats, but the existence of an arms race in adversarial settings has recently challenged such systems. In this work, we focus on malware embedded in PDF files as a representative case of such an arms race. We start by providing a comprehensive taxonomy of the different approaches used to generate PDF malware, and of the corresponding learning-based detection systems. We then categorize threats specifically targeted against learning-based PDF malware detectors, using a well-established framework in the field of adversarial machine learning. This framework allows us to categorize known vulnerabilities of learning-based PDF malware detectors and to identify novel attacks that may threaten such systems, along with the potential defense mechanisms that can mitigate the impact of such threats. We conclude the paper by discussing how such findings highlight promising research directions towards tackling the more general challenge of designing robust malware detectors in adversarial settings.
CRMay 24, 2018
On the Effectiveness of System API-Related Information for Android Ransomware DetectionMichele Scalas, Davide Maiorca, Francesco Mercaldo et al.
Ransomware constitutes a significant threat to the Android operating system. It can either lock or encrypt the target devices, and victims are forced to pay ransoms to restore their data. Hence, the prompt detection of such attacks has a priority in comparison to other malicious threats. Previous works on Android malware detection mainly focused on Machine Learning-oriented approaches that were tailored to identifying malware families, without a clear focus on ransomware. More specifically, such approaches resorted to complex information types such as permissions, user-implemented API calls, and native calls. However, this led to significant drawbacks concerning complexity, resilience against obfuscation, and explainability. To overcome these issues, in this paper, we propose and discuss learning-based detection strategies that rely on System API information. These techniques leverage the fact that ransomware attacks heavily resort to System API to perform their actions, and allow distinguishing between generic malware, ransomware and goodware. We tested three different ways of employing System API information, i.e., through packages, classes, and methods, and we compared their performances to other, more complex state-of-the-art approaches. The attained results showed that systems based on System API could detect ransomware and generic malware with very good accuracy, comparable to systems that employed more complex information. Moreover, the proposed systems could accurately detect novel samples in the wild and showed resilience against static obfuscation attempts. Finally, to guarantee early on-device detection, we developed and released on the Android platform a complete ransomware and malware detector (R-PackDroid) that employed one of the methodologies proposed in this paper.
CRMar 12, 2018
Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in ExecutablesBojan Kolosnjaji, Ambra Demontis, Battista Biggio et al.
Machine-learning methods have already been exploited as useful tools for detecting malicious executable files. They leverage data retrieved from malware samples, such as header fields, instruction sequences, or even raw bytes, to learn models that discriminate between benign and malicious software. However, it has also been shown that machine learning and deep neural networks can be fooled by evasion attacks (also referred to as adversarial examples), i.e., small changes to the input data that cause misclassification at test time. In this work, we investigate the vulnerability of malware detection methods that use deep networks to learn from raw bytes. We propose a gradient-based attack that is capable of evading a recently-proposed deep network suited to this purpose by only changing few specific bytes at the end of each malware sample, while preserving its intrusive functionality. Promising results show that our adversarial malware binaries evade the targeted network with high probability, even though less than 1% of their bytes are modified.
LGMar 9, 2018
Explaining Black-box Android Malware DetectionMarco Melis, Davide Maiorca, Battista Biggio et al.
Machine-learning models have been recently used for detecting malicious Android applications, reporting impressive performances on benchmark datasets, even when trained only on features statically extracted from the application, such as system calls and permissions. However, recent findings have highlighted the fragility of such in-vitro evaluations with benchmark datasets, showing that very few changes to the content of Android malware may suffice to evade detection. How can we thus trust that a malware detector performing well on benchmark data will continue to do so when deployed in an operating environment? To mitigate this issue, the most popular Android malware detectors use linear, explainable machine-learning models to easily identify the most influential features contributing to each decision. In this work, we generalize this approach to any black-box machine- learning model, by leveraging a gradient-based approach to identify the most influential local features. This enables using nonlinear models to potentially increase accuracy without sacrificing interpretability of decisions. Our approach also highlights the global characteristics learned by the model to discriminate between benign and malware applications. Finally, as shown by our empirical analysis on a popular Android malware detection task, it also helps identifying potential vulnerabilities of linear and nonlinear models against adversarial manipulations.
CRFeb 4, 2018
IntelliAV: Building an Effective On-Device Android Malware DetectorMansour Ahmadi, Angelo Sotgiu, Giorgio Giacinto
The importance of employing machine learning for malware detection has become explicit to the security community. Several anti-malware vendors have claimed and advertised the application of machine learning in their products in which the inference phase is performed on servers and high-performance machines, but the feasibility of such approaches on mobile devices with limited computational resources has not yet been assessed by the research community, vendors still being skeptical. In this paper, we aim to show the practicality of devising a learning-based anti-malware on Android mobile devices, first. Furthermore, we aim to demonstrate the significance of such a tool to cease new and evasive malware that can not easily be caught by signature-based or offline learning-based security tools. To this end, we first propose the extraction of a set of lightweight yet powerful features from Android applications. Then, we embed these features in a vector space to build an effective as well as efficient model. Hence, the model can perform the inference on the device for detecting potentially harmful applications. We show that without resorting to any signatures and relying only on a training phase involving a reasonable set of samples, the proposed system, named IntelliAV, provides more satisfying performances than the popular major anti-malware products. Moreover, we evaluate the robustness of IntelliAV against common obfuscation techniques where most of the anti-malware solutions get affected.
CROct 27, 2017
Adversarial Detection of Flash Malware: Limitations and Open IssuesDavide Maiorca, Ambra Demontis, Battista Biggio et al.
During the past four years, Flash malware has become one of the most insidious threats to detect, with almost 600 critical vulnerabilities targeting Adobe Flash disclosed in the wild. Research has shown that machine learning can be successfully used to detect Flash malware by leveraging static analysis to extract information from the structure of the file or its bytecode. However, the robustness of Flash malware detectors against well-crafted evasion attempts - also known as adversarial examples - has never been investigated. In this paper, we propose a security evaluation of a novel, representative Flash detector that embeds a combination of the prominent, static features employed by state-of-the-art tools. In particular, we discuss how to craft adversarial Flash malware examples, showing that it suffices to manipulate the corresponding source malware samples slightly to evade detection. We then empirically demonstrate that popular defense techniques proposed to mitigate evasion attempts, including re-training on adversarial examples, may not always be sufficient to ensure robustness. We argue that this occurs when the feature vectors extracted from adversarial examples become indistinguishable from those of benign data, meaning that the given feature representation is intrinsically vulnerable. In this respect, we are the first to formally define and quantitatively characterize this vulnerability, highlighting when an attack can be countered by solely improving the security of the learning algorithm, or when it requires also considering additional features. We conclude the paper by suggesting alternative research directions to improve the security of learning-based Flash malware detectors.
CRAug 21, 2017
Evasion Attacks against Machine Learning at Test TimeBattista Biggio, Igino Corona, Davide Maiorca et al.
In security-sensitive applications, the success of machine learning depends on a thorough vetting of their resistance to adversarial data. In one pertinent, well-motivated attack scenario, an adversary may attempt to evade a deployed system at test time by carefully manipulating attack samples. In this work, we present a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks. Following a recently proposed framework for security evaluation, we simulate attack scenarios that exhibit different risk levels for the classifier by increasing the attacker's knowledge of the system and her ability to manipulate attack samples. This gives the classifier designer a better picture of the classifier performance under evasion attacks, and allows him to perform a more informed model selection (or parameter setting). We evaluate our approach on the relevant security task of malware detection in PDF files, and show that such systems can be easily evaded. We also sketch some countermeasures suggested by our analysis.
CRApr 28, 2017
Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware DetectionAmbra Demontis, Marco Melis, Battista Biggio et al.
To cope with the increasing variability and sophistication of modern attacks, machine learning has been widely adopted as a statistically-sound tool for malware detection. However, its security against well-crafted attacks has not only been recently questioned, but it has been shown that machine learning exhibits inherent vulnerabilities that can be exploited to evade detection at test time. In other words, machine learning itself can be the weakest link in a security system. In this paper, we rely upon a previously-proposed attack framework to categorize potential attack scenarios against learning-based malware detection tools, by modeling attackers with different skills and capabilities. We then define and implement a set of corresponding evasion attacks to thoroughly assess the security of Drebin, an Android malware detector. The main contribution of this work is the proposal of a simple and scalable secure-learning paradigm that mitigates the impact of evasion attacks, while only slightly worsening the detection rate in the absence of attack. We finally argue that our secure-learning approach can also be readily applied to other malware detection tasks.
CRNov 13, 2015
Novel Feature Extraction, Selection and Fusion for Effective Malware Family ClassificationMansour Ahmadi, Dmitry Ulyanov, Stanislav Semenov et al.
Modern malware is designed with mutation characteristics, namely polymorphism and metamorphism, which causes an enormous growth in the number of variants of malware samples. Categorization of malware samples on the basis of their behaviors is essential for the computer security community, because they receive huge number of malware everyday, and the signature extraction process is usually based on malicious parts characterizing malware families. Microsoft released a malware classification challenge in 2015 with a huge dataset of near 0.5 terabytes of data, containing more than 20K malware samples. The analysis of this dataset inspired the development of a novel paradigm that is effective in categorizing malware variants into their actual family groups. This paradigm is presented and discussed in the present paper, where emphasis has been given to the phases related to the extraction, and selection of a set of novel features for the effective representation of malware samples. Features can be grouped according to different characteristics of malware behavior, and their fusion is performed according to a per-class weighting paradigm. The proposed method achieved a very high accuracy ($\approx$ 0.998) on the Microsoft Malware Challenge dataset.