AIJul 31, 2024
The Llama 3 Herd of ModelsAaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri et al. · allen-ai, berkeley
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
ROApr 13, 2023
A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving ServicesDewant Katare, Diego Perino, Jari Nurmi et al.
Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.
LGJun 10, 2022
Hierarchical Federated Learning with PrivacyVarun Chandrasekaran, Suman Banerjee, Diego Perino et al.
Federated learning (FL), where data remains at the federated clients, and where only gradient updates are shared with a central aggregator, was assumed to be private. Recent work demonstrates that adversaries with gradient-level access can mount successful inference and reconstruction attacks. In such settings, differentially private (DP) learning is known to provide resilience. However, approaches used in the status quo (\ie central and local DP) introduce disparate utility vs. privacy trade-offs. In this work, we take the first step towards mitigating such trade-offs through {\em hierarchical FL (HFL)}. We demonstrate that by the introduction of a new intermediary level where calibrated DP noise can be added, better privacy vs. utility trade-offs can be obtained; we term this {\em hierarchical DP (HDP)}. Our experiments with 3 different datasets (commonly used as benchmarks for FL) suggest that HDP produces models as accurate as those obtained using central DP, where noise is added at a central aggregator. Such an approach also provides comparable benefit against inference adversaries as in the local DP case, where noise is added at the federated clients.
LGFeb 26, 2023
P4L: Privacy Preserving Peer-to-Peer Learning for Infrastructureless SetupsIoannis Arapakis, Panagiotis Papadopoulos, Kleomenis Katevas et al.
Distributed (or Federated) learning enables users to train machine learning models on their very own devices, while they share only the gradients of their models usually in a differentially private way (utility loss). Although such a strategy provides better privacy guarantees than the traditional centralized approach, it requires users to blindly trust a centralized infrastructure that may also become a bottleneck with the increasing number of users. In this paper, we design and implement P4L: a privacy preserving peer-to-peer learning system for users to participate in an asynchronous, collaborative learning scheme without requiring any sort of infrastructure or relying on differential privacy. Our design uses strong cryptographic primitives to preserve both the confidentiality and utility of the shared gradients, a set of peer-to-peer mechanisms for fault tolerance and user churn, proximity and cross device communications. Extensive simulations under different network settings and ML scenarios for three real-life datasets show that P4L provides competitive performance to baselines, while it is resilient to different poisoning attacks. We implement P4L and experimental results show that the performance overhead and power consumption is minimal (less than 3mAh of discharge).
LGJan 31, 2023
Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement LearningGabriele Castellano, Juan-José Nieto, Jordi Luque et al.
Many real-time applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving computation close to the data sources enables us to meet stringent latency and throughput requirements. However, the constrained nature of edge networks poses several additional challenges to the management of inference workloads: edge clusters can not provide unlimited processing power to DNN models, and often a trade-off between network and processing time should be considered when it comes to end-to-end delay requirements. In this paper, we focus on the problem of scheduling inference queries on DNN models in edge networks at short timescales (i.e., few milliseconds). By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP, highlighting the need for a dynamic scheduling policy that can adapt to network conditions and workloads. We therefore design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions. Our results show that ASET effectively provides the best performance compared to static policies when scheduling over a distributed pool of edge resources.
NIAug 13, 2025
Anomaly Detection for IoT Global ConnectivityJesus Omaña Iglesias, Carlos Segura Perales, Stefan Geißler et al.
Internet of Things (IoT) application providers rely on Mobile Network Operators (MNOs) and roaming infrastructures to deliver their services globally. In this complex ecosystem, where the end-to-end communication path traverses multiple entities, it has become increasingly challenging to guarantee communication availability and reliability. Further, most platform operators use a reactive approach to communication issues, responding to user complaints only after incidents have become severe, compromising service quality. This paper presents our experience in the design and deployment of ANCHOR -- an unsupervised anomaly detection solution for the IoT connectivity service of a large global roaming platform. ANCHOR assists engineers by filtering vast amounts of data to identify potential problematic clients (i.e., those with connectivity issues affecting several of their IoT devices), enabling proactive issue resolution before the service is critically impacted. We first describe the IoT service, infrastructure, and network visibility of the IoT connectivity provider we operate. Second, we describe the main challenges and operational requirements for designing an unsupervised anomaly detection solution on this platform. Following these guidelines, we propose different statistical rules, and machine- and deep-learning models for IoT verticals anomaly detection based on passive signaling traffic. We describe the steps we followed working with the operational teams on the design and evaluation of our solution on the operational platform, and report an evaluation on operational IoT customers.
CRJul 8, 2021
Serverless Computing: A Security PerspectiveEduard Marin, Diego Perino, Roberto Di Pietro
Serverless Computing is a virtualisation-related paradigm that promises to simplify application management and to solve the last challenges in the field: scale down and easy to use. The implied cost reduction, coupled with a simplified management of underlying applications, are expected to further push the adoption of virtualisation-based solutions, including cloud-computing or telco-cloud solutions. However, in this quest for efficiency, security is not ranked among the top priorities, also because of the (misleading) belief that current solutions developed for virtualised environments could be applied (as is) to this new paradigm. Unfortunately, this is not the case, due to the highlighted idiosyncratic features of serverless computing. In this paper, we review the current serverless architectures, abstract and categorise their founding principles, and provide an in depth analyse of them from the point of view of security, referring to principles and practices of the cybersecurity domain. In particular, we show the security shortcomings of the analysed serverless architectural paradigms, point to possible countermeasures, and highlight a few research directions.
MMMay 8, 2021
360NorVic: 360-Degree Video Classification from Mobile Encrypted Video TrafficChamara Kattadige, Aravindh Raman, Kanchana Thilakarathna et al.
Streaming 360° video demands high bandwidth and low latency, and poses significant challenges to Internet Service Providers (ISPs) and Mobile Network Operators (MNOs). The identification of 360° video traffic can therefore benefits fixed and mobile carriers to optimize their network and provide better Quality of Experience (QoE) to the user. However, end-to-end encryption of network traffic has obstructed identifying those 360° videos from regular videos. As a solution this paper presents 360NorVic, a near-realtime and offline Machine Learning (ML) classification engine to distinguish 360° videos from regular videos when streamed from mobile devices. We collect packet and flow level data for over 800 video traces from YouTube & Facebook accounting for 200 unique videos under varying streaming conditions. Our results show that for near-realtime and offline classification at packet level, average accuracy exceeds 95%, and that for flow level, 360NorVic achieves more than 92% average accuracy. Finally, we pilot our solution in the commercial network of a large MNO showing the feasibility and effectiveness of 360NorVic in production settings.
CRApr 29, 2021
PPFL: Privacy-preserving Federated Learning with Trusted Execution EnvironmentsFan Mo, Hamed Haddadi, Kleomenis Katevas et al.
We propose and implement a Privacy-preserving Federated Learning ($PPFL$) framework for mobile systems to limit privacy leakages in federated learning. Leveraging the widespread presence of Trusted Execution Environments (TEEs) in high-end and mobile devices, we utilize TEEs on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. Challenged by the limited memory size of current TEEs, we leverage greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of our implementation shows that $PPFL$ can significantly improve privacy while incurring small system overheads at the client-side. In particular, $PPFL$ can successfully defend the trained model against data reconstruction, property inference, and membership inference attacks. Furthermore, it can achieve comparable model utility with fewer communication rounds (0.54$\times$) and a similar amount of network traffic (1.002$\times$) compared to the standard federated learning of a complete model. This is achieved while only introducing up to ~15% CPU time, ~18% memory usage, and ~21% energy consumption overhead in $PPFL$'s client-side.
LGNov 18, 2020
FLaaS: Federated Learning as a ServiceNicolas Kourtellis, Kleomenis Katevas, Diego Perino
Federated Learning (FL) is emerging as a promising technology to build machine learning models in a decentralized, privacy-preserving fashion. Indeed, FL enables local training on user devices, avoiding user data to be transferred to centralized servers, and can be enhanced with differential privacy mechanisms. Although FL has been recently deployed in real systems, the possibility of collaborative modeling across different 3rd-party applications has not yet been explored. In this paper, we tackle this problem and present Federated Learning as a Service (FLaaS), a system enabling different scenarios of 3rd-party application collaborative model building and addressing the consequent challenges of permission and privacy management, usability, and hierarchical model training. FLaaS can be deployed in different operational environments. As a proof of concept, we implement it on a mobile phone setting and discuss practical implications of results on simulated and real devices with respect to on-device training CPU cost, memory footprint and power consumed per FL model round. Therefore, we demonstrate FLaaS's feasibility in building unique or joint FL models across applications for image object detection in a few hours, across 100 devices.
CRApr 28, 2020
A Retrospective Analysis of User Exposure to (Illicit) Cryptocurrency Mining on the WebRalph Holz, Diego Perino, Matteo Varvello et al.
In late 2017, a sudden proliferation of malicious JavaScript was reported on the Web: browser-based mining exploited the CPU time of website visitors to mine the cryptocurrency Monero. Several studies measured the deployment of such code and developed defenses. However, previous work did not establish how many users were really exposed to the identified mining sites and whether there was a real risk given common user browsing behavior. In this paper, we present a retroactive analysis to close this research gap. We pool large-scale, longitudinal data from several vantage points, gathered during the prime time of illicit cryptomining, to measure the impact on web users. We leverage data from passive traffic monitoring of university networks and a large European ISP, with suspected mining sites identified in previous active scans. We corroborate our results with data from a browser extension with a large user base that tracks site visits. We also monitor open HTTP proxies and the Tor network for malicious injection of code. We find that the risk for most Web users was always very low, much lower than what deployment scans suggested. Any exposure period was also very brief. However, we also identify a previously unknown and exploited attack vector on mobile devices.
CRAug 15, 2016
Are wearable devices ready for HTTPS? Measuring the cost of secure communication protocols on wearable devicesHarini Kolamunna, Jagmohan Chauhan, Yining Hu et al.
The majority of available wearable devices require communication with Internet servers for data analysis and storage, and rely on a paired smartphone to enable secure communication. However, wearable devices are mostly equipped with WiFi network interfaces, enabling direct communication with the Internet. Secure communication protocols should then run on these wearables itself, yet it is not clear if they can be efficiently supported. In this paper, we show that wearable devices are ready for direct and secure Internet communication by means of experiments with both controlled and Internet servers. We observe that the overall energy consumption and communication delay can be reduced with direct Internet connection via WiFi from wearables compared to using smartphones as relays via Bluetooth. We also show that the additional HTTPS cost caused by TLS handshake and encryption is closely related to number of parallel connections, and has the same relative impact on wearables and smartphones.