Alecio Binotto

DC
h-index13
4papers
54citations
Novelty41%
AI Score40

4 Papers

DCNov 30, 2025
Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

Aladin Djuhera, Fernando Koch, Alecio Binotto

Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments. We introduce a framework in which both the spatial placement and internal segmentation of foundation models are elevated to runtime-resolved constructs. The orchestration problem is formalized as a constrained optimization over layer-wise assignments, subject to evolving latency, utilization, and privacy gradients. The framework implements reactive inference composition responsive to infrastructural fluctuations by integrating model-aware capacity profiling with dynamic graph re-partitioning and reallocation. We introduce architectural and algorithmic components, along with a representative use case in 6G multi-access edge computing.

SPMay 15
Against the Monolithic Wireless World Model: Why NextG Needs Composable and Agentic Intelligence

Aladin Djuhera, Farhan Ahmed, Vlad C. Andrei et al.

AI-native 6G visions increasingly invoke wireless foundation models, large multimodal models, and wireless world models as the natural endpoint of AI-native networking, drawing an analogy to recent developments in large language models (LLMs). We argue that this analogy is structurally incomplete. The success of LLMs is based on a broad, reusable, and largely self-contained tokenized data substrate, whereas the wireless domain lacks an equivalent data foundation. Unlike text, code, or images, wireless data such as CSI tensors, IQ samples, or scheduler logs are not self-contained: their meaning is configuration-dependent, simulator-conditioned, task-disaggregated, and weakly grounded in operational feedback, all structural bottlenecks that undermine current pre- and post-training recipes. We therefore argue that monolithic models, including mixture-of-experts (MoE) and wireless world models, are not the most realistic near-term path toward deployable AI-native networks. Instead, emerging evidence points toward composable and agentic network architectures, where general reasoning models orchestrate specialized signal processing models, classical algorithms, digital twins, standards-aware retrieval, and safety checks through explicit programmable interfaces.

DCMar 19, 2025
Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge

Fernando Koch, Aladin Djuhera, Alecio Binotto

Large Foundation Models (LFMs), including multi-modal and generative models, promise to unlock new capabilities for next-generation Edge AI applications. However, performing inference with LFMs in resource-constrained and heterogeneous edge environments, such as Multi-access Edge Computing (MEC), presents significant challenges for workload orchestration due to time-varying network, compute, and storage conditions. In particular, current split inference strategies, which partition LFM layers across nodes, are not designed to adapt to fluctuating workloads, dynamic bandwidth conditions, or evolving privacy constraints in high-utilization MEC environments. In this work, we propose a novel adaptive split inference orchestration framework that elevates both the placement and partitioning of LFM layers to runtime-tunable variables. Specifically, our framework enables real-time, quality-of-service (QoS)-aware management of inference workloads by extending conventional orchestrators with three key services: (1) Capacity-aware workload distribution, which continuously profiles node resources and selects an optimal subset of MEC nodes; (2) Dynamic partition migration, which transparently relocates pre-cut LFM segments in response to changes in utilization or network conditions; (3) Real-time reconfiguration, which dynamically re-splits LFM layers to balance latency, throughput, and privacy. We formalize the joint placement-partitioning problem, outline a reference architecture and algorithmic workflow, and discuss applicability in representative smart city, V2X, and industrial edge scenarios.

DCJan 29, 2018
Using Meta-heuristics and Machine Learning for Software Optimization of Parallel Computing Systems: A Systematic Literature Review

Suejb Memeti, Sabri Pllana, Alecio Binotto et al.

While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for software optimization at compile-time and run-time. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of software optimization for parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.