LGDec 27, 2022
Anomaly detection in laser-guided vehicles' batteries: a case studyGianfranco Lombardo, Stefano Cagnoni, Stefano Cavalli et al.
Detecting anomalous data within time series is a very relevant task in pattern recognition and machine learning, with many possible applications that range from disease prevention in medicine, e.g., detecting early alterations of the health status before it can clearly be defined as "illness" up to monitoring industrial plants. Regarding this latter application, detecting anomalies in an industrial plant's status firstly prevents serious damages that would require a long interruption of the production process. Secondly, it permits optimal scheduling of maintenance interventions by limiting them to urgent situations. At the same time, they typically follow a fixed prudential schedule according to which components are substituted well before the end of their expected lifetime. This paper describes a case study regarding the monitoring of the status of Laser-guided Vehicles (LGVs) batteries, on which we worked as our contribution to project SUPER (Supercomputing Unified Platform, Emilia Romagna) aimed at establishing and demonstrating a regional High-Performance Computing platform that is going to represent the main Italian supercomputing environment for both computing power and data volume.
5.6LGMay 9
A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging PhasesGianfranco Lombardo, Giuseppe Trimigno, Stefano Cagnoni
We investigate the geometry of predictive information across the layers of large language models (LLMs). We repurpose representation lenses-learned affine maps trained to predict the next token from intermediate residual streams-as geometric diagnostic tools. Rather than asking what the model predicts at each layer, we ask where predictive information resides and how it evolves across depth. We define at each layer a predictive readout subspace as the dominant k-dimensional singular subspace of such a map on the d-dimensional residual stream (where k is a resolution parameter), and track its trajectory on the Grassmann manifold as a similarity profile across layers. The profile is well described by unimodal distributions exhibiting a rise, near-plateau, and descent; varying k from 1% to 50% of d traces a Pareto frontier between visibility and energy retention, yet the same structure emerges at all scales. Across eight models from two families (Qwen2.5 and OLMo2, 1B-32B), we identify three geometric phases. Updates are approximately orthogonal to the residual stream throughout; what distinguishes the phases is their effect on the effective rank, which expands, stabilizes, and concentrates. In the first, Seeding Multiplexing, feed-forward memories and attention layers seed a candidate set in superposition in family-specific proportions, with the final token rising as leading candidate from 20% to 35% of positions across this phase. In the second, Hoisting Overriding, updates override existing subspaces to concentrate the candidate distribution without expanding the rank. In the third, Focal Convergence, high-energy low-rank updates write the winner into a form aligned with the unembedding direction. Phases 1 and 3 grow slowly with model depth, while Phase 2 expands linearly. The additional capacity of deeper LLMs is largely absorbed by candidate disambiguation.