CVMay 18
Historical Knowledge Graphs for Global Maritime Estimated Time of ArrivalNeofytos Dimitriou
Accurate vessel estimated-time-of-arrival forecasts are critical for port operations and decarbonization, yet global-scale travel-time prediction remains difficult without costly contextual data. Herein, I present a methodology for constructing a historical maritime knowledge graph using only Automatic Identification System (AIS) data. First, segmented trajectories are extracted from noisy AIS data using a Gaussian-mixture-model-based preprocessing pipeline. The graph is then constructed by iteratively processing the trajectories and storing speed distributions stratified by vessel type, time of travel, and direction of travel; the resulting global graph comprises 5,433 geohash-3 nodes and 12,334 edges. The graph can be queried to retrieve travel-time predictions between any two location via a hierarchical, priority-based system that uses historical statistics with principled fallback. On a temporally held-out test set, median RMSE is 22.75 min (segment-level) and 30.90 min (trajectory-level), with 69.1% of trajectories within 20% of actual arrival time. On a second external test set, median RMSE is 27.36 min (segment-level) and 37.46 min (trajectory-level), with 62.1% of trajectories within 20%. These results corroborate the promise of our method, enabling global travel-time prediction and providing a strong foundation for just-in-time arrival planning and emissions reduction.
LGMay 17, 2025
Unsupervised Port Berth Identification from Automatic Identification System DataAndreas Hadjipieris, Neofytos Dimitriou, Ognjen Arandjelović
Port berthing sites are regions of high interest for monitoring and optimizing port operations. Data sourced from the Automatic Identification System (AIS) can be superimposed on berths enabling their real-time monitoring and revealing long-term utilization patterns. Ultimately, insights from multiple berths can uncover bottlenecks, and lead to the optimization of the underlying supply chain of the port and beyond. However, publicly available documentation of port berths, even when available, is frequently incomplete - e.g. there may be missing berths or inaccuracies such as incorrect boundary boxes - necessitating a more robust, data-driven approach to port berth localization. In this context, we propose an unsupervised spatial modeling method that leverages AIS data clustering and hyperparameter optimization to identify berthing sites. Trained on one month of freely available AIS data and evaluated across ports of varying sizes, our models significantly outperform competing methods, achieving a mean Bhattacharyya distance of 0.85 when comparing Gaussian Mixture Models (GMMs) trained on separate data splits, compared to 13.56 for the best existing method. Qualitative comparison with satellite images and existing berth labels further supports the superiority of our method, revealing more precise berth boundaries and improved spatial resolution across diverse port environments.
CVDec 12, 2021
Magnifying Networks for Images with Billions of PixelsNeofytos Dimitriou, Ognjen Arandjelovic
The shift towards end-to-end deep learning has brought unprecedented advances in many areas of computer vision. However, deep neural networks are trained on images with resolutions that rarely exceed $1,000 \times 1,000$ pixels. The growing use of scanners that create images with extremely high resolutions (average can be $100,000 \times 100,000$ pixels) thereby presents novel challenges to the field. Most of the published methods preprocess high-resolution images into a set of smaller patches, imposing an a priori belief on the best properties of the extracted patches (magnification, field of view, location, etc.). Herein, we introduce Magnifying Networks (MagNets) as an alternative deep learning solution for gigapixel image analysis that does not rely on a preprocessing stage nor requires the processing of billions of pixels. MagNets can learn to dynamically retrieve any part of a gigapixel image, at any magnification level and field of view, in an end-to-end fashion with minimal ground truth (a single global, slide-level label). Our results on the publicly available Camelyon16 and Camelyon17 datasets corroborate to the effectiveness and efficiency of MagNets and the proposed optimization framework for whole slide image classification. Importantly, MagNets process far less patches from each slide than any of the existing approaches ($10$ to $300$ times less).
CVJul 16, 2020
A New Look at Ghost NormalizationNeofytos Dimitriou, Ognjen Arandjelovic
Batch normalization (BatchNorm) is an effective yet poorly understood technique for neural network optimization. It is often assumed that the degradation in BatchNorm performance to smaller batch sizes stems from it having to estimate layer statistics using smaller sample sizes. However, recently, Ghost normalization (GhostNorm), a variant of BatchNorm that explicitly uses smaller sample sizes for normalization, has been shown to improve upon BatchNorm in some datasets. Our contributions are: (i) we uncover a source of regularization that is unique to GhostNorm, and not simply an extension from BatchNorm, (ii) three types of GhostNorm implementations are described, two of which employ BatchNorm as the underlying normalization technique, (iii) by visualising the loss landscape of GhostNorm, we observe that GhostNorm consistently decreases the smoothness when compared to BatchNorm, (iv) we introduce Sequential Normalization (SeqNorm), and report superior performance over state-of-the-art methodologies on both CIFAR--10 and CIFAR--100 datasets.
CVOct 18, 2019
Deep Learning for Whole Slide Image Analysis: An OverviewNeofytos Dimitriou, Ognjen Arandjelović, Peter D Caie
The widespread adoption of whole slide imaging has increased the demand for effective and efficient gigapixel image analysis. Deep learning is at the forefront of computer vision, showcasing significant improvements over previous methodologies on visual understanding. However, whole slide images have billions of pixels and suffer from high morphological heterogeneity as well as from different types of artefacts. Collectively, these impede the conventional use of deep learning. For the clinical translation of deep learning solutions to become a reality, these challenges need to be addressed. In this paper, we review work on the interdisciplinary attempt of training deep neural networks using whole slide images, and highlight the different ideas underlying these methodologies.
IVFeb 10, 2019
Colorectal Cancer Outcome Prediction from H&E Whole Slide Images using Machine Learning and Automatically Inferred Phenotype ProfilesXingzhi Yue, Neofytos Dimitriou, Ognjen Arandjelovic
Digital pathology (DP) is a new research area which falls under the broad umbrella of health informatics. Owing to its potential for major public health impact, in recent years DP has been attracting much research attention. Nevertheless, a wide breadth of significant conceptual and technical challenges remain, few of them greater than those encountered in the field of oncology. The automatic analysis of digital pathology slides of cancerous tissues is particularly problematic due to the inherent heterogeneity of the disease, extremely large images, amongst numerous others. In this paper we introduce a novel machine learning based framework for the prediction of colorectal cancer outcome from whole digitized haematoxylin & eosin (H&E) stained histopathology slides. Using a real-world data set we demonstrate the effectiveness of the method and present a detailed analysis of its different elements which corroborate its ability to extract and learn salient, discriminative, and clinically meaningful content.