66.5DCJun 2
Predicting Lakehouse Performance in Clouds: An Empirical Exploration of Query Runtime VarianceJames Nurdin, Wei Liu, Richard Mccreadie et al.
Data analytics increasingly runs on distributed lakehouse systems, where platform operators must optimise monetary, resource, and environmental costs. Query Performance Prediction (QPP) helps to balance these costs and supports workload management techniques, such as adaptive resource scaling and low-carbon scheduling. However, runtimes in lakehouses can vary substantially, and the impact of runtime variance on QPP accuracy and workload orchestration has not previously been systematically studied for lakehouse systems. This paper addresses this gap by investigating the runtime variance observed for distributed lakehouse analytical queries and its impact on QPP. First, we quantify the run-to-run variance using Kubernetes deployments across three public clouds and one private cloud, spanning multiple database scales and three analytical benchmarks. Our results demonstrate that repeated executions of the same query can vary in runtime by nearly twofold. Second, we conduct a factor analysis study assessing key sources of this runtime variance such as data locality, co-tenant load, and caching effects. Third, we examine how variance influences state-of-the-art QPP models, revealing that addressing key sources of variance can reduce prediction error up to 80%. Finally, we demonstrate the downstream implications for low-carbon scheduling as an example of a workload management technique that relies on performance prediction, showing that accounting for runtime variance can lead to a significant reduction in carbon costs.
ROOct 16, 2023Code
RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language ModelsZijun Long, George Killick, Richard McCreadie et al.
Robotic vision applications often necessitate a wide range of visual perception tasks, such as object detection, segmentation, and identification. While there have been substantial advances in these individual tasks, integrating specialized models into a unified vision pipeline presents significant engineering challenges and costs. Recently, Multimodal Large Language Models (MLLMs) have emerged as novel backbones for various downstream tasks. We argue that leveraging the pre-training capabilities of MLLMs enables the creation of a simplified framework, thus mitigating the need for task-specific encoders. Specifically, the large-scale pretrained knowledge in MLLMs allows for easier fine-tuning to downstream robotic vision tasks and yields superior performance. We introduce the RoboLLM framework, equipped with a BEiT-3 backbone, to address all visual perception tasks in the ARMBench challenge-a large-scale robotic manipulation dataset about real-world warehouse scenarios. RoboLLM not only outperforms existing baselines but also substantially reduces the engineering burden associated with model selection and tuning. The source code is publicly available at https://github.com/longkukuhi/armbench.
CVMar 31, 2023
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersZijun Long, Zaiqiao Meng, Gerardo Aragon Camarasa et al.
Vision Transformers (ViTs) have emerged as popular models in computer vision, demonstrating state-of-the-art performance across various tasks. This success typically follows a two-stage strategy involving pre-training on large-scale datasets using self-supervised signals, such as masked random patches, followed by fine-tuning on task-specific labeled datasets with cross-entropy loss. However, this reliance on cross-entropy loss has been identified as a limiting factor in ViTs, affecting their generalization and transferability to downstream tasks. Addressing this critical challenge, we introduce a novel Label-aware Contrastive Training framework, LaCViT, which significantly enhances the quality of embeddings in ViTs. LaCViT not only addresses the limitations of cross-entropy loss but also facilitates more effective transfer learning across diverse image classification tasks. Our comprehensive experiments on eight standard image classification datasets reveal that LaCViT statistically significantly enhances the performance of three evaluated ViTs by up-to 10.78% under Top-1 Accuracy.
CVNov 25, 2023
Elucidating and Overcoming the Challenges of Label Noise in Supervised Contrastive LearningZijun Long, George Killick, Lipeng Zhuang et al.
Image classification datasets exhibit a non-negligible fraction of mislabeled examples, often due to human error when one class superficially resembles another. This issue poses challenges in supervised contrastive learning (SCL), where the goal is to cluster together data points of the same class in the embedding space while distancing those of disparate classes. While such methods outperform those based on cross-entropy, they are not immune to labeling errors. However, while the detrimental effects of noisy labels in supervised learning are well-researched, their influence on SCL remains largely unexplored. Hence, we analyse the effect of label errors and examine how they disrupt the SCL algorithm's ability to distinguish between positive and negative sample pairs. Our analysis reveals that human labeling errors manifest as easy positive samples in around 99% of cases. We, therefore, propose D-SCL, a novel Debiased Supervised Contrastive Learning objective designed to mitigate the bias introduced by labeling errors. We demonstrate that D-SCL consistently outperforms state-of-the-art techniques for representation learning across diverse vision benchmarks, offering improved robustness to label errors.
CVAug 28, 2023
When hard negative sampling meets supervised contrastive learningZijun Long, George Killick, Richard McCreadie et al.
State-of-the-art image models predominantly follow a two-stage strategy: pre-training on large datasets and fine-tuning with cross-entropy loss. Many studies have shown that using cross-entropy can result in sub-optimal generalisation and stability. While the supervised contrastive loss addresses some limitations of cross-entropy loss by focusing on intra-class similarities and inter-class differences, it neglects the importance of hard negative mining. We propose that models will benefit from performance improvement by weighting negative samples based on their dissimilarity to positive counterparts. In this paper, we introduce a new supervised contrastive learning objective, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. Without requiring specialized architectures, additional data, or extra computational resources, experimental results indicate that SCHaNe outperforms the strong baseline BEiT-3 in Top-1 accuracy across various benchmarks, with significant gains of up to $3.32\%$ in few-shot learning settings and $3.41\%$ in full dataset fine-tuning. Importantly, our proposed objective sets a new state-of-the-art for base models on ImageNet-1k, achieving an 86.14\% accuracy. Furthermore, we demonstrate that the proposed objective yields better embeddings and explains the improved effectiveness observed in our experiments.
CLFeb 11, 2023
Divergence-Based Domain Transferability for Zero-Shot ClassificationAlexander Pugantsov, Richard McCreadie
Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks, meanwhile further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task. However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive. Hence, the question arises, are we able to improve the effectiveness and efficiency of tasks with no training examples through selective fine-tuning? In this paper, we explore statistical measures that approximate the divergence between domain representations as a means to estimate whether tuning using one task pair will exhibit performance benefits over tuning another. This estimation can then be used to reduce the number of task pairs that need to be tested by eliminating pairs that are unlikely to provide benefits. Through experimentation over 58 tasks and over 6,600 task pair combinations, we demonstrate that statistical measures can distinguish effective task pairs, and the resulting estimates can reduce end-to-end runtime by up to 40%.
IRFeb 23, 2024Code
CFIR: Fast and Effective Long-Text To Image Retrieval for Large CorporaZijun Long, Xuri Ge, Richard Mccreadie et al.
Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases. Although Multimodal Large Language Models (MLLMs) demonstrate state-of-the-art performance, they exhibit limitations in handling large-scale, diverse, and ambiguous real-world needs of retrieval, due to the computation cost and the injective embeddings they produce. This paper presents a two-stage Coarse-to-Fine Index-shared Retrieval (CFIR) framework, designed for fast and effective large-scale long-text to image retrieval. The first stage, Entity-based Ranking (ER), adapts to long-text query ambiguity by employing a multiple-queries-to-multiple-targets paradigm, facilitating candidate filtering for the next stage. The second stage, Summary-based Re-ranking (SR), refines these rankings using summarized queries. We also propose a specialized Decoupling-BEiT-3 encoder, optimized for handling ambiguous user needs and both stages, which also enhances computational efficiency through vector-based similarity inference. Evaluation on the AToMiC dataset reveals that CFIR surpasses existing MLLMs by up to 11.06% in Recall@1000, while reducing training and retrieval times by 68.75% and 99.79%, respectively. We will release our code to facilitate future research at https://github.com/longkukuhi/CFIR.
STMar 27, 2024Code
Stock Recommendations for Individual Investors: A Temporal Graph Network Approach with Mean-Variance Efficient SamplingYoungbin Lee, Yejin Kim, Javier Sanz-Cruzado et al.
Recommender systems can be helpful for individuals to make well-informed decisions in complex financial markets. While many studies have focused on predicting stock prices, even advanced models fall short of accurately forecasting them. Additionally, previous studies indicate that individual investors often disregard established investment theories, favoring their personal preferences instead. This presents a challenge for stock recommendation systems, which must not only provide strong investment performance but also respect these individual preferences. To create effective stock recommender systems, three critical elements must be incorporated: 1) individual preferences, 2) portfolio diversification, and 3) the temporal dynamics of the first two. In response, we propose a new model, Portfolio Temporal Graph Network Recommender PfoTGNRec, which can handle time-varying collaborative signals and incorporates diversification-enhancing sampling. On real-world individual trading data, our approach demonstrates superior performance compared to state-of-the-art baselines, including cutting-edge dynamic embedding models and existing stock recommendation models. Indeed, we show that PfoTGNRec is an effective solution that can balance customer preferences with the need to suggest portfolios with high Return-on-Investment. The source code and data are available at https://github.com/youngandbin/PfoTGNRec.
26.5IRMar 23
ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image RetrievalZhuocheng Zhang, Xingwu Zhang, Kangheng Liang et al.
Recent advances in interactive text-to-image retrieval (I-TIR) use diffusion models to bridge the modality gap between the textual information need and the images to be searched, resulting in increased effectiveness. However, existing frameworks fuse multi-modal views of user feedback by simple embedding addition. In this work, we show that this static and undifferentiated fusion indiscriminately incorporates generative noise produced by the diffusion model, leading to performance degradation for up to 55.62% samples. We further propose ADaFuSE (Adaptive Diffusion-Text Fusion with Semantic-aware Experts), a lightweight fusion model designed to align and calibrate multi-modal views for diffusion-augmented I-TIR, which can be plugged into existing frameworks without modifying the backbone encoder. Specifically, we introduce a dual-branch fusion mechanism that employs an adaptive gating branch to dynamically balance modality reliability, alongside a semantic-aware mixture-of-experts branch to capture fine-grained cross-modal nuances. Via thorough evaluation over four standard I-TIR benchmarks, ADaFuSE achieves state-of-the-art performance, surpassing DAR by up to 3.49% in Hits@10 with only a 5.29% parameter increase, while exhibiting stronger robustness to noisy and longer interactive queries. These results show that generative augmentation coupled with principled fusion provides a simple, generalizable alternative to fine-tuning for interactive retrieval.
CVJan 5, 2024
CrisisViT: A Robust Vision Transformer for Crisis Image ClassificationZijun Long, Richard McCreadie, Muhammad Imran
In times of emergency, crisis response agencies need to quickly and accurately assess the situation on the ground in order to deploy relevant services and resources. However, authorities often have to make decisions based on limited information, as data on affected regions can be scarce until local response services can provide first-hand reports. Fortunately, the widespread availability of smartphones with high-quality cameras has made citizen journalism through social media a valuable source of information for crisis responders. However, analyzing the large volume of images posted by citizens requires more time and effort than is typically available. To address this issue, this paper proposes the use of state-of-the-art deep neural models for automatic image classification/tagging, specifically by adapting transformer-based architectures for crisis image classification (CrisisViT). We leverage the new Incidents1M crisis image dataset to develop a range of new transformer-based image classification models. Through experimentation over the standard Crisis image benchmark dataset, we demonstrate that the CrisisViT models significantly outperform previous approaches in emergency type, image relevance, humanitarian category, and damage severity classification. Additionally, we show that the new Incidents1M dataset can further augment the CrisisViT models resulting in an additional 1.25% absolute accuracy gain.
AIApr 8, 2025
Are Generative AI Agents Effective Personalized Financial Advisors?Takehiro Takayanagi, Kiyoshi Izumi, Javier Sanz-Cruzado et al.
Large language model-based agents are becoming increasingly popular as a low-cost mechanism to provide personalized, conversational advice, and have demonstrated impressive capabilities in relatively simple scenarios, such as movie recommendations. But how do these agents perform in complex high-stakes domains, where domain expertise is essential and mistakes carry substantial risk? This paper investigates the effectiveness of LLM-advisors in the finance domain, focusing on three distinct challenges: (1) eliciting user preferences when users themselves may be unsure of their needs, (2) providing personalized guidance for diverse investment preferences, and (3) leveraging advisor personality to build relationships and foster trust. Via a lab-based user study with 64 participants, we show that LLM-advisors often match human advisor performance when eliciting preferences, although they can struggle to resolve conflicting user needs. When providing personalized advice, the LLM was able to positively influence user behavior, but demonstrated clear failure modes. Our results show that accurate preference elicitation is key, otherwise, the LLM-advisor has little impact, or can even direct the investor toward unsuitable assets. More worryingly, users appear insensitive to the quality of advice being given, or worse these can have an inverse relationship. Indeed, users reported a preference for and increased satisfaction as well as emotional trust with LLMs adopting an extroverted persona, even though those agents provided worse advice.
CVFeb 22, 2024
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning FusionZijun Long, George Killick, Lipeng Zhuang et al.
State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks, achieving gains of up to 3.52% in few-shot learning scenarios and 3.41% in transfer learning settings with the BEiT-3 model. Importantly, our proposed CLCE approach effectively mitigates the dependency of contrastive learning on large batch sizes such as 4096 samples per batch, a limitation that has previously constrained the application of contrastive learning in budget-limited hardware environments.
IRJan 26, 2025
Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image RetrievalZijun Long, Kangheng Liang, Gerardo Aragon-Camarasa et al.
Interactive Text-to-image retrieval (I-TIR) is an important enabler for a wide range of state-of-the-art services in domains such as e-commerce and education. However, current methods rely on finetuned Multimodal Large Language Models (MLLMs), which are costly to train and update, and exhibit poor generalizability. This latter issue is of particular concern, as: 1) finetuning narrows the pretrained distribution of MLLMs, thereby reducing generalizability; and 2) I-TIR introduces increasing query diversity and complexity. As a result, I-TIR solutions are highly likely to encounter queries and images not well represented in any training dataset. To address this, we propose leveraging Diffusion Models (DMs) for text-to-image mapping, to avoid finetuning MLLMs while preserving robust performance on complex queries. Specifically, we introduce Diffusion Augmented Retrieval (DAR), a framework that generates multiple intermediate representations via LLM-based dialogue refinements and DMs, producing a richer depiction of the user's information needs. This augmented representation facilitates more accurate identification of semantically and visually related images. Extensive experiments on four benchmarks show that for simple queries, DAR achieves results on par with finetuned I-TIR models, yet without incurring their tuning overhead. Moreover, as queries become more complex through additional conversational turns, DAR surpasses finetuned I-TIR models by up to 7.61% in Hits@10 after ten turns, illustrating its improved generalization for more intricate queries.
CVMar 10, 2024
Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive LearningZijun Long, Lipeng Zhuang, George Killick et al.
Human-annotated vision datasets inevitably contain a fraction of human mislabelled examples. While the detrimental effects of such mislabelling on supervised learning are well-researched, their influence on Supervised Contrastive Learning (SCL) remains largely unexplored. In this paper, we show that human-labelling errors not only differ significantly from synthetic label errors, but also pose unique challenges in SCL, different to those in traditional supervised learning methods. Specifically, our results indicate they adversely impact the learning process in the ~99% of cases when they occur as false positive samples. Existing noise-mitigating methods primarily focus on synthetic label errors and tackle the unrealistic setting of very high synthetic noise rates (40-80%), but they often underperform on common image datasets due to overfitting. To address this issue, we introduce a novel SCL objective with robustness to human-labelling errors, SCL-RHE. SCL-RHE is designed to mitigate the effects of real-world mislabelled examples, typically characterized by much lower noise rates (<5%). We demonstrate that SCL-RHE consistently outperforms state-of-the-art representation learning and noise-mitigating methods across various vision benchmarks, by offering improved resilience against human-labelling errors.
CVSep 4, 2023
MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrievalZijun Long, George Killick, Richard McCreadie et al.
As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized tasks becomes increasingly challenging due to high computational and memory demands. Indeed, traditional fine-tuning methods are costly, due to the need for extensive, task-specific training. While efficient adaptation methods exist that aim to reduce these costs, in practice they suffer from shallow inter-modal alignment, which severely hurts model effectiveness. To tackle these computational challenges and improve inter-modal alignment, we introduce the MultiWay-Adapter (MWA), a novel framework featuring an 'Alignment Enhancer'. This enhancer deepens inter-modal alignment, enabling high transferability with minimal tuning effort. Our experiments show that unlike prior efficient tuning approaches, MWA maintains model effectiveness, while reducing training time by up-to 57%. MWA is also lightweight, increasing model size by only 2-3% (in terms of parameters) for state-of-the-art foundation models like BEiT-3 Large. These results demonstrate that MWA provides an efficient and effective adaptation method for MLLMs, significantly broadening their applicability.
LGFeb 2, 2022
Identifying Suitable Tasks for Inductive Transfer Through the Analysis of Feature AttributionsAlexander Pugantsov, Richard McCreadie
Transfer learning approaches have shown to significantly improve performance on downstream tasks. However, it is common for prior works to only report where transfer learning was beneficial, ignoring the significant trial-and-error required to find effective settings for transfer. Indeed, not all task combinations lead to performance benefits, and brute-force searching rapidly becomes computationally infeasible. Hence the question arises, can we predict whether transfer between two tasks will be beneficial without actually performing the experiment? In this paper, we leverage explainability techniques to effectively predict whether task pairs will be complementary, through comparison of neural network activation between single-task models. In this way, we can avoid grid-searches over all task and hyperparameter combinations, dramatically reducing the time needed to find effective task pairs. Our results show that, through this approach, it is possible to reduce training time by up to 83.5% at a cost of only 0.034 reduction in positive-class F1 on the TREC-IS 2020-A dataset.
IRJul 26, 2020
Exploring Data Splitting Strategies for the Evaluation of Recommendation ModelsZaiqiao Meng, Richard McCreadie, Craig Macdonald et al.
Effective methodologies for evaluating recommender systems are critical, so that such systems can be compared in a sound manner. A commonly overlooked aspect of recommender system evaluation is the selection of the data splitting strategy. In this paper, we both show that there is no standard splitting strategy and that the selection of splitting strategy can have a strong impact on the ranking of recommender systems. In particular, we perform experiments comparing three common splitting strategies, examining their impact over seven state-of-the-art recommendation models for two datasets. Our results demonstrate that the splitting strategy employed is an important confounding variable that can markedly alter the ranking of state-of-the-art systems, making much of the currently published literature non-comparable, even when the same dataset and metrics are used.
IRSep 17, 2019
Variational Bayesian Context-aware Representation for Grocery RecommendationZaiqiao Meng, Richard McCreadie, Craig Macdonald et al.
Grocery recommendation is an important recommendation use-case, which aims to predict which items a user might choose to buy in the future, based on their shopping history. However, existing methods only represent each user and item by single deterministic points in a low-dimensional continuous space. In addition, most of these methods are trained by maximizing the co-occurrence likelihood with a simple Skip-gram-based formulation, which limits the expressive ability of their embeddings and the resulting recommendation performance. In this paper, we propose the Variational Bayesian Context-Aware Representation (VBCAR) model for grocery recommendation, which is a novel variational Bayesian model that learns the user and item latent vectors by leveraging basket context information from past user-item interactions. We train our VBCAR model based on the Bayesian Skip-gram framework coupled with the amortized variational inference so that it can learn more expressive latent representations that integrate both the non-linearity and Bayesian behaviour. Experiments conducted on a large real-world grocery recommendation dataset show that our proposed VBCAR model can significantly outperform existing state-of-the-art grocery recommendation methods.
IROct 13, 2016
Emergency Identification and Analysis with EAIMSRichard McCreadie, Craig Macdonald, Iadh Ounis
Social media platforms are now a key source of information for a large segment of the public. As such, these platforms have a great potential as a means to provide real-time information to emergency management agencies. Moreover, during an emergency, these agencies are very interested in social media as a means to find public-driven response efforts, as well as to track how their handling of that emergency is being perceived. However, there is currently a lack advanced tools designed for monitoring social media during emergencies. The Emergency Analysis Identification and Management System (EAIMS) is a prototype service that aims to fill this technology gap by providing richer analytic and exploration tools than current solutions. In particular, EAIMS provides real-time detection of emergency events, related information finding, information access and credibility analysis tools for use over social media during emergencies.