Pádraig Cunningham

LG
15papers
1,062citations
Novelty37%
AI Score28

15 Papers

LGAug 23, 2024
A Comparison of Deep Learning and Established Methods for Calf Behaviour Monitoring

Oshana Dissanayake, Lucile Riaboff, Sarah E. McPherson et al.

In recent years, there has been considerable progress in research on human activity recognition using data from wearable sensors. This technology also has potential in the context of animal welfare in livestock science. In this paper, we report on research on animal activity recognition in support of welfare monitoring. The data comes from collar-mounted accelerometer sensors worn by Holstein and Jersey calves, the objective being to detect changes in behaviour indicating sickness or stress. A key requirement in detecting changes in behaviour is to be able to classify activities into classes, such as drinking, running or walking. In Machine Learning terms, this is a time-series classification task, and in recent years, the Rocket family of methods have emerged as the state-of-the-art in this area. We have over 27 hours of labelled time-series data from 30 calves for our analysis. Using this data as a baseline, we present Rocket's performance on a 6-class classification task. Then, we compare this against the performance of 11 Deep Learning (DL) methods that have been proposed as promising methods for time-series classification. Given the success of DL in related areas, it is reasonable to expect that these methods will perform well here as well. Surprisingly, despite taking care to ensure that the DL methods are configured correctly, none of them match Rocket's performance. A possible explanation for the impressive success of Rocket is that it has the data encoding benefits of DL models in a much simpler classification framework.

SPJun 25, 2024
Development of a digital tool for monitoring the behaviour of pre-weaned calves using accelerometer neck-collars

Oshana Dissanayake, Sarah E. Mcpherson, Joseph Allyndrée et al.

Automatic monitoring of calf behaviour is a promising way of assessing animal welfare from their first week on farms. This study aims to (i) develop machine learning models from accelerometer data to classify the main behaviours of pre-weaned calves and (ii) set up a digital tool for monitoring the behaviour of pre-weaned calves from the models' prediction. Thirty pre-weaned calves were equipped with a 3-D accelerometer attached to a neck-collar for two months and filmed simultaneously. The behaviours were annotated, resulting in 27.4 hours of observation aligned with the accelerometer data. The time-series were then split into 3 seconds windows. Two machine learning models were tuned using data from 80% of the calves: (i) a Random Forest model to classify between active and inactive behaviours using a set of 11 hand-craft features [model 1] and (ii) a RidgeClassifierCV model to classify between lying, running, drinking milk and other behaviours using ROCKET features [model 2]. The performance of the models was tested using data from the remaining 20% of the calves. Model 1 achieved a balanced accuracy of 0.92. Model 2 achieved a balanced accuracy of 0.84. Behavioural metrics such as daily activity ratio and episodes of running, lying, drinking milk, and other behaviours expressed over time were deduced from the predictions. All the development was finally embedded into a Python dashboard so that the individual calf metrics could be displayed directly from the raw accelerometer files.

LGJul 19, 2021
Introducing a Family of Synthetic Datasets for Research on Bias in Machine Learning

William Blanzeisky, Pádraig Cunningham, Kenneth Kennedy

A significant impediment to progress in research on bias in machine learning (ML) is the availability of relevant datasets. This situation is unlikely to change much given the sensitivity of such data. For this reason, there is a role for synthetic data in this research. In this short paper, we present one such family of synthetic data sets. We provide an overview of the data, describe how the level of bias can be varied, and present a simple example of an experiment on the data.

CLJun 29, 2021
Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

Linyi Yang, Jiazheng Li, Pádraig Cunningham et al.

While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that may exist in their training and test data. Such issues come to be manifest in performance problems when faced with out-of-distribution data in the field. One recent solution has been to use counterfactually augmented datasets in order to reduce any reliance on spurious patterns that may exist in the original data. Producing high-quality augmented data can be costly and time-consuming as it usually needs to involve human feedback and crowdsourcing efforts. In this work, we propose an alternative by describing and evaluating an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance when compared to models training on the original data and even when compared to models trained with the benefit of human-generated augmented data.

LGMay 31, 2021
Using Pareto Simulated Annealing to Address Algorithmic Bias in Machine Learning

William Blanzeisky, Pádraig Cunningham

Algorithmic Bias can be due to bias in the training data or issues with the algorithm itself. These algorithmic issues typically relate to problems with model capacity and regularisation. This underestimation bias may arise because the model has been optimised for good generalisation accuracy without any explicit consideration of bias or fairness. In a sense, we should not be surprised that a model might be biased when it hasn't been "asked" not to be. In this paper, we consider including bias (underestimation) as an additional criterion in model training. We present a multi-objective optimisation strategy using Pareto Simulated Annealing that optimise for both balanced accuracy and underestimation. We demonstrate the effectiveness of this strategy on one synthetic and two real-world datasets.

LGApr 28, 2021
Algorithmic Factors Influencing Bias in Machine Learning

William Blanzeisky, Pádraig Cunningham

It is fair to say that many of the prominent examples of bias in Machine Learning (ML) arise from bias that is there in the training data. In fact, some would argue that supervised ML algorithms cannot be biased, they reflect the data on which they are trained. In this paper we demonstrate how ML algorithms can misrepresent the training data through underestimation. We show how irreducible error, regularization and feature and class imbalance can contribute to this underestimation. The paper concludes with a demonstration of how the careful management of synthetic counterfactuals can ameliorate the impact of this underestimation bias.

LGOct 11, 2020
A Case-Study on the Impact of Dynamic Time Warping in Time Series Regression

Vivek Mahato, Pádraig Cunningham

It is well understood that Dynamic Time Warping (DTW) is effective in revealing similarities between time series that do not align perfectly. In this paper, we illustrate this on spectroscopy time-series data. We show that DTW is effective in improving accuracy on a regression task when only a single wavelength is considered. When combined with k-Nearest Neighbour, DTW has the added advantage that it can reveal similarities and differences between samples at the level of the time-series. However, in the problem, we consider here data is available across a spectrum of wavelengths. If aggregate statistics (means, variances) are used across many wavelengths the benefits of DTW are no longer apparent. We present this as another example of a situation where big data trumps sophisticated models in Machine Learning.

CYJan 12, 2016
Indicators of Good Student Performance in Moodle Activity Data

Ewa Młynarska, Derek Greene, Pádraig Cunningham

In this paper we conduct an analysis of Moodle activity data focused on identifying early predictors of good student performance. The analysis shows that three relevant hypotheses are largely supported by the data. These hypotheses are: early submission is a good sign, a high level of activity is predictive of good results and evening activity is even better than daytime activity. We highlight some pathological examples where high levels of activity correlates with bad results.

IRNov 3, 2014
TextLuas: Tracking and Visualizing Document and Term Clusters in Dynamic Text Data

Derek Greene, Daniel Archambault, Václav Belák et al.

For large volumes of text data collected over time, a key knowledge discovery task is identifying and tracking clusters. These clusters may correspond to emerging themes, popular topics, or breaking news stories in a corpus. Therefore, recently there has been increased interest in the problem of clustering dynamic data. However, there exists little support for the interactive exploration of the output of these analysis techniques, particularly in cases where researchers wish to simultaneously explore both the change in cluster structure over time and the change in the textual content associated with clusters. In this paper, we propose a model for tracking dynamic clusters characterized by the evolutionary events of each cluster. Motivated by this model, the TextLuas system provides an implementation for tracking these dynamic clusters and visualizing their evolution using a metro map metaphor. To provide overviews of cluster content, we adapt the tag cloud representation to the dynamic clustering scenario. We demonstrate the TextLuas system on two different text corpora, where they are shown to elucidate the evolution of key themes. We also describe how TextLuas was applied to a problem in bibliographic network research.

SIJul 29, 2014
A Latent Space Analysis of Editor Lifecycles in Wikipedia

Xiangju Qin, Derek Greene, Pádraig Cunningham

Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member turnover. In this paper we borrow ideas from topic analysis to editor activity on Wikipedia over time into a latent space that offers an insight into the evolving patterns of editor behavior. This latent space representation reveals a number of different categories of editor (e.g. content experts, social networkers) and we show that it does provide a signal that predicts an editor's departure from the community. We also show that long term editors gradually diversify their participation by shifting edit preference from one or two namespaces to multiple namespaces and experience relatively soft evolution in their editor profiles, while short term editors generally distribute their contribution randomly among the namespaces and experience considerably fluctuated evolution in their editor profiles.

LGApr 16, 2014
How Many Topics? Stability Analysis for Topic Models

Derek Greene, Derek O'Callaghan, Pádraig Cunningham

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus. Choosing too few topics will produce results that are overly broad, while choosing too many will result in the "over-clustering" of a corpus into many small, highly-similar topics. In this paper, we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data. Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can successfully guide the model selection process.

IRMar 12, 2014
Adaptive Representations for Tracking Breaking News on Twitter

Igor Brigadir, Derek Greene, Pádraig Cunningham

Twitter is often the most up-to-date source for finding and tracking breaking news stories. Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. This is a non-trivial text analytics task as tweets are short, and standard retrieval methods often fail as stories evolve over time. In this paper we examine the effectiveness of adaptive mechanisms for tracking and summarizing breaking news stories. We evaluate the effectiveness of these mechanisms on a number of recent news events for which manually curated timelines are available. Assessments based on ROUGE metrics indicate that an adaptive approaches are best suited for tracking evolving stories on Twitter.

SIAug 28, 2013
Using tf-idf as an edge weighting scheme in user-object bipartite networks

Sorin Alupoaie, Pádraig Cunningham

Bipartite user-object networks are becoming increasingly popular in representing user interaction data in a web or e-commerce environment. They have certain characteristics and challenges that differentiates them from other bipartite networks. This paper analyzes the properties of five real world user-object networks. In all cases we found a heavy tail object degree distribution with popular objects connecting together a large part of the users causing significant edge inflation in the projected users network. We propose a novel edge weighting strategy based on tf-idf and show that the new scheme improves both the density and the quality of the community structure in the projections. The improvement is also noticed when comparing to partially random networks.

SIApr 15, 2013
Link Prediction with Social Vector Clocks

Conrad Lee, Bobo Nick, Ulrik Brandes et al.

State-of-the-art link prediction utilizes combinations of complex features derived from network panel data. We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions. Our features are based on social vector clocks, an adaptation of the vector-clock concept introduced in distributed computing to social interaction networks. In fact, our experiments suggest that by taking into account the order and spacing of interactions, social vector clocks exploit different aspects of link formation so that their combination with previous approaches yields the most accurate predictor to date.

SIJun 8, 2012
Aggregating Content and Network Information to Curate Twitter User Lists

Derek Greene, Gavin Sheridan, Barry Smyth et al.

Twitter introduced user lists in late 2009, allowing users to be grouped according to meaningful topics or themes. Lists have since been adopted by media outlets as a means of organising content around news stories. Thus the curation of these lists is important - they should contain the key information gatekeepers and present a balanced perspective on a story. Here we address this list curation process from a recommender systems perspective. We propose a variety of criteria for generating user list recommendations, based on content analysis, network analysis, and the "crowdsourcing" of existing user lists. We demonstrate that these types of criteria are often only successful for datasets with certain characteristics. To resolve this issue, we propose the aggregation of these different "views" of a news story on Twitter to produce more accurate user recommendations to support the curation process.