James Davenport

IR
3papers
4citations
Novelty25%
AI Score14

3 Papers

LGNov 16, 2020
The Influence of Domain-Based Preprocessing on Subject-Specific Clustering

Alexandra Gkolia, Nikhil Fernandes, Nicolas Pizzo et al.

The sudden change of moving the majority of teaching online at Universities due to the global Covid-19 pandemic has caused an increased amount of workload for academics. One of the contributing factors is answering a high volume of queries coming from students. As these queries are not limited to the synchronous time frame of a lecture, there is a high chance of many of them being related or even equivalent. One way to deal with this problem is to cluster these questions depending on their topic. In our previous work, we aimed to find an improved method of clustering that would give us a high efficiency, using a recurring LDA model. Our data set contained questions posted online from a Computer Science course at the University of Bath. A significant number of these questions contained code excerpts, which we found caused a problem in clustering, as certain terms were being considered as common words in the English language and not being recognised as specific code terms. To address this, we implemented tagging of these technical terms using Python, as part of preprocessing the data set. In this paper, we explore the realms of tagging data sets, focusing on identifying code excerpts and providing empirical results in order to justify our reasoning.

SEOct 12, 2020
Rooting Formal Methods within Higher Education Curricula for Computer Science and Software Engineering -- A White Paper

Antonio Cerone, Markus Roggenbach, James Davenport et al.

This white paper argues that formal methods need to be better rooted in higher education curricula for computer science and software engineering programmes of study. To this end, it advocates (i) improved teaching of formal methods; (ii) systematic highlighting of formal methods within existing, `classical' computer science courses; and (iii) the inclusion of a compulsory formal methods course in computer science and software engineering curricula. These recommendations are based on the observations that (a) formal methods are an essential and cost-effective means to increase software quality; however (b) computer science and software engineering programmes typically fail to provide adequate training in formal methods; and thus (c) there is a lack of computer science graduates who are qualified to apply formal methods in industry. This white paper is the result of a collective effort by authors and participants of the 1st International Workshop on "Formal Methods, Fun for Everybody" which was held in Bergen, Norway, 2-3 December 2019. As such, it represents insights based on learning and teaching computer science and software engineering (with or without formal methods) at various universities across Europe.

IROct 4, 2020
Unification of HDP and LDA Models for Optimal Topic Clustering of Subject Specific Question Banks

Nikhil Fernandes, Alexandra Gkolia, Nicolas Pizzo et al.

There has been an increasingly popular trend in Universities for curriculum transformation to make teaching more interactive and suitable for online courses. An increase in the popularity of online courses would result in an increase in the number of course-related queries for academics. This, coupled with the fact that if lectures were delivered in a video on demand format, there would be no fixed time where the majority of students could ask questions. When questions are asked in a lecture there is a negligible chance of having similar questions repeatedly, but asynchronously this is more likely. In order to reduce the time spent on answering each individual question, clustering them is an ideal choice. There are different unsupervised models fit for text clustering, of which the Latent Dirichlet Allocation model is the most commonly used. We use the Hierarchical Dirichlet Process to determine an optimal topic number input for our LDA model runs. Due to the probabilistic nature of these topic models, the outputs of them vary for different runs. The general trend we found is that not all the topics were being used for clustering on the first run of the LDA model, which results in a less effective clustering. To tackle probabilistic output, we recursively use the LDA model on the effective topics being used until we obtain an efficiency ratio of 1. Through our experimental results we also establish a reasoning on how Zeno's paradox is avoided.