Phillip Lord

AI
8papers
98citations
Novelty19%
AI Score17

8 Papers

CEAug 10, 2012Code
An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

Michael J. Bell, Colin S. Gillespie, Daniel Swan et al.

Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use UniProt Knowledge Base (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. Availability: Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation. Contact: phillip.lord@newcastle.ac.uk

AINov 20, 2017
Facets, Tiers and Gems: Ontology Patterns for Hypernormalisation

Phillip Lord, Robert Stevens

There are many methodologies and techniques for easing the task of ontology building. Here we describe the intersection of two of these: ontology normalisation and fully programmatic ontology development. The first of these describes a standardized organisation for an ontology, with singly inherited self-standing entities, and a number of small taxonomies of refining entities. The former are described and defined in terms of the latter and used to manage the polyhierarchy of the self-standing entities. Fully programmatic development is a technique where an ontology is developed using a domain-specific language within a programming language, meaning that as well defining ontological entities, it is possible to add arbitrary patterns or new syntax within the same environment. We describe how new patterns can be used to enable a new style of ontology development that we call hypernormalisation.

IRSep 27, 2017
A Literature Based Approach to Define the Scope of Biomedical Ontologies: A Case Study on a Rehabilitation Therapy Ontology

Mohammad K. Halawani, Rob Forsyth, Phillip Lord

In this article, we investigate our early attempts at building an ontology describing rehabilitation therapies following brain injury. These therapies are wide-ranging, involving interventions of many different kinds. As a result, these therapies are hard to describe. As well as restricting actual practice, this is also a major impediment to evidence-based medicine as it is hard to meaningfully compare two treatment plans. Ontology development requires significant effort from both ontologists and domain experts. Knowledge elicited from domain experts forms the scope of the ontology. The process of knowledge elicitation is expensive, consumes experts' time and might have biases depending on the selection of the experts. Various methodologies and techniques exist for enabling this knowledge elicitation, including community groups and open development practices. A related problem is that of defining scope. By defining the scope, we can decide whether a concept (i.e. term) should be represented in the ontology. This is the opposite of knowledge elicitation, in the sense that it defines what should not be in the ontology. This can be addressed by pre-defining a set of competency questions. These approaches are, however, expensive and time-consuming. Here, we describe our work toward an alternative approach, bootstrapping the ontology from an initially small corpus of literature that will define the scope of the ontology, expanding this to a set covering the domain, then using information extraction to define an initial terminology to provide the basis and the competencies for the ontology. Here, we discuss four approaches to building a suitable corpus that is both sufficiently covering and precise.

AISep 26, 2017
User and Developer Interaction with Editable and Readable Ontologies

Aisha Blfgeh, Phillip Lord

The process of building ontologies is a difficult task that involves collaboration between ontology developers and domain experts and requires an ongoing interaction between them. This collaboration is made more difficult, because they tend to use different tool sets, which can hamper this interaction. In this paper, we propose to decrease this distance between domain experts and ontology developers by creating more readable forms of ontologies, and further to enable editing in normal office environments. Building on a programmatic ontology development environment, such as Tawny-OWL, we are now able to generate these readable/editable from the raw ontological source and its embedded comments. We have this translation to HTML for reading; this environment provides rich hyperlinking as well as active features such as hiding the source code in favour of comments. We are now working on translation to a Word document that also enables editing. Taken together this should provide a significant new route for collaboration between the ontologist and domain specialist.

AIMay 15, 2015
How, What and Why to test an ontology

Jennifer D. Warrender, Phillip Lord

Ontology development relates to software development in that they both involve the production of formal computational knowledge. It is possible, therefore, that some of the techniques used in software engineering could also be used for ontologies; for example, in software engineering testing is a well-established process, and part of many different methodologies. The application of testing to ontologies, therefore, seems attractive. The Karyotype Ontology is developed using the novel Tawny-OWL library. This provides a fully programmatic environment for ontology development, which includes a complete test harness. In this paper, we describe how we have used this harness to build an extensive series of tests as well as used a commodity continuous integration system to link testing deeply into our development process; this environment, is applicable to any OWL ontology whether written using Tawny-OWL or not. Moreover, we present a novel analysis of our tests, introducing a new classification of what our different tests are. For each class of test, we describe why we use these tests, also by comparison to software tests. We believe that this systematic comparison between ontology and software development will help us move to a more agile form of ontology development.

AISep 23, 2013
An evolutionary approach to Function

Phillip Lord

Background: Understanding the distinction between function and role is vexing and difficult. While it appears to be useful, in practice this distinction is hard to apply, particularly within biology. Results: I take an evolutionary approach, considering a series of examples, to develop and generate definitions for these concepts. I test them in practice against the Ontology for Biomedical Investigations (OBI). Finally, I give an axiomatisation and discuss methods for applying these definitions in practice. Conclusions: The definitions in this paper are applicable, formalizing current practice. As such, they make a significant contribution to the use of these concepts within biomedical ontologies.

CLAug 21, 2013
Can inferred provenance and its visualisation be used to detect erroneous annotation? A case study using UniProtKB

Michael J. Bell, Matthew Collison, Phillip Lord

A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge, during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over 100, 000 sentences. Over 8000 sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately 30% are erroneous, whilst 35% appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website.

AIMar 1, 2013
The Semantic Web takes Wing: Programming Ontologies with Tawny-OWL

Phillip Lord

The Tawny-OWL library provides a fully-programmatic environment for ontology building; it enables the use of a rich set of tools for ontology development, by recasting development as a form of programming. It is built in Clojure - a modern Lisp dialect, and is backed by the OWL API. Used simply, it has a similar syntax to OWL Manchester syntax, but it provides arbitrary extensibility and abstraction. It builds on existing facilities for Clojure, which provides a rich and modern programming tool chain, for versioning, distributed development, build, testing and continuous integration. In this paper, we describe the library, this environment and the its potential implications for the ontology development process.