CVIRLGOTNov 18, 2022

Toward a Flexible Metadata Pipeline for Fish Specimen Images

arXiv:2211.15472v12 citationsh-index: 22
Originality Synthesis-oriented
AI Analysis

This work addresses the need for FAIR-compliant metadata pipelines in biology, specifically for fish specimen images, but it is incremental as it builds on existing standards and methods.

The paper tackles the problem of developing a flexible metadata pipeline for over 300,000 digital fish specimen images to support AI research like species identification, resulting in a four-phased approach and an RDF graph prototype.

Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving automated species identification, segmentation and trait extraction. The paper provides contextual background, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Investigation of Solutions, 3. Implementation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology Guided Neural Networks (NSF/HDR-BGNN) project and the HDR Imageomics Institute. An RDF graph prototype pipeline is presented, followed by a discussion of research implications and conclusion summarizing the results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes