ERMrest: an entity-relationship data storage service for web-based, data-oriented collaboration
This addresses the problem of data integration and sharing for scientists in collaborative research, though it appears incremental as it builds on existing relational and web-based approaches.
The authors tackled the challenge of managing complex, evolving scientific data with relational structures in collaborative environments by introducing ERMrest, a service that enables entity-relationship modeling of metadata via RESTful access, which has been deployed to hundreds of users across multiple projects.
Scientific discovery is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. While the details vary from domain to domain, these data often consist of diverse digital assets (e.g. image files, sequence data, or simulation outputs) that are organized with complex relationships and context which may evolve over the course of an investigation. In addition, discovery is often collaborative, such that sharing of the data and its organizational context is highly desirable. Common systems for managing file or asset metadata hide their inherent relational structures, while traditional relational database systems do not extend to the distributed collaborative environment often seen in scientific investigations. To address these issues, we introduce ERMrest, a collaborative data management service which allows general entity-relationship modeling of metadata manipulated by RESTful access methods. We present the design criteria, architecture, and service implementation, as well as describe an ecosystem of tools and services that we have created to integrate metadata into an end-to-end scientific data life cycle. ERMrest has been deployed to hundreds of users across multiple scientific research communities and projects. We present two representative use cases: an international consortium and an early-phase, multidisciplinary research project.