COPO: a metadata platform for brokering FAIR data in the life sciences
Scientific innovation is increasingly reliant on data and computational resources. Much of today’s
life science research involves generating, processing, and reusing heterogeneous datasets that
are growing exponentially in size. Demand for technical experts (data scientists and
bioinformaticians) to process these data is at an all-time high, but these are not typically trained in
good data management practices. That said, we have come a long way in the last decade, with
funders, publishers, and researchers themselves making the case for open, interoperable data as
a key component of an open science philosophy. In response, recognition of the FAIR Principles
(that data should be Findable, Accessible, Interoperable and Reusable) has become
commonplace. However, both technical and cultural challenges for the implementation of these
principles still exist when storing, managing, analysing and disseminating both legacy and new
data.
COPO is a computational system that attempts to address some of these challenges by enabling
scientists to describe their research objects (raw or processed data, publications, samples, images,
etc.) using community-sanctioned metadata sets and vocabularies, and then use public or
institutional repositories to share it with the wider scientific community. COPO encourages data
generators to adhere to appropriate metadata standards when publishing research objects, using
semantic terms to add meaning to them and specify relationships between them. This allows data
consumers, be they people or machines, to find, aggregate, and analyse data which would
otherwise be private or invisible. Building upon existing standards to push the state of the art in
scientific data dissemination whilst minimising the burden of data publication and sharing.