Dryad2Dataverse project

Connecting two vital sources of research data

Overview

The Dryad2Dataverse project consists of two components — a specialized Python software library and an application based on the library. Sitting between Dryad repository software and Dataverse repository software, it can:

  • Translate metadata from Dryad to Dataverse
  • Transfer Dryad information to Dataverse
  • Monitor changes over time

Benefits

  • Dryad2Dataverse helps consolidate UBC’s research-data holdings on the Scholars Portal Dataverse platform (https://dataverse.scholarsportal.info/dataverse/ubc) and be digitally stewarded by UBC Library. These efforts help ensure that UBC manages its own research output and prepares it for digital preservation.
  • As a secondary benefit, it ensures that the research data from UBC will be much less distributed (i.e., more easily findable at a single location). By integrating Dryad with UBC’s research-data collection at Scholars Portal, UBC’s research data is also further discoverable by FRDR (Federated Research Data Repository) discovery service and by Geodisy, a spatially-aware data search also developed by the Research Commons.
  • Tertiary benefits include reuse by other individuals and institutions who wish to use Dataverse, as Dryad2Dataverse is designed to be modular and simple to use.

The project is open source, released under a very permissive licence, and is freely available for download at https://github.com/ubc-library-rc/dryad2dataverse.

Data platforms and their non-connectedness

Publishers, organizations, institutions, and others have repositories which are not necessarily connected. Researchers use different repositories based on disciplines or publishing requirements. This makes finding the research output of a university such as UBC more difficult than it could be.

Currently, UBC has much of its research data hosted at a repository curated by UBC Library — the Scholars Portal Dataverse (https://dataverse.scholarsportal.info/ubc), which is powered by the eponymous Dataverse software. This is the preferred UBC repository.

Dryad (https://datadryad.org) also hosts a large collection of data created by UBC researchers–566 datasets as of 12 March 2021. Most of the data hosted in Dryad is based on the life sciences. UBC has recently become the first Canadian institution member for Dryad, highlighting its vital importance to researchers. Like Dataverse, Dryad is a platform powered by a piece of software also called Dryad.

All of the material deposited into Dryad is required to have a Creative Commons Zero (CC0) licence. This means that data can be harvested and transferred without restriction or legal boundaries.

Dryad software and Dataverse software don’t speak with one another by default, and researchers who prefer Dryad for whatever reason don’t have representation in UBC’s Scholars Portal Dataverse repository, which means UBC’s data collection is awkwardly split.

The Dryad2Dataverse project attempts to solve this problem by sitting between two pieces of software (or web applications) — taking material from Dryad and placing it into Dataverse, without any effort on the part of the researcher. 

At its most basic, it can be reduced to one line, where the middle did not previously exist:

Dryad -> Dryad2Dataverse  -> Dataverse

Technical overview

The Dryad2Dataverse project works with the Dryad and Dataverse software, not necessarily the platforms of the same name. If platform A runs Dryad software and platform B runs Dataverse software, Dryad2Dataverse fills the gap between the two.

The project comes as two separate, but related, components. The first is a Python-programming-language library called Dryad2Dataverse, which allows users with Python programming expertise to easily perform these functions without extensive knowledge of either Dryad’s or Dataverse’s data formats or application-programming-interface standards.

The second component is an application which will convert then transfer Dryad data and metadata to a Dataverse repository. It can monitor Dryad for changes and updates with minimal supervision and intervention. This secondary component requires no programming knowledge, and is designed so that UBC and others that decide to use the software can very easily transfer information from one system to another.

The application is designed to be portable and not require complex infrastructure. It can be run on any computer with an internet connection and sufficient storage space to allow for data transfers, ranging from a cell phone to a supercomputer.

The software is an open-source release with an MIT licence (https://opensource.org/licenses/MIT), which will be freely available to all who wish to use and extend it.

Credits

The Dryad2Dataverse project is a creation of the UBC Library’s Research Commons (https://researchcommons.library.ubc.ca). Primary programming was by Paul Lesack, with input from the Research Commons team, including Eugene Barsky, Doug Brigham, and Paul Dante.