Synchronization with external sources

Video

In this video (02:56), GBIF Data Administrator, Marie Grosjean, describes how external sources are used in GRSciColl. If you are unable to watch the embedded Vimeo video, you can download it locally on the Files for download page.

Video transcript

Click to expand

 

module1 section3 Slide1

Introduction

The GRSciColl institution and collection entries can have external primary sources of information that comes from another registry or website. Edits to such primary sources prompt updates to corresponding GRSciColl entries, eliminating the need to manage information across multiple registries.

Currently, the two possible sources of information for GRSciColl entries are Index Herbariorum and the GBIF dataset and publisher metadata. Data for entries drawn from these sources should be edited at the source. In practice, GRSciColl’s editing interface doesn’t allow users to update fields that use information from an external source.

Index Herbariorum

Every week, GRSciColl synchronizes with the Index Herbariorum API, updating information for existing entries whose source is Index Herbariorum and creating suggestions for new entries as needed.

module1 section3 Slide2

By default, one Index Herbariorum entry corresponds to an institution entry as well as a collection entry in GRSciColl because herbaria are often botanical collections within other institutions. You can read more about the rationale in this GitHub issue. The synchronization process can generate duplicate institution entries when a single institution is home to several herbaria collections. This is why the synchronization now generates suggestions instead of creating new entries directly. Reviewers are asked to check the new entries suggested to make sure that no new duplicate is created.

Editors can disconnect institutions entries from Index Herbariorum and choose to edit the institution directly in the GRSciColl editing interface.

GBIF dataset metadata and GBIF publisher pages

The metadata of datasets published on GBIF can be used as primary sources for collection entries in GRSciColl. Unlike with the Index Herbariorum synchronization, there is no weekly schedule and new entries aren’t automatically created. Instead, editors must link GRSciColl collections to their sources manually. This is because the scope of GBIF includes data beyond the scope of GRSciColl. Note that there is also an option to create a collection entry from a dataset which is available to editors and mediators. When the metadata of a dataset is updated, the corresponding collection entry is updated immediately. Similarly, the publisher information available on GBIF can be used as a primary source for institution entries in GRSciColl.

Review

Quiz yourself on the concepts covered in this module.

For the following GRSciColl entries, find their master source.

  1. Which is the master source for the collection record for Herbarium - MNHN - Museum national d’Histoire naturelle collection?

  2. Which is the master source for the institution record for MNHN - Museum national d’Histoire naturelle

  3. Which is the master source for the collection record for The fossil collection (F) of the Muséum national d’Histoire naturelle (MNHN - Paris)