Use Case I - Herbarium Specimens
|
Familiarize yourself with the use case scenario. |
Scenario
A Data Mobilization Project in a Regional Herbarium
This narrative was developed as a basis for practical exercises in the biodiversity data mobilization course and the exercise concept and content was developed by Alberto González-Talaván, Néstor Beltrán, Nicolas Noé, Sharon Grant. The data are from a real dataset, but have been modified for the purposes of these exercises. It is a fictionalized scenario and is meant only for instructional purposes.
Description
The University of White Plains is a well recognized tertiary education institution in Guatemala and a national reference for biodiversity research. The Plant Biology Department keeps a medium-size herbarium containing approximately 80,000 specimens collected in and around the local region and dating from the mid-20th century to the present. The collection includes important specimens including types and endemics.
Currently, the care of the collection is assigned to the Professor of Plant Systematics, who performs curatorial tasks as well as their regular research and teaching work. The Departmental Admin is responsible for day-to-day administrative tasks for the herbarium such as purchasing consumables including paper and labels. Faculty staff and students collaborating within the department occasionally work on and update the identifications on the specimens and there are two retired botanists who regularly volunteer in the collection assisting the Professor to prepare loans.
The University already has an online search for its library collections which is maintained by the University’s central web-team on externally hosted servers. No natural history specimens are currently served via this platform.
The Head of the Plant Biology Department has recently secured a 50,000USD, two year grant to image and publish the botanical collections information online. The team wants to use this opportunity to establish a permanent digitization and publishing protocol that will give higher visibility to the herbarium and continue attracting funding.
Data collection
The herbarium comprises approximately 80,000 specimens collected from 1960 to 2015 concentrated mainly in the biodiversity hot spot — Short Mountain National Park. The collection is still growing as a result of exchanges, donations and several active research projects. For each specimen, information about collector, time, date, location and taxonomy are documented. So far no quality control measures have been carried out on the data and there has been no systematic imaging of specimens.
Exercises
Download the exercise sheet (MS Word 345 KB).
Exercise 1
Data capture
Imagine that you are the person assigned to transcribe the data found on the herbarium sheets.
-
Download UC-HS-1-base-material.zip. (34.4 MB). There are 10 images. Two images per specimen for a total of five specimens. The herbarium sheets are in Spanish (data may come to you in various means and in other languages than your own), but you should be able to recognize the data contained in the fields on the labels. Remember to use both images per record to compile the information.
-
Download the spreadsheet template: UC-HS-1-occurrence-template.xlsx (57.3 KB) to transcribe the information found on each of the images for the five specimens.
-
Use the exercise sheet to provide your answers and submit the spreadsheet.
Exercise 2
Data management
Your institution is part of the “Global Poales Association (GPA)”. This association has secured funding to publish an up-to-date flora on the group and has requested your herbarium to participate and provide any high quality records you may have on this order of plants. The order is well represented in your collection so you think you could contribute substantially to this effort.
-
Download UC-HS-2-data-cleaning.csv. (213 KB)
-
Evaluate the dataset and identify which types of errors are present.
-
Identify possible ways to correct those issues and perform those corrections for as many of the errors as you can.
-
Use the exercise sheet to provide your answers and submit the spreadsheet.
Exercise 3
Data publishing
After cleaning the data in the Poales dataset, the team considers that publishing the data online through the GBIF network could be a good way to make this effort visible. You have been requested to lead that publishing work, based on the dataset.
-
Download UC-HS-3-poales-publishing.csv. (233.5 KB)
-
Use the previously provided IPT installation to publish the given dataset.
-
Use the exercise sheet to provide your answers and link to the published dataset.