Collection descriptors
Video
In this video (05:33), GBIF Data Administrator, Marie Grosjean, describes collection descriptors and how to use them within GRSciColl. If you are unable to watch the embedded Vimeo video, you can download it locally on the Files for download page. |
Video transcript
Click to expand

What are collection descriptors in GRSciColl?

GRSciColl collection descriptors are meant to share structured information about collections. They can contain relevant details about collections and sub-collections as well as quantitative data which cannot be shared on collection pages (for example, the number of type specimens for a particular taxon). Some collection descriptors are used for indexing collections. This means that they improve collection discoverability. For example, a collection entry associated with dragonfly species names will be found by users looking for “Odonata” in the scientific name field of the collection search.
Here are some examples of searches based on collection descriptors:

Currently, only a handful of collection descriptors are indexed and searchable: scientific name, country or area of coverage (of the specimen, this is based on the dwc:country term), recorded by and type status. However, GBIF will be able to add filters for more standardized terms as more descriptors are shared in GRSciColl.
When to upload collection descriptors in GRSciColl:
-
The collection is not part of the GBIF Scope (for example, mineralogy, archeology, anthropology, etc.)
-
The information about the collection is structured but not at the specimen level (e.g. you can not create occurrences).
-
You have information that does not fit the Darwin Core Standard.
How to format collection descriptors

Each GRSciColl collection entry can have one or several collection descriptor groups. A group can correspond to descriptors for a particular aspect of the collection or a sub-collection. Each group requires:
-
A title of the set of descriptors. For example, “Taxonomic breakdown of the algae sub-collection”.
-
A description for the set of descriptors. For example, “These descriptors are based on the 2008 inventory of the algarium. This inventory focused mainly on type specimens”.
-
A comma-separated file containing the descriptors where each column is a descriptor and each row a subset of the collection described. The header of the table is used to map its content to Darwin Core and Latimer Core (see more details below).
The descriptor table

As mentioned above the descriptor tables are CSV files where each row is a subset of the collection (or a group of specimens) and each column is a descriptor. When possible, the data should be mapped to the Darwin Core and Latimer Core standards but it is possible to share data that is not mapped to any standard. When mapped to one of the standards, the header of the column should contain the prefix of the standard (ltc
: for Latimer Core and dwc
: for Darwin Core) as well as the name of the term.
ltc:biomeType | dwc:scientificName | dwc:country | Number of identified specimens at genus level |
---|---|---|---|
Freshwater |
Perciformes |
Colombia |
300 |
Freshwater |
Perciformes |
Brazil |
145 |
In the example above, the last column couldn’t be mapped to any Darwin Core or Latimer Core term so it was left with a descriptive title. This column will not be indexed, and users will not be able to search data based on its values, but it will be displayed along the other descriptors on the collection page. See an example of collection descriptors for The New York Botanical Garden collection where not every column is mapped to a standard:

Tables might contain overlapping information or different descriptions for the same subset of specimens. |
There is not any template to download as the descriptors can include a lot of headers. You are welcome to download any table you like from GRSciColl and use it as your own template.

Here are some examples that we compiled from real data while working on the implementation of descriptors. They could certainly be mapped differently, and this is to help give an idea of the type of mapping we expect:

The Latimer core term objectClassificationName is very convenient to describe subsets of collections that do not necessarily have other ways of being grouped. For example, this is helpful for groups of non-monophyletic taxa (for example Algae). Ideally, the names used in this field should follow a controlled vocabulary. We haven’t yet set up one though. If you need some guidance, see the proposed vocabulary.
|
Activity
Given the following text describing a (fictional) collection and with the help of the Latimer Core quick reference guide, rank the tables from the poorest to the richest amount of information mapped. This collection contains bony fish specimens from the African coast gathered during two expeditions: one in the 19th century and one in 1940-1952. About half of these specimens are identified to the genus level. The collection contains a few type specimens, including one for Eustomias monoclonoides and one for Bothus podas.
|