One of the first steps of performing a collection analysis is to define the scope of the collection. While I am focused on analyzing the corpus of BHL for my project, this collection only represents a subset of all biodiversity literature. After defining the scope of biodiversity literature, we can start to understand the coverage of the BHL collection and identify its gaps to target future digitization.
The term “biodiversity” is a contraction of “biological diversity,” first used in 1986 during the planning meeting for National Forum on BioDiversity.1 Simply put, biodiversity is “the variability among living organisms from all sources including, inter alia, terrestrial, marine and other aquatic ecosystems and the ecological complexes of which they are part; this includes diversity within species, between species and of ecosystems.”2 All living life and their environments–quite a large scope.
In order to define this subject area for digitization purposes, the BHL Collections Committee created a Collection Development Policy that defines specific areas of interest to digitize based on BHL user needs.The committee defines BHL users as an interdisciplinary audience composed of “zoologists, botanists, evolutionary biologists, taxonomists, systematists, ecologists, natural history collections managers, scientific illustrators, biological science historiographers, and amateur scientists & hobbyists.” 3 Based on these interests, the committee created an infographic to help distinguish between relevant content or “core literature” topics (blue) and supporting literature topics (green).
The types of information these subjects may cover include: species descriptions, distribution records, climate records, history of scientific discovery, information on extinct species, scientific observations, scientific illustrations, and ecosystem profiles. These types of data can be published in monographs (books) and serials (journals), or unpublished in field notebooks and diaries (handwritten records 4 ). 5
Now that we know what biodiversity literature is, how much of it is out there? And now the big question–how do you calculate all of the written materials, both in and out of copyright, about all living life and all environments in which they inhibit? We can make some educated guesses.
In 2010, members of the BHL Collections Committee estimated the core literature (botanical, mycological and zoological) to consist of 495,000,000 pages6. This estimate was calculated by identifying the core literature in botany, which was chosen over zoology and mycology because of its superior documentation. Two extensive bibliographies were chosen for assessment: Taxonomic Literature, 2nd Edition (TL-2)7 which documents monographs published from 1753-1940 and Botanico-Periodicum-Huntianum (BPH) which documents the number of serials published from 1665 to present. Estimates of each were taken to determine the amount of volumes in each (37,600 in TL-2 and 30,000 in BPH 8. Then the average number of pages per volume based on a sample (15,040,000 in TL-2 and 96,000,000 in BPH).
The estimate of botanical literature is then used to estimate mycological and zoological literature based on the number of scientifically defined species. By determining the ratio of known botanical species to known mycological and zoological species9, (310,129 botanical species, 98,988 mycological species and 1,424,153 zoological species) the amount of pages per species can be estimated for each category. Using these estimates, total pages for all species would be 497,574,779.
This estimate can give us an idea of the size of the scope of biodiversity literature (and a number of pages to aspire to), but BHL is also interested in assessing areas of coverage based on geographic, taxonomic, and subject/discipline data points. Some methods I am exploring include:
using Library of Congress Subject Headings to determine distribution of BHL collections (there is some very exciting research about turning LCSH into hierarchical trees for browsing and searching collections that I would like to replicate for BHL!)
using taxonomic name data10 to analyze species coverage
using comprehensive subject bibliographies to assess coverage (a pilot study was performed in 2015 using pteridological literature by BHL librarians).
Stay tuned to read more about these projects as they develop!
1. [ Issues in Science and Technology Librarianship http://www.istl.org/12-fall/internet.html ]
2. [Convention on Biological Diversity http://www.cbd.int/convention/articles/?a=cbd-02]
3. [Collection Development Policy http://biodivlib.wikispaces.com/Collection+Development+Policy ]
4. [See Katie’s previous post about using transcription tools for handwritten documents!
5. [A large part of biodiversity information is datasets which fall outside of BHL collecting scope which focuses on literature.]
6. [For reference, BHL currently (as of 2/22/17) has 51,362,213 pages online–check the counter at the bottom right page of BHL portal for updated stats!]
7. [For further info on turning TL-2 into a database (like BPH online) ]
8. [BPH online estimates 34,000 titles, this increase can be due both to new journals being indexed in the past 6 years since this estimate and degree of error in the BHL estimate. But hey, they got pretty close!
9. [According to A.D. Chapman’s Numbers of Living Species in Australia and the World. 2nd edn (2009.]
10. [BHL uses Global Names Architecture’s Global Names Recognition and Discovery (GNRD), a taxonomic name recognition algorithm, to search through all of the texts digitized in BHL and extract the scientific names]
Alicia Esquivel is the National Digital Stewardship Resident at the Chicago Botanic Garden, working on "Foundations to Actions: Extending Innovations in Digital Libraries in Partnership with NDSR Learners", an IMLS grant awarded to the Ernst Mayr LIbrary of the MCZ with partner libraries at the Chicago Botanic Garden, Smithsonian Institution, Missouri Botanic Garden and Natural History Museum of LA County.