Biodiversity Data Journal :
General research article
|
Corresponding author:
Reviewable v
1 Received: 14 Jul 2015
© Thomas Horn
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Horn T () Integrating Biodiversity Data into Botanic Garden Collections. . https://doi.org/
|
|
Biological research always evolves around a form of life, an individual or a group that belongs to a certain taxonomic affinity. Species as fundamental entities of biodiversity are described, grouped and reorganized. The stream of research documentation is frequently interrupted by new discoveries, new technologies and subsequent new evidence on the delimitation of certain taxonomic groups. To recover all relevant information for future research, the assessment of botanic garden collections, the identification of potentially invasive taxa, the construction of priority lists or the development of DNA based species authentication, plant genetic resources of botanic gardens and similar facilities can be improved by connecting them to a dynamic web of knowledge.
Scientific names extracted from seed catalogues provided by 135 botanic gardens were evaluated using the Encyclopedia of Life (EOL), The Catalogue of Life (COL) and The Plant List (TPL). 98.5% of taxon names were verified and discrepancies of taxonomic status among providers were evaluated. Retrieving geographic information COL appeared as most valuable source while EOL was a major contributor primarily of Asian and American occurrences. A compiled flora database made similar contributions as COL, EOL and TPL. Only 7 % of the verified names were found to be included in the International Union for Conservation of Nature (IUCN) Red List, including one officially "extinct" species (Euphrasia mendoncae Samp.) and three taxa (Bromus bromoideus (Lej.) Crép., Lysimachia minoricensis J.J.Rodr. and Mangifera casturi Kosterm.) with the status "extinct in the wild". As second most important factor for biodiversity loss, potential invasiveness was determined using the Global Invasive Species Information Network (GISIN) and Delivering Alien Invasive Species Inventories for Europe (DAISIE). Approximately 4 % of the verified names were found in GISIN, 577 names representing introduced (exotic) and 183 invasive taxa. According to DAISIE around 20 % of the verified names are representing European alien taxa. 15 of the 18 worst European invasive plant taxa also were found among the verified taxon names.
Botanic Gardens, Conservation, Invasiveness, Encyclopedia of Life, Catalouge of Life, The Plant List, Biodiversity, Bioinformatics
Before the introduction of binomial nomenclature scientific names were intended to combine identity and diagnostic description of a species. While dealing with a small group this may have been feasible, for the aim of cataloguing complex groups it proofed to be rather impractical. Starting with Specium Plantarium [
As consequence of progress in taxonomic research the arrangement of organisms is modified subsequently changing or invalidating a scientific name and creating discontinuities among documented empirical data. Thus, the history of a scientific name contains relevant information for future research projects. An entry point for discovering the history of a taxon is a current species checklist which includes synonyms that are the milestones along the road to the present name.
As to collate a uniform and validated index to the world's known species the Integrated Taxonomic Information System (ITIS), a partnership of federal agencies and other organizations from the United States, Canada and Mexico, with data stewards and experts from around the world and Species 2000, an autonomous federation of taxonomic database custodians, involving taxonomists throughout the world, joined forces to setup The Catalogue of Life (COL) in 2001 further developing it [
With sufficient information on the establishment of the accepted scientific name the story of a species unfolds and literature can be screened for relevant information, including names that are no longer in use. Beyond literature there are freely available scientific data repositories of different kinds containing additional information. Members of the International Nucleotide Sequence Database Collaboration (INSDC) have been collecting and providing sequence information for 30 years accumulating about 178 million sequences of 340 thousand species and infraspecific epithets. The Barcoding of Life Datasystems (BOLD) supporting the generation and application of DNA barcode data, as of May 2015 offering over 4 million DNA barcode sequences supporting specimen identification. The Global Biodiversity Information Facility (GBIF) provides a single point of access to more than 500 million records, shared freely by hundreds of institutions worldwide, making it the biggest biodiversity database on the Internet. Names are also critical when building priority lists, e.g. the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES), the International Union for Conservation of Nature (IUCN) or invasive species lists (e.g. Global Invasive Species Information Network, GISIN or Delivering Alien Invasive Species Inventories for Europe, DAISIE).
The vision of building a species database combining names with all kinds of useful data [
All mentioned data providers offer a website where the user can search for respective information. Some of them also offer the possibility to retrieve information through an application programming interface (API). By using dynamic high-level general-purpose languages (e.g. Perl, Phyton, PHP and others) stakeholders can include respective data in their own (web) application. Additionally, scientists are able to retrieve and analyze data from many different taxa in little time by using either the API or third party software that facilitates the API (e.g.
The primary source for all the described information are, nevertheless, physical beings - individual members of respective species that are found in their natural habitat, cultivated on experimental fields or in botanic gardens.
Using science as fundamental criteria for the definition of a botanic garden, the first European botanic gardens were established in the mid-16th century [
Today the role of botanic gardens is much more diverse. Support of scientific research and economic endeavors (e.g. Centre of Economic Botany at Royal Botanic Gardens Kew, founded in 1847 [
Botanic Gardens Conservation International (BGCI) is the world authority on botanic gardens and plant conservation. It represents about 700 members, mostly botanic gardens, from 118 countries. A traditional practice among botanic gardens is the exchange of plant (genetic) resources by annually offering seed catalogues (indices seminum) from which other gardens can order to develop their own collection. This practice is believed to have started in the late 16 nth century at the Oxford BG [
The possibility to develop a collection or scientifically utilize respective species without the need for expensive expeditions and simultaneously complying to the CBD is appealing. However, the exchange and cultivation of plant species also has less favorable consequences. The introduction of exotic species by botanic gardens has frequently been associated with the potential for escape and evolution of invasiveness [
The aim of this study was to gain an overview on the currently available genetic resources offered by botanic gardens using taxonomic names of seed catalogues, to gather relevant information (i.e. taxonomic status, IUCN Red List status, geographical distribution and invasiveness reports) and to evaluate the information and its sources in the context of improving facilities dealing with plant genetic resource.
Botanic gardens hold a vast array of genetic resources (BGCI. 2015. PlantSearch online database. Botanic Gardens Conservation International. Richmond, U.K.) that is freely accessible for research and education. I analyzed the content of electronically available seed catalogues (i.e. indices seminum, IS) from 135, mostly European, botanic gardens (Suppl. material
Schematic representation of the analysis: All extracted taxon names were compiled into a unique name list (UNL). Based on this list taxon verification using EOL, COL and TPL was performed yielding separate unique identifier lists (UIL) containing all verified taxa with source specific identifiers. The UNL was also used for the retrieval of information on conservation (IUCN Red List), invasiveness (GISIN, DAISIE) and geographic distribution (Flora database). The UIL was used to retrieve information on geographic distribution directly via EOL, COL and TPL.
PDF documents were converted into XML, parsed to extract taxa names which then were saved into a local database (IS Taxa, Fig.
To verify the existence and to retrieve the taxonomic status of each name I used data from TPL (The Plant List Version 1.1, available from http://www.theplantlist.org/), EOL (Encyclopedia of Life. Available from http://www.eol.org. Accessed March 2015) and COL (Catalogue of Life. Available from http://www.catalogueoflife.org/,
Name status information was retrieved by respective API functions summarized in Table
Source |
API function |
Output format |
EOL |
eol.org/api/search/1.0.json |
JSON |
eol.org/api/pages/1.0/[pageId].json | JSON | |
eol.org/api/hierarchy_entries/1.0/[taxonConceptId].json | JSON | |
TPL |
local mysql database |
SQL |
COL |
www.catalogueoflife.org/webservices/status/query/ |
XML |
Statuses of taxon names provided by The Plant List (TPL) and other sources. *1 In case the taxon name searched for with EOL is returned in the title of the result, the name is the preferred name (set by the EOL curators). In case the name is returned in the content of the result, the name is considered a synonym.
TPL Status |
Source Status |
Source |
Accepted |
Preferred (Title*1) |
EOL |
Accepted name |
COL |
|
Provisionally accepted name |
COL |
|
Synonym |
Not-Preferred (Content*1) |
EOL |
Synonym |
COL |
|
Unresolved |
||
Misapplied |
In detail, the retrieval of status information was done in two steps. First, only exact matches were used to retrieve the taxon status. In case of EOL the parameter exact was set to true. For CoL and TPL only exact matches are returned by default. In case more than one exact match was returned the respective status was set to “ambiguous”. Secondly, all names that did not return any result (NA) were re-evaluated. In case of EOL, firstly, the parameter exact was set to false (default) and, if still no result was returned, the name of the next higher hierarchical level (i.e. species in case of infraspecific epithets or genus in case of species names) was searched for. If found, the respective taxon page was used to retrieve all available names of children (i.e. infraspecifics or species) stored in hierarchies available through that page. If any of the returned names was an exact match or very similar (i.e. levenshtein distance < 3) it was retained as alternative. In case more than one possible alternative was found the name status was set to “ambiguous” unless the closest match was a perfect match (see discussion for details). If a single perfect match or alternative match was found the corresponding status was used as result. For CoL and TPL the same approach was realized using the API or mySQL queries to retrieve and test respective data.
For subsequent analyses the results of the taxon name status evaluation were merged and a list of unique names was compiled (Unique Name List, Fig.
To evaluate status discrepancies between the sources, the results from all three data providers were combined. The status of each unique name was compared between pairs of data sources (i.e. TPL vs. COL, TPL vs. EOL, COL vs. EOL). The status was either identical or not. In the latter case the comparison was saved and used to evaluate discrepancies in more detail.
The list of unique verified taxon names (UNL, Fig.
To compile an overview of the geographic distribution of available genetic resources, I used two approaches. Firstly, I used the list of verified taxon names with unique identifiers from TPL, EOL and COL (UIL, Fig.
The Plant List Distribution Data
In case of TPL only some of the data sources offer information about distribution that I could use for this analysis: If the TPL data source was either the World Checklist of Selected Plant Families (WCSP), the International Organization for Plant Information (IOPI) or iPlants, the stored identifier (UIL) was used to access taxon information at WCSP which is based on TDWG scheme. In case the TPL data source was the International Legume Database & Information Service (ILDIS), I retrieved the taxon report from their website by using the stored identifier and parsed for geographical records with respective region names. If Tropicos was the TPL data source I used their web service to retrieve distribution data of the stored identifier. For TPL data provided by the The International Compositae Alliance (TICA), a locally installed database containing occurrence information of respective taxa was used (data was kindly provided by Kevin Richards).
The Catalouge of Life and EOL Distribution Data
The web services of COL and EOL could be accessed directly using stored identifiers (UIL).
Flora DB Approach
As a second approach and to supplement the geographical information retrieved from the previously mentioned sources, I compiled a flora database using information of different available floras and regional name checklists (Table
Region |
Name |
Reference / Date of retrieval |
Africa |
African Plant Database |
African Plants Database (version 3.4.0). Conservatoire et Jardin botaniques de la Ville de Genève and South African National Biodiversity Institute, Pretoria, "Retrieved August 2014", from <http://www.ville-ge.ch/musinfo/bd/cjb/africa/>. |
Catalogue of the Vascular Plants of Madagascar |
Tropicos , Madagascar Catalogue, 2013. Catalogue of the Vascular Plants of Madagascar. Missouri Botanic Garden, St. Louis, U.S.A. & Antananarivo, Madagascar [http://www.efloras.org/madagascar. Accessed: April, 2014]. |
|
Asia |
Flora of China |
Tropicos, botanical information system at the Missouri Botanic Garden - www.tropicos.org, Mai 2014 |
Flora of Taiwan Checklist |
eFloras, August 2014 |
|
Ornamental Plants From Russia And Adjacent States Of The Former Soviet Union |
eFloras, August 2014 |
|
Annotated Checklist of the Flowering Plants of Nepal (eFloras) |
Tropicos classic, Mai 2014 |
|
Australasia |
Australian Plant Name Index (APNI) |
April 2014 |
New Zealand indigenous plant list |
New Zealand Plant Conservation Network, 2010 |
|
Europe |
World Checklist of Selected Plant Families (WCSP) |
WCSP (2014). 'World Checklist of Selected Plant Families. Facilitated by the Royal Botanic Gardens, Kew. Published on the Internet; http://apps.kew.org/wcsp/ Retrieved August 2014.' |
North America |
Flora of North America |
Tropicos, botanical information system at the Missouri Botanic Garden - www.tropicos.org, Mai 2014 |
South America |
Flora of Chile |
Tropicos, botanical information system at the Missouri Botanic Garden - www.tropicos.org, ? |
Flora Brasiliensis |
Centro de Referência em Informação Ambiental, CRIA, September 2014 |
|
Cubensis prima flora |
Biodiversity Heritage Library, Biblioteca Digital del Real Jardin Botanico de Madrid |
|
Catalogue of the Vascular Plants of Ecuador |
Tropicos, botanic information system at the Missouri Botanic Garden - www.tropicos.org, April 2014 |
To determine the number and names of potentially invasive taxa cultivated by botanic gardens, I used the Global Invasive Species Information Network (GISIN) web service (http://www.gisin.org/GISIN/GISINWebService) to query each taxon name (UNL, Fig.
Seed lists of 135 botanic gardens located in 124 European, 6 North American, 4 Asian countries and 1 South American country (Fig.
In the exact match pass, 13'764, 12'939 and 14'338 names could be verified using COL, EOL and TPL respectively, constituting a success rate of 79.8 to 88.4 % (Fig.
Results of the taxonomic name verification and status check (Suppl. materials
Comparing the status information retrieved during name verification from each of the sources 3'552 and 3'997 discrepancies were found between TPL and COL, and TPL and EOL respectively, while only 1'816 discrepancies were detected between COL and EOL (Fig.
Taxon name status discrepancies between TPL, EOL and COL (Suppl. material
Out of the 15'973 unique taxon names (UNL), 1'066 (7 %) were found to be assessed by the IUCN Red List. One of the taxon names returned the status extinct (EX). Euphrasia minima Jacq. ex DC., according to IUCN a synonym for Euphrasia mendoncae Samp., can be found in the seed list of BGU Lautaret. The same name with different authority (Euphrasia minima Schleich.) can be found in the seed list of CJB Geneva. Three of the taxon names lead to the status extinct in the wild (EW): Bromus bromoideus, Lysimachia minoricensis and Mangifera casturi. 267 (1.7 %) names fell into one of the IUCN Red List threatened categories (vulnerable, endangered and critically endangered) and the remaining 795 included 84 near threatened (27 in the lower risk category), 6 lower risk conservation dependent, 620 least concern (39 on the lower risk category) and 85 data deficient taxon names (Fig.
IUCN Red List Status of 1'076 taxon names found in seed catalouges of 135 botanic gardens (Suppl. material
To retrieve information about the geographic distribution using a locally installed flora database, a list of unique names (UNL, Fig.
Geographic distribution of studied taxa (Suppl. materials
Most of the evaluated taxa occur in Eurasia (Fig.
Geographic (Europe, Asia, Africa, North America, South America, Australasia and Pacific) distribution of plant genetic resources according to Flora, EOL, TPL and COL information (Suppl. materials
Geographic (Europe, Asia, Africa, North America, South America, Australasia and Pacific) distribution of plant genetic resources according to Flora, EOL, TPL and COL information (Suppl. materials
COL appears as major contributor of distributional data in case of most regions while EOL is a major contributor primarily of Asian and American occurrences. The flora database makes similar contributions as COL (European, African and Australasian occurrences), EOL (Asian occurrences) and TPL (American occurrences).
Of the 15'973 unique taxon names (UNL) 3.9 % (616) have records in the GISIN database. 577 of these taxa have been reported as exotic, 183 also as harmful (invasive) and 39 neither as exotic nor as harmful. All exotic and ivasive taxa are contained in Suppl. material
Of 15'973 unique taxon names 51 % (8'489) can be found in the BOLD taxonomy database. 7'691 of these taxon names also have one or more public barcode record registered with BOLD.
Species, the units of biodiversity, have been described, named, grouped and rearranged for centuries. Scientific research has been recorded under one or another name depending on current opinion. Todays names have a history that comprises valuable hints for constructing a web of knowledge for public and scientific purposes. Names are fundamental for the correct determination of identity and for subsequent research that is based on the premise that data is obtained from a specific taxon.
Botanic gardens cultivate and store many different plant species for scientific research and education. They should maintain a high degree of consistency with current systematic research and should have adequately documented specimens of correct species identity. Factors that most likely compromise these points are insufficient financial commitment and the decline of expertise in classical botany due to reduction or omission of respective topics and training in the biological curriculum.
An approach aiming to resolve this issue, promising specimen identification on the species level using a small standardized region of the (plant) genome [DNA barcoding sensu
Botanic gardens are in their nature places where many different species are brought together that under normal circumstances would never meet (artificial sympatry). Additionally their cultivation inevitably means that they are put into a novel environment. Both factors entail complex consequences that are relevant for our understanding of plant evolution and for conservation biological projects [
How can botanic gardens maintain a high degree of consistency with current systematic research and provide authentic genetic resources ? With internet access a collection can easily be connected to all available data and thus provide local staff and scientists alike with information of their particular interest (e.g. information on natural habitat, taxonomy, invasiveness, conservation status, all known common and scientific names, traditional usage, medicinal potential, etc). As demonstrated in this study in most cases the different sources are in agreement on taxonomic names but there is also a significant number of conflicting cases. Creating brought awareness of these cases increases the chance of research that, ultimately, will resolve conflicts. In this regard it is favorable to integrate taxonomic name status information that clearly indicates if a name should be used with caution (e.g. the status "Unresolved" of TPL) that at the same time indicates the need for further study.
Judging from the analysed seed lists there are still many botanic collections with specimens of mostly unknown origin. This fact alone reduces scientific value considerably [
Results of taxon name verification and status check are based on data provided by third parties to one or more of the data portals used in this study. In case of EOL the name status is determined by EOL curators. A name as part of one or more hierarchies is either the preferred name or it is not. Only COL and TPL provide a name with a status that is based on the respective data source. According to TPL 98 % of all status values were directly derived from the data source that supplied the name record while only 2 % are a result of automated conflict resolution processes. Information provided by COL is based on global species databases that "have been validated for inclusion by independent peer reviews". How many peers are necessary to validate databases that contain names of thousands of plant species from hundreds of families? Is it the few or the many peers that bring about an objective assessment of taxonomy ?
The coverage of taxon names is highest (92 %) using the plant specific portal (TPL) closely followed by that of EOL (91 %) and COL (88 %). Using the levenshtein distance as recovery approach is adequate for simple typos. In case of more complex spelling variants of a name the fuzzy algorithm [
Considering name statuses TPL apparently offers a more heterogeneous view. The status "Unresolved" which in most cases indicates that the respective name status has not yet been assessed by the data provider is unique and makes TPL a distinguished source for plant names. In addition, a much higher degree of name ambiguity can be detected using TPL. About 4 - 6 times more names are considered ambiguous because of the existence of homonyms, many of which are not found using COL and EOL. The primary concern of this study was to find names and associated information. Having an exhaustive name space increases the number of hits when mining for data. This, however, could also result in a more contaminated dataset. Having complete information on a name, never the less, means being able to take all information into consideration when evaluating associated data.
It is curious to find a presumably extinct species (Euphrasia mendoncae Samp.) in a seed list of a botanic garden. The species was described as Portuguese endemic in 1936 and was never found again. IUCN therefore classed the species as extinct. Along with this information doubt on the identity of the portuguese taxon is expressed in the IUCN dataset and Euphrasia minima Jacq. ex Dc. is introduced as synonym of E. mendoncae. The presumed synonym E. minima Jacq. ex DC. appears in the IS of BGU Lautaret 2014, reporting the collection site at 2500 m height in Col du Galibier, France. While COL lists E. mendoncae Samp. as accepted name, TPL lists it as Unresolved, supporting the notion of insufficient data on the identity of the species described in 1936. While E. minima Jacq. ex DC. is not listet in TPL the name appeared as a synonym of E. officinalis subsp. officinalis L. in COL (2014) and in the meantime (2015) is classed as doubtful synonym of E. officinalis subsp. officinalis Jacq. ex DC.. This example demonstrates that names, as important as they are, sometimes are too ambiguous to be meaningful.
1.7 % of the taxon names surveyed fell into a IUCN threatened category. This proportion appears to be very low and might suggest failure of conservation initiatives that aimed to draw botanic gardens closer into the conservation network (e.g. IUCN / WWF Plants Conservation Program, 1984). However, conservation efforts are not expressed by the quantity of threatened taxa maintained by a botanic garden alone (For performance indicators for ex situ conservation facilities see
Reasons for a low proportion of threatened taxa distributed by botanic gardens might be that the distribution itself is considered to be counter productive by leading to extreme genetic depletion (i.e. gardens prefer to order seeds instead of collecting them from natural habitats). Another point may be the necessary expertise that only through considerable commitment of time and resources can be established. Additionally due to a lack of sufficient documentation and prior knowledge of conservational aspects [
Depending on the source the distribution of an taxon can be determined on the continental, country or even regional level. Unfortunately the form of geographic information is not all consistent. Besides the TDGW scheme [
On the 1st of January 2015 an EU regulation on the prevention and management of the introduction and spread of invasive alien species (No. 1143/2014) came into force. It aims to address the adverse impact alien invasive species have on biodiversity, ecosystem services, human health and the economy in the EU Member States. Botanic gardens, without doubt, create artificial situations for species. Hybridization, as one possible consequence, has been shown to be an important factor in evolution [
Botanic gardens represent one major sources of plant genetic resources. As to be useful for education or any other scientific research the material needs to be authentic and properly documented. Particularly in a highly specialized scientific world, where experience in the determination of species has become rare other methods need to be considered. The true (i.e. accepted) name of a taxon is essential in many ways. To verify the identity of a specimen the true name will lead to a diagnostic description. With a verified specimen studies can commence and DNA based authentication assays could be established. True names lead to alternative names that lead to additional information. To assess the conservational, horticultural or medicinal value of a collection true names lead to respective data that can be evaluated. To measure progress towards a certain goal only true names are relevant. In this study I evaluated names of taxa offered in seed catalogues demonstrating that there are still significant numbers of taxonomic issues which need to be resolved before true names will be available. Furthermore I demonstrated that The Plant List Version 1.1 is currently the most complete and informative checklist for plant species names. Providers of critical data (e.g. CITES, IUCN Red List, invasive or toxic species catalogues) should make sure to include a complete list of known names for a particular taxon to maximize data exposure.
With increasing level of publicly available data through portals like EOL and publishers supporting open data sharing [e.g.
Thanks goes to all mentioned people involved in the creation and maintenance of the used data portals as well as those standing behind the actual data providers, to all the botanic gardens as listed in Suppl. material 1 for providing the seed lists and finally to Prof. Peter Nick for supporting my work.
Table containing all botanic gardens included in this study
Table containing name status information of 16'224 taxonomic names retrieved by exact match inquiry from EOL, TPL and COL.
Table containing name status information of names that were not found at EOL, TPL and COL using exact name search.
Table containing discrepancies of pair-wise comparison of taxonomic name status information provided by EOL, TPL and COL
Table containing taxon names with IUCN Red List status of taxa detected in seed catalogues of botanic gardens
Table containing information on geographic distribution of plant gentic resources distributed by botanic gardens. Retrieved from EOL, TPL and COL.
Table containing information on geographic distribution of plant gentic resources distributed by botanic gardens. Merged data retrieved from EOL, TPL and COL.
Table containing taxon names and GISIN status information of exotic and invasive (exotic and harmful) taxa detected in seed catalogues of botanic gardens
Table containing taxon names of European alien species detected in seed catalogues of botanic gardens