Bees Only Please: Selecting Hundreds of Thousands of Possible Bee Interactions Using a Laptop, Open Datasets, and Small (but Mighty) Commandline Tools.
As pollinators, bees play a major role in food production and ecosystems. Many datasets describing bee-plant interactions are becoming openly available. In their partnership, the Big Bee project [1]
and Global Biotic Interactions (GloBI) [2]
work together to facilitate access to these valuable bee interaction records. Here, we outline a method for accessing biotic interactions that involve Anthophila, commonly known as bees. These interactions include records of bees visiting flowers, parasites of bees as well as hosts of social parasitic bees. Using commonly available tools and known open resources, we show how to build an interaction dataset only involving Anthophila (bees).
Ingredients
- GloBI’s verbatim interaction records
[3]
- DiscoverLife Bee Checklist as made available via Nomer
[4]
,[5]
- Lenovo T480s Laptop running Ubuntu 22.04.1 LTS
- Familiarity with the Unix Shell
[6]
,[7]
- GloBI’s Interaction Data Review Report Framework
Selecting Bee Interactions
Figure 1. shows an approach to select biotic interactions involving bees in a verbatim interactions table from GloBI’s Interpreted Data Products.
First (Fig 1a), the taxonomic names are parsed using GBIF’s taxonomic name parser.
Next (Fig 1b), the parsed names are aligned with the DiscoverLife Bee Taxonomic Checklist as defined in Nomer’s Corpus of Taxonomic Resources.
Then (Fig 1c), interactions are only included when a known bee family name (i.e., Andrenidae, Apidae, Colletidae, Halictidae, Megachilidae, Melittidae, Stenotritidae) is mentioned.
Finally (Fig 1d), a review report is generated by re-using GloBI’s review script check-dataset.sh @09d2f68 with the subset of biotic interactions that mention at least one bee name per interaction.
This approach was implemented in a bash script select-bee-interactions.sh
. This script combines openly available tools (e.g., GloBI’s Nomer [5]
, GNU’s Grep) to process openly available interaction data [3]
.
This script was run on two different interaction datasets (GloBI v0.6/v0.7) and the results can be found at https://github.com/Big-Bee-Network/select-bee-interactions.sh and in the table below.
output / version | GloBI Data v0.7 | GloBI Data v0.6 |
input data | verbatim-interactions.tsv 13.9M records |
verbatim-interactions.tsv 20.0M records |
filtered interactions | bees-only-interactions.tsv 1.4M records (also here) |
bees-only-interactions.tsv 1.1M records (also here) |
review report | bees-only-review.pdf (also here) |
bees-only-review.pdf (also here) |
These results suggest that relatively complex and custom biodiversity data integrations with large datasets (>1M records) can be achieved on modest hardware with carefully selected tools, openly available datasets, using the nimble yet powerful Unix shell, and leveraging available data publication platforms.
The output from this work is currently being used by Big Bee to create a global dataset of bee interactions that includes taxon name alignment based on DiscoverLife. That dataset currently includes over half a million records of global bee species interacting with plants, parasites and other organisms [8]
.
Future Work
I suspect I’ll be reusing the select-bee-interactions.sh
script in the future to create similar custom data processing workflows. . . and I am curious how others approach the challenge of integrating biodiversity data to answer a specific (research) question. So, I’ll close out by asking a question:
What other approach can you think of to get an extensive list of species interaction data involving bees compiled from known and open datasets?
References
[1] Seltmann, K., Allen, J., Brown, B. V, Carper, A., Engel, M. S, Franz, N., et al. (2021). Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization. Biodiversity Information Science and Standards, 5(e74037). doi:10.3897/biss.5.74037 Retrieved from https://escholarship.org/uc/item/0937b5gp
[2] Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.
[3] GloBI Community. (2024). Global Biotic Interactions: Interpreted Data Products hash://md5/946f7666667d60657dc89d9af8ffb909 hash://sha256/4e83d2daee05a4fa91819d58259ee58ffc5a29ec37aa7e84fd5ffbb2f92aa5b8 (0.7) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11552565
[4] Ascher, J. S. and J. Pickering. 2024. Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila). http://www.discoverlife.org/mp/20q?guide=Apoidea_species.
[5] Poelen, J. H. (ed . ) . (2024). Nomer Corpus of Taxonomic Resources hash://sha256/83617875e84bb8ae7ac2a257ad50eb8e82d8935d975f465b8ee8f3a803f72b48 hash://md5/c639d7e3fcd5603f6c48e9d5e6c49672 (0.24) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11105453
[6] Danielle Kane (Ed.), Anna Oates (Ed.), John Wright (Ed.), Nilani Ganeshwaran (Ed.), Tim Dennis (Ed.), Belinda Weaver (Ed.), James Baker, Christopher Erdmann, Dan Michael Heggø, Katrin Leinweber, hugolio, … Vikram Chhatre. (2019, July). LibraryCarpentry/lc-shell: Library Carpentry: The UNIX Shell, June 2019 (Version v2019.06.1). Zenodo. https://doi.org/10.5281/zenodo.3266085 Accessed at https://librarycarpentry.org/lc-shell/ on 2024-06-11 .
[7] Seltmann & Poelen. 2021. A Practical Exploration of Biotic Interaction Data Management and Information Retrieval through Terrestrial Parasite Tracker (TPT) and Global Biotic Interactions (GloBI) [Workshop]. Zenodo. doi:10.5281/zenodo.4759060 Access at https://www.globalbioticinteractions.org/interaction-data-workshop/03-ixodes-whole-dataset/ on 2024-06-11 .
[8] Katja C. Seltmann, & Global Biotic Interaction Community. (2024). Global Bee Interaction Data (v3.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10552937