Main Page: Difference between revisions

From Protist-Prokaryote Symbiosis Database
Jump to navigation Jump to search
(updates section)
 
(31 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==


This project aims to describe symbiotic interactions between protists and prokaryotes as structured data.
This project aims to describe symbiotic interactions between protists and prokaryotes as [https://en.wikipedia.org/wiki/Linked_data Linked Open Data]. Information in the database is compiled from the scientific literature; ~700 symbiotic interactions described in ~380 scientific publications are represented at present. The database is hosted on [https://www.wikibase.cloud/ Wikibase Cloud], a hosting service for Wikibase instances provided by [https://www.wikimedia.de/ Wikimedia Deutschland].


Envisioned use cases include:
What can the database be used for?
* Search and browse symbiotic interactions by biological taxonomy, leveraging cross-references to external taxonomies (e.g. NCBI Taxonomy, Catalogue of Life)
* Search and browse symbiotic interactions by biological taxonomy, leveraging cross-references to external taxonomies and databases
* Find interactions that are described in earlier literature but not yet studied with modern methods
* Find interactions that were described in earlier literature but not yet studied with modern methods
* Programatically find new literature to update the database, by querying the NCBI databases using linked NCBI taxon IDs
* Programatically find sequence data, literature, etc. by querying the NCBI databases using linked NCBI taxon IDs
* Share data with [https://www.globalbioticinteractions.org/ GloBI] through periodic data exports
* Share data to be indexed in [https://www.globalbioticinteractions.org/ GloBI], through [https://github.com/kbseah/ppsdb-globi-export periodic data exports]


Interactions are described with the following statements, roughly aligned with the GloBI terms:
Documentation of the workflow and project administration is linked from [[Project:Main]]. Your questions may be answered at [[Project:Q&A]]
* Taxonomy of hosts and symbionts, with links to external databases (primarily NCBI)
* Localization of symbionts in cellular compartments of the host cell, using Gene Ontology terms
* Nature of biotic interactions, if this is known, using Relation Ontology terms (although there are some limitations in this ontology for describing mutualistic interactions)
* Analytical methods used to study the symbioses


This project originated as part of my [http://nbn-resolving.de/urn:nbn:de:gbv:46-00106172-12 doctoral dissertation] (2017).
==Updates==
 
Similar projects elsewhere:
* [https://www.globalbioticinteractions.org Global Biotic Interactions (GloBI)], an aggregator for biotic interactions datasets across all domains of life
* [https://github.com/ramalok/PIDA Protist Interaction Database (PIDA)], [https://doi.org/10.1038/s41396-019-0542-5 Bjorbækmo et al., 2019] (last updated 2018)
* [https://github.com/FloraVincent/DIDB Diatom Interaction Database (DIDB)] (last updated 2019)


* 2024-09-21 : Addshore, the original developer of the precursor to Wikibase.cloud (WBStack), wrote a [https://addshore.com/2024/09/2-years-of-wikibase-cloud-by-wmde/ blog post] reflecting on the past two years since the project was transferred to Wikimedia Deutschland. I contributed a short user testimonial.
* 2024-08-26 : Released a [https://doi.org/10.32942/X2ZW4S preprint manuscript] describing the database motivation and design, intended for biologists who may use the database or want to build similar projects


==Explore the data==
==Explore the data==
Line 30: Line 23:
* [[Item:Q206|''Pelomyxa palustris'']], a freshwater amoebozoan
* [[Item:Q206|''Pelomyxa palustris'']], a freshwater amoebozoan
* [[Item:Q296|''Mixotricha paradoxa'']], itself a hint gut symbiont of the termite [[Item:Q298|''Mastotermes darwiniensis'']]
* [[Item:Q296|''Mixotricha paradoxa'']], itself a hint gut symbiont of the termite [[Item:Q298|''Mastotermes darwiniensis'']]
* [[Item:Q141|''Candidatus'' Megaira polyxenophila]], a bacterial endosymbiont of a diverse array of protists and algae
Each interaction statement is supported by one or more references to the scientific literature, linked by their DOI if available.


Each interaction statement is supported by one or more references to the scientific literature, linked by their DOI.
Entities (e.g. taxa, publications) can be found by a '''free-text search''' of their labels with the search bar (top of each page) or at [[Special:Search]].


Use the Query Service (link on menu bar) to launch SPARQL queries; try the [[Project:SPARQL/examples|example queries]] to get started.
'''Semantic queries''', which make use of the data model (e.g. "find host species where symbionts are Alphaproteobacteria"), require the use of the SPARQL query language. Use the Query Service (link on menu bar) to launch SPARQL queries; try the [[Project:SPARQL/examples|example queries]] to get started.


== Q & A ==
==Data linking==


; Why use a single 'interacts with' statement, with qualifiers for interaction type, instead of different properties for each interaction type?
We wish to capture different facets of symbiotic interactions, and link these to other databases and ontologies.
: Nature of an interaction is often not fully understood, or may have multiple facets. Coding interaction types as qualifiers allows us to stack multiple functional roles on a single interaction
 
; What is a [[Item:Q56|placeholder taxon]]?
{| class="wikitable"
: We would like to model taxonomic relationships ("find taxa that are members of Ciliophora") and also link out to external databases, particularly NCBI. However, there is often a discrepancy between NCBI Taxonomy and the "actual" taxonomy.
! Information
: For example, the [[Item:Q22|brown ciliate]] is reported as a Parduczia sp. based on sequence analysis, but the sequences from that study are published under an environmental "ciliate metagenome" identifier on NCBI.
! Relevant database or ontology
: Therefore, the item "brown ciliate" is modeled here as a provisional taxon, because it does not have an exact equivalent in the NCBI Taxonomy.
|-
: For [[Item:Q301|Bacteroidales sp. Cc3-010 ectosymbiont of Caduceia versatilis]], an SSU rRNA sequence has been published, but it is currently placed in a provisional "taxon" in the NCBI Taxonomy. The property [[Property:P28|P28]] is used to link to a representative sequence for disambiguation.
| Taxonomy of interacting organisms
; Why do some interacts with statements link to "unknown value"?
| [https://www.ncbi.nlm.nih.gov/taxonomy NCBI Taxonomy], [https://www.wikidata.org/ Wikidata]
: If a symbiont is reported only on the basis of microscopy, without any information on its phylogenetic affiliation, "unknown value" is used for the object of the statement.
|-
: If some information is known about its likely taxonomy, e.g. through use of group-specific FISH probes, then a placeholder taxon is created with a temporary name.
| Localization of symbionts in host organism
: The entry [[Item:Q448|''Metopus contortus'']] has both an "unknown value" statement and a placeholder symbiont taxon [[Item:Q442|Q442]].
| [https://geneontology.org/ Gene Ontology], [https://www.ebi.ac.uk/ols4/ontologies/uberon Uberon]
; Why host this on Wikibase?
|-
: This database has seen a number of iterations: starting as a table in a word processor file, to spreadsheets, a custom SQLite database, and an attempt to homebrew a structured data base with XML files and Python scripts. After getting some experience on Wikidata, I found that Wikibase offers the key features that I wanted: flexible and extensible schemata, graphical frontend for manual data entry, options for programmatic data import from tables, integration with external databases, and a sophisticated query interface.
| Nature of biotic interactions, if known/inferred
; What is the beautiful organism depicted in the logo?
| [https://oborel.github.io/ Relation Ontology]
: [[Item:Q7|Kentrophoros sp. H]]
|-
: (The logo may not be visible in the mobile version of this site.)
| Analytical methods used to identify organisms or interaction type
| [https://obi-ontology.org/ OBI], [https://www.evidenceontology.org/ Evidence Ontology]
|-
| Environment where organisms were isolated
| [https://sites.google.com/site/environmentontology/ Environment Ontology]
|-
| Representative SSU rRNA or genome sequences
| [https://www.ncbi.nlm.nih.gov/genbank/ Genbank]
|-
| Scientific publications describing symbiosis
| DOI, [https://www.wikidata.org/ Wikidata]
|}
 
Terms will be linked to other linked open data or ontologies, if there is a suitable exact match. The [https://www.ebi.ac.uk/ols4/ EMBL-EBI Ontology Lookup Service] is a useful resource for browsing life science related ontologies.
 
==Links==
 
This project originated as part of my [http://nbn-resolving.de/urn:nbn:de:gbv:46-00106172-12 doctoral dissertation] (2017).
 
Similar projects elsewhere:
* [https://www.globalbioticinteractions.org Global Biotic Interactions (GloBI)] ([https://doi.org/10.1016/j.ecoinf.2014.08.005 Poelen et al., 2014]) an aggregator for biotic interactions datasets across all domains of life
* [https://github.com/ramalok/PIDA Protist Interaction Database (PIDA)] ([https://doi.org/10.1038/s41396-019-0542-5 Bjorbækmo et al., 2019]) (last updated 2018)
* [https://github.com/FloraVincent/DIDB Diatom Interaction Database (DIDB)] ([https://doi.org/10.1128/msystems.00444-19 Vincent & Bowler, 2020]) (last updated 2019)
* [http://www.aquasymbio.fr/ AQUASYMBIO] (last updated 2017)

Latest revision as of 09:41, 23 September 2024

Introduction

This project aims to describe symbiotic interactions between protists and prokaryotes as Linked Open Data. Information in the database is compiled from the scientific literature; ~700 symbiotic interactions described in ~380 scientific publications are represented at present. The database is hosted on Wikibase Cloud, a hosting service for Wikibase instances provided by Wikimedia Deutschland.

What can the database be used for?

  • Search and browse symbiotic interactions by biological taxonomy, leveraging cross-references to external taxonomies and databases
  • Find interactions that were described in earlier literature but not yet studied with modern methods
  • Programatically find sequence data, literature, etc. by querying the NCBI databases using linked NCBI taxon IDs
  • Share data to be indexed in GloBI, through periodic data exports

Documentation of the workflow and project administration is linked from Project:Main. Your questions may be answered at Project:Q&A

Updates

  • 2024-09-21 : Addshore, the original developer of the precursor to Wikibase.cloud (WBStack), wrote a blog post reflecting on the past two years since the project was transferred to Wikimedia Deutschland. I contributed a short user testimonial.
  • 2024-08-26 : Released a preprint manuscript describing the database motivation and design, intended for biologists who may use the database or want to build similar projects

Explore the data

Some example entries to see how the data are modeled:

Each interaction statement is supported by one or more references to the scientific literature, linked by their DOI if available.

Entities (e.g. taxa, publications) can be found by a free-text search of their labels with the search bar (top of each page) or at Special:Search.

Semantic queries, which make use of the data model (e.g. "find host species where symbionts are Alphaproteobacteria"), require the use of the SPARQL query language. Use the Query Service (link on menu bar) to launch SPARQL queries; try the example queries to get started.

Data linking

We wish to capture different facets of symbiotic interactions, and link these to other databases and ontologies.

Information Relevant database or ontology
Taxonomy of interacting organisms NCBI Taxonomy, Wikidata
Localization of symbionts in host organism Gene Ontology, Uberon
Nature of biotic interactions, if known/inferred Relation Ontology
Analytical methods used to identify organisms or interaction type OBI, Evidence Ontology
Environment where organisms were isolated Environment Ontology
Representative SSU rRNA or genome sequences Genbank
Scientific publications describing symbiosis DOI, Wikidata

Terms will be linked to other linked open data or ontologies, if there is a suitable exact match. The EMBL-EBI Ontology Lookup Service is a useful resource for browsing life science related ontologies.

Links

This project originated as part of my doctoral dissertation (2017).

Similar projects elsewhere: