Project:Curation workflow
Manual annotation steps
- Find descriptions of symbiotic interactions in the literature, see current backlog: Project:New studies to add
- Check if taxon items representing symbiotic interaction partners are already represented in the database (search for taxon names, potential alternative names)
- If not, create taxon items, linking NCBI taxon ID [[Property:P11] if available
- From the item representing host, link to the symbiont with an 'interacts with' Property:P19 statement
- Add the following information about the symbiotic interaction as qualifiers to the 'interacts with' statement. Skip if not known or not specified in the publication.
- Localization of symbiont in host cell/body, Property:P20
- Analytical methods used to identify the symbiont, Property:P22, e.g. light microscopy, metagenome sequencing, phylogenetic marker sequencing
- Analytical methods used to identify the host, Property:P42, e.g. microbiological culture, light microscopy
- Nature/outcome of the interaction, Property:P26 (under development)
- If the interaction is not found in nature but the result of experimental manipulation, use the 'interacts experimentally with' Property:P41 statement instead, with the same qualifiers.
- Cite the source of the information in a reference, using the reference DOI Property:P27 (without the 'doi:' prefix). Reference items will be semi-automatically created/linked later.
- Add information about the environment where the organism was found. For cultured organisms, this should reflect the original environment where the isolate was collected, if known.
- Environmental material of origin Property:P36
- Broad environmental context Property:P38
- Local environmental context Property:P42
- The order in which properties are displayed is set at MediaWiki:Wikibase-SortedProperties; otherwise, statements are displayed in the order they are added.
Automated maintenance tasks
(under development) The following tasks are (semi)-automated, so it is not necessary to do them manually. Scripts are currently run ad-hoc.
- Scripts to create new reference items from DOIs in reference statements and link them
Annotation guidance
NCBI Taxon IDs and representative sequences
(details to come)
Higher taxonomy (P29, P32)
Interaction claims (P19, P41) and their qualifiers
Subject body part (P20)
Terms should be mapped to Gene Ontology or Uberon.
Methods used to identify interaction partners (P22, P42)
Interaction type (function, outcome) (P26)
Methods used to characterize interaction type (P45)
References that support or refute a given claim (P23, P43)
Environment terms (P36, P38, P40)
Follow the guidelines for using EnvO terms in the MIxS standard: https://github.com/EnvironmentOntology/envo/wiki/ENVO-annotations-for-MIxS-v5
Use subclasses/instances of the following classes with the respective properties. The items within each class should be mapped to subclasses/instances of the corresponding EnvO terms with Property:P37.
Create new environment term items if necessary. These should be drawn from EnvO and mapped to the equivalent EnvO term. Within PPSDB, the environment term should be filed as a subclass of one of the three classes below. For now, it is not necessary to replicate the subclass structure of EnvO within PPSDB.
Property | Class | EnvO term |
---|---|---|
Property:P36 environmental material of origin | Item:Q1597 environmental material | http://purl.obolibrary.org/obo/ENVO_00010483 |
Property:P38 environmental system of origin | Item:Q1602 environmental system | http://purl.obolibrary.org/obo/ENVO_01000254 |
Property:P40 local environmental context | Item:Q1673 astronomical body part | http://purl.obolibrary.org/obo/ENVO_01000813 |
References without DOIs
If a publication does not have a DOI, a reference item has to be created and linked manually. If the full text is available online, specify the URL Property:P15, otherwise give the citation as free text Property:P14.
Representative sequence records (P34, P46)
- A representative sequence is important to clarify the identity of taxa, particularly informally named ones, in the case of updates or changes to classification and taxonomy.
- The representative sequence record must be one that is stated in a publication that describes the interaction/organism. The publication DOI/item should be cited in a reference statement.
- Instances of taxon without equivalent NCBI taxon must have a representative SSU rRNA or genome sequence accession, to ensure that the taxon concept can be reconstructed later.
- For SSU rRNA sequences Property:P34, use the Genbank accession, with or without the version suffix. This is typically one that is directly stated in a publication as a single accession or within a range of accession IDs.
- For genome data Property:P46, publications may cite various identifiers (BioProject, BioSample, WGS, Genbank), so accession for the genome sequence itself may not appear directly in the cited publication. The preferred accession to cite is either the assembly (GCA_ prefix) or WGS contig set accession.
Images (P33)
(Experimental)
- Open-licensed images hosted on Wikimedia Commons may be linked to taxon items. A thumbnail of that image should appear as a preview in PPSDB, and can also be used in visualizations in the SPARQL query service.
- The images should be micrographs or illustrations that depict the named organisms.
- The corresponding item in Commons should be annotated with the organisms they depict (under Structured Data).
- If the organism is described in an open access publication published under a suitable license (e.g. CC-BY-4.0), the figures can be uploaded to Commons and then linked to the PPSDB item.