Project:Curation workflow

From Protist-Prokaryote Symbiosis Database
Jump to navigation Jump to search

Manual annotation steps

  • Find descriptions of symbiotic interactions in the literature, see current backlog: Project:New studies to add
  • Check if taxon items representing symbiotic interaction partners are already represented in the database (search for taxon names, potential alternative names)
  • If not, create taxon items, linking NCBI taxon ID [[Property:P11] if available
  • From the item representing host, link to the symbiont with an 'interacts with' Property:P19 statement
  • Add the following information about the symbiotic interaction as qualifiers to the 'interacts with' statement. Skip if not known or not specified in the publication.
  • Localization of symbiont in host cell/body, Property:P20
  • Analytical methods used to identify the symbiont, Property:P22, e.g. light microscopy, metagenome sequencing, phylogenetic marker sequencing
  • Analytical methods used to identify the host, Property:P42, e.g. microbiological culture, light microscopy
  • Nature/outcome of the interaction, Property:P26 (under development)
  • If the interaction is not found in nature but the result of experimental manipulation, use the 'interacts experimentally with' Property:P41 statement instead, with the same qualifiers.
  • Cite the source of the information in a reference, using the reference DOI Property:P27 (without the 'doi:' prefix). Reference items will be semi-automatically created/linked later.
  • Add information about the environment where the organism was found. For cultured organisms, this should reflect the original environment where the isolate was collected, if known.

Automated maintenance tasks

(under development) The following tasks are (semi)-automated, so it is not necessary to do them manually. Scripts are currently run ad-hoc.

  • Scripts to create new reference items from DOIs in reference statements and link them

Annotation guidance

NCBI Taxon IDs and representative sequences

(details to come)

Higher taxonomy (P29, P32)

Interaction claims (P19, P41) and their qualifiers

Subject body part (P20)

Terms should be mapped to Gene Ontology or Uberon.

Methods used to identify interaction partners (P22, P42)

Interaction type (function, outcome) (P26)

Methods used to characterize interaction type (P45)

References that support or refute a given claim (P23, P43)

Environment terms (P36, P38, P40)

Follow the guidelines for using EnvO terms in the MIxS standard: https://github.com/EnvironmentOntology/envo/wiki/ENVO-annotations-for-MIxS-v5

Use subclasses/instances of the following classes with the respective properties. The items within each class should be mapped to subclasses/instances of the corresponding EnvO terms with Property:P37.

Property Class EnvO term
Property:P36 environmental material of origin Item:Q1597 environmental material http://purl.obolibrary.org/obo/ENVO_00010483
Property:P38 environmental system of origin Item:Q1602 environmental system http://purl.obolibrary.org/obo/ENVO_01000254
Property:P40 local environmental context Item:Q1673 astronomical body part http://purl.obolibrary.org/obo/ENVO_01000813

References without DOIs

If a publication does not have a DOI, a reference item has to be created and linked manually. If the full text is available online, specify the URL Property:P15, otherwise give the citation as free text Property:P14.

Representative sequence records (P34, P46)

  • A representative sequence is important to clarify the identity of taxa, particularly informally named ones, in the case of updates or changes to classification and taxonomy.
  • The representative sequence record must be one that is stated in a publication that describes the interaction/organism. The publication DOI/item should be cited in a reference statement.
  • Instances of taxon without equivalent NCBI taxon must have a representative SSU rRNA or genome sequence accession, to ensure that the taxon concept can be reconstructed later.
  • For SSU rRNA sequences Property:P34, use the Genbank accession, with or without the version suffix. This is typically one that is directly stated in a publication as a single accession or within a range of accession IDs.
  • For genome data Property:P46, publications may cite various identifiers (BioProject, BioSample, WGS, Genbank), so accession for the genome sequence itself may not appear directly in the cited publication. The preferred accession to cite is either the assembly (GCA_ prefix) or WGS contig set accession.

Images (P33)

(Experimental)

  • Open-licensed images hosted on Wikimedia Commons may be linked to taxon items. A thumbnail of that image should appear as a preview in PPSDB, and can also be used in visualizations in the SPARQL query service.
  • The images should be micrographs or illustrations that depict the named organisms.
  • The corresponding item in Commons should be annotated with the organisms they depict (under Structured Data).
  • If the organism is described in an open access publication published under a suitable license (e.g. CC-BY-4.0), the figures can be uploaded to Commons and then linked to the PPSDB item.

Taxon name with Open Nomenclature signs (P12)

Schema validation