Project:Main

From Protist-Prokaryote Symbiosis Database
Jump to navigation Jump to search

Lists of pages

Project pages

EntitySchema pages

MediaWiki pages

Software

  • ppsdb-utils -- Python scripts for various maintenance task (private GitHub repo)


To do lists

NCBI taxon IDs requiring attention from NCBI Taxonomy team

Items that need some attention

  • Item:Q448 - should be split to two entries
  • Item:Q994 - check that all listed host species included

Taxonomic groups that need updates

  • Euplotes symbionts
  • Ciliate symbionts generally

Project maintenance to-dos

In progress:

  • Add P42 statements "Method used to identify subject"
  • Add environmental origin statements: P36, P38, P40
  • Check for statements that should be moved from P19 to P41 "interacts experimentally with"

Chores (have to periodically clear them):

  • Process backlog of new references
  • Add higher taxonomy for newly added taxa that are not yet represented (PR2 for eukaryotes)
  • Add parent taxa for all taxon items semi-automatically - try to match first word in taxon name to a genus name, remaining items have to be added manually
  • Ensure that all items have a class, check that class "placeholder taxon" is consistently used
  • Find references that have only DOIs and link them to reference items, or create new reference items for them by Wikidata lookup
  • Link taxon items to Wikidata by their NCBI taxon IDs
  • Find (prokaryote) taxon items that have NCBI taxon ID and a LPSN record, but which are not in Wikidata, and export them to Wikidata
  • Add formatted citations to reference items; get these from Crossref using the DOI: https://citation.crosscite.org/docs.html

Ideas:

  • Better documentation of the data modeling, example workflow for adding a new entry based on information in a publication
  • Interaction statement: Qualifier if identified in cultured strains (vs. in samples that were directly sampled from environment)
  • Copyright statement, privacy policy

NB: The taxonomy version (e.g. PR2 v5.0.0 for eukaryotes) is tagged by linking to a reference item for that particular taxonomy version to the 'parent taxon' statement in a 'stated in' reference. Ideally we should be able to specify which taxonomy version we want to use, if branches get moved around in the future...

Modeling:

  • Sane way to model placeholder taxa and non-specific taxon statements? E.g. general statements like "all members of this family are associated with methanogens", without creating items for each individual?
  • Modeling interactions: RO is insufficient?
  • Add statements about metabolism ("phototroph", "nitrogen fixer") to symbiont items?
  • Dummy taxon for "microbiome" to enable us to add references for microbiome studies?
  • Which environmental material is most appropriate for protists that are symbionts located in digestive tract eQnvironment?

Draft annotation guidelines:


Export for Globi

SPARQL query to export table with fields used by Globi. The output may need to be processed further.

#List all interactions, optionally the localization, interaction type, and references
PREFIX pp: <https://ppsdb.wikibase.cloud/entity/>
PREFIX ppt: <https://ppsdb.wikibase.cloud/prop/direct/>
PREFIX pps: <https://ppsdb.wikibase.cloud/prop/>
PREFIX ppss: <https://ppsdb.wikibase.cloud/prop/statement/>
PREFIX ppsq: <https://ppsdb.wikibase.cloud/prop/qualifier/>
PREFIX ppsr: <https://ppsdb.wikibase.cloud/prop/reference/>


SELECT DISTINCT ?argumentTypeName ?sourceTaxon ?sourceTaxonName ?sourceWdmap ?sourceTaxonId ?typeLabel ?interactionTypeId ?targetTaxon ?targetTaxonName ?targetWdmap ?targetTaxonId ?sourceBodyPartName ?sourceBodyPartId ?referenceDoi ?referenceCitation WHERE {
  ?sourceTaxon pps:P19 ?interaction.
  ?interaction ppss:P19 ?targetTaxon.
  OPTIONAL {
    ?interaction ppsq:P20 ?sourceBodyPart. 
    ?sourceBodyPart rdfs:label ?sourceBodyPartName.
    OPTIONAL { ?sourceBodyPart ppt:P17 ?sourceBodyPartId. }
    OPTIONAL { ?sourceBodyPart ppt:P44 ?sourceBodyPartId. }
  }
  OPTIONAL {
    ?interaction ppsq:P26 ?type. 
    OPTIONAL { ?type ppt:P16 ?interactionTypeId. }
  }
  OPTIONAL {
    ?interaction prov:wasDerivedFrom ?refnode.
    # OPTIONAL { ?refnode ppsr:P27 ?doi }
    OPTIONAL {
      ?refnode ppsr:P23 ?statedIn.
      OPTIONAL { ?statedIn ppt:P13 ?referenceDoi. }
      OPTIONAL { ?statedIn ppt:P14 ?referenceCitation. }
      BIND (STR("support") AS ?argumentTypeName)
    }
    OPTIONAL {
      ?refnode ppsr:P43 ?statedIn.
      OPTIONAL { ?statedIn ppt:P13 ?referenceDoi. }
      OPTIONAL { ?statedIn ppt:P14 ?referenceCitation. }
      BIND (STR("refute") AS ?argumentTypeName)
    }
  }
  OPTIONAL {
    ?sourceTaxon ppt:P11 ?sourceTaxon_ncbi. 
    BIND ( CONCAT("NCBI:txid", STR(?sourceTaxon_ncbi)) as ?sourceTaxonId )
  }
  OPTIONAL {
    ?targetTaxon ppt:P11 ?targetTaxon_ncbi. 
    BIND ( CONCAT("NCBI:txid", STR(?targetTaxon_ncbi)) as ?targetTaxonId )
  }
  ?sourceTaxon rdfs:label ?sourceTaxonName .
  OPTIONAL { ?targetTaxon rdfs:label ?targetTaxonName. }
  OPTIONAL { ?sourceTaxon ppt:P2 ?sourceWdmap . }
  OPTIONAL { ?targetTaxon ppt:P2 ?targetWdmap . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?sourceTaxonName ?targetTaxonName

Try it!