Project:Main: Difference between revisions

From Protist-Prokaryote Symbiosis Database
Jump to navigation Jump to search
(Created page with "== Lists of pages == === Project pages === * Project:Main -- This page * Project:Curation workflow -- Workflow guidelines for adding new entries and annotations * Project:New studies to add -- List of papers with potentially relevant information for the database * Project:SPARQL/examples -- Example SPARQL queries added here will appear in the examples list in the Query Service interface * Project:Cradle -- Cradle templates (not really used at the mo...")
 
 
(27 intermediate revisions by the same user not shown)
Line 5: Line 5:
* [[Project:Main]] -- This page
* [[Project:Main]] -- This page
* [[Project:Curation workflow]] -- Workflow guidelines for adding new entries and annotations
* [[Project:Curation workflow]] -- Workflow guidelines for adding new entries and annotations
* [[Project:Q&A]]
* [[Project:New studies to add]] -- List of papers with potentially relevant information for the database
* [[Project:New studies to add]] -- List of papers with potentially relevant information for the database
* [[Project:SPARQL/examples]] -- Example SPARQL queries added here will appear in the examples list in the Query Service interface
* [[Project:SPARQL/examples]] -- Example SPARQL queries added here will appear in the examples list in the Query Service interface
* [[Project:SPARQL/other]] -- Other queries and SPARQL code snippets
* [[Project:Cradle]] -- Cradle templates (not really used at the moment, prefer to define EntitySchema)
* [[Project:Cradle]] -- Cradle templates (not really used at the moment, prefer to define EntitySchema)
* [[Project:Ball 1969 revisited]]


=== EntitySchema pages ===
=== EntitySchema pages ===
Line 20: Line 23:


* [https://github.com/kbseah/ppsdb-utils/ ppsdb-utils] -- Python scripts for various maintenance task (private GitHub repo)
* [https://github.com/kbseah/ppsdb-utils/ ppsdb-utils] -- Python scripts for various maintenance task (private GitHub repo)
 
* [https://github.com/kbseah/ppsdb-globi-export ppsdb-globi-export] -- Export of core interaction information from PPSDB for indexing by GloBI


== To do lists ==
== To do lists ==
Line 33: Line 36:


* [[Item:Q448]] - should be split to two entries
* [[Item:Q448]] - should be split to two entries
* [[Item:Q994]] - check that all listed host species included  
* [[Item:Q994]] - check that all listed host species included


=== Taxonomic groups that need updates ===
=== Taxonomic groups that need updates ===


* Euplotes symbionts
* Historical literature on "greek letter" ciliate symbionts
* Ciliate symbionts generally
* Forams!


=== Project maintenance to-dos ===
=== Project maintenance to-dos ===
Line 46: Line 49:
* Add environmental origin statements: P36, P38, P40
* Add environmental origin statements: P36, P38, P40
* Check for statements that should be moved from P19 to P41 "interacts experimentally with"
* Check for statements that should be moved from P19 to P41 "interacts experimentally with"
* Interaction statements: If different aspects of the same symbiosis are described in different publications (e.g. phylogenetic identity vs. interaction type), encode these as separate statements with different qualifier values and references, instead of merging them into one statement where it is not clear which reference is cited in support of which claim. Example: [[Item:Q501]]. Use qualifier P45 "method used to determine interaction type".


Chores (have to periodically clear them):
Chores handled by ppsdb-utils:
* Process backlog of new references
* Process backlog of new references
* Add higher taxonomy for newly added taxa that are not yet represented (PR2 for eukaryotes)
* Add parent taxa for all taxon items semi-automatically - try to match first word in taxon name to a genus name, remaining items have to be added manually
* Ensure that all items have a class, check that class "placeholder taxon" is consistently used
* Find references that have only DOIs and link them to reference items, or create new reference items for them by Wikidata lookup
* Find references that have only DOIs and link them to reference items, or create new reference items for them by Wikidata lookup
* Add formatted citations to reference items; get these from Crossref using the DOI: https://citation.crosscite.org/docs.html
Other chores:
* Add parent taxa for all taxon items
* Ensure that all items have a class
* Find (prokaryote) taxon items that have NCBI taxon ID and a LPSN record, but which are not in Wikidata, and export them to Wikidata


Ideas:
Ideas:
* Decouple taxonomy from interaction partners? Class "interaction partner" that is independent of "taxon", though interaction partners are also instances of taxon. "taxon" can then have a new property "part of taxonomic system" with a "reference" item.
* Link prokaryote names to LPSN?
* <s>Add SILVA taxonomy for prokaryotes by mapping to NCBI taxon ID. Prefer SILVA to GTDB because more spp. represented by 16S, not just genomes</s>
* Better documentation of the data modeling, example workflow for adding a new entry based on information in a publication
* Better documentation of the data modeling, example workflow for adding a new entry based on information in a publication
* Interaction statement: Qualifier if identified in cultured strains (vs. in samples that were directly sampled from environment)
* Which environmental material is most appropriate for protists that are symbionts located in digestive tract environment?
* Copyright statement, privacy policy


NB: The taxonomy version (e.g. PR2 v5.0.0 for eukaryotes) is tagged by linking to a reference item for that particular taxonomy version to the 'parent taxon' statement in a 'stated in' reference. Ideally we should be able to specify which taxonomy version we want to use, if branches get moved around in the future...
NB: The taxonomy version (e.g. PR2 v5.0.0 for eukaryotes) is tagged by linking to a reference item for that particular taxonomy version to the 'parent taxon' statement in a 'stated in' reference. Ideally we should be able to specify which taxonomy version we want to use, if branches get moved around in the future...
Line 68: Line 69:
* Sane way to model placeholder taxa and non-specific taxon statements? E.g. general statements like "all members of this family are associated with methanogens", without creating items for each individual?
* Sane way to model placeholder taxa and non-specific taxon statements? E.g. general statements like "all members of this family are associated with methanogens", without creating items for each individual?
* Modeling interactions: RO is insufficient?
* Modeling interactions: RO is insufficient?
* How to model "attached to" vs. "adjacent to" relations? Perhaps add subproperties of http://purl.obolibrary.org/obo/RO_0002323 as values of a "topological relationship to subject body part" qualifier
* Add statements about metabolism ("phototroph", "nitrogen fixer") to symbiont items?
* Add statements about metabolism ("phototroph", "nitrogen fixer") to symbiont items?
* Dummy taxon for "microbiome" to enable us to add references for microbiome studies?
* Dummy taxon for "microbiome" to enable us to add references for microbiome studies?
 
* Which environmental material is most appropriate for protists that are symbionts located in digestive tract environment?
Draft annotation guidelines:
* Reference items should always have DOI as an alias, so it that users can easily check with the search bar if it is already used somewhere in the database
* Environmental origin of the organisms using EnvO terms, use guidelines aligned to MIxS: https://github.com/EnvironmentOntology/envo/wiki/ENVO-annotations-for-MIxS-v5

Latest revision as of 08:56, 19 August 2024

Lists of pages

Project pages

EntitySchema pages

MediaWiki pages

Software

  • ppsdb-utils -- Python scripts for various maintenance task (private GitHub repo)
  • ppsdb-globi-export -- Export of core interaction information from PPSDB for indexing by GloBI

To do lists

NCBI taxon IDs requiring attention from NCBI Taxonomy team

Items that need some attention

  • Item:Q448 - should be split to two entries
  • Item:Q994 - check that all listed host species included

Taxonomic groups that need updates

  • Historical literature on "greek letter" ciliate symbionts
  • Forams!

Project maintenance to-dos

In progress:

  • Add P42 statements "Method used to identify subject"
  • Add environmental origin statements: P36, P38, P40
  • Check for statements that should be moved from P19 to P41 "interacts experimentally with"
  • Interaction statements: If different aspects of the same symbiosis are described in different publications (e.g. phylogenetic identity vs. interaction type), encode these as separate statements with different qualifier values and references, instead of merging them into one statement where it is not clear which reference is cited in support of which claim. Example: Item:Q501. Use qualifier P45 "method used to determine interaction type".

Chores handled by ppsdb-utils:

  • Process backlog of new references
  • Find references that have only DOIs and link them to reference items, or create new reference items for them by Wikidata lookup
  • Add formatted citations to reference items; get these from Crossref using the DOI: https://citation.crosscite.org/docs.html

Other chores:

  • Add parent taxa for all taxon items
  • Ensure that all items have a class
  • Find (prokaryote) taxon items that have NCBI taxon ID and a LPSN record, but which are not in Wikidata, and export them to Wikidata

Ideas:

  • Better documentation of the data modeling, example workflow for adding a new entry based on information in a publication

NB: The taxonomy version (e.g. PR2 v5.0.0 for eukaryotes) is tagged by linking to a reference item for that particular taxonomy version to the 'parent taxon' statement in a 'stated in' reference. Ideally we should be able to specify which taxonomy version we want to use, if branches get moved around in the future...

Modeling:

  • Sane way to model placeholder taxa and non-specific taxon statements? E.g. general statements like "all members of this family are associated with methanogens", without creating items for each individual?
  • Modeling interactions: RO is insufficient?
  • How to model "attached to" vs. "adjacent to" relations? Perhaps add subproperties of http://purl.obolibrary.org/obo/RO_0002323 as values of a "topological relationship to subject body part" qualifier
  • Add statements about metabolism ("phototroph", "nitrogen fixer") to symbiont items?
  • Dummy taxon for "microbiome" to enable us to add references for microbiome studies?
  • Which environmental material is most appropriate for protists that are symbionts located in digestive tract environment?