Project:Main: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 75: | Line 75: | ||
* Reference items should always have DOI as an alias, so it that users can easily check with the search bar if it is already used somewhere in the database | * Reference items should always have DOI as an alias, so it that users can easily check with the search bar if it is already used somewhere in the database | ||
* Environmental origin of the organisms using EnvO terms, use guidelines aligned to MIxS: https://github.com/EnvironmentOntology/envo/wiki/ENVO-annotations-for-MIxS-v5 | * Environmental origin of the organisms using EnvO terms, use guidelines aligned to MIxS: https://github.com/EnvironmentOntology/envo/wiki/ENVO-annotations-for-MIxS-v5 | ||
== Export for Globi == | |||
SPARQL query to export table with fields used by Globi. The output may need to be processed further. | |||
<sparql tryit="1"> | |||
#List all interactions, optionally the localization, interaction type, and references | |||
PREFIX pp: <https://ppsdb.wikibase.cloud/entity/> | |||
PREFIX ppt: <https://ppsdb.wikibase.cloud/prop/direct/> | |||
PREFIX pps: <https://ppsdb.wikibase.cloud/prop/> | |||
PREFIX ppss: <https://ppsdb.wikibase.cloud/prop/statement/> | |||
PREFIX ppsq: <https://ppsdb.wikibase.cloud/prop/qualifier/> | |||
PREFIX ppsr: <https://ppsdb.wikibase.cloud/prop/reference/> | |||
SELECT DISTINCT ?argumentTypeName ?sourceTaxon ?sourceTaxonName ?sourceWdmap ?sourceTaxonId ?typeLabel ?interactionTypeId ?targetTaxon ?targetTaxonName ?targetWdmap ?targetTaxonId ?sourceBodyPartName ?sourceBodyPartId ?referenceDoi ?referenceCitation WHERE { | |||
?sourceTaxon pps:P19 ?interaction. | |||
?interaction ppss:P19 ?targetTaxon. | |||
OPTIONAL { | |||
?interaction ppsq:P20 ?sourceBodyPart. | |||
?sourceBodyPart rdfs:label ?sourceBodyPartName. | |||
OPTIONAL { ?sourceBodyPart ppt:P17 ?sourceBodyPartId. } | |||
OPTIONAL { ?sourceBodyPart ppt:P44 ?sourceBodyPartId. } | |||
} | |||
OPTIONAL { | |||
?interaction ppsq:P26 ?type. | |||
OPTIONAL { ?type ppt:P16 ?interactionTypeId. } | |||
} | |||
OPTIONAL { | |||
?interaction prov:wasDerivedFrom ?refnode. | |||
# OPTIONAL { ?refnode ppsr:P27 ?doi } | |||
OPTIONAL { | |||
?refnode ppsr:P23 ?statedIn. | |||
OPTIONAL { ?statedIn ppt:P13 ?referenceDoi. } | |||
OPTIONAL { ?statedIn ppt:P14 ?referenceCitation. } | |||
BIND (STR("support") AS ?argumentTypeName) | |||
} | |||
OPTIONAL { | |||
?refnode ppsr:P43 ?statedIn. | |||
OPTIONAL { ?statedIn ppt:P13 ?referenceDoi. } | |||
OPTIONAL { ?statedIn ppt:P14 ?referenceCitation. } | |||
BIND (STR("refute") AS ?argumentTypeName) | |||
} | |||
} | |||
OPTIONAL { | |||
?sourceTaxon ppt:P11 ?sourceTaxon_ncbi. | |||
BIND ( CONCAT("NCBI:txid", STR(?sourceTaxon_ncbi)) as ?sourceTaxonId ) | |||
} | |||
OPTIONAL { | |||
?targetTaxon ppt:P11 ?targetTaxon_ncbi. | |||
BIND ( CONCAT("NCBI:txid", STR(?targetTaxon_ncbi)) as ?targetTaxonId ) | |||
} | |||
?sourceTaxon rdfs:label ?sourceTaxonName . | |||
OPTIONAL { ?targetTaxon rdfs:label ?targetTaxonName. } | |||
OPTIONAL { ?sourceTaxon ppt:P2 ?sourceWdmap . } | |||
OPTIONAL { ?targetTaxon ppt:P2 ?targetWdmap . } | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } | |||
} ORDER BY ?sourceTaxonName ?targetTaxonName | |||
</sparql> |
Revision as of 15:21, 17 June 2024
Lists of pages
Project pages
- Project:Main -- This page
- Project:Curation workflow -- Workflow guidelines for adding new entries and annotations
- Project:New studies to add -- List of papers with potentially relevant information for the database
- Project:SPARQL/examples -- Example SPARQL queries added here will appear in the examples list in the Query Service interface
- Project:Cradle -- Cradle templates (not really used at the moment, prefer to define EntitySchema)
EntitySchema pages
- EntitySchema:E1 -- Taxon
MediaWiki pages
- MediaWiki:Wikibase-SortedProperties -- Set the order in which statements are displayed on item pages
Software
- ppsdb-utils -- Python scripts for various maintenance task (private GitHub repo)
To do lists
NCBI taxon IDs requiring attention from NCBI Taxonomy team
- Item:Q251
- Item:Q297 and related
- Instances of Item:Q657
- Item:Q557 name is misspelled in NCBI Taxonomy database -- reported 2024-06-13
Items that need some attention
Taxonomic groups that need updates
- Euplotes symbionts
- Ciliate symbionts generally
Project maintenance to-dos
In progress:
- Add P42 statements "Method used to identify subject"
- Add environmental origin statements: P36, P38, P40
- Check for statements that should be moved from P19 to P41 "interacts experimentally with"
Chores (have to periodically clear them):
- Process backlog of new references
- Add higher taxonomy for newly added taxa that are not yet represented (PR2 for eukaryotes)
- Add parent taxa for all taxon items semi-automatically - try to match first word in taxon name to a genus name, remaining items have to be added manually
- Ensure that all items have a class, check that class "placeholder taxon" is consistently used
- Find references that have only DOIs and link them to reference items, or create new reference items for them by Wikidata lookup
- Link taxon items to Wikidata by their NCBI taxon IDs
- Find (prokaryote) taxon items that have NCBI taxon ID and a LPSN record, but which are not in Wikidata, and export them to Wikidata
- Add formatted citations to reference items; get these from Crossref using the DOI: https://citation.crosscite.org/docs.html
Ideas:
- Better documentation of the data modeling, example workflow for adding a new entry based on information in a publication
- Interaction statement: Qualifier if identified in cultured strains (vs. in samples that were directly sampled from environment)
- Copyright statement, privacy policy
NB: The taxonomy version (e.g. PR2 v5.0.0 for eukaryotes) is tagged by linking to a reference item for that particular taxonomy version to the 'parent taxon' statement in a 'stated in' reference. Ideally we should be able to specify which taxonomy version we want to use, if branches get moved around in the future...
Modeling:
- Sane way to model placeholder taxa and non-specific taxon statements? E.g. general statements like "all members of this family are associated with methanogens", without creating items for each individual?
- Modeling interactions: RO is insufficient?
- Add statements about metabolism ("phototroph", "nitrogen fixer") to symbiont items?
- Dummy taxon for "microbiome" to enable us to add references for microbiome studies?
- Which environmental material is most appropriate for protists that are symbionts located in digestive tract eQnvironment?
Draft annotation guidelines:
- Reference items should always have DOI as an alias, so it that users can easily check with the search bar if it is already used somewhere in the database
- Environmental origin of the organisms using EnvO terms, use guidelines aligned to MIxS: https://github.com/EnvironmentOntology/envo/wiki/ENVO-annotations-for-MIxS-v5
Export for Globi
SPARQL query to export table with fields used by Globi. The output may need to be processed further.
#List all interactions, optionally the localization, interaction type, and references
PREFIX pp: <https://ppsdb.wikibase.cloud/entity/>
PREFIX ppt: <https://ppsdb.wikibase.cloud/prop/direct/>
PREFIX pps: <https://ppsdb.wikibase.cloud/prop/>
PREFIX ppss: <https://ppsdb.wikibase.cloud/prop/statement/>
PREFIX ppsq: <https://ppsdb.wikibase.cloud/prop/qualifier/>
PREFIX ppsr: <https://ppsdb.wikibase.cloud/prop/reference/>
SELECT DISTINCT ?argumentTypeName ?sourceTaxon ?sourceTaxonName ?sourceWdmap ?sourceTaxonId ?typeLabel ?interactionTypeId ?targetTaxon ?targetTaxonName ?targetWdmap ?targetTaxonId ?sourceBodyPartName ?sourceBodyPartId ?referenceDoi ?referenceCitation WHERE {
?sourceTaxon pps:P19 ?interaction.
?interaction ppss:P19 ?targetTaxon.
OPTIONAL {
?interaction ppsq:P20 ?sourceBodyPart.
?sourceBodyPart rdfs:label ?sourceBodyPartName.
OPTIONAL { ?sourceBodyPart ppt:P17 ?sourceBodyPartId. }
OPTIONAL { ?sourceBodyPart ppt:P44 ?sourceBodyPartId. }
}
OPTIONAL {
?interaction ppsq:P26 ?type.
OPTIONAL { ?type ppt:P16 ?interactionTypeId. }
}
OPTIONAL {
?interaction prov:wasDerivedFrom ?refnode.
# OPTIONAL { ?refnode ppsr:P27 ?doi }
OPTIONAL {
?refnode ppsr:P23 ?statedIn.
OPTIONAL { ?statedIn ppt:P13 ?referenceDoi. }
OPTIONAL { ?statedIn ppt:P14 ?referenceCitation. }
BIND (STR("support") AS ?argumentTypeName)
}
OPTIONAL {
?refnode ppsr:P43 ?statedIn.
OPTIONAL { ?statedIn ppt:P13 ?referenceDoi. }
OPTIONAL { ?statedIn ppt:P14 ?referenceCitation. }
BIND (STR("refute") AS ?argumentTypeName)
}
}
OPTIONAL {
?sourceTaxon ppt:P11 ?sourceTaxon_ncbi.
BIND ( CONCAT("NCBI:txid", STR(?sourceTaxon_ncbi)) as ?sourceTaxonId )
}
OPTIONAL {
?targetTaxon ppt:P11 ?targetTaxon_ncbi.
BIND ( CONCAT("NCBI:txid", STR(?targetTaxon_ncbi)) as ?targetTaxonId )
}
?sourceTaxon rdfs:label ?sourceTaxonName .
OPTIONAL { ?targetTaxon rdfs:label ?targetTaxonName. }
OPTIONAL { ?sourceTaxon ppt:P2 ?sourceWdmap . }
OPTIONAL { ?targetTaxon ppt:P2 ?targetWdmap . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?sourceTaxonName ?targetTaxonName