# Life Science ontologies and Knowledge Graphs, HANDS-ON
*HANDS-ON session given as part of ETBII 2024*

At the end of this hands-on session, you will be able to
 - explore and search publicly available biomedical ontologies, combine knowledge provided by multiple ontologies,
 - computationally exploit these ontologies: explore node neighborhood, navigate class hierarchies, retrieve synonyms,
 - understand how biochemical regulations are modeled in BioPAX,
 - assemble/summarize new graphs based on graph patterns.

# 1. Querying gene regulation resources  


We are interesed in regulators for SCN5A, a gene involved in cardiac arrhythmias. We would like build a diagram like : 

![:scale 50%](fig/viz.png)

We will now use PathwayCommons (http://www.pathwaycommons.org), an RDF dataset used to integrated biological signaling pathways (5,772 pathways, 2,424,055 interactions) from 22 regulation data sources. We will use this SPARQL endpoint: https://abromics.gcp.glicid.fr/sparql  

PathwayCommons uses the BioPAX ontology to represent regulation and signaling knowledge. Have a look on Figure 3 and Figure 4 of the BioPAX paper (https://www.researchgate.net/publication/46191859_BioPAX_-_A_community_standard_for_pathway_data_sharing) to have a quick overview of BioPAX. 

We are interested in **activation** or **inhibition** gene regulations. The following *turtle* syntax shows how they can be represented in BioPAX. 

```
@prefix rdf:	<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix pc:	<http://pathwaycommons.org/pc12/#> .
@prefix bp:	<http://www.biopax.org/release/biopax-level3.owl#> .
@prefix xsd:	<http://www.w3.org/2001/XMLSchema#> .

pc:TemplateReactionRegulation_3906f4646d89f7cf2b1316601aa946d4
	rdf:type	bp:TemplateReactionRegulation ;
	bp:comment	"REPLACED http://pathwaycommons.org/pc12/TemplateReactionRegulation_3906f4646d89f7cf2b1316601aa946d4"^^xsd:string ;
	bp:controlType	"ACTIVATION"^^xsd:string ;
	bp:controlled	pc:TemplateReaction_8baa2a60259c4221da8a2304b1d9f1fd ;
	bp:controller	pc:SmallMolecule_59bf598d207e0612414e22fd11aa3ccb ;
	bp:dataSource	pc:Provenance_b3c525b2c0027b58914b1a79ecc37320 ;
	bp:displayName	"Phenobarbital results in increased expression of CYP2B6 protein"^^xsd:string ;
	bp:xref	<http://identifiers.org/pubmed/19084549> ,
            <http://identifiers.org/pubmed/21778469> ,
            <http://identifiers.org/pubmed/21227907> ,
            <http://identifiers.org/pubmed/15548381> ,
            <http://identifiers.org/pubmed/12571232> ,
            <http://identifiers.org/pubmed/14977870> ,
            <http://identifiers.org/pubmed/19952500> ,
            <http://identifiers.org/pubmed/20361990> ,
            <http://identifiers.org/pubmed/25512232> ,
            <http://identifiers.org/pubmed/24224465> .
```


## Question 1
On a piece of paper, draw the corresponding directed labelled graph. 

## Question 2
1. Test the `DESCRIBE pc:TemplateReactionRegulation_3906f4646d89f7cf2b1316601aa946d4` query directly at <https://abromics.gcp.glicid.fr/sparql>
2. Test the same query directly in this notebook: 
```python
query = "YOUR QUERY"
sparql = SPARQLWrapper("https://abromics.gcp.glicid.fr/sparql")
sparql.setQuery(query)
results = sparql.queryAndConvert()
```

In [None]:
from SPARQLWrapper import SPARQLWrapper

query = """
...
"""
#sparql = SPARQLWrapper("https://abromics.gcp.glicid.fr/sparql")
#sparql.setQuery(query)
#results = sparql.queryAndConvert()
#print(results.serialize(format="turtle"))

**In the remainder of the question we will use the SPARQL endpoint web interface (<https://abromics.gcp.glicid.fr/sparql>).**

## Question 3
Based on this description, write a query to show the names of all genes that regulate (activation or inhibition) SCN5A. We will proceed in multiple progressive steps. 

1. identify regulation reactions with resources of type `bp:TemplateReactionRegulation` (don’t forget to use a LIMIT 10 to get fast results)
2. show their control type (`bp:controlType` property) and filter only “activation” or “inhibition”.
3. show the associated scientific publication with the `bp:xref` property. Make sure that “pubmed” is contained in its URI (you can use a FILTER fonction: `FILTER (regex(?publi, "pubmed"))`).
4. identify the source of the regulation (`bp:controller`) and its display name (`bp:displayName`).
5. identify the target of the regulation (`bp:controlled`) and its display name (`bp:displayName`). Make sure (FILTER) that the display name is our target gene: SCN5A.

###  Question 3.1
Identify regulation reactions with resources of type `bp:TemplateReactionRegulation` (don’t forget to use a LIMIT 10 to get fast results)

In [None]:
query = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>

SELECT * WHERE {
    ...
} 
LIMIT 10
"""

In [None]:
# SOLUTION:
!echo "ClBSRUZJWCBicDogPGh0dHA6Ly93d3cuYmlvcGF4Lm9yZy9yZWxlYXNlL2Jpb3BheC1sZXZlbDMub3dsIz4KClNFTEVDVCAqIFdIRVJFIHsKICAgID9yZWd1bCByZGY6dHlwZSBicDpUZW1wbGF0ZVJlYWN0aW9uUmVndWxhdGlvbiAuIAp9IApMSU1JVCAxMAo=" | base64 --decode

###  Question 9.2
Show their control type (`bp:controlType` property). You can filter only “ACTIVATION” or “INHIBITION”. 
Use a FILTER clause, the or  `||` operator and `regex(variable,"pattern")` function. 

In [None]:
query = """
...
"""

In [None]:
# SOLUTION:
!echo "UFJFRklYIGJwOiA8aHR0cDovL3d3dy5iaW9wYXgub3JnL3JlbGVhc2UvYmlvcGF4LWxldmVsMy5vd2wjPgoKU0VMRUNUICogV0hFUkUgewogICAgP3JlZ3VsIHJkZjp0eXBlIGJwOlRlbXBsYXRlUmVhY3Rpb25SZWd1bGF0aW9uIDsgCiAgICAgICAgICAgICAgICBicDpjb250cm9sVHlwZSA/dCAuCiAgICBGSUxURVIgKHJlZ2V4KD90LCBBQ1RJVikpCn0gCkxJTUlUIDEw" | base64 --decode

### Question 3.3
Show the associated scientific publication with the `bp:xref` property. Make sure that “pubmed” is contained in its URI. Use a FILTER clause, and a `regex(variable,"pattern")` function. We want to get reaction, even  if no publication isassociated, for that we will enclose the triple pattern within an `OPTIONAL {}` clause.

In [None]:
query = """
...
"""

In [None]:
# SOLUTION:
!echo "ClBSRUZJWCBicDogPGh0dHA6Ly93d3cuYmlvcGF4Lm9yZy9yZWxlYXNlL2Jpb3BheC1sZXZlbDMub3dsIz4KClNFTEVDVCAqIFdIRVJFIHsKICAgID9yZWd1bCByZGY6dHlwZSBicDpUZW1wbGF0ZVJlYWN0aW9uUmVndWxhdGlvbiA7IAogICAgICAgICAgICAgICAgYnA6Y29udHJvbFR5cGUgP3QgLgogICAgRklMVEVSIChyZWdleCg/dCwgQUNUSVYpKQoKICAgID9yZWd1bCBicDp4cmVmID9wdWJsaSAuIAogICAgRklMVEVSIChyZWdleCg/cHVibGksIHB1Ym1lZCkpCn0gCg==" | base64 --decode

### Question 3.4
Identify the source of the regulation (`bp:controller`) and its display name (`bp:displayName`).

In [None]:
query = """
...
"""

In [None]:
# SOLUTION:
!echo "UFJFRklYIGJwOiA8aHR0cDovL3d3dy5iaW9wYXgub3JnL3JlbGVhc2UvYmlvcGF4LWxldmVsMy5vd2wjPgoKU0VMRUNUICogV0hFUkUgewogICAgP3JlZ3VsIHJkZjp0eXBlIGJwOlRlbXBsYXRlUmVhY3Rpb25SZWd1bGF0aW9uIDsgCiAgICAgICAgICAgICAgICBicDpjb250cm9sVHlwZSA/dCAuCiAgICBGSUxURVIgKHJlZ2V4KD90LCBBQ1RJVikpCgogICAgP3JlZ3VsIGJwOnhyZWYgP3B1YmxpIC4gCiAgICBGSUxURVIgKHJlZ2V4KD9wdWJsaSwgcHVibWVkKSkKCiAgICA/cmVndWwgYnA6Y29udHJvbGxlciA/c291cmNlIC4gCiAgICA/c291cmNlIGJwOmRpc3BsYXlOYW1lID9zb3VyY2VfbmFtZSAuIAp9IApMSU1JVCAxMA==" | base64 --decode

### Question 3.5
Identify the target of the regulation (`bp:controlled`) and its display name (`bp:displayName`). Make sure (FILTER) that the display name is our target gene: SCN5A.

In [None]:
query = """
...
"""

In [None]:
# SOLUTION:
!echo "UFJFRklYIGJwOiA8aHR0cDovL3d3dy5iaW9wYXgub3JnL3JlbGVhc2UvYmlvcGF4LWxldmVsMy5vd2wjPgoKU0VMRUNUICogV0hFUkUgewogICAgP3JlZ3VsIHJkZjp0eXBlIGJwOlRlbXBsYXRlUmVhY3Rpb25SZWd1bGF0aW9uIC4gCiAgICA/cmVndWwgYnA6ZGF0YVNvdXJjZSA/ZHMgLiAKICAgID9yZWd1bCBicDpjb250cm9sVHlwZSA/dHlwZSAuCiAgICBGSUxURVIgKHJlZ2V4KD90eXBlLCBBQ1RJViwgaSkgfHwgcmVnZXgoP3R5cGUsIElOSElCLCBpKSkKCiAgICA/cmVndWwgYnA6eHJlZiA/cHVibGkgLiAKICAgIEZJTFRFUiAocmVnZXgoP3B1YmxpLCBwdWJtZWQpKQoKICAgID9yZWd1bCBicDpjb250cm9sbGVyID9zb3VyY2UgLiAKICAgID9zb3VyY2UgYnA6ZGlzcGxheU5hbWUgP3NvdXJjZV9uYW1lIC4gCiAgICA/cmVndWwgYnA6Y29udHJvbGxlZCA/dGFyZ2V0IC4gCiAgICA/dGFyZ2V0IGJwOmRpc3BsYXlOYW1lID90YXJnZXRfbmFtZSAuIAogICAgRklMVEVSIChyZWdleCg/dGFyZ2V0X25hbWUsIFNDTjVBKSkKfSA=" | base64 --decode

## Question 4
From the previous query, retrieve a tabular file (CSV) with 3 columns for the source name, the regulation type, and the target name. Use the http://app.rawgraphs.io web tool to generate an alluvial flow chart which displays the relations between the source and target nodes. 

# 2. Exploring Life Science ontologies

BioPortal (https://bioportal.bioontology.org) is a large repository of biomedical ontologies gathering 600+ ontologies and 8+ million classes. We will use this web resource to navigate and retrieve biomedical knowledge.

## Question 5
Search for two definitions of “mitral valve prolapse”, coming from two different ontologies.

## Question 6
In the human phenotype ontology, search for all sub-classes of “mitral stenosis”. You will use the “jump to” search box to directly display the corresponding class.

## Question 7
Still from the Human Phenotype Ontology, list “mitral valve prolapse” class mappings. Based on its corresponding class in the OMIM ontology (Online Mendelian Inheritance in Man), retrieve possibly involved genes. You will need to navigate through “manifestation of” and “gene symbol” properties.