{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Life Science ontologies and Knowledge Graphs, HANDS-ON\n", "\n", "*Alban Gaignard, alban.gaignard@univ-nantes.fr, Olivier Dameron, olivier.dameron@univ-rennes1.fr*\n", "\n", "*HANDS-ON session given as part of ETBII 2023*\n", "\n", "At the end of this hands-on session, you will be able to\n", " - explore and search publicly available biomedical ontologies, combine knowledge provided by multiple ontologies,\n", " - computationally exploit these ontologies: explore node neighborhood, navigate class hierarchies, retrieve synonyms,\n", " - understand how biochemical regulations are modeled in BioPAX,\n", " - assemble/summarize new graphs based on graph patterns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Exploring Life Science ontologies\n", "\n", "BioPortal (https://bioportal.bioontology.org) is a large repository of biomedical ontologies gathering 600+ ontologies and 8+ million classes. We will use this web resource to navigate and retrieve biomedical knowledge." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1\n", "Search for two definitions of “mitral valve prolapse”, coming from two different ontologies." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2\n", "In the human phenotype ontology, search for all sub-classes of “mitral stenosis”. You will use the “jump to” search box to directly display the corresponding class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3\n", "Still from the Human Phenotype Ontology, list “mitral valve prolapse” class mappings. Based on its corresponding class in the OMIM ontology (Online Mendelian Inheritance in Man), retrieve possibly involved genes. You will need to navigate through “manifestation of” and “gene symbol” properties." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Up to now we browsed web pages to retrieve biomedical knowledge. In the following questions, we will use the *beta* BioPortal SPARQL endpoint (http://sparql.bioontology.org) to automate information retrieval." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4\n", "Execute a SPARQL DESCRIBE query to display all the outgoing properties of “mitral valve prolapse” (```http://purl.bioontology.org/ontology/OMIM/MTHU001468```)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "# (Optionnal) Declare your namespaces here\n", "# General syntax: \n", "# PREFIX rdf: \n", "# For the usual namespaces, you can use http://prefix.cc/\n", "\n", "DESCRIBE \n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 5. \n", "Write a SPARQL SELECT query to list all genes possibly involved in mitral valve prolapse (you can execute a SPARQL DESCRIBE on a `manifestation` to find the property URI linking gene symbols." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 6\n", "Write a SPARQL SELECT query to list the direct subclasses of heart valve diseases and their synonyms. You will use the `DOID ontology`, the `rdfs:subClassOf` and `obo:hasExactSynonym` properties. Don’t forget to start by describing the `DOID_4079` resource (`DESCRIBE` query)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#DESCRIBE \n", "\n", "query = \"\"\"\n", "PREFIX rdfs: \n", "PREFIX obo: \n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Querying gene regulation resources \n", "\n", "We will now use PathwayCommons (http://www.pathwaycommons.org), an RDF dataset used to integrated biological signaling pathways (5,772 pathways, 2,424,055 interactions) from 22 regulation data sources. PathwayCommons can be queried through its SPARQL endpoint (http://rdf.pathwaycommons.org/sparql). Since this server is down (jan 2023), we will use this mirror deployed at IFB : http://134.214.213.234/sparql \n", "\n", "PathwayCommons uses the BioPAX ontology to represent regulation and signaling knowledge. Have a look on Figure 3 and Figure 4 of the BioPAX paper (https://www.researchgate.net/publication/46191859_BioPAX_-_A_community_standard_for_pathway_data_sharing) to have a quick overview of BioPAX. \n", "\n", "We are interested in **activation** or **inhibition** gene regulations. The following *turtle* syntax shows how they can be represented in BioPAX. \n", "\n", "```\n", "@prefix rdf: . \n", "@prefix pc2: .\n", "@prefix bp: .\n", "@prefix xsd: .\n", "\n", "pc2:TemplateReactionRegulation_145afc203ffa1cb5a00fa445a5a63c64 rdf:type bp:TemplateReactionRegulation .\n", "\n", "pc2:TemplateReactionRegulation_145afc203ffa1cb5a00fa445a5a63c64 bp:comment \" REPLACED http://www.ctdbase.org/#EXP_2556811\"^^xsd:string ;\n", " bp:controlType \"ACTIVATION\"^^xsd:string ;\n", " bp:controlled pc2:TemplateReaction_9129e01e3c570005d993c44e75b2134c ; \n", " bp:controller pc2:SmallMolecule_38f43f385b04215c6ac89a44cdc22f67 ; \n", " bp:dataSource pc2:ctd ;\n", " bp:displayName \"Phenobarbital results in increased expression of CYP2B6 protein \"^^xsd:string ;\n", " bp:xref , \n", " , \n", " , \n", " ,\n", " , \n", " , \n", " , \n", " , \n", " , \n", " .\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 7\n", "On a piece of paper, draw the corresponding directed labelled graph. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 8\n", "1. Test the `DESCRIBE pc2:TemplateReactionRegulation_145afc203ffa1cb5a00fa445a5a63c64` query directly at \n", "2. Test the same query directly in this notebook: \n", "```python\n", "query = \"YOUR QUERY\"\n", "sparql = SPARQLWrapper(\"http://134.214.213.234/sparql\")\n", "sparql.setQuery(query)\n", "results = sparql.queryAndConvert()\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'SPARQLWrapper' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn [4], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m query \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\"\"\u001b[39m\n\u001b[1;32m 2\u001b[0m \n\u001b[1;32m 3\u001b[0m \u001b[38;5;124m\"\"\"\u001b[39m\n\u001b[0;32m----> 4\u001b[0m sparql \u001b[38;5;241m=\u001b[39m \u001b[43mSPARQLWrapper\u001b[49m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhttp://134.214.213.234/sparql\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 5\u001b[0m sparql\u001b[38;5;241m.\u001b[39msetQuery(query)\n\u001b[1;32m 6\u001b[0m results \u001b[38;5;241m=\u001b[39m sparql\u001b[38;5;241m.\u001b[39mqueryAndConvert()\n", "\u001b[0;31mNameError\u001b[0m: name 'SPARQLWrapper' is not defined" ] } ], "source": [ "query = \"\"\"\n", "\n", "\"\"\"\n", "sparql = SPARQLWrapper(\"http://134.214.213.234/sparql\")\n", "sparql.setQuery(query)\n", "results = sparql.queryAndConvert()\n", "print(results.serialize(format=\"turtle\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**In the remainder of the question we will use the SPARQL endpoint web interface ().**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 9\n", "Based on this description, write a query to show the names of all genes that regulate (activation or inhibition) SCN5A. We will proceed in multiple progressive steps. \n", "\n", "1. identify regulation reactions with resources of type `bp:TemplateReactionRegulation` (don’t forget to use a LIMIT 10 to get fast results)\n", "2. show their control type (`bp:controlType` property) and filter only “activation” or “inhibition”.\n", "3. show the associated scientific publication with the `bp:xref` property. Make sure that “pubmed” is contained in its URI (you can use a FILTER fonction: `FILTER (regex(?publi, \"pubmed\"))`).\n", "4. identify the source of the regulation (`bp:controller`) and its display name (`bp:displayName`).\n", "5. identify the target of the regulation (`bp:controlled`) and its display name (`bp:displayName`). Make sure (FILTER) that the display name is our target gene: SCN5A." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 9.1\n", "Identify regulation reactions with resources of type `bp:TemplateReactionRegulation` (don’t forget to use a LIMIT 10 to get fast results)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "PREFIX bp: \n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 9.2\n", "Show their control type (`bp:controlType` property). You can filter only “ACTIVATION” or “INHIBITION”. \n", "Use a FILTER clause, the or `||` operator and `regex(variable,\"pattern\")` function. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "PREFIX bp: \n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 9.3\n", "Show the associated scientific publication with the `bp:xref` property. Make sure that “pubmed” is contained in its URI. Use a FILTER clause, and a `regex(variable,\"pattern\")` function. We want to get reaction, even if no publication isassociated, for that we will enclose the triple pattern within an `OPTIONAL {}` clause." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "PREFIX bp: \n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Question 9.4\n", "Identify the source of the regulation (`bp:controller`) and its display name (`bp:displayName`)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "PREFIX bp: \n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Question 9.5\n", "Identify the target of the regulation (`bp:controlled`) and its display name (`bp:displayName`). Make sure (FILTER) that the display name is our target gene: SCN5A." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "PREFIX bp: \n", "\n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 10\n", "From the previous query, retrieve a tabular file (CSV) with 3 columns for the source name, the regulation type, and the target name. Use the http://app.rawgraphs.io web tool to generate an alluvial flow chart which displays the relations between the source and target nodes. You should obtain something like : \n", "![:scale 50%](fig/viz.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 11\n", "Use what you learned (recently) to produce a SIF file for this regulation information and display it through Cytoscape. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "queryString = \"\"\"\n", "\n", "\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "## Here we create the output SIF file:\n", "import csv\n", "\n", "with open('output.sif', 'wt') as out_file:\n", " tsv_writer = csv.writer(out_file, delimiter='\\t')\n", " # ..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }