Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save slarson/6745890 to your computer and use it in GitHub Desktop.
Save slarson/6745890 to your computer and use it in GitHub Desktop.
iPython notebook example showing how to create simple RDF from the OpenWorm Semantic Database via Python
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Creating RDF from OpenWorm Semantic Database via Python"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "An example of using a [Python MySQL library](https://github.com/PyMySQL/PyMySQL) to access the [OpenWorm Semantic Database](http://www.interintelligence.org/openworm/). Further down we use RDFlib to begin generating RDF from this database."
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Make sure we have the right libraries installed:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!sudo pip install PyMySQL",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Downloading/unpacking PyMySQL\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " Downloading PyMySQL-0.5.tar.gz\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " Running setup.py egg_info for package PyMySQL\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \r\nInstalling collected packages: PyMySQL\r\n Running setup.py install for PyMySQL\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "Successfully installed PyMySQL\r\nCleaning up...\r\n"
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": "!sudo pip install rdflib",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Requirement already satisfied (use --upgrade to upgrade): rdflib in /usr/local/lib/python2.7/dist-packages\r\nRequirement already satisfied (use --upgrade to upgrade): isodate in /usr/local/lib/python2.7/dist-packages (from rdflib)\r\nRequirement already satisfied (use --upgrade to upgrade): pyparsing<=1.5.7 in /usr/local/lib/python2.7/dist-packages (from rdflib)\r\nRequirement already satisfied (use --upgrade to upgrade): SPARQLWrapper in /usr/local/lib/python2.7/dist-packages (from rdflib)\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "Requirement already satisfied (use --upgrade to upgrade): html5lib in /usr/local/lib/python2.7/dist-packages (from rdflib)\r\nRequirement already satisfied (use --upgrade to upgrade): six in /usr/local/lib/python2.7/dist-packages (from html5lib->rdflib)\r\nCleaning up...\r\n"
}
],
"prompt_number": 37
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Simple example of pulling data out of the database -- no RDF yet."
},
{
"cell_type": "code",
"collapsed": false,
"input": "#!/usr/bin/env python\n\nimport pymysql\n\n#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')\n\nconn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')\n \ncur = conn.cursor()\n\n#let's look at IDs and names coming from the database\ncur.execute(\"SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10\")\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n print(r)\n\ncur.close()\nconn.close()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "(1L, 'Neuron')\n(2L, 'Interneuron')\n(3L, 'Sensory')\n(4L, 'Motor')\n(5L, 'Interneuron/Motor')\n(6L, 'Pharyngeal interneuron')\n(7L, 'Sensory/Interneuron/Motor')\n(8L, 'Sensory/Interneuron')\n(9L, 'Ring Motor')\n(10L, 'Ring/Pharynx')\n"
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now a super simple example drawing from a basic approach to [create RDF triples](https://rdflib.readthedocs.org/en/latest/intro_to_creating_rdf.html#adding-triples). This demonstrates we know how to build up a simple URIRef with an OpenWorm namespace."
},
{
"cell_type": "code",
"collapsed": false,
"input": "from rdflib import Graph\nfrom rdflib import Namespace\nfrom rdflib.namespace import RDF\nfrom rdflib import URIRef, BNode, Literal\n\ng = Graph()\n\nn = Namespace(\"http://openworm.org/entities/\")\n\nprint(n['worm'])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "http://openworm.org/entities/worm\n"
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now we combine the last two examples together and we make a simple RDF document"
},
{
"cell_type": "code",
"collapsed": false,
"input": "#!/usr/bin/env python\n\nimport pymysql\nfrom rdflib import Graph\nfrom rdflib import Namespace\nfrom rdflib.namespace import RDF, RDFS\nfrom rdflib import URIRef, BNode, Literal\nimport urllib\n\n#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')\n\nconn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')\n \ncur = conn.cursor()\n\ncur.execute(\"SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10\")\n\nn = Namespace(\"http://openworm.org/entities/\")\n\n# print cur.description\n\ng = Graph()\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n #first item is a number -- needs to be converted to a string\n first = str(r[0])\n #second item is text \n second = str(r[1])\n \n # This is the backbone of any RDF graph. The unique\n # ID for each entity is encoded as a URI and every other piece of \n # knowledge about that entity is connected via triples to that URI\n # In this case, we connect the common name of that entity to the \n # root URI via the RDFS label property.\n g.add( (n[first], RDFS.label, Literal(second)) )\n\ncur.close()\nconn.close()\n\nprint g.serialize(format='turtle')",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n@prefix xml: <http://www.w3.org/XML/1998/namespace> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\n<http://openworm.org/entities/1> rdfs:label \"Neuron\" .\n\n<http://openworm.org/entities/10> rdfs:label \"Ring/Pharynx\" .\n\n<http://openworm.org/entities/2> rdfs:label \"Interneuron\" .\n\n<http://openworm.org/entities/3> rdfs:label \"Sensory\" .\n\n<http://openworm.org/entities/4> rdfs:label \"Motor\" .\n\n<http://openworm.org/entities/5> rdfs:label \"Interneuron/Motor\" .\n\n<http://openworm.org/entities/6> rdfs:label \"Pharyngeal interneuron\" .\n\n<http://openworm.org/entities/7> rdfs:label \"Sensory/Interneuron/Motor\" .\n\n<http://openworm.org/entities/8> rdfs:label \"Sensory/Interneuron\" .\n\n<http://openworm.org/entities/9> rdfs:label \"Ring Motor\" .\n\n\n"
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This example pulls data from the relationship table in the OpenWorm semantic database"
},
{
"cell_type": "code",
"collapsed": false,
"input": "#!/usr/bin/env python\n\nimport pymysql\nfrom rdflib import Graph\nfrom rdflib import Namespace\nfrom rdflib.namespace import RDF, RDFS\nfrom rdflib import URIRef, BNode, Literal\nimport urllib\n\n#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')\n\nconn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')\n \ncur = conn.cursor()\n\n#first step, grab all entities and add them to the graph\n\ncur.execute(\"SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10\")\n\nn = Namespace(\"http://openworm.org/entities/\")\n\n# print cur.description\n\ng = Graph()\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n #first item is a number -- needs to be converted to a string\n first = str(r[0])\n #second item is text \n second = str(r[1])\n \n # This is the backbone of any RDF graph. The unique\n # ID for each entity is encoded as a URI and every other piece of \n # knowledge about that entity is connected via triples to that URI\n # In this case, we connect the common name of that entity to the \n # root URI via the RDFS label property.\n g.add( (n[first], RDFS.label, Literal(second)) )\n\n\n#second stem, get the relationships between them and add them to the graph\ncur.execute(\"SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship LIMIT 10\")\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n print r\n #all items are numbers -- need to be converted to a string\n first = str(r[0])\n second = str(r[1])\n third = str(r[2])\n \n g.add( (n[first], n[second], n[third]) )\n\ncur.close()\nconn.close()\n\nprint g.serialize(format='turtle')",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "(39L, 1516L, 1438L)\n(39L, 1516L, 1441L)\n(48L, 1516L, 1428L)\n(48L, 1516L, 1429L)\n(48L, 1516L, 1452L)\n(48L, 1516L, 1453L)\n(49L, 1516L, 1430L)\n(49L, 1516L, 1431L)\n(49L, 1516L, 1432L)\n(49L, 1516L, 1433L)\n@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n@prefix xml: <http://www.w3.org/XML/1998/namespace> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\n<http://openworm.org/entities/1> rdfs:label \"Neuron\" .\n\n<http://openworm.org/entities/10> rdfs:label \"Ring/Pharynx\" .\n\n<http://openworm.org/entities/2> rdfs:label \"Interneuron\" .\n\n<http://openworm.org/entities/3> rdfs:label \"Sensory\" .\n\n<http://openworm.org/entities/39> <http://openworm.org/entities/1516> <http://openworm.org/entities/1438>,\n <http://openworm.org/entities/1441> .\n\n<http://openworm.org/entities/4> rdfs:label \"Motor\" .\n\n<http://openworm.org/entities/48> <http://openworm.org/entities/1516> <http://openworm.org/entities/1428>,\n <http://openworm.org/entities/1429>,\n <http://openworm.org/entities/1452>,\n <http://openworm.org/entities/1453> .\n\n<http://openworm.org/entities/49> <http://openworm.org/entities/1516> <http://openworm.org/entities/1430>,\n <http://openworm.org/entities/1431>,\n <http://openworm.org/entities/1432>,\n <http://openworm.org/entities/1433> .\n\n<http://openworm.org/entities/5> rdfs:label \"Interneuron/Motor\" .\n\n<http://openworm.org/entities/6> rdfs:label \"Pharyngeal interneuron\" .\n\n<http://openworm.org/entities/7> rdfs:label \"Sensory/Interneuron/Motor\" .\n\n<http://openworm.org/entities/8> rdfs:label \"Sensory/Interneuron\" .\n\n<http://openworm.org/entities/9> rdfs:label \"Ring Motor\" .\n\n\n"
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This example pulls information from the entity table and the relationship table together in the same graph and performs a very simple SPARQL query on the new graph to demonstrate it works"
},
{
"cell_type": "code",
"collapsed": false,
"input": "#!/usr/bin/env python\n\nimport pymysql\nfrom rdflib import Graph\nfrom rdflib import Namespace\nfrom rdflib.namespace import RDF, RDFS\nfrom rdflib import URIRef, BNode, Literal\nimport urllib\n\n#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')\n\nconn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')\n \ncur = conn.cursor()\n\n#first step, grab all entities and add them to the graph\n\ncur.execute(\"SELECT DISTINCT ID, Entity FROM tblentity\")\n\nn = Namespace(\"http://openworm.org/entities/\")\n\n# print cur.description\n\ng = Graph()\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n #first item is a number -- needs to be converted to a string\n first = str(r[0])\n #second item is text \n second = str(r[1])\n \n # This is the backbone of any RDF graph. The unique\n # ID for each entity is encoded as a URI and every other piece of \n # knowledge about that entity is connected via triples to that URI\n # In this case, we connect the common name of that entity to the \n # root URI via the RDFS label property.\n g.add( (n[first], RDFS.label, Literal(second)) )\n\n\n#second stem, get the relationships between them and add them to the graph\ncur.execute(\"SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship\")\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n #print r\n #all items are numbers -- need to be converted to a string\n first = str(r[0])\n second = str(r[1])\n third = str(r[2])\n \n g.add( (n[first], n[second], n[third]) )\n\ncur.close()\nconn.close()\n\nprint(\"going to get results...\")\n\nqres = g.query(\n \"\"\"SELECT ?subject ?predicate ?object\n WHERE {\n ?subject ?predicate ?object.\n \t} LIMIT 5\"\"\")\n\nprint(\"printing results\")\n\n\nprint(\"The graph has \" + str(len(g)) + \" items in it\")\n\nprint(\"Name--not necessarily in strign format: \")\nprint(qres.result[0])\n\n# when done!\ng.close()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "going to get results...\nprinting results\nThe graph has 5773 items in it\nName--not necessarily in strign format: \n(rdflib.term.URIRef(u'http://openworm.org/entities/190'), rdflib.term.URIRef(u'http://openworm.org/entities/356'), rdflib.term.URIRef(u'http://openworm.org/entities/11'))\n"
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "#!/usr/bin/env python\n\nimport pymysql\nfrom rdflib import Graph\nfrom rdflib import Namespace\nfrom rdflib.namespace import RDF, RDFS\nfrom rdflib import URIRef, BNode, Literal\nimport urllib\n\n#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')\n\nconn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')\n \ncur = conn.cursor()\n\n#first step, grab all entities and add them to the graph\n\ncur.execute(\"SELECT DISTINCT ID, Entity FROM tblentity\")\n\nn = Namespace(\"http://openworm.org/entities/\")\n\n# print cur.description\n\ng = Graph()\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n #first item is a number -- needs to be converted to a string\n first = str(r[0])\n #second item is text \n second = str(r[1])\n \n # This is the backbone of any RDF graph. The unique\n # ID for each entity is encoded as a URI and every other piece of \n # knowledge about that entity is connected via triples to that URI\n # In this case, we connect the common name of that entity to the \n # root URI via the RDFS label property.\n g.add( (n[first], RDFS.label, Literal(second)) )\n\n\n#second stem, get the relationships between them and add them to the graph\ncur.execute(\"SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship\")\n\n# r = cur.fetchall()\n# print r\n# ...or...\nfor r in cur.fetchall():\n #print r\n #all items are numbers -- need to be converted to a string\n first = str(r[0])\n second = str(r[1])\n third = str(r[2])\n \n g.add( (n[first], n[second], n[third]) )\n\ncur.close()\nconn.close()\n\nprint(\"going to get results...\")\n\nqres = g.query(\n \"\"\"SELECT ?predLabel ?objLabel #we want to get out the labels associated with the predicates and the objects\n WHERE {\n ?AVALnode ?p \"AVAL\". #we are looking first for the 'AVALnode' that is the anchor of all information about the AVAL neuron\n ?AVALnode ?predicate ?object .# having identified that node, here we match any predicate and object associated with the AVALnode\n ?predicate rdfs:label ?predLabel .#for the predicates, look up their plain text label (because otherwise we only have URIs)\n ?object rdfs:label ?objLabel #for the object, look up their plain text label.\n \t} ORDER BY ?predLabel #sort by the predicate\"\"\") \n\nprint(\"printing results\")\n\n\nprint(\"The graph has \" + str(len(g)) + \" items in it\\n\\n\")\n\nprint \"AVAL has the following information stored about it: \\n\"\nfor r in qres.result:\n print str(r[0]), str(r[1])\n\n# when done!\ng.close()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "going to get results...\nprinting results\nThe graph has 5773 items in it\n\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nAVAL has the following information stored about it: \n\nNeuropeptide FLP-1\nNeuropeptide FLP-18\nReceptor GLR-5\nReceptor NMR-1\nReceptor NMR-2\nReceptor GLR-2\nReceptor GLR-1\nReceptor GGR-3\nReceptor GLR-4\nReceptor UNC-8\ngap junction VA8\ngap junction DA7\ngap junction SABD\ngap junction PVCL\ngap junction AVAR\ngap junction URYDL\ngap junction URYVR\ngap junction VA4\ngap junction AS5\ngap junction VA3\ngap junction VB9\ngap junction DA5\ngap junction SDQR\ngap junction VA1\ngap junction VA5\ngap junction VA6\ngap junction DA6\ngap junction PVPL\ngap junction VA7\ngap junction AS8\ngap junction VA11\ngap junction PVCR\ngap junction AS1\ngap junction DB6\ngap junction AVJL\ngap junction RIMR\ngap junction VA10\ngap junction VA9\ngap junction SABVR\ngap junction FLPL\ngap junction AS3\ngap junction DA2\ngap junction VA2\ngap junction DA1\ngap junction LUAL\ngap junction AS9\ngap junction DA4\ngap junction AS10\ngap junction AS7\ngap junction AS6\nis a Neuron\nsend AS5\nsend AVBR\nsend DA6\nsend LUAL\nsend AS4\nsend AS11\nsend DB5\nsend AS3\nsend VA11\nsend VA9\nsend VA8\nsend AVDL\nsend VA2\nsend DA3\nsend AS10\nsend AVHL\nsend VA6\nsend VA3\nsend AS2\nsend DA9\nsend AS9\nsend AS7\nsend VA10\nsend DB6\nsend DA5\nsend VA5\nsend DA1\nsend DA2\nsend DA7\nsend AS8\nsend DA4\nsend DA8\nsend VA4\nsend VA12\nsend PVCL\nsend AVAR\nsend PVCR\n"
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment