Created
October 3, 2013 22:30
-
-
Save slarson/6818123 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "metadata": { | |
| "name": "BAMS Thesaurus examples" | |
| }, | |
| "nbformat": 3, | |
| "nbformat_minor": 0, | |
| "worksheets": [ | |
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Some examples of doing SPARQL queries on the [BAMS Thesaurus](http://brancusi1.usc.edu/thesaurus/list/) RDF" | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": " Navigating to the code" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "%cd src", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "/home/ubuntu/ipython/BAMS-to-NeuroLex/src\n" | |
| } | |
| ], | |
| "prompt_number": 3 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "!ls", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "BAMS_Data_Queries.py\t\t HelloWorld_On_BAMS_VF.py\r\nBAMS_Experimental_Queries.py\t HelloWorld.py\r\nExperimental_BAMS_Data_Search.py HelloWorld_to_BAMS_Final.py\r\nget-pip.py\t\t\t rdflib_graph.pickle\r\nHelloWorld2.py\t\t\t SPARQL_BAMS_Basal_Ganglia.py\r\nHelloWorld2_V2.py\t\t SPARQL_BAMS_EXAMPLE.py\r\nHelloWorld3_V1.py\t\t SPARQL_BAMS_Store_Persist_Example.py\r\nHelloWorld_On_BAMS.py\t\t SPARQL_BAMS_Store_Query_Example.py\r\nHelloWorld_On_BAMS_V1.py\t Unzip_BAMS_Example.py\r\nHelloWorld_On_Bams_V2.py\r\n" | |
| } | |
| ], | |
| "prompt_number": 4 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Load up the example that persists the BAMS thesaurus RDF graph to disk for fast queries:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "%load SPARQL_BAMS_Store_Persist_Example.py", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 5 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "#SPARQL_BAMS_Store_Persist_Example.py\n#This program is used to open up BAMS data, persist it to a store for speed up of querying\n\n\n#For Parsing\nimport rdflib\nfrom rdflib import plugin\n\n#for getting the length of the files\nimport os\n\n#for working with tempfiles\nimport os.path as op\nimport tempfile\n\n#For Unzipping\nimport zipfile\nfrom StringIO import StringIO\n\nplugin.register(\n 'sparql', rdflib.query.Processor,\n 'rdfextras.sparql.processor', 'Processor')\nplugin.register(\n 'sparql', rdflib.query.Result,\n 'rdfextras.sparql.query', 'SPARQLQueryResult')\n\nzipdata = StringIO()\n\n# open the file using a relative path\n#r = open(\"../Data/BAMS1.zip\")\n\n# adding the BAMS Thesaurus instead of the more limited set of data:\nr = open(\"../Data/bams_thesaurus_2013-09-24_17-12-40.xml.zip\")\n\n# zipdata is a buffer holding the contents of the zip file in memory\nzipdata.write(r.read())\n\nprint(\"~40 seconds for zip to open...\")\n\n#myzipfile opens the contents of the zip file as an object that knows how to unzip\nmyzipfile = zipfile.ZipFile(zipdata)\n\n#grab the contents out of myzipfile by name\n#foofile = myzipfile.open('bams_ontology_2013-07-10_03-20-00.xml')\n\n#changing the foofile to be the file we upen above^^^^^ in r = open()....etc.\nfoofile = myzipfile.open('bams_thesaurus_2013-09-24_17-12-40.xml')\n\nprint(\"loading up the BAMS file in memory...\")\n\n#Get a Graph object using a Sleepycat persistent store\ng = rdflib.Graph('Sleepycat',identifier='BAMS')\n\n# first time create the store\n# put the store in a temp directory so it doesn't get confused with stuff we should commit\ntempStore = op.join( tempfile.gettempdir(), 'myRDF_BAMS_Thesaurus_Store')\ng.open(tempStore, create = True)\n\n#pull in the BAMS RDF document, parse, and store.\n#result = g.parse(file=myzipfile.open('bams_ontology_2013-07-10_03-20-00.xml'), format=\"application/rdf+xml\")\n\n#do the same thing but with the BAMS thesaurus file\nresult = g.parse(file=myzipfile.open('bams_thesaurus_2013-09-24_17-12-40.xml'), format=\"application/rdf+xml\")\n\n\nfoofile.close()\n\n# when done!\ng.close()\n\nprint(\"Graph stored to disk\")", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "~40 seconds for zip to open...\nloading up the BAMS file in memory...\nGraph stored to disk" | |
| }, | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "\n" | |
| } | |
| ], | |
| "prompt_number": 6 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Now that the graph is stored to disk, let's play with some queries:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "%load SPARQL_BAMS_Store_Query_Example.py", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 7 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "#SPARQL_BAMS_Store_Query_Example.py\n\n#Accessing Python Interactive Mode:\n#python -i SPARQL_BAMS_Store_Query_Example.py\n#This program demonstrates a basic query pulling data out of a persisted SPARQL store\n\n#For Parsing\nimport rdflib\nfrom rdflib import plugin\n\n#for getting the length of the files\nimport os\n\n#for working with tempfiles\nimport os.path as op\nimport tempfile\n\nplugin.register(\n 'sparql', rdflib.query.Processor,\n 'rdfextras.sparql.processor', 'Processor')\nplugin.register(\n 'sparql', rdflib.query.Result,\n 'rdfextras.sparql.query', 'SPARQLQueryResult')\n\n#Get a Graph object\ng = rdflib.Graph('Sleepycat',identifier='BAMS')\n\nprint(\"loading up the BAMS file in memory...\")\n\n# assumes myRDF_BAMS_Store has been created\ntempStore = op.join( tempfile.gettempdir(), 'myRDF_BAMS_Thesaurus_Store')\ng.open(tempStore)\n\nprint(\"going to get results...\")\n\nqres = g.query(\n \"\"\"SELECT ?subject ?predicate ?object\n WHERE {\n ?subject ?predicate ?object.\n \t} LIMIT 5\"\"\")\n\nprint(\"printing results\")\n\nprint(\"The graph has \" + str(len(g)) + \" items in it\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1]), str(r[2])\n\n# when done!\n#g.close()\n\n\n", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "loading up the BAMS file in memory...\ngoing to get results...\nprinting results" | |
| }, | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "\nThe graph has 9518 items in it\nN81484be1904d45e8835cfe8ccfb8a0a6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type file:///anchor\nN81484be1904d45e8835cfe8ccfb8a0a6 http://www.w3.org/1999/xlinktype simple\nN81484be1904d45e8835cfe8ccfb8a0a6 http://www.w3.org/1999/xlinkhref http://brancusi1.usc.edu/thesaurus/definition/tectum/\nhttp://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/ http://brancusi1.usc.edu/RDF/definition N81484be1904d45e8835cfe8ccfb8a0a6\n" | |
| } | |
| ], | |
| "prompt_number": 1 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "The simplest possible query returns some triples. First let's just try using triples DIRECTLY, without worrying about prefix nonsense.\n\nBy using the `< >` brackets, we can always just directly plug in a complete URI into a query to find out what is connected to it:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?predicate ?object\n WHERE {\n <http://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/> ?predicate ?object.\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/RDF/definition N81484be1904d45e8835cfe8ccfb8a0a6\nhttp://brancusi1.usc.edu/RDF/entry Corpora quadrigemina\nhttp://brancusi1.usc.edu/RDF/slug corpora-quadrigemina\nhttp://brancusi1.usc.edu/RDF/workspace 0\n" | |
| } | |
| ], | |
| "prompt_number": 10 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "In theory, the prefix notation for this graph would be this:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "##DOES NOT WORK\nqres = g.query(\n \"\"\"PREFIX bams: <http://brancusi1.usc.edu/thesaurus/definition/>\n SELECT ?predicate ?object\n WHERE {\n bams:corpora-quadrigemina/ ?predicate ?object.\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])\n\n## DOES NOT RETURN RESULTS", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "ename": "AssertionError", | |
| "evalue": "None", | |
| "output_type": "pyerr", | |
| "traceback": [ | |
| "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mAssertionError\u001b[0m Traceback (most recent call last)", | |
| "\u001b[1;32m<ipython-input-3-fd73a343ad9f>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 5\u001b[0m WHERE {\n\u001b[0;32m 6\u001b[0m \u001b[0mbams\u001b[0m\u001b[1;33m:\u001b[0m\u001b[0mcorpora\u001b[0m\u001b[1;33m-\u001b[0m\u001b[0mquadrigemina\u001b[0m\u001b[1;33m/\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m?\u001b[0m\u001b[0mpredicate\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m?\u001b[0m\u001b[0mobject\u001b[0m\u001b[1;33m.\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 7\u001b[1;33m \t} LIMIT 5\"\"\")\n\u001b[0m\u001b[0;32m 8\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 9\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mr\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mqres\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", | |
| "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/rdflib/graph.pyc\u001b[0m in \u001b[0;36mquery\u001b[1;34m(self, query_object, processor, result, initNs, initBindings, use_store_provided, **kwargs)\u001b[0m\n\u001b[0;32m 1043\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1044\u001b[0m return result(processor.query(\n\u001b[1;32m-> 1045\u001b[1;33m query_object, initBindings, initNs, **kwargs))\n\u001b[0m\u001b[0;32m 1046\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1047\u001b[0m def update(self, update_object, processor='sparql',\n", | |
| "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/rdfextras/sparql/processor.pyc\u001b[0m in \u001b[0;36mquery\u001b[1;34m(self, strOrQuery, initBindings, initNs, DEBUG, PARSE_DEBUG, dataSetBase, extensionFunctions, USE_PYPARSING, dSCompliance, loadContexts)\u001b[0m\n\u001b[0;32m 47\u001b[0m \u001b[0mextensionFunctions\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mextensionFunctions\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 48\u001b[0m \u001b[0mdSCompliance\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdSCompliance\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 49\u001b[1;33m loadContexts=loadContexts)\n\u001b[0m", | |
| "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/rdfextras/sparql/algebra.pyc\u001b[0m in \u001b[0;36mTopEvaluate\u001b[1;34m(query, dataset, passedBindings, DEBUG, exportTree, dataSetBase, extensionFunctions, dSCompliance, loadContexts)\u001b[0m\n\u001b[0;32m 404\u001b[0m \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msparql_query\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mQuery\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtop\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtripleStore\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 405\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 406\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0misinstance\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mexpr\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mAlgebraExpression\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mrepr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mexpr\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 407\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mDEBUG\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 408\u001b[0m \u001b[0mlog\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"## Full SPARQL Algebra expression ##\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", | |
| "\u001b[1;31mAssertionError\u001b[0m: None" | |
| ] | |
| }, | |
| { | |
| "output_type": "stream", | |
| "stream": "stderr", | |
| "text": "ERROR: An unexpected error occurred while tokenizing input\nThe following traceback may be corrupted or invalid\nThe error message is: ('EOF in multi-line string', (1, 14))\n\n" | |
| } | |
| ], | |
| "prompt_number": 3 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "However, this obviously doesn't work. I haven't before seen an RDF URI where the final character was a forward slash (`/`). Trying several versions of this, like trying to quote or wrap the prefixed portion of the query doesn't work. Even though it is apparently valid RDF, this looks like a problem with the RDF that we will need to get Mihail to fix.\n\nHowever, it looks like some of the other URI nodes will allow prefixes. They are the ones that DON'T have a final forward slash. When we did our initial graph exploration search, we found a URI `http://brancusi1.usc.edu/RDF/definition` as a predicate. Let's explore what else is in the graph that is related to this, in another example without prefixes:\n\n" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?subject ?object\n WHERE {\n ?subject <http://brancusi1.usc.edu/RDF/definition> ?object.\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/ N81484be1904d45e8835cfe8ccfb8a0a6\nhttp://brancusi1.usc.edu/thesaurus/definition/corpus-striatum/ Naa7e64f1d16e42b7836acef8f53339e1\nhttp://brancusi1.usc.edu/thesaurus/definition/cranial-nerve-ganglia/ Nabd5295a0a944224ba72b1a0f226cbfe\nhttp://brancusi1.usc.edu/thesaurus/definition/cranial-nerves/ N816139a4dce14ac58c1ce1bab60414ea\nhttp://brancusi1.usc.edu/thesaurus/definition/craniospinal-ganglia/ N90636d89301d403d99fbe29a1ce8aeed\n" | |
| } | |
| ], | |
| "prompt_number": 2 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Interesting. A lot of blank nodes come back as the definitions for these brain regions. \n\nBefore we explore that, let's look at how we could rewrite that exact query using prefixes:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"PREFIX bamsProp: <http://brancusi1.usc.edu/RDF/>\n SELECT ?subject ?object\n WHERE {\n ?subject bamsProp:definition ?object.\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/ N81484be1904d45e8835cfe8ccfb8a0a6\nhttp://brancusi1.usc.edu/thesaurus/definition/corpus-striatum/ Naa7e64f1d16e42b7836acef8f53339e1\nhttp://brancusi1.usc.edu/thesaurus/definition/cranial-nerve-ganglia/ Nabd5295a0a944224ba72b1a0f226cbfe\nhttp://brancusi1.usc.edu/thesaurus/definition/cranial-nerves/ N816139a4dce14ac58c1ce1bab60414ea\nhttp://brancusi1.usc.edu/thesaurus/definition/craniospinal-ganglia/ N90636d89301d403d99fbe29a1ce8aeed\n" | |
| } | |
| ], | |
| "prompt_number": 3 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Okay, now let's explore what's up with these blank nodes.\n\nWe can use SPARQL to query some blank nodes explicitly if we wanted to. The first one in the result above is named \n\n`N81484be1904d45e8835cfe8ccfb8a0a6`\n\nWe can query for it like this:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?subject ?predicate\n WHERE {\n ?subject ?predicate _:N81484be1904d45e8835cfe8ccfb8a0a6 .\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "N81484be1904d45e8835cfe8ccfb8a0a6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type\nN81484be1904d45e8835cfe8ccfb8a0a6 http://www.w3.org/1999/xlinktype\nN81484be1904d45e8835cfe8ccfb8a0a6 http://www.w3.org/1999/xlinkhref\nhttp://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type\nhttp://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/ http://brancusi1.usc.edu/RDF/definition\n" | |
| } | |
| ], | |
| "prompt_number": 19 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "These are the items that are \"upstream\" of the blank node, where the blank node is the 'object'. Let's look at if this blank node is the subject of any triples:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?predicate ?object\n WHERE {\n _:N81484be1904d45e8835cfe8ccfb8a0a6 ?predicate ?object .\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type file:///anchor\nhttp://www.w3.org/1999/xlinktype simple\nhttp://www.w3.org/1999/xlinkhref http://brancusi1.usc.edu/thesaurus/definition/tectum/\nhttp://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/RDF/definition N81484be1904d45e8835cfe8ccfb8a0a6\n" | |
| } | |
| ], | |
| "prompt_number": 18 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Yes it is. They are kind of weird. Let's come back to them.\n\nWhat we would really like is to find some text that contains actual definitions. Let's look at some more examples of what the definition predicate points to by expanding the limit from the earlier query with the definition predicate: " | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"PREFIX bamsProp: <http://brancusi1.usc.edu/RDF/>\n SELECT ?subject ?object\n WHERE {\n ?subject bamsProp:definition ?object.\n \t} LIMIT 50\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://brancusi1.usc.edu/thesaurus/definition/corpora-quadrigemina/ N81484be1904d45e8835cfe8ccfb8a0a6\nhttp://brancusi1.usc.edu/thesaurus/definition/corpus-striatum/ Naa7e64f1d16e42b7836acef8f53339e1\nhttp://brancusi1.usc.edu/thesaurus/definition/cranial-nerve-ganglia/ Nabd5295a0a944224ba72b1a0f226cbfe\nhttp://brancusi1.usc.edu/thesaurus/definition/cranial-nerves/ N816139a4dce14ac58c1ce1bab60414ea\nhttp://brancusi1.usc.edu/thesaurus/definition/craniospinal-ganglia/ N90636d89301d403d99fbe29a1ce8aeed\nhttp://brancusi1.usc.edu/thesaurus/definition/craniospinal-nerves/ N10015200b8a64c68b7a84ae0b68565ad\nhttp://brancusi1.usc.edu/thesaurus/definition/cycle/ N31a8c4363ba14a868f543540257664a0\nhttp://brancusi1.usc.edu/thesaurus/definition/decussation/ N955aa9fe28e042c6b90ec1679acbee44\nhttp://brancusi1.usc.edu/thesaurus/definition/deep/ N4939c07f6a714419bb88e390bc95af44\nhttp://brancusi1.usc.edu/thesaurus/definition/dendrite/ N6d023443834649639f2197660b087dd5\nhttp://brancusi1.usc.edu/thesaurus/definition/dendritic-spine/ N20f0f895bbd544e6bb29f455089dbed7\nhttp://brancusi1.usc.edu/thesaurus/definition/diencephalon/ N0a20133c456245b88203fff0cf3256ab\nhttp://brancusi1.usc.edu/thesaurus/definition/anterior/ N8c4a6b0004ba476dbb4f36b62a1184a8\nhttp://brancusi1.usc.edu/thesaurus/definition/diffuse-nerve-net/ N2e735a6f084348eca78e6404414db2d2\nhttp://brancusi1.usc.edu/thesaurus/definition/distal/ N37c0d73abb894ad89f0506787d14e770\nhttp://brancusi1.usc.edu/thesaurus/definition/division/ N8786180d521645a4877cad5164636195\nhttp://brancusi1.usc.edu/thesaurus/definition/dorsal/ N669c4eccddd2431bbefa08ef5fb8906a\nhttp://brancusi1.usc.edu/thesaurus/definition/dorsoventral-axis/ Ne466ff0667fa4407a3541226c3cb27c7\nhttp://brancusi1.usc.edu/thesaurus/definition/dura/ N597154a05790493cb0a437bdd3b455b0\nhttp://brancusi1.usc.edu/thesaurus/definition/dura-mater/ N8dbf6e4fa1284b08a7519fda9e3dffba\nhttp://brancusi1.usc.edu/thesaurus/definition/effector/ N3fec5134e0a94bcfbe5bfa34069ddf68\nhttp://brancusi1.usc.edu/thesaurus/definition/efferent/ N31b5ffc08f4f4c42a1c82e66d9bb9d47\nhttp://brancusi1.usc.edu/thesaurus/definition/electrical-synapse/ N8a0018016eb54c28b80c445cb9fc28e4\nhttp://brancusi1.usc.edu/thesaurus/definition/electrotonic-synapse/ N13dee4b0b2c74e809ca3846669cef860\nhttp://brancusi1.usc.edu/thesaurus/definition/encephalon/ N4e63f0aa66c04b3f8932b99f9cae5485\nhttp://brancusi1.usc.edu/thesaurus/definition/end-bulb/ Nd9de117309534039a1ed0fa8cf6ba423\nhttp://brancusi1.usc.edu/thesaurus/definition/anterior-2/ Nf52d986d0a164d1888cea15f199dcdf2\nhttp://brancusi1.usc.edu/thesaurus/definition/endbrain/ N7b5b63bb64f243968dfd87767c50c306\nhttp://brancusi1.usc.edu/thesaurus/definition/epencephalon-3/ N40e91e04ec6543a0914ffc562e5debda\nhttp://brancusi1.usc.edu/thesaurus/definition/epencephalon-2/ N966216af36ad4d48ab52b4f19fbbb641\nhttp://brancusi1.usc.edu/thesaurus/definition/epencephalon/ N19ce501ad59b40fc87c65f05da91dea7\nhttp://brancusi1.usc.edu/thesaurus/definition/ephapse/ N8983ca3ef38143f29f40bc2a6afbeff0\nhttp://brancusi1.usc.edu/thesaurus/definition/external/ N82ba834d315f4f13985eec9f8ea5fae2\nhttp://brancusi1.usc.edu/thesaurus/definition/extrinsic-connection/ N697bb8de907d4cb089972d1403d2ba9a\nhttp://brancusi1.usc.edu/thesaurus/definition/extrinsic-pathway/ N4ef021a578fd45d99bfd40424da9627e\nhttp://brancusi1.usc.edu/thesaurus/definition/fiber-of-passage/ N27f32f0ac19f4629bd47cce8e5b72717\nhttp://brancusi1.usc.edu/thesaurus/definition/forebrain/ Nc9449b7560444217bd79aa35874bc597\nhttp://brancusi1.usc.edu/thesaurus/definition/forebrain-2/ Nff5f4f9b1e814e16819985b4e50c3e14\nhttp://brancusi1.usc.edu/thesaurus/definition/foundational-model-of-connectivity/ Abbreviated form of \"Foundational Model of Structural Connectivity in the Nervous System\". The term is derived from Foundational Model of Anatomy (FMA); see brinkley-jf-1991|Brinkley (1991).\nhttp://brancusi1.usc.edu/thesaurus/definition/fourth-ventricle/ Nfc2af83105154d0ba7ade315297953fb\nhttp://brancusi1.usc.edu/thesaurus/definition/anteroposterior-axis/ Na8642728b5d94a6d8ddff73075846357\nhttp://brancusi1.usc.edu/thesaurus/definition/frontal-plane/ N418aac9763fe4c13a7123fc10b34439c\nhttp://brancusi1.usc.edu/thesaurus/definition/frontal-plane-2/ Nead6d95c7d484a81880f70ec5b350187\nhttp://brancusi1.usc.edu/thesaurus/definition/functional-connection/ Nb071d129c7e44e219919b6adad7f3874\nhttp://brancusi1.usc.edu/thesaurus/definition/ganglia/ N49a99bc8eafb4ac38aee0f7886112d5d\nhttp://brancusi1.usc.edu/thesaurus/definition/ganglionic-ring/ Nf4ef382432ce4875968127edc3b25c8c\nhttp://brancusi1.usc.edu/thesaurus/definition/glia/ N44e9574e56074e2db2399a73e3bcedd8\nhttp://brancusi1.usc.edu/thesaurus/definition/glial-cells/ Nba0d4e7fc7bb4d779de274b9ebd73565\nhttp://brancusi1.usc.edu/thesaurus/definition/gray-matter/ N7a6267df4d1d4fb08875618d42a2ecaf\nhttp://brancusi1.usc.edu/thesaurus/definition/gray-matter-nucleus/ N45c775efbd7843e987ac3b9df6f32d74\n" | |
| } | |
| ], | |
| "prompt_number": 11 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Weird. There is some text in the object of one node, but all the rest are blank nodes.\n\nAt this point you gotta start wondering if the RDF is messed up potentially. BUT, blank nodes are allowed, so maybe we're just not digging in deeply enough into them.\n\nSince we did want to find stuff specifically about Basal Ganglia, let's take a minute to locate THE node that holds stuff on basal ganglia and see if there is some other property that holds on to simple text definitions. Maybe we can side step this whole blank nodes nonsense.\n\nWe now know enough about how the URIs are formed to do that. The URIs look like:\n\n`http://brancusi1.usc.edu/thesaurus/definition/[brain-region]/`\n\nwith dashes in place of spaces. Let's try a simple query to get basal ganglia doing this:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?predicate ?object\n WHERE {\n <http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/> ?predicate ?object.\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Cool. Let's take the limit off to see everything:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?predicate ?object\n WHERE {\n <http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/> ?predicate ?object.\n \t}\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/RDF/definition N95d8b25d90a84c72a14645f33ebc0791\nhttp://brancusi1.usc.edu/RDF/entry Basal ganglia\nhttp://brancusi1.usc.edu/RDF/slug basal-ganglia\nhttp://brancusi1.usc.edu/RDF/workspace 0\nhttp://brancusi1.usc.edu/RDF/reference Nde6fff3b1f9f4ef592c521b6b6af3667\n" | |
| } | |
| ], | |
| "prompt_number": 14 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "That's it? Weird. Well, now it looks like at least the simple text string description is stored under the property called \"entry\". \n\nLet's confirm we can get this same node back out again using a simple string search:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"PREFIX bamsProp: <http://brancusi1.usc.edu/RDF/>\n SELECT ?subject ?predicate\n WHERE {\n ?subject bamsProp:entry \"Basal ganglia\".\n \t}\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ None\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ None\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ None\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ None\n" | |
| } | |
| ], | |
| "prompt_number": 15 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Indeed. In fact we find that there is not 1 master node with all basal ganglia info, but 4! \n\nAt this point we could either look into all 4 separately with four separate queries, or we can explore ALL 4 at once with one query. The separate query just requires modifying what we've done before and subbing in these new URIs. Here's what that looks like for basal-ganglia-2:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?predicate ?object\n WHERE {\n <http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/> ?predicate ?object.\n \t}\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/RDF/definition Nb8e2f07b1ba643fea0e63a141fdf1873\nhttp://brancusi1.usc.edu/RDF/entry Basal ganglia\nhttp://brancusi1.usc.edu/RDF/slug basal-ganglia-2\nhttp://brancusi1.usc.edu/RDF/workspace 0\nhttp://brancusi1.usc.edu/RDF/reference Neabfdaf0d46b4f688be93b3d0667d70f\n" | |
| } | |
| ], | |
| "prompt_number": 16 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Here's what that looks like for ALL 4 at once:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"PREFIX bamsProp: <http://brancusi1.usc.edu/RDF/>\n SELECT ?subject ?predicate ?object\n WHERE {\n ?subject bamsProp:entry \"Basal ganglia\" .\n ?subject ?predicate ?object\n \t}\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1]), str(r[2])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://brancusi1.usc.edu/RDF/definition Nd50ba8bdabb84579a70e59ec6519e200\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://brancusi1.usc.edu/RDF/entry Basal ganglia\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://brancusi1.usc.edu/RDF/slug basal-ganglia-4\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://brancusi1.usc.edu/RDF/workspace 0\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://brancusi1.usc.edu/RDF/reference Na61ba614ac9a43ea867fec1d2f932a59\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://brancusi1.usc.edu/RDF/definition N95d8b25d90a84c72a14645f33ebc0791\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://brancusi1.usc.edu/RDF/entry Basal ganglia\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://brancusi1.usc.edu/RDF/slug basal-ganglia\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://brancusi1.usc.edu/RDF/workspace 0\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://brancusi1.usc.edu/RDF/reference Nde6fff3b1f9f4ef592c521b6b6af3667\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://brancusi1.usc.edu/RDF/definition Nb8e2f07b1ba643fea0e63a141fdf1873\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://brancusi1.usc.edu/RDF/entry Basal ganglia\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://brancusi1.usc.edu/RDF/slug basal-ganglia-2\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://brancusi1.usc.edu/RDF/workspace 0\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://brancusi1.usc.edu/RDF/reference Neabfdaf0d46b4f688be93b3d0667d70f\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://brancusi1.usc.edu/RDF/definition Ndbc450108e654cef9350b1dd50b65b0a\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://brancusi1.usc.edu/RDF/entry Basal ganglia\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://brancusi1.usc.edu/RDF/slug basal-ganglia-3\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://brancusi1.usc.edu/RDF/workspace 0\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://brancusi1.usc.edu/RDF/reference Nf6ae1bae8698432a93b2c700b7a993fe\n" | |
| } | |
| ], | |
| "prompt_number": 17 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "This is why it is powerful to chain together different triples in the SPARQL query. Compare these results to the results from the first time we did the simple string search. We brought up just the URIs. Now, we have used that list of URIs, and we have additionally asked for all triples that were connected to them." | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "STILL though, we don't have actual textual definitions. Mihail, why have you hidden them behind blank nodes?! :)\n\nI guess we have to dig into these darn blank nodes.\n\nOK, here's the blank node reference for the definition of the simple `basal-ganglia`:\n\n`N95d8b25d90a84c72a14645f33ebc0791`\n\nWhat's in there?" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?predicate ?object\n WHERE {\n _:N95d8b25d90a84c72a14645f33ebc0791 ?predicate ?object .\n \t} LIMIT 5\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": " http://www.w3.org/1999/02/22-rdf-syntax-ns#type file:///anchor\nhttp://www.w3.org/1999/xlinktype simple\nhttp://www.w3.org/1999/xlinkhref http://brancusi1.usc.edu/thesaurus/definition/tectum/\nhttp://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/RDF/definition N81484be1904d45e8835cfe8ccfb8a0a6\n" | |
| } | |
| ], | |
| "prompt_number": 21 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Not good. This is identical to the first five results from the last time we ran the same query with the DIFFERENT blank node:\n\n`\nhttp://www.w3.org/1999/02/22-rdf-syntax-ns#type file:///anchor\nhttp://www.w3.org/1999/xlinktype simple\nhttp://www.w3.org/1999/xlinkhref http://brancusi1.usc.edu/thesaurus/definition/tectum/\nhttp://www.w3.org/1999/02/22-rdf-syntax-ns#type http://brancusi1.usc.edu/RDF/thesaurus\nhttp://brancusi1.usc.edu/RDF/definition N81484be1904d45e8835cfe8ccfb8a0a6`\n\nNow I'm pretty concerned that something is either wrong with the way I'm querying blank nodes (haven't messed with them much before so maybe I'm doing something wrong) or maybe this RDF is pretty messed up.\n\nBefore going back to Mihail, let's do one more thing. Let's assume that SOMEWHERE in here, there's a definition of basal ganglia.\n\nFrom [the website](http://brancusi1.usc.edu/thesaurus/list/), we know one of the definitions is: \n\n\"For macrodissected adult humans it includes the caudate and lenticular nuclei and the amygdala, and is thus not synonymous with cerebral nuclei (Swanson, 2000)\"\n\nOne more useful way to search via SPARQL is just a simple string search using [regex](http://www.w3.org/TR/sparql11-query/#func-regex). Let's take a few words from the definition and try to find that text anywhere in the graph. Here's what that looks like:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?subject ?predicate \n WHERE {\n ?subject ?predicate ?text .\n FILTER regex(?text, \"^For macrodissected\", \"i\")\n \t}\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 28 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Hmm no results... Maybe the query is wrong? Let's sub in some text that we know we found earlier as a simple string to test the query:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "qres = g.query(\n \"\"\"SELECT ?subject ?predicate \n WHERE {\n ?subject ?predicate ?text .\n FILTER regex(?text, \"^basal\", \"i\")\n \t}\"\"\")\n\nfor r in qres.result:\n print str(r[0]), str(r[1])", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://brancusi1.usc.edu/RDF/entry\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/ http://brancusi1.usc.edu/RDF/slug\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://brancusi1.usc.edu/RDF/entry\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/ http://brancusi1.usc.edu/RDF/slug\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://brancusi1.usc.edu/RDF/entry\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/ http://brancusi1.usc.edu/RDF/slug\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://brancusi1.usc.edu/RDF/entry\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/ http://brancusi1.usc.edu/RDF/slug\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-of-telencephalon/ http://brancusi1.usc.edu/RDF/entry\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-of-telencephalon/ http://brancusi1.usc.edu/RDF/slug\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-nuclei/ http://brancusi1.usc.edu/RDF/entry\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-nuclei/ http://brancusi1.usc.edu/RDF/slug\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-nuclei-2/ http://brancusi1.usc.edu/RDF/entry\nhttp://brancusi1.usc.edu/thesaurus/definition/basal-nuclei-2/ http://brancusi1.usc.edu/RDF/slug\n" | |
| } | |
| ], | |
| "prompt_number": 25 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Whoops -- brings back results. Looks like that definition sentence may not be present AT all in the RDF :(\n\nLast resort to confirm, we have to go to the original data file and see if there's anything in there. It is probably most straight forward to do this just on the command line. Note that all lines preceeded by `!` are shell commands, not Python" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "!mkdir /tmp/bams", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 29 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "!unzip BAMS-to-NeuroLex/Data/bams_thesaurus_2013-09-24_17-12-40.xml.zip -d /tmp/bams/ ", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "Archive: BAMS-to-NeuroLex/Data/bams_thesaurus_2013-09-24_17-12-40.xml.zip\r\n inflating: /tmp/bams/bams_thesaurus_2013-09-24_17-12-40.xml \r\n creating: /tmp/bams/__MACOSX/\r\n inflating: /tmp/bams/__MACOSX/._bams_thesaurus_2013-09-24_17-12-40.xml \r\n" | |
| } | |
| ], | |
| "prompt_number": 40 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "!grep 'For macrodissected' /tmp/bams/bams_thesaurus_2013-09-24_17-12-40.xml", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": " <bams:definition>For macrodissected adult humans it includes the caudate and lenticular nuclei and the amygdala, and is thus not synonymous with <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>; p. 496.</bams:definition>\r\n <bams:definition>For macrodissected adult humans it includes the caudate and lentiform (putamen and globus pallidus) nuclei, amygdala, and claustrum (p. 252) and is thus not synonymous with <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>. More recently it was used in Ranson's sense by for example Clark (1951, p. 968).</bams:definition>\r\n" | |
| } | |
| ], | |
| "prompt_number": 44 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Uh oh. It is in the RDF. But it is not coming out of the query.\n\nIt is looking suspiciously like the presence of other hyperlinks inside the RDF is causing a big problem. Specifically I mean this stuff:\n\n`<anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">`\n\nThat doesn't look like it should be within the `bams:definition` XML tags.\n\nLet's look for 'basal' in there:" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "!grep 'basal' /tmp/bams/bams_thesaurus_2013-09-24_17-12-40.xml", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": " <rdf:Description rdf:about=\"http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-4/\">\r\n <bams:slug>basal-ganglia-4</bams:slug>\r\n <rdf:Description rdf:about=\"http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia/\">\r\n <bams:slug>basal-ganglia</bams:slug>\r\n <rdf:Description rdf:about=\"http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-2/\">\r\n <bams:definition>Synonym for basal ganglia of telencephalon (Ranson, 1920) in macrodissected adult humans, and thus not synonymous with <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>; p. 319.</bams:definition>\r\n <bams:slug>basal-ganglia-2</bams:slug>\r\n <rdf:Description rdf:about=\"http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/\">\r\n <bams:definition>Synonym for <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>; see Warwick & Williams (1973, p. 805; and Williams & Warwick, 1980, p. 864). Its use is discouraged because reference to <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/ganglia/\">ganglia (Galen, c173)</anchor> in the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebrospinal-axis/\">cerebrospinal axis (Meckel, 1817)</anchor> is archaic; and because \"basal ganglia\" today usually refers to a functional system that includes components in the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/forebrain-2/\">forebrain (Goette, 1873)</anchor> and <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/midbrain/\">midbrain (Baer, 1837)</anchor>, rather than to a <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/topographic-division/\">topographic division</anchor> of the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/endbrain/\">endbrain (Kuhlenbeck, 1927)</anchor>; see Anthoney (1994, pp. 106-109), DeLong & Wichmann (2007), and Federative Committee on Anatomical Terminology (1998, *A14.1.09.501).</bams:definition>\r\n <bams:slug>basal-ganglia-3</bams:slug>\r\n <rdf:Description rdf:about=\"http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-of-telencephalon/\">\r\n <bams:slug>basal-ganglia-of-telencephalon</bams:slug>\r\n <rdf:Description rdf:about=\"http://brancusi1.usc.edu/thesaurus/definition/basal-nuclei/\">\r\n <bams:definition>Synonym for basal ganglia (Strong & Elwyn, 1943) in macrodissected adult humans, and thus not synonymous with <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>; p. 968. Others employing this usage include Warwick & Williams (1973, p. 976; and Williams & Warwick, 1980, p. 1032), International Anatomical Nomenclature Committee (1983, p. A72).</bams:definition>\r\n <bams:slug>basal-nuclei</bams:slug>\r\n <rdf:Description rdf:about=\"http://brancusi1.usc.edu/thesaurus/definition/basal-nuclei-2/\">\r\n <bams:slug>basal-nuclei-2</bams:slug>\r\n <bams:definition>Basically a combination of the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/brainstem-2/\">brainstem (Schwalbe, 1881)</anchor> and the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>, or basal ganglia (Warwick &amp; Williams, 1973), as originally defined for macrodissected adult humans; p. 11. It corresponds to the oblong marrow (Willis, 1664), and has been used more recently in Burdach's sense by for example Herrick (1915, p. 114), Ranson &amp; Clarke (1959, Fig. 32).</bams:definition>\r\n <bams:definition>The <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/ventral/\">ventral (Schulze, 1893)</anchor> <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/topographic-division/\">topographic division</anchor> of the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/endbrain/\">endbrain (Kuhlenbeck, 1927)</anchor>, with a basically nonlaminated architecture; the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/dorsal/\">dorsal (Barclay, 1803)</anchor> division is the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-cortex/\">cerebral cortex (Bauhin, 1605)</anchor>. The general outlines of the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei</anchor> were described for macrodissected adult humans by Bartholin (1651; see English translation 1662, p. 141), and a basic distinction during embryogenesis between <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-cortex/\">cerebral cortex</anchor> and <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei</anchor> was stressed by Baer (1837) and Reichert (1859-1861). The most common synonym today for <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei</anchor>, which was clearly defined by Swanson (2000, p. 117; 2004, pp. 166-170), is <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-3/\">basal ganglia (Warwick &amp; Williams, 1973)</anchor>; also see <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/ganglia/\">ganglion (Galen, c173)</anchor>. Other synonyms include corpus striatum (Willis, 1664), cerebral ganglia (Reil, 1809), and basal nuclei (Warwick &amp; Williams, 1973). <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">Cerebral nuclei (Swanson, 2000)</anchor> is preferred to the synonym basal nuclei (Warwick &amp; Williams, 1973) because it pairs naturally with <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-cortex/\">cerebral cortex (Bauhin, 1605)</anchor>.</bams:definition>\r\n <bams:definition>A ganglion is a recognizable aggregation of <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/neuron/\">neurons (Waldeyer, 1891)</anchor>. There are <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/marginal-ganglion/\">marginal ganglia</anchor> associated with invertebrate <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/nerve-net/\">nerve nets</anchor>, <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/central-ganglion/\">central ganglia</anchor> associated with invertebrate <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/central-nerve-cord/\">central nerve cords</anchor>, and <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/peripheral-ganglia/\">peripheral ganglia</anchor> in the invertebrate and vertebrate <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/peripheral-nervous-system/\">peripheral nervous system (Meckel, 1817)</anchor>. For vertebrates it has long been best practice to restrict the term ganglion (and terms derived from ganglion) to structures of the <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/peripheral-nervous-system/\">peripheral nervous system (Meckel, 1817)</anchor>. As Herrick wrote, \"The term \u2018ganglion' is also sometimes used for nuclei or centers within the brain\u2026but this usage is objectionable, for the use of the word ganglion in vertebrate neurology should be restricted to collections of neurons outside the central nervous system, such as the ganglia of the cranial and spinal nerves and the sympathetic [autonomic] ganglia.\" (1915, p. 108). A prime example is the use of \"basal ganglia\" for <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>. Discovered and named in macrodissected adult mammals by Galen (c173; see translation by May , 1968, pp. 695-696).</bams:definition>\r\n <bams:definition>Synonym for <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/basal-ganglia-of-telencephalon/\">basal ganglia of telencephalon (Ranson, 1920)</anchor> in macrodissected adult humans, and is thus not synonymous with <anchor xlink:type=\"simple\" xlink:href=\"http://brancusi1.usc.edu/thesaurus/definition/cerebral-nuclei/\">cerebral nuclei (Swanson, 2000)</anchor>; swanson-lw-2000|p. 356.</bams:definition>\r\n <bams:title>Circuits and circuit disorders of the basal ganglia. Arch Neurol 64:20-24.</bams:title>\r\n" | |
| } | |
| ], | |
| "prompt_number": 42 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Yup -- looks like there's a lot of that stuff.\n\nSIGH. OK I'll get back to Mihail and find out what's up with this stuff. In the meantime, hopefully this was still a useful guide to how to think through exploring graphs with unknown structures." | |
| } | |
| ], | |
| "metadata": {} | |
| } | |
| ] | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment