Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save bkutlu/5a6f3b144d88169916f586cd2080d106 to your computer and use it in GitHub Desktop.
Save bkutlu/5a6f3b144d88169916f586cd2080d106 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convert xgmml file to iGraph object"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"options(jupyter.plot_mimetypes = c(\"text/plain\", \"image/png\" ))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Author: Burak Kutlu\n",
"License: https://creativecommons.org/licenses/by/4.0/\n",
"\n",
"This is a demo of how I convert the XGMML file to iGraph object in R. Currently version of iGraph R library cannot read in XGMML format. This file defines a graph with nodes, edges and related attributes. For XGMML format specifications refer to: https://en.wikipedia.org/wiki/XGMML\n",
"\n",
"First let's load the packages required"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──\n",
"✔ ggplot2 3.0.0 ✔ readr 1.1.1\n",
"✔ tibble 1.4.2 ✔ purrr 0.2.5\n",
"✔ tidyr 0.8.1 ✔ dplyr 0.7.5\n",
"✔ ggplot2 3.0.0 ✔ forcats 0.3.0\n",
"── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──\n",
"✖ dplyr::filter() masks stats::filter()\n",
"✖ dplyr::lag() masks stats::lag()\n",
"\n",
"Attaching package: ‘igraph’\n",
"\n",
"The following objects are masked from ‘package:dplyr’:\n",
"\n",
" as_data_frame, groups, union\n",
"\n",
"The following objects are masked from ‘package:purrr’:\n",
"\n",
" compose, simplify\n",
"\n",
"The following object is masked from ‘package:tidyr’:\n",
"\n",
" crossing\n",
"\n",
"The following object is masked from ‘package:tibble’:\n",
"\n",
" as_data_frame\n",
"\n",
"The following objects are masked from ‘package:stats’:\n",
"\n",
" decompose, spectrum\n",
"\n",
"The following object is masked from ‘package:base’:\n",
"\n",
" union\n",
"\n"
]
}
],
"source": [
"library(\"xml2\")\n",
"library(\"stringr\")\n",
"library(\"tidyverse\")\n",
"library(\"igraph\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I am going to use an example file: http://www.cgl.ucsf.edu/cytoscape/structureViz2/pte.xgmml"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{xml_document}\n",
"<graph label=\"Phosphotriesterases\" id=\"Phosphotriesterases\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns=\"http://www.cs.rpi.edu/XGMML\">\n",
" [1] <att value=\"1.0\" name=\"documentVersion\"/>\n",
" [2] <att name=\"networkMetadata\">\\n <rdf:RDF>\\n <rdf:Description rdf:abou ...\n",
" [3] <att value=\"#ffffff\" name=\"backgroundColor\"/>\n",
" [4] <att type=\"real\" value=\"1.051702522972862\" name=\"GRAPH_VIEW_ZOOM\" label= ...\n",
" [5] <att type=\"real\" value=\"15421.0322265625\" name=\"GRAPH_VIEW_CENTER_X\" lab ...\n",
" [6] <att type=\"real\" value=\"17076.3984375\" name=\"GRAPH_VIEW_CENTER_Y\" label= ...\n",
" [7] <node name=\"base\" label=\"gi15607371\" id=\"-237\">\\n <att type=\"string\" na ...\n",
" [8] <node name=\"base\" label=\"gi16131257\" id=\"-442\">\\n <att type=\"string\" na ...\n",
" [9] <node name=\"base\" label=\"gi15839610\" id=\"-794\">\\n <att type=\"string\" na ...\n",
"[10] <node name=\"base\" label=\"gi13786719\" id=\"-214\">\\n <att type=\"string\" va ...\n",
"[11] <node name=\"base\" label=\"gi12084365\" id=\"-411\">\\n <att type=\"string\" va ...\n",
"[12] <node name=\"base\" label=\"gi15613153\" id=\"-371\">\\n <att type=\"string\" na ...\n",
"[13] <node name=\"base\" label=\"gi5542102\" id=\"-870\">\\n <att type=\"string\" val ...\n",
"[14] <node name=\"base\" label=\"gi1310974\" id=\"-851\">\\n <att type=\"string\" val ...\n",
"[15] <node name=\"base\" label=\"gi2098312\" id=\"-413\">\\n <att type=\"string\" val ...\n",
"[16] <node name=\"base\" label=\"gi2392286\" id=\"-866\">\\n <att type=\"string\" val ...\n",
"[17] <node name=\"base\" label=\"gi15212234\" id=\"-146\">\\n <att type=\"string\" na ...\n",
"[18] <node name=\"base\" label=\"gi17977844\" id=\"-737\">\\n <att type=\"string\" na ...\n",
"[19] <node name=\"base\" label=\"gi14719485\" id=\"-491\">\\n <att type=\"string\" va ...\n",
"[20] <node name=\"base\" label=\"gi15805954\" id=\"-475\">\\n <att type=\"string\" na ...\n",
"...\n"
]
}
],
"source": [
"url = \"http://www.cgl.ucsf.edu/cytoscape/structureViz2/pte.xgmml\"\n",
"x1 <- read_xml(url)\n",
"print(x1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Good, we were able to import the xml file in our session! Now let's attempt to access the nodes and edges"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"'/d1:graph/d1:node'"
],
"text/latex": [
"'/d1:graph/d1:node'"
],
"text/markdown": [
"'/d1:graph/d1:node'"
],
"text/plain": [
"[1] \"/d1:graph/d1:node\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"{xml_nodeset (16)}\n",
" [1] <node name=\"base\" label=\"gi15607371\" id=\"-237\">\\n <att type=\"string\" na ...\n",
" [2] <node name=\"base\" label=\"gi16131257\" id=\"-442\">\\n <att type=\"string\" na ...\n",
" [3] <node name=\"base\" label=\"gi15839610\" id=\"-794\">\\n <att type=\"string\" na ...\n",
" [4] <node name=\"base\" label=\"gi13786719\" id=\"-214\">\\n <att type=\"string\" va ...\n",
" [5] <node name=\"base\" label=\"gi12084365\" id=\"-411\">\\n <att type=\"string\" va ...\n",
" [6] <node name=\"base\" label=\"gi15613153\" id=\"-371\">\\n <att type=\"string\" na ...\n",
" [7] <node name=\"base\" label=\"gi5542102\" id=\"-870\">\\n <att type=\"string\" val ...\n",
" [8] <node name=\"base\" label=\"gi1310974\" id=\"-851\">\\n <att type=\"string\" val ...\n",
" [9] <node name=\"base\" label=\"gi2098312\" id=\"-413\">\\n <att type=\"string\" val ...\n",
"[10] <node name=\"base\" label=\"gi2392286\" id=\"-866\">\\n <att type=\"string\" val ...\n",
"[11] <node name=\"base\" label=\"gi15212234\" id=\"-146\">\\n <att type=\"string\" na ...\n",
"[12] <node name=\"base\" label=\"gi17977844\" id=\"-737\">\\n <att type=\"string\" na ...\n",
"[13] <node name=\"base\" label=\"gi14719485\" id=\"-491\">\\n <att type=\"string\" va ...\n",
"[14] <node name=\"base\" label=\"gi15805954\" id=\"-475\">\\n <att type=\"string\" na ...\n",
"[15] <node name=\"base\" label=\"gi15899258\" id=\"-672\">\\n <att type=\"string\" na ...\n",
"[16] <node name=\"base\" label=\"gi129176\" id=\"-888\">\\n <att type=\"string\" valu ..."
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# get the nodes with xpath expression\n",
"xpath_nodes <- \"/graph/node\"\n",
"# but you need to add the name space\n",
"(xpath_nodes <- str_replace_all(xpath_nodes,'/','/d1:'))\n",
"\n",
"xml_find_all(x1, xpath_nodes, xml_ns(x1))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th scope=col>name</th><th scope=col>label</th><th scope=col>id</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><td>base </td><td>gi15607371</td><td>-237 </td></tr>\n",
"\t<tr><td>base </td><td>gi16131257</td><td>-442 </td></tr>\n",
"\t<tr><td>base </td><td>gi15839610</td><td>-794 </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{lll}\n",
" name & label & id\\\\\n",
"\\hline\n",
"\t base & gi15607371 & -237 \\\\\n",
"\t base & gi16131257 & -442 \\\\\n",
"\t base & gi15839610 & -794 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"name | label | id | \n",
"|---|---|---|\n",
"| base | gi15607371 | -237 | \n",
"| base | gi16131257 | -442 | \n",
"| base | gi15839610 | -794 | \n",
"\n",
"\n"
],
"text/plain": [
" name label id \n",
"[1,] base gi15607371 -237\n",
"[2,] base gi16131257 -442\n",
"[3,] base gi15839610 -794"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"xml_find_all(x1, xpath_nodes, xml_ns(x1)) %>%\n",
" xml_attrs() %>%\n",
" do.call(rbind, .) -> nodes\n",
"head(nodes, n = 3)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"'/d1:graph/d1:edge'"
],
"text/latex": [
"'/d1:graph/d1:edge'"
],
"text/markdown": [
"'/d1:graph/d1:edge'"
],
"text/plain": [
"[1] \"/d1:graph/d1:edge\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"{xml_nodeset (120)}\n",
" [1] <edge label=\"gi15607371 (isrelated) gi16131257\" id=\"gi15607371 (isrelate ...\n",
" [2] <edge label=\"gi15607371 (isrelated) gi5542102\" id=\"gi15607371 (isrelated ...\n",
" [3] <edge label=\"gi15607371 (isrelated) gi129176\" id=\"gi15607371 (isrelated) ...\n",
" [4] <edge label=\"gi15607371 (isrelated) gi17977844\" id=\"gi15607371 (isrelate ...\n",
" [5] <edge label=\"gi15607371 (isrelated) gi12084365\" id=\"gi15607371 (isrelate ...\n",
" [6] <edge label=\"gi15607371 (isrelated) gi13786719\" id=\"gi15607371 (isrelate ...\n",
" [7] <edge label=\"gi15607371 (isrelated) gi14719485\" id=\"gi15607371 (isrelate ...\n",
" [8] <edge label=\"gi15607371 (isrelated) gi1310974\" id=\"gi15607371 (isrelated ...\n",
" [9] <edge label=\"gi15607371 (isrelated) gi2392286\" id=\"gi15607371 (isrelated ...\n",
"[10] <edge label=\"gi15607371 (isrelated) gi15212234\" id=\"gi15607371 (isrelate ...\n",
"[11] <edge label=\"gi15607371 (isrelated) gi15899258\" id=\"gi15607371 (isrelate ...\n",
"[12] <edge label=\"gi15607371 (isrelated) gi15839610\" id=\"gi15607371 (isrelate ...\n",
"[13] <edge label=\"gi16131257 (isrelated) gi12084365\" id=\"gi16131257 (isrelate ...\n",
"[14] <edge label=\"gi16131257 (isrelated) gi14719485\" id=\"gi16131257 (isrelate ...\n",
"[15] <edge label=\"gi16131257 (isrelated) gi15212234\" id=\"gi16131257 (isrelate ...\n",
"[16] <edge label=\"gi16131257 (isrelated) gi1310974\" id=\"gi16131257 (isrelated ...\n",
"[17] <edge label=\"gi16131257 (isrelated) gi15839610\" id=\"gi16131257 (isrelate ...\n",
"[18] <edge label=\"gi16131257 (isrelated) gi15899258\" id=\"gi16131257 (isrelate ...\n",
"[19] <edge label=\"gi16131257 (isrelated) gi5542102\" id=\"gi16131257 (isrelated ...\n",
"[20] <edge label=\"gi16131257 (isrelated) gi2392286\" id=\"gi16131257 (isrelated ...\n",
"..."
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# now the edges\n",
"xpath_edges <- \"/graph/edge\"\n",
"# but you need to add the name space\n",
"(xpath_edges <- str_replace_all(xpath_edges,'/','/d1:'))\n",
"\n",
"xml_find_all(x1, xpath_edges, xml_ns(x1))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th scope=col>label</th><th scope=col>id</th><th scope=col>target</th><th scope=col>source</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><td>gi15607371 (isrelated) gi16131257</td><td>gi15607371 (isrelated) gi16131257</td><td>-442 </td><td>-237 </td></tr>\n",
"\t<tr><td>gi15607371 (isrelated) gi5542102 </td><td>gi15607371 (isrelated) gi5542102 </td><td>-870 </td><td>-237 </td></tr>\n",
"\t<tr><td>gi15607371 (isrelated) gi129176 </td><td>gi15607371 (isrelated) gi129176 </td><td>-888 </td><td>-237 </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{llll}\n",
" label & id & target & source\\\\\n",
"\\hline\n",
"\t gi15607371 (isrelated) gi16131257 & gi15607371 (isrelated) gi16131257 & -442 & -237 \\\\\n",
"\t gi15607371 (isrelated) gi5542102 & gi15607371 (isrelated) gi5542102 & -870 & -237 \\\\\n",
"\t gi15607371 (isrelated) gi129176 & gi15607371 (isrelated) gi129176 & -888 & -237 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"label | id | target | source | \n",
"|---|---|---|\n",
"| gi15607371 (isrelated) gi16131257 | gi15607371 (isrelated) gi16131257 | -442 | -237 | \n",
"| gi15607371 (isrelated) gi5542102 | gi15607371 (isrelated) gi5542102 | -870 | -237 | \n",
"| gi15607371 (isrelated) gi129176 | gi15607371 (isrelated) gi129176 | -888 | -237 | \n",
"\n",
"\n"
],
"text/plain": [
" label id target\n",
"[1,] gi15607371 (isrelated) gi16131257 gi15607371 (isrelated) gi16131257 -442 \n",
"[2,] gi15607371 (isrelated) gi5542102 gi15607371 (isrelated) gi5542102 -870 \n",
"[3,] gi15607371 (isrelated) gi129176 gi15607371 (isrelated) gi129176 -888 \n",
" source\n",
"[1,] -237 \n",
"[2,] -237 \n",
"[3,] -237 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"xml_find_all(x1, xpath_edges, xml_ns(x1)) %>% \n",
" xml_attrs() %>%\n",
" do.call(rbind, .) -> edges\n",
"head(edges, n = 3)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"IGRAPH 56c7f0d UN-- 16 120 -- \n",
"+ attr: name (v/c), label (v/c), label (e/c), id (e/c)\n",
"+ edges from 56c7f0d (vertex names):\n",
" [1] -237---442 -237---870 -237---888 -237---737 -237---411 -237---214\n",
" [7] -237---491 -237---851 -237---866 -237---146 -237---672 -237---794\n",
"[13] -442---411 -442---491 -442---146 -442---851 -442---794 -442---672\n",
"[19] -442---870 -442---866 -442---214 -794---866 -794---411 -794---491\n",
"[25] -794---851 -794---214 -794---672 -411---866 -411---851 -411---491\n",
"[31] -214---411 -371---146 -794---371 -371---888 -214---371 -371---866\n",
"[37] -411---371 -371---851 -371---413 -371---870 -237---371 -371---491\n",
"[43] -371---737 -371---672 -442---371 -411---870 -870---851 -794---870\n",
"+ ... omitted several edges"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"igraph::graph_from_data_frame(d = edges[,c(3:4, 1:2)], vertices = nodes[,c(3,2)], directed = F)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"R version 3.4.4 (2018-03-15)\n",
"Platform: x86_64-pc-linux-gnu (64-bit)\n",
"Running under: Ubuntu 18.04 LTS\n",
"\n",
"Matrix products: default\n",
"BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3\n",
"LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3\n",
"\n",
"locale:\n",
" [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C \n",
" [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 \n",
" [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 \n",
" [7] LC_PAPER=en_US.UTF-8 LC_NAME=C \n",
" [9] LC_ADDRESS=C LC_TELEPHONE=C \n",
"[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C \n",
"\n",
"attached base packages:\n",
"[1] stats graphics grDevices utils datasets methods base \n",
"\n",
"other attached packages:\n",
" [1] igraph_1.2.1 forcats_0.3.0 dplyr_0.7.5 purrr_0.2.5 \n",
" [5] readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0 \n",
" [9] tidyverse_1.2.1 stringr_1.3.1 xml2_1.2.0 \n",
"\n",
"loaded via a namespace (and not attached):\n",
" [1] pbdZMQ_0.3-3 tidyselect_0.2.4 repr_0.15.0 \n",
" [4] reshape2_1.4.3 haven_1.1.2 lattice_0.20-35 \n",
" [7] colorspace_1.3-2 htmltools_0.3.6 base64enc_0.1-3 \n",
"[10] rlang_0.2.1 pillar_1.2.3 withr_2.1.2 \n",
"[13] foreign_0.8-70 glue_1.2.0 modelr_0.1.2 \n",
"[16] readxl_1.1.0 bindrcpp_0.2.2 uuid_0.1-2 \n",
"[19] bindr_0.1.1 plyr_1.8.4 munsell_0.5.0 \n",
"[22] gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2 \n",
"[25] psych_1.8.4 evaluate_0.10.1 curl_3.2 \n",
"[28] parallel_3.4.4 broom_0.4.5 IRdisplay_0.5.0 \n",
"[31] Rcpp_0.12.17 scales_0.5.0.9000 IRkernel_0.8.12.9000 \n",
"[34] jsonlite_1.5 mnormt_1.5-5 hms_0.4.2 \n",
"[37] digest_0.6.15 stringi_1.2.3 grid_3.4.4 \n",
"[40] cli_1.0.0 tools_3.4.4 magrittr_1.5 \n",
"[43] lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.1 \n",
"[46] lubridate_1.7.4 rstudioapi_0.7.0-9000 assertthat_0.2.0 \n",
"[49] httr_1.3.1 R6_2.2.2 nlme_3.1-137 \n",
"[52] compiler_3.4.4 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sessionInfo()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.4.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment