Last active
March 26, 2020 15:46
-
-
Save Orbifold/621b55e7556c4d964b0d0d80117222a1 to your computer and use it in GitHub Desktop.
Untitled.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# CSV generator for yFiles ETL\n", | |
"\n", | |
"The code below produces a CSV file you can use with the ETL designer.\n", | |
"The graph contained in the CSV is based on the [Barabasi-Albert algorithm](https://en.wikipedia.org/wiki/Barab%C3%A1si%E2%80%93Albert_model) but any of [the algorithms in NetworkX](https://networkx.github.io/documentation/networkx-1.9.1/reference/generators.html) will do.\n", | |
"\n", | |
"The [Faker](https://github.com/joke2k/faker) package is used to generate random data.\n", | |
"\n", | |
"Of course you need to have Python (v3.6+) installed and the following installs the necessary packages\n", | |
"\n", | |
"`pip install networkx`\n", | |
"`pip install faker`\n" | |
], | |
"metadata": { | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
} | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import networkx as nx\n", | |
"import csv, copy\n", | |
"import matplotlib.pyplot as plt\n", | |
"from faker import Faker\n", | |
"faker = Faker()\n" | |
], | |
"outputs": [], | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"source_hidden": false, | |
"outputs_hidden": false | |
}, | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
}, | |
"execution": { | |
"iopub.status.busy": "2020-03-26T15:32:37.332Z", | |
"iopub.execute_input": "2020-03-26T15:32:37.335Z", | |
"iopub.status.idle": "2020-03-26T15:32:37.392Z", | |
"shell.execute_reply": "2020-03-26T15:32:37.394Z" | |
} | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"The following generates a random graph:" | |
], | |
"metadata": { | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
} | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"N = 50\n", | |
"ba = nx.barabasi_albert_graph(N, 5)" | |
], | |
"outputs": [], | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"source_hidden": false, | |
"outputs_hidden": false | |
}, | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
}, | |
"execution": { | |
"iopub.status.busy": "2020-03-26T15:24:31.317Z", | |
"iopub.execute_input": "2020-03-26T15:24:31.321Z", | |
"iopub.status.idle": "2020-03-26T15:24:31.326Z", | |
"shell.execute_reply": "2020-03-26T15:24:31.330Z" | |
} | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"The graph generated needs to be augmented with some data:" | |
], | |
"metadata": { | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
} | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"def createNode(i):\n", | |
" return {\n", | |
" \"id\": str(i),\n", | |
" \"firstName\": faker.first_name(),\n", | |
" \"lastName\": faker.last_name()\n", | |
" }\n", | |
"nodes = [createNode(i) for i in range(N)]\n" | |
], | |
"outputs": [], | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"source_hidden": false, | |
"outputs_hidden": false | |
}, | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
}, | |
"execution": { | |
"iopub.status.busy": "2020-03-26T15:35:21.584Z", | |
"iopub.execute_input": "2020-03-26T15:35:21.588Z", | |
"iopub.status.idle": "2020-03-26T15:35:21.593Z", | |
"shell.execute_reply": "2020-03-26T15:35:21.598Z" | |
} | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"In order to use a tabel structure as a way to store a graph structure we inevitably induce denormalization of the data. Each and every time we define an edge from a node we need to re-use the same node information:" | |
], | |
"metadata": { | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
} | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"rows = []\n", | |
"for edge in list(ba.edges):\n", | |
" [sourceId, targetId] = edge\n", | |
" obj = copy.copy(nodes[sourceId])\n", | |
" obj[\"target\"] = str(targetId)\n", | |
" rows.append(obj)" | |
], | |
"outputs": [], | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"source_hidden": false, | |
"outputs_hidden": false | |
}, | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
}, | |
"execution": { | |
"iopub.status.busy": "2020-03-26T15:35:23.974Z", | |
"iopub.execute_input": "2020-03-26T15:35:23.977Z", | |
"iopub.status.idle": "2020-03-26T15:35:23.982Z", | |
"shell.execute_reply": "2020-03-26T15:35:23.984Z" | |
} | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"\n", | |
"with open(\"/data/barabsi.csv\", \"wt\") as f:\n", | |
" w = csv.writer(f)\n", | |
"\n", | |
" # Write CSV Header, If you dont need that, remove this line\n", | |
" w.writerow([\"id\", \"firstName\", \"lastName\", \"target\"])\n", | |
"\n", | |
" for x in rows:\n", | |
" w.writerow([x[\"id\"],\n", | |
" x[\"firstName\"],\n", | |
" x[\"lastName\"],\n", | |
" x[\"target\"]])" | |
], | |
"outputs": [], | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"source_hidden": false, | |
"outputs_hidden": false | |
}, | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
}, | |
"execution": { | |
"iopub.status.busy": "2020-03-26T15:35:26.279Z", | |
"iopub.execute_input": "2020-03-26T15:35:26.281Z", | |
"iopub.status.idle": "2020-03-26T15:35:26.285Z", | |
"shell.execute_reply": "2020-03-26T15:35:26.287Z" | |
} | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [], | |
"outputs": [], | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"source_hidden": false, | |
"outputs_hidden": false | |
}, | |
"nteract": { | |
"transient": { | |
"deleting": false | |
} | |
} | |
} | |
} | |
], | |
"metadata": { | |
"kernel_info": { | |
"name": "python3" | |
}, | |
"language_info": { | |
"name": "python", | |
"version": "3.7.2", | |
"mimetype": "text/x-python", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"file_extension": ".py" | |
}, | |
"kernelspec": { | |
"argv": [ | |
"/Users/swa/conda/bin/python", | |
"-m", | |
"ipykernel_launcher", | |
"-f", | |
"{connection_file}" | |
], | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment