Last active
April 30, 2019 02:49
-
-
Save ajtulloch/7d3ff88981f0aab03ac4a8e0538e1844 to your computer and use it in GitHub Desktop.
RelayTVMFusionE2E.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import tvm\nfrom tvm import relay\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n", | |
"execution_count": 1, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "X = tvm.placeholder((10,), name=\"X\")\nY = tvm.placeholder((10,), name=\"Y\")\n\nZ = tvm.compute(X.shape, lambda i: X[i] + Y[i], name=\"Z\")\nZ_relu = tvm.compute(Z.shape, lambda i: tvm.max(Z[i], 0), name=\"Z_relu\")", | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Default Schedule\n\nFirst, we see that the default schedule does two separate passes, one to compute the sum, another to compute the ReLU." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "s = tvm.create_schedule(Z_relu.op)\nprint(tvm.lower(s, [X, Y, Z_relu], simple_mode=True))", | |
"execution_count": 3, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "// attr [Z] storage_scope = \"global\"\nallocate Z[float32 * 10]\nproduce Z {\n for (i, 0, 10) {\n Z[i] = (X[i] + Y[i])\n }\n}\nproduce Z_relu {\n for (i, 0, 10) {\n Z_relu[i] = max(Z[i], 0.000000f)\n }\n}\n\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Fused Schedule\n\nNow, we compute the addition 'inline' - that is, we compute it at the point where it is used (in the ReLU). This allows us to compute the entire expression in a single pass over the input data." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "s = tvm.create_schedule(Z_relu.op)\ns[Z].compute_inline()\nprint(tvm.lower(s, [X, Y, Z_relu], simple_mode=True))", | |
"execution_count": 4, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "produce Z_relu {\n for (i, 0, 10) {\n Z_relu[i] = max((X[i] + Y[i]), 0.000000f)\n }\n}\n\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Fusion at the Relay level\n\nNow, let's construct a simple graph of Relay IR. This is a simple Add -> Exp -> ReLU graph." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "x = relay.var(\"x\", shape=(10, 32))\ny = relay.add(x, relay.const(1, \"float32\"))\nz = relay.exp(y)\nw = relay.maximum(z, relay.const(0, \"float32\"))\n\nf = relay.Function([x], w)\nf = relay.ir_pass.infer_type(f)", | |
"execution_count": 5, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "We see that the graph is a single function with three instructions (add, exp, maximum)." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "print(f.astext(show_meta_data=False))", | |
"execution_count": 6, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "v0.0.1\n%3 = fn (%x: Tensor[(10, 32), float32]) -> Tensor[(10, 32), float32] {\n %0 = add(%x, 1f) // ty=Tensor[(10, 32), float32]\n %1 = exp(%0) // ty=Tensor[(10, 32), float32]\n %2 = maximum(%1, 0f) // ty=Tensor[(10, 32), float32]\n %2\n}\n%3\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Fusion Pass\n\nNow, we invoke operator fusion, we see that the graph is decomposed into a separate Relay function \n\n```\n %3 = fn (%p0: Tensor[(10, 20), float32], __dict__=meta[StrMap][0]) -> Tensor[(10, 20), float32] {\n %0 = add(%p0, 1f) // ty=Tensor[(10, 20), float32]\n %1 = exp(%0) // ty=Tensor[(10, 20), float32]\n %2 = maximum(%1, 0f) // ty=Tensor[(10, 20), float32]\n %2\n }\n```\n\nWe will then generate the fused HalideIR for this subgraph directly." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "ff = relay.ir_pass.fuse_ops(f, opt_level=2)\nff = relay.ir_pass.infer_type(ff)\n\nprint(ff.astext(show_meta_data=False))", | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "v0.0.1\n%5 = fn (%x: Tensor[(10, 32), float32]) -> Tensor[(10, 32), float32] {\n %3 = fn (%p0: Tensor[(10, 32), float32], __dict__=meta[StrMap][0]) -> Tensor[(10, 32), float32] {\n %0 = add(%p0, 1f) // ty=Tensor[(10, 32), float32]\n %1 = exp(%0) // ty=Tensor[(10, 32), float32]\n %2 = maximum(%1, 0f) // ty=Tensor[(10, 32), float32]\n %2\n }\n %4 = %3(%x) // ty=Tensor[(10, 32), float32]\n %4\n}\n%5\n// meta data omitted. you can use show_meta_data=True to include meta data\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Generated HalideIR for fused blocks\n\nWe can now invoke the compilation flow and see the exact HalideIR we generate for our fused block. We produce a function called `fused_add_exp_maximum`, where the HalideIR is what we'd expect:\n\n```\nproduce tensor {\n parallel (ax0, 0, 10) {\n for (ax1.outer, 0, 2) {\n tensor[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] = max(exp((placeholder[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] + x16(1.000000f))), x16(0.000000f))\n }\n }\n}\n```" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "\n_ = relay.build(ff, target=\"llvm -mcpu=core-avx2\")", | |
"execution_count": 8, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "DEBUG:autotvm:Finish loading 35 records\nDEBUG:root:lower function fused_add_exp_maximum\nDEBUG:root:produce tensor {\n parallel (ax0, 0, 10) {\n for (ax1.outer, 0, 2) {\n tensor[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] = max(exp((placeholder[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] + x16(1.000000f))), x16(0.000000f))\n }\n }\n}\n\n", | |
"name": "stderr" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "\nx = relay.var(\"x\", shape=(1, 16, 32, 32))\nk = relay.var(\"k\", shape=(32, 16, 3, 3))\n\ny = relay.nn.max_pool2d(x, pool_size=[2, 2])\nz = relay.exp(y)\n\nz_conv = relay.nn.conv2d(z, k)\nz_conv_relu = relay.nn.relu(z_conv)\n\nf = relay.Function([x, k], z_conv_relu)\nf = relay.ir_pass.infer_type(f)", | |
"execution_count": 9, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "print(f.astext(show_meta_data=False))", | |
"execution_count": 10, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "v0.0.1\n%4 = fn (%x: Tensor[(1, 16, 32, 32), float32], %k: Tensor[(32, 16, 3, 3), float32]) -> Tensor[(1, 32, 29, 29), float32] {\n %0 = nn.max_pool2d(%x, pool_size=[2, 2]) // ty=Tensor[(1, 16, 31, 31), float32]\n %1 = exp(%0) // ty=Tensor[(1, 16, 31, 31), float32]\n %2 = nn.conv2d(%1, %k) // ty=Tensor[(1, 32, 29, 29), float32]\n %3 = nn.relu(%2) // ty=Tensor[(1, 32, 29, 29), float32]\n %3\n}\n%4\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "ff = relay.ir_pass.fuse_ops(f, opt_level=2)\nff = relay.ir_pass.infer_type(ff)\n\nprint(ff.astext(show_meta_data=False))", | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "v0.0.1\n%8 = fn (%x: Tensor[(1, 16, 32, 32), float32], %k: Tensor[(32, 16, 3, 3), float32]) -> Tensor[(1, 32, 29, 29), float32] {\n %2 = fn (%p0: Tensor[(1, 16, 32, 32), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 16, 31, 31), float32] {\n %0 = nn.max_pool2d(%p0, pool_size=[2, 2]) // ty=Tensor[(1, 16, 31, 31), float32]\n %1 = exp(%0) // ty=Tensor[(1, 16, 31, 31), float32]\n %1\n }\n %3 = %2(%x) // ty=Tensor[(1, 16, 31, 31), float32]\n %6 = fn (%p01: Tensor[(1, 16, 31, 31), float32], %p1: Tensor[(32, 16, 3, 3), float32], __dict__=meta[StrMap][1]) -> Tensor[(1, 32, 29, 29), float32] {\n %4 = nn.conv2d(%p01, %p1) // ty=Tensor[(1, 32, 29, 29), float32]\n %5 = nn.relu(%4) // ty=Tensor[(1, 32, 29, 29), float32]\n %5\n }\n %7 = %6(%3, %k) // ty=Tensor[(1, 32, 29, 29), float32]\n %7\n}\n%8\n// meta data omitted. you can use show_meta_data=True to include meta data\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "", | |
"execution_count": null, | |
"outputs": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python2", | |
"display_name": "Python 2", | |
"language": "python" | |
}, | |
"_draft": { | |
"nbviewer_url": "https://gist.github.com/7d3ff88981f0aab03ac4a8e0538e1844" | |
}, | |
"language_info": { | |
"mimetype": "text/x-python", | |
"nbconvert_exporter": "python", | |
"name": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.15", | |
"file_extension": ".py", | |
"codemirror_mode": { | |
"version": 2, | |
"name": "ipython" | |
} | |
}, | |
"gist": { | |
"id": "7d3ff88981f0aab03ac4a8e0538e1844", | |
"data": { | |
"description": "RelayTVMFusionE2E.ipynb", | |
"public": false | |
} | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment