ajtulloch · April 30, 2019 02:49
diff --git a/Untitled41.ipynb b/Untitled41.ipynb
 {
  "cells": [
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import tvm\nfrom tvm import relay\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n",
      "execution_count": 1,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "X = tvm.placeholder((10,), name=\"X\")\nY = tvm.placeholder((10,), name=\"Y\")\n\nZ = tvm.compute(X.shape, lambda i: X[i] + Y[i], name=\"Z\")\nZ_relu = tvm.compute(Z.shape, lambda i: tvm.max(Z[i], 0), name=\"Z_relu\")",
      "execution_count": 2,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Default Schedule\n\nFirst, we see that the default schedule does two separate passes, one to compute the sum, another to compute the ReLU."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "s = tvm.create_schedule(Z_relu.op)\nprint(tvm.lower(s, [X, Y, Z_relu], simple_mode=True))",
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": "// attr [Z] storage_scope = \"global\"\nallocate Z[float32 * 10]\nproduce Z {\n  for (i, 0, 10) {\n    Z[i] = (X[i] + Y[i])\n  }\n}\nproduce Z_relu {\n  for (i, 0, 10) {\n    Z_relu[i] = max(Z[i], 0.000000f)\n  }\n}\n\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Fused Schedule\n\nNow, we compute the addition 'inline' - that is, we compute it at the point where it is used (in the ReLU). This allows us to compute the entire expression in a single pass over the input data."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "s = tvm.create_schedule(Z_relu.op)\ns[Z].compute_inline()\nprint(tvm.lower(s, [X, Y, Z_relu], simple_mode=True))",
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "text": "produce Z_relu {\n  for (i, 0, 10) {\n    Z_relu[i] = max((X[i] + Y[i]), 0.000000f)\n  }\n}\n\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Fusion at the Relay level\n\nNow, let's construct a simple graph of Relay IR. This is a simple Add -> Exp -> ReLU graph."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "x = relay.var(\"x\", shape=(10, 32))\ny = relay.add(x, relay.const(1, \"float32\"))\nz = relay.exp(y)\nw = relay.maximum(z, relay.const(0, \"float32\"))\n\nf = relay.Function([x], w)\nf = relay.ir_pass.infer_type(f)",
      "execution_count": 5,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "We see that the graph is a single function with three instructions (add, exp, maximum)."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "print(f.astext(show_meta_data=False))",
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "text": "v0.0.1\n%3 = fn (%x: Tensor[(10, 32), float32]) -> Tensor[(10, 32), float32] {\n  %0 = add(%x, 1f) // ty=Tensor[(10, 32), float32]\n  %1 = exp(%0) // ty=Tensor[(10, 32), float32]\n  %2 = maximum(%1, 0f) // ty=Tensor[(10, 32), float32]\n  %2\n}\n%3\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Fusion Pass\n\nNow, we invoke operator fusion, we see that the graph is decomposed into a separate Relay function \n\n```\n  %3 = fn (%p0: Tensor[(10, 20), float32], __dict__=meta[StrMap][0]) -> Tensor[(10, 20), float32] {\n    %0 = add(%p0, 1f) // ty=Tensor[(10, 20), float32]\n    %1 = exp(%0) // ty=Tensor[(10, 20), float32]\n    %2 = maximum(%1, 0f) // ty=Tensor[(10, 20), float32]\n    %2\n  }\n```\n\nWe will then generate the fused HalideIR for this subgraph directly."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "ff = relay.ir_pass.fuse_ops(f, opt_level=2)\nff = relay.ir_pass.infer_type(ff)\n\nprint(ff.astext(show_meta_data=False))",
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "text": "v0.0.1\n%5 = fn (%x: Tensor[(10, 32), float32]) -> Tensor[(10, 32), float32] {\n  %3 = fn (%p0: Tensor[(10, 32), float32], __dict__=meta[StrMap][0]) -> Tensor[(10, 32), float32] {\n    %0 = add(%p0, 1f) // ty=Tensor[(10, 32), float32]\n    %1 = exp(%0) // ty=Tensor[(10, 32), float32]\n    %2 = maximum(%1, 0f) // ty=Tensor[(10, 32), float32]\n    %2\n  }\n  %4 = %3(%x) // ty=Tensor[(10, 32), float32]\n  %4\n}\n%5\n// meta data omitted. you can use show_meta_data=True to include meta data\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Generated HalideIR for fused blocks\n\nWe can now invoke the compilation flow and see the exact HalideIR we generate for our fused block. We produce a function called `fused_add_exp_maximum`, where the HalideIR is what we'd expect:\n\n```\nproduce tensor {\n  parallel (ax0, 0, 10) {\n    for (ax1.outer, 0, 2) {\n      tensor[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] = max(exp((placeholder[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] + x16(1.000000f))), x16(0.000000f))\n    }\n  }\n}\n```"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "\n_ = relay.build(ff, target=\"llvm -mcpu=core-avx2\")",
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "text": "DEBUG:autotvm:Finish loading 35 records\nDEBUG:root:lower function fused_add_exp_maximum\nDEBUG:root:produce tensor {\n  parallel (ax0, 0, 10) {\n    for (ax1.outer, 0, 2) {\n      tensor[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] = max(exp((placeholder[ramp((((ax0*2) + ax1.outer)*16), 1, 16)] + x16(1.000000f))), x16(0.000000f))\n    }\n  }\n}\n\n",
          "name": "stderr"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "\nx = relay.var(\"x\", shape=(1, 16, 32, 32))\nk = relay.var(\"k\", shape=(32, 16, 3, 3))\n\ny = relay.nn.max_pool2d(x, pool_size=[2, 2])\nz = relay.exp(y)\n\nz_conv = relay.nn.conv2d(z, k)\nz_conv_relu = relay.nn.relu(z_conv)\n\nf = relay.Function([x, k], z_conv_relu)\nf = relay.ir_pass.infer_type(f)",
      "execution_count": 9,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "print(f.astext(show_meta_data=False))",
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "text": "v0.0.1\n%4 = fn (%x: Tensor[(1, 16, 32, 32), float32], %k: Tensor[(32, 16, 3, 3), float32]) -> Tensor[(1, 32, 29, 29), float32] {\n  %0 = nn.max_pool2d(%x, pool_size=[2, 2]) // ty=Tensor[(1, 16, 31, 31), float32]\n  %1 = exp(%0) // ty=Tensor[(1, 16, 31, 31), float32]\n  %2 = nn.conv2d(%1, %k) // ty=Tensor[(1, 32, 29, 29), float32]\n  %3 = nn.relu(%2) // ty=Tensor[(1, 32, 29, 29), float32]\n  %3\n}\n%4\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "ff = relay.ir_pass.fuse_ops(f, opt_level=2)\nff = relay.ir_pass.infer_type(ff)\n\nprint(ff.astext(show_meta_data=False))",
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "text": "v0.0.1\n%8 = fn (%x: Tensor[(1, 16, 32, 32), float32], %k: Tensor[(32, 16, 3, 3), float32]) -> Tensor[(1, 32, 29, 29), float32] {\n  %2 = fn (%p0: Tensor[(1, 16, 32, 32), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 16, 31, 31), float32] {\n    %0 = nn.max_pool2d(%p0, pool_size=[2, 2]) // ty=Tensor[(1, 16, 31, 31), float32]\n    %1 = exp(%0) // ty=Tensor[(1, 16, 31, 31), float32]\n    %1\n  }\n  %3 = %2(%x) // ty=Tensor[(1, 16, 31, 31), float32]\n  %6 = fn (%p01: Tensor[(1, 16, 31, 31), float32], %p1: Tensor[(32, 16, 3, 3), float32], __dict__=meta[StrMap][1]) -> Tensor[(1, 32, 29, 29), float32] {\n    %4 = nn.conv2d(%p01, %p1) // ty=Tensor[(1, 32, 29, 29), float32]\n    %5 = nn.relu(%4) // ty=Tensor[(1, 32, 29, 29), float32]\n    %5\n  }\n  %7 = %6(%3, %k) // ty=Tensor[(1, 32, 29, 29), float32]\n  %7\n}\n%8\n// meta data omitted. you can use show_meta_data=True to include meta data\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "",
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python2",
      "display_name": "Python 2",
      "language": "python"
    },
    "_draft": {
      "nbviewer_url": "https://gist.github.com/7d3ff88981f0aab03ac4a8e0538e1844"
    },
    "language_info": {
      "mimetype": "text/x-python",
      "nbconvert_exporter": "python",
      "name": "python",
      "pygments_lexer": "ipython2",
      "version": "2.7.15",
      "file_extension": ".py",
      "codemirror_mode": {
        "version": 2,
        "name": "ipython"
      }
    },
    "gist": {
      "id": "7d3ff88981f0aab03ac4a8e0538e1844",
      "data": {
        "description": "RelayTVMFusionE2E.ipynb",
        "public": false
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "import tvm\nfrom tvm import relay\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n",
	"execution_count": 1,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "X = tvm.placeholder((10,), name=\"X\")\nY = tvm.placeholder((10,), name=\"Y\")\n\nZ = tvm.compute(X.shape, lambda i: X[i] + Y[i], name=\"Z\")\nZ_relu = tvm.compute(Z.shape, lambda i: tvm.max(Z[i], 0), name=\"Z_relu\")",
	"execution_count": 2,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Default Schedule\n\nFirst, we see that the default schedule does two separate passes, one to compute the sum, another to compute the ReLU."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "s = tvm.create_schedule(Z_relu.op)\nprint(tvm.lower(s, [X, Y, Z_relu], simple_mode=True))",
	"execution_count": 3,
	"outputs": [
	{
	"output_type": "stream",
	"text": "// attr [Z] storage_scope = \"global\"\nallocate Z[float32 * 10]\nproduce Z {\n for (i, 0, 10) {\n Z[i] = (X[i] + Y[i])\n }\n}\nproduce Z_relu {\n for (i, 0, 10) {\n Z_relu[i] = max(Z[i], 0.000000f)\n }\n}\n\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Fused Schedule\n\nNow, we compute the addition 'inline' - that is, we compute it at the point where it is used (in the ReLU). This allows us to compute the entire expression in a single pass over the input data."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "s = tvm.create_schedule(Z_relu.op)\ns[Z].compute_inline()\nprint(tvm.lower(s, [X, Y, Z_relu], simple_mode=True))",
	"execution_count": 4,
	"outputs": [
	{
	"output_type": "stream",
	"text": "produce Z_relu {\n for (i, 0, 10) {\n Z_relu[i] = max((X[i] + Y[i]), 0.000000f)\n }\n}\n\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Fusion at the Relay level\n\nNow, let's construct a simple graph of Relay IR. This is a simple Add -> Exp -> ReLU graph."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "x = relay.var(\"x\", shape=(10, 32))\ny = relay.add(x, relay.const(1, \"float32\"))\nz = relay.exp(y)\nw = relay.maximum(z, relay.const(0, \"float32\"))\n\nf = relay.Function([x], w)\nf = relay.ir_pass.infer_type(f)",
	"execution_count": 5,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "We see that the graph is a single function with three instructions (add, exp, maximum)."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "print(f.astext(show_meta_data=False))",
	"execution_count": 6,
	"outputs": [
	{
	"output_type": "stream",
	"text": "v0.0.1\n%3 = fn (%x: Tensor[(10, 32), float32]) -> Tensor[(10, 32), float32] {\n %0 = add(%x, 1f) // ty=Tensor[(10, 32), float32]\n %1 = exp(%0) // ty=Tensor[(10, 32), float32]\n %2 = maximum(%1, 0f) // ty=Tensor[(10, 32), float32]\n %2\n}\n%3\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Fusion Pass\n\nNow, we invoke operator fusion, we see that the graph is decomposed into a separate Relay function \n\n```\n %3 = fn (%p0: Tensor[(10, 20), float32], __dict__=meta[StrMap][0]) -> Tensor[(10, 20), float32] {\n %0 = add(%p0, 1f) // ty=Tensor[(10, 20), float32]\n %1 = exp(%0) // ty=Tensor[(10, 20), float32]\n %2 = maximum(%1, 0f) // ty=Tensor[(10, 20), float32]\n %2\n }\n```\n\nWe will then generate the fused HalideIR for this subgraph directly."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "ff = relay.ir_pass.fuse_ops(f, opt_level=2)\nff = relay.ir_pass.infer_type(ff)\n\nprint(ff.astext(show_meta_data=False))",
	"execution_count": 7,
	"outputs": [
	{
	"output_type": "stream",
	"text": "v0.0.1\n%5 = fn (%x: Tensor[(10, 32), float32]) -> Tensor[(10, 32), float32] {\n %3 = fn (%p0: Tensor[(10, 32), float32], __dict__=meta[StrMap][0]) -> Tensor[(10, 32), float32] {\n %0 = add(%p0, 1f) // ty=Tensor[(10, 32), float32]\n %1 = exp(%0) // ty=Tensor[(10, 32), float32]\n %2 = maximum(%1, 0f) // ty=Tensor[(10, 32), float32]\n %2\n }\n %4 = %3(%x) // ty=Tensor[(10, 32), float32]\n %4\n}\n%5\n// meta data omitted. you can use show_meta_data=True to include meta data\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Generated HalideIR for fused blocks\n\nWe can now invoke the compilation flow and see the exact HalideIR we generate for our fused block. We produce a function called `fused_add_exp_maximum`, where the HalideIR is what we'd expect:\n\n```\nproduce tensor {\n parallel (ax0, 0, 10) {\n for (ax1.outer, 0, 2) {\n tensor[ramp((((ax02) + ax1.outer)16), 1, 16)] = max(exp((placeholder[ramp((((ax02) + ax1.outer)16), 1, 16)] + x16(1.000000f))), x16(0.000000f))\n }\n }\n}\n```"
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "\n_ = relay.build(ff, target=\"llvm -mcpu=core-avx2\")",
	"execution_count": 8,
	"outputs": [
	{
	"output_type": "stream",
	"text": "DEBUG:autotvm:Finish loading 35 records\nDEBUG:root:lower function fused_add_exp_maximum\nDEBUG:root:produce tensor {\n parallel (ax0, 0, 10) {\n for (ax1.outer, 0, 2) {\n tensor[ramp((((ax02) + ax1.outer)16), 1, 16)] = max(exp((placeholder[ramp((((ax02) + ax1.outer)16), 1, 16)] + x16(1.000000f))), x16(0.000000f))\n }\n }\n}\n\n",
	"name": "stderr"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "\nx = relay.var(\"x\", shape=(1, 16, 32, 32))\nk = relay.var(\"k\", shape=(32, 16, 3, 3))\n\ny = relay.nn.max_pool2d(x, pool_size=[2, 2])\nz = relay.exp(y)\n\nz_conv = relay.nn.conv2d(z, k)\nz_conv_relu = relay.nn.relu(z_conv)\n\nf = relay.Function([x, k], z_conv_relu)\nf = relay.ir_pass.infer_type(f)",
	"execution_count": 9,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "print(f.astext(show_meta_data=False))",
	"execution_count": 10,
	"outputs": [
	{
	"output_type": "stream",
	"text": "v0.0.1\n%4 = fn (%x: Tensor[(1, 16, 32, 32), float32], %k: Tensor[(32, 16, 3, 3), float32]) -> Tensor[(1, 32, 29, 29), float32] {\n %0 = nn.max_pool2d(%x, pool_size=[2, 2]) // ty=Tensor[(1, 16, 31, 31), float32]\n %1 = exp(%0) // ty=Tensor[(1, 16, 31, 31), float32]\n %2 = nn.conv2d(%1, %k) // ty=Tensor[(1, 32, 29, 29), float32]\n %3 = nn.relu(%2) // ty=Tensor[(1, 32, 29, 29), float32]\n %3\n}\n%4\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "ff = relay.ir_pass.fuse_ops(f, opt_level=2)\nff = relay.ir_pass.infer_type(ff)\n\nprint(ff.astext(show_meta_data=False))",
	"execution_count": 11,
	"outputs": [
	{
	"output_type": "stream",
	"text": "v0.0.1\n%8 = fn (%x: Tensor[(1, 16, 32, 32), float32], %k: Tensor[(32, 16, 3, 3), float32]) -> Tensor[(1, 32, 29, 29), float32] {\n %2 = fn (%p0: Tensor[(1, 16, 32, 32), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 16, 31, 31), float32] {\n %0 = nn.max_pool2d(%p0, pool_size=[2, 2]) // ty=Tensor[(1, 16, 31, 31), float32]\n %1 = exp(%0) // ty=Tensor[(1, 16, 31, 31), float32]\n %1\n }\n %3 = %2(%x) // ty=Tensor[(1, 16, 31, 31), float32]\n %6 = fn (%p01: Tensor[(1, 16, 31, 31), float32], %p1: Tensor[(32, 16, 3, 3), float32], __dict__=meta[StrMap][1]) -> Tensor[(1, 32, 29, 29), float32] {\n %4 = nn.conv2d(%p01, %p1) // ty=Tensor[(1, 32, 29, 29), float32]\n %5 = nn.relu(%4) // ty=Tensor[(1, 32, 29, 29), float32]\n %5\n }\n %7 = %6(%3, %k) // ty=Tensor[(1, 32, 29, 29), float32]\n %7\n}\n%8\n// meta data omitted. you can use show_meta_data=True to include meta data\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "",
	"execution_count": null,
	"outputs": []
	}
	],
	"metadata": {
	"kernelspec": {
	"name": "python2",
	"display_name": "Python 2",
	"language": "python"
	},
	"_draft": {
	"nbviewer_url": "https://gist.github.com/7d3ff88981f0aab03ac4a8e0538e1844"
	},
	"language_info": {
	"mimetype": "text/x-python",
	"nbconvert_exporter": "python",
	"name": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.15",
	"file_extension": ".py",
	"codemirror_mode": {
	"version": 2,
	"name": "ipython"
	}
	},
	"gist": {
	"id": "7d3ff88981f0aab03ac4a8e0538e1844",
	"data": {
	"description": "RelayTVMFusionE2E.ipynb",
	"public": false
	}
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}