amaarora · June 2, 2021 01:09
diff --git a/03_data.core.ipynb b/03_data.core.ipynb
 {
  "cells": [
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#skip\n! [ -e /content ] && pip install -Uqq fastai  # upgrade fastai on colab",
      "execution_count": 1,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#default_exp data.core",
      "execution_count": 2,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\nfrom fastai.torch_basics import *\nfrom fastai.data.load import *\nfrom torch.utils.data import DataLoader",
      "execution_count": 3,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\nfrom nbdev.showdoc import *",
      "execution_count": 4,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "# Data core\n\n> Core functionality for gathering data"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "The classes here provide functionality for applying a list of transforms to a set of items (`TfmdLists`, `Datasets`) or a `DataLoader` (`TfmdDl`) as well as the base class used to gather the data for model training: `DataLoaders`."
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## TfmdDL -"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n@typedispatch\ndef show_batch(x, y, samples, ctxs=None, max_n=9, **kwargs):\n    if ctxs is None: ctxs = Inf.nones\n    if hasattr(samples[0], 'show'):\n        ctxs = [s.show(ctx=c, **kwargs) for s,c,_ in zip(samples,ctxs,range(max_n))]\n    else:\n        for i in range_of(samples[0]):\n            ctxs = [b.show(ctx=c, **kwargs) for b,c,_ in zip(samples.itemgot(i),ctxs,range(max_n))]\n    return ctxs",
      "execution_count": 5,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "`show_batch` is a type-dispatched function that is responsible for showing decoded `samples`. `x` and `y` are the input and the target in the batch to be shown, and are passed along to dispatch on their types. There is a different implementation of `show_batch` if `x` is a `TensorImage` or a `TensorText` for instance (see vision.core or text.data for more details). `ctxs` can be passed but the function is responsible to create them if necessary. `kwargs` depend on the specific implementation."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n@typedispatch\ndef show_results(x, y, samples, outs, ctxs=None, max_n=9, **kwargs):\n    if ctxs is None: ctxs = Inf.nones\n    for i in range(len(samples[0])):\n        ctxs = [b.show(ctx=c, **kwargs) for b,c,_ in zip(samples.itemgot(i),ctxs,range(max_n))]\n    for i in range(len(outs[0])):\n        ctxs = [b.show(ctx=c, **kwargs) for b,c,_ in zip(outs.itemgot(i),ctxs,range(max_n))]\n    return ctxs",
      "execution_count": 6,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "`show_results` is a type-dispatched function that is responsible for showing decoded `samples` and their corresponding `outs`. Like in `show_batch`, `x` and `y` are the input and the target in the batch to be shown, and are passed along to dispatch on their types. `ctxs` can be passed but the function is responsible to create them if necessary. `kwargs` depend on the specific implementation."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n_all_ = [\"show_batch\", \"show_results\"]",
      "execution_count": 7,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n_batch_tfms = ('after_item','before_batch','after_batch')",
      "execution_count": 8,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n_collate_types = (ndarray, Tensor, typing.Mapping, str)",
      "execution_count": 9,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\ndef fa_collate(t):\n    \"A replacement for PyTorch `default_collate` which maintains types and handles `Sequence`s\"\n    b = t[0]\n    return (default_collate(t) if isinstance(b, _collate_types)\n            else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)\n            else default_collate(t))",
      "execution_count": 10,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n@delegates()\nclass TfmdDL(DataLoader, GetAttr):\n    \"Transformed `DataLoader`\"\n    _default='dataset'\n    def __init__(self, dataset, bs=64, shuffle=False, num_workers=None, collate_fn=fa_collate, verbose=False, do_setup=True, indexed=None, device=None, **kwargs):\n        if num_workers is None: num_workers = min(16, defaults.cpus)\n        for nm in _batch_tfms: kwargs[nm] = Pipeline(kwargs.get(nm,None))\n        self.after_item = kwargs.pop('after_item')\n        self.after_batch = kwargs.pop('after_batch')\n        self.before_batch = kwargs.pop('before_batch')\n        self.indexed = hasattr(dataset, '__getitem__') and not isinstance(dataset, IterableDataset)\n        \n        store_attr('shuffle, device, bs')\n        if do_setup:\n            for nm in _batch_tfms:\n                pv(f\"Setting up {nm}: {getattr(self, nm)}\", verbose)\n                getattr(self, nm).setup(self)\n        \n        if self.after_item is not noop:\n            dataset = TfmdDataset(dataset, after_item=self.after_item)\n        super().__init__(dataset=dataset, batch_size=bs, shuffle=shuffle, num_workers=num_workers, \n                         collate_fn=collate_fn, **kwargs)\n\n        \n    def __iter__(self):\n        self.before_iter()\n        for batch in super().__iter__():\n            if self.device is not None: batch = to_device(batch, self.device)\n            yield self.after_batch(batch)   \n            \n    def before_iter(self):\n        split_idx = getattr(self.dataset, 'split_idx', None)\n        for nm in _batch_tfms:\n            f = getattr(self,nm)\n            if isinstance(f,Pipeline): f.split_idx=split_idx\n\n    def _one_pass(self):\n        its = next(iter(self))\n        self._n_inp = 1 if not isinstance(its, (list,tuple)) or len(its)==1 else len(its)-1\n        self._types = explode_types(its)       \n        \n    def new(self, dataset=None, cls=None, **kwargs):\n        if dataset is None: dataset=self.dataset\n        if cls is None: cls = type(self)\n        cur_kwargs = dict(dataset=dataset, num_workers=self.num_workers, pin_memory=self.pin_memory, timeout=self.timeout,\n                          bs=self.batch_size, shuffle=self.shuffle, drop_last=self.drop_last, indexed=self.indexed, device=self.device)\n        for n in _batch_tfms:\n            o = getattr(self, n)\n            if not isinstance(o, MethodType): cur_kwargs[n] = o\n        res = cls(**merge(cur_kwargs, kwargs))\n\n        if not hasattr(self, '_n_inp') or not hasattr(self, '_types'):\n            try:\n                self._one_pass()\n                res._n_inp,res._types = self._n_inp,self._types\n            except: print(\"Could not do one pass in your dataloader, there is something wrong in it\")\n        else: res._n_inp,res._types = self._n_inp,self._types\n        return res\n\n\n    def to(self, device):\n        self.device = device\n        for tfm in self.after_batch.fs:\n            for a in L(getattr(tfm, 'parameters', None)): setattr(tfm, a, getattr(tfm, a).to(device))\n        return self\n\n    def _retain_dl(self,b):\n        if not getattr(self, '_types', None): self._one_pass()\n        return retain_types(b, typs=self._types)\n\n    def decode(self, b): return to_cpu(self.after_batch.decode(self._retain_dl(b)))\n    def decode_batch(self, b, max_n=9, full=True): return self._decode_batch(self.decode(b), max_n, full)\n\n    def _decode_batch(self, b, max_n=9, full=True):\n        f = self.after_item.decode\n        f1 = self.before_batch.decode\n        f = compose(f1, f, partial(getattr(self.dataset,'decode',noop), full = full))\n        return L(batch_to_samples(b, max_n=max_n)).map(f)\n\n    def _pre_show_batch(self, b, max_n=9):\n        \"Decode `b` to be ready for `show_batch`\"\n        b = self.decode(b)\n        if hasattr(b, 'show'): return b,None,None\n        its = self._decode_batch(b, max_n, full=False)\n        if not is_listy(b): b,its = [b],L((o,) for o in its)\n        return detuplify(b[:self.n_inp]),detuplify(b[self.n_inp:]),its\n\n    def show_batch(self, b=None, max_n=9, ctxs=None, show=True, unique=False, **kwargs):\n        if unique:\n            old_get_idxs = self.get_idxs\n            self.get_idxs = lambda: Inf.zeros\n        if b is None: b = self.one_batch()\n        if not show: return self._pre_show_batch(b, max_n=max_n)\n        show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)\n        if unique: self.get_idxs = old_get_idxs\n\n    def show_results(self, b, out, max_n=9, ctxs=None, show=True, **kwargs):\n        x,y,its = self.show_batch(b, max_n=max_n, show=False)\n        b_out = type(b)(b[:self.n_inp] + (tuple(out) if is_listy(out) else (out,)))\n        x1,y1,outs = self.show_batch(b_out, max_n=max_n, show=False)\n        res = (x,x1,None,None) if its is None else (x, y, its, outs.itemgot(slice(self.n_inp,None)))\n        if not show: return res\n        show_results(*res, ctxs=ctxs, max_n=max_n, **kwargs)\n\n    def one_batch(self):\n        res = first(self)\n        return res\n    \n    @property\n    def n_inp(self):\n        if hasattr(self.dataset, 'n_inp'): return self.dataset.n_inp\n        if not hasattr(self, '_n_inp'): self._one_pass()\n        return self._n_inp",
      "execution_count": 11,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "A `TfmdDL` is a `DataLoader` that creates `Pipeline` from a list of `Transform`s for the callbacks `after_item`, `before_batch` and `after_batch`. As a result, it can decode or show a processed `batch`."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\nadd_docs(TfmdDL,\n         decode=\"Decode `b` using `tfms`\",\n         decode_batch=\"Decode `b` entirely\",\n         new=\"Create a new version of self with a few changed attributes\",\n         show_batch=\"Show `b` (defaults to `one_batch`), a list of lists of pipeline outputs (i.e. output of a `DataLoader`)\",\n         show_results=\"Show each item of `b` and `out`\",\n         one_batch=\"Return one batch from `DataLoader`.\",\n         to=\"Put self and its transforms state on `device`\", \n         before_iter=\"override\")",
      "execution_count": 12,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class _Category(int, ShowTitle): pass",
      "execution_count": 13,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#Test retain type\nclass NegTfm(Transform):\n    def encodes(self, x): return torch.neg(x)\n    def decodes(self, x): return torch.neg(x)\n    \ntdl = TfmdDL([(TensorImage([1]),)] * 4, after_batch=NegTfm(), bs=4, num_workers=4)\nb = tdl.one_batch()\ntest_eq(type(b[0]), TensorImage)\nb = (tensor([1.,1.,1.,1.]),)\ntest_eq(type(tdl.decode_batch(b)[0][0]), TensorImage)",
      "execution_count": 14,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class A(Transform): \n    def encodes(self, x): return x\n    def decodes(self, x): return TitledInt(x) \n\n@Transform\ndef f(x)->None: return fastuple((x,x))\n\nstart = torch.arange(50)\ntest_eq_type(f(2), fastuple((2,2)))",
      "execution_count": 15,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "a = A()\ntdl = TfmdDL(start, after_item=lambda x: (a(x), f(x)), bs=4)\nx,y = tdl.one_batch()\ntest_eq(type(y), fastuple)\n\ns = tdl.decode_batch((x,y))\ntest_eq(type(s[0][1]), fastuple)",
      "execution_count": 16,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "tdl = TfmdDL(torch.arange(0,50), after_item=A(), after_batch=NegTfm(), bs=4)\ntest_eq(tdl.dataset[0], start[0])\ntest_eq(len(tdl), (50-1)//4+1)\ntest_eq(tdl.bs, 4)\ntest_stdout(tdl.show_batch, '0\\n1\\n2\\n3')",
      "execution_count": 17,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class B(Transform):\n    parameters = 'a'\n    def __init__(self): self.a = torch.tensor(0.)\n    def encodes(self, x): x\n    \ntdl = TfmdDL([(TensorImage([1]),)] * 4, after_batch=B(), bs=4)\ntest_eq(tdl.after_batch.fs[0].a.device, torch.device('cpu'))\ntdl.to(default_device())\ntest_eq(tdl.after_batch.fs[0].a.device, default_device())",
      "execution_count": 18,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Methods"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdDL.one_batch)",
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdDL.one_batch\" class=\"doc_header\"><code>TfmdDL.one_batch</code><a href=\"__main__.py#L106\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdDL.one_batch</code>()\n\nReturn one batch from `DataLoader`."
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "tfm = NegTfm()\ntdl = TfmdDL(start, after_batch=tfm, bs=4)",
      "execution_count": 20,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "b = tdl.one_batch()\ntest_eq(tensor([0,-1,-2,-3]), b)",
      "execution_count": 21,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdDL.decode)",
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdDL.decode\" class=\"doc_header\"><code>TfmdDL.decode</code><a href=\"__main__.py#L72\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdDL.decode</code>(**`b`**)\n\nDecode `b` using `tfms`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "test_eq(tdl.decode(b), tensor(0,1,2,3))",
      "execution_count": 23,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdDL.decode_batch)",
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdDL.decode_batch\" class=\"doc_header\"><code>TfmdDL.decode_batch</code><a href=\"__main__.py#L73\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdDL.decode_batch</code>(**`b`**, **`max_n`**=*`9`*, **`full`**=*`True`*)\n\nDecode `b` entirely"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "test_eq(tdl.decode_batch(b), [0,1,2,3])",
      "execution_count": 25,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdDL.show_batch)",
      "execution_count": 26,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdDL.show_batch\" class=\"doc_header\"><code>TfmdDL.show_batch</code><a href=\"__main__.py#L89\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdDL.show_batch</code>(**`b`**=*`None`*, **`max_n`**=*`9`*, **`ctxs`**=*`None`*, **`show`**=*`True`*, **`unique`**=*`False`*, **\\*\\*`kwargs`**)\n\nShow `b` (defaults to `one_batch`), a list of lists of pipeline outputs (i.e. output of a `DataLoader`)"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdDL.to)",
      "execution_count": 27,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdDL.to\" class=\"doc_header\"><code>TfmdDL.to</code><a href=\"__main__.py#L62\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdDL.to</code>(**`device`**)\n\nPut self and its transforms state on `device`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## DataLoaders -"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# export\n@docs\nclass DataLoaders(GetAttr):\n    \"Basic wrapper around several `DataLoader`s.\"\n    _default='train'\n    def __init__(self, *loaders, path='.', device=None):\n        self.loaders,self.path = list(loaders),Path(path)\n        if device is not None or hasattr(loaders[0],'to'): self.device = device\n\n    def __getitem__(self, i): return self.loaders[i]\n    def new_empty(self):\n        loaders = [dl.new(dl.dataset.new_empty()) for dl in self.loaders]\n        return type(self)(*loaders, path=self.path, device=self.device)\n\n    def _set(i, self, v): self.loaders[i] = v\n    train   ,valid    = add_props(lambda i,x: x[i], _set)\n    train_ds,valid_ds = add_props(lambda i,x: x[i].dataset)\n\n    @property\n    def device(self): return self._device\n\n    @device.setter\n    def device(self, d):\n        for dl in self.loaders: dl.to(d)\n        self._device = d\n\n    def to(self, device):\n        self.device = device\n        return self\n            \n    def _add_tfms(self, tfms, event, dl_idx):\n        \"Adds `tfms` to `event` on `dl`\"\n        if(isinstance(dl_idx,str)): dl_idx = 0 if(dl_idx=='train') else 1\n        dl_tfms = getattr(self[dl_idx], event)\n        apply(dl_tfms.add, tfms)\n        \n    def add_tfms(self,tfms,event,loaders=None):\n        \"Adds `tfms` to `events` on `loaders`\"\n        if(loaders is None): loaders=range(len(self.loaders))\n        if not is_listy(loaders): loaders = listify(loaders)\n        for loader in loaders:\n            self._add_tfms(tfms,event,loader)      \n\n    def cuda(self): return self.to(device=default_device())\n    def cpu(self):  return self.to(device=torch.device('cpu'))\n\n    @classmethod\n    def from_dsets(cls, *ds, path='.',  bs=64, device=None, dl_type=TfmdDL, **kwargs):\n        default = (True,) + (False,) * (len(ds)-1)\n        defaults = {'shuffle': default, 'drop_last': default}\n        for nm in _batch_tfms:\n            if nm in kwargs: kwargs[nm] = Pipeline(kwargs[nm])\n        kwargs = merge(defaults, {k: tuplify(v, match=ds) for k,v in kwargs.items()})\n        kwargs = [{k: v[i] for k,v in kwargs.items()} for i in range_of(ds)]\n        return cls(*[dl_type(d, bs=bs, **k) for d,k in zip(ds, kwargs)], path=path, device=device)\n\n    @classmethod\n    def from_dblock(cls, dblock, source, path='.',  bs=64, val_bs=None, shuffle=True, device=None, **kwargs):\n        return dblock.dataloaders(source, path=path, bs=bs, val_bs=val_bs, shuffle=shuffle, device=device, **kwargs)\n\n    _docs=dict(__getitem__=\"Retrieve `DataLoader` at `i` (`0` is training, `1` is validation)\",\n               train=\"Training `DataLoader`\",\n               valid=\"Validation `DataLoader`\",\n               train_ds=\"Training `Dataset`\",\n               valid_ds=\"Validation `Dataset`\",\n               to=\"Use `device`\",\n               add_tfms=\"Add `tfms` to `loaders` for `event\",\n               cuda=\"Use the gpu if available\",\n               cpu=\"Use the cpu\",\n               new_empty=\"Create a new empty version of `self` with the same transforms\",\n               from_dblock=\"Create a dataloaders from a given `dblock`\")",
      "execution_count": 28,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "dls = DataLoaders(tdl,tdl)\nx = dls.train.one_batch()\nx2 = first(tdl)\ntest_eq(x,x2)\nx2 = dls.one_batch()\ntest_eq(x,x2)",
      "execution_count": 29,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "dls[0].after_batch, dls[1].after_batch",
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 30,
          "data": {
            "text/plain": "(Pipeline: NegTfm, Pipeline: NegTfm)"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#test assignment works\ndls.train = dls.train.new(bs=4)",
      "execution_count": 31,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Multiple transforms can by added to multiple dataloaders using `Dataloaders.add_tfms`. You can specify the dataloaders by list of names `dls.add_tfms(...,'valid',...)` or by index `dls.add_tfms(...,1,....)`, by default transforms are added to all dataloaders. `event` is a required argument and determined when the transform will be run, for more information on events please refer to `TfmdDL`. `tfms` is a list of `Transform`, and is a required argument. "
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class _TestTfm(Transform):\n    def encodes(self, o):  return torch.ones_like(o)\n    def decodes(self, o):  return o\ntdl1,tdl2 = TfmdDL(start, bs=4),TfmdDL(start, bs=4)\ndls2 = DataLoaders(tdl1,tdl2)\ndls2.add_tfms([_TestTfm()],'after_batch',['valid'])\ndls2.add_tfms([_TestTfm()],'after_batch',[1])\ndls2.train.after_batch,dls2.valid.after_batch,",
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 32,
          "data": {
            "text/plain": "(Pipeline: , Pipeline: _TestTfm -> _TestTfm)"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\ntest_eq(len(dls2.train.after_batch.fs),0)\ntest_eq(len(dls2.valid.after_batch.fs),2)\ntest_eq(next(iter(dls2.valid)),tensor([1,1,1,1]))",
      "execution_count": 33,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Methods"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(DataLoaders.__getitem__)",
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"DataLoaders.__getitem__\" class=\"doc_header\"><code>DataLoaders.__getitem__</code><a href=\"__main__.py#L10\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>DataLoaders.__getitem__</code>(**`i`**)\n\nRetrieve `DataLoader` at `i` (`0` is training, `1` is validation)"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "x2 = dls[0].one_batch()\ntest_eq(x,x2)",
      "execution_count": 35,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(DataLoaders.train, name=\"DataLoaders.train\")",
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"DataLoaders.train\" class=\"doc_header\"><code>DataLoaders.train</code><a href=\"__main__.py#L16\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\nTraining `DataLoader`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(DataLoaders.valid, name=\"DataLoaders.valid\")",
      "execution_count": 37,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"DataLoaders.valid\" class=\"doc_header\"><code>DataLoaders.valid</code><a href=\"__main__.py#L16\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\nValidation `DataLoader`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(DataLoaders.train_ds, name=\"DataLoaders.train_ds\")",
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"DataLoaders.train_ds\" class=\"doc_header\"><code>DataLoaders.train_ds</code><a href=\"__main__.py#L17\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\nTraining `Dataset`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(DataLoaders.valid_ds, name=\"DataLoaders.valid_ds\")",
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"DataLoaders.valid_ds\" class=\"doc_header\"><code>DataLoaders.valid_ds</code><a href=\"__main__.py#L17\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\nValidation `Dataset`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## TfmdLists -"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\nclass FilteredBase:\n    \"Base class for lists with subsets\"\n    _dl_type,_dbunch_type = TfmdDL,DataLoaders\n    def __init__(self, *args, dl_type=None, **kwargs):\n        if dl_type is not None: self._dl_type = dl_type\n        self.dataloaders = delegates(self._dl_type.__init__)(self.dataloaders)\n        super().__init__(*args, **kwargs)\n\n    @property\n    def n_subsets(self): return len(self.splits)\n    def _new(self, items, **kwargs): return super()._new(items, splits=self.splits, **kwargs)\n    def subset(self): raise NotImplemented\n\n    def dataloaders(self, bs=64, shuffle_train=None, shuffle=True, val_shuffle=False,n=None, path='.', dl_type=None, dl_kwargs=None,\n                    device=None,drop_last=None,val_bs=None, **kwargs):\n        if shuffle_train is not None: \n            shuffle=shuffle_train\n            warnings.warn('`shuffle_train` is deprecated. Use `shuffle` instead.',DeprecationWarning)\n        if device is None: device=default_device()\n        if dl_kwargs is None: dl_kwargs = [{}] * self.n_subsets\n        if dl_type is None: dl_type = self._dl_type\n        if drop_last is None: drop_last = shuffle\n        val_kwargs={k[4:]:v for k,v in kwargs.items() if k.startswith('val_')}\n        def_kwargs = {'bs':bs,'shuffle':shuffle,'drop_last':drop_last,'device':device}\n        dl = dl_type(self.subset(0), **merge(kwargs,def_kwargs, dl_kwargs[0]))\n        def_kwargs = {'bs':bs if val_bs is None else val_bs,'shuffle':val_shuffle,'drop_last':False}\n        dls = [dl] + [dl.new(self.subset(i), **merge(kwargs,def_kwargs,val_kwargs,dl_kwargs[i]))\n                      for i in range(1, self.n_subsets)]\n        return self._dbunch_type(*dls, path=path, device=device)    \n\nFilteredBase.train,FilteredBase.valid = add_props(lambda i,x: x.subset(i))",
      "execution_count": 40,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(FilteredBase().dataloaders)",
      "execution_count": 41,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"FilteredBase.dataloaders\" class=\"doc_header\"><code>FilteredBase.dataloaders</code><a href=\"__main__.py#L15\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>FilteredBase.dataloaders</code>(**`bs`**=*`64`*, **`shuffle_train`**=*`None`*, **`shuffle`**=*`True`*, **`val_shuffle`**=*`False`*, **`n`**=*`None`*, **`path`**=*`'.'`*, **`dl_type`**=*`None`*, **`dl_kwargs`**=*`None`*, **`device`**=*`None`*, **`drop_last`**=*`None`*, **`val_bs`**=*`None`*, **`num_workers`**=*`None`*, **`collate_fn`**=*`fa_collate`*, **`verbose`**=*`False`*, **`do_setup`**=*`True`*, **`indexed`**=*`None`*, **`batch_size`**:`Optional`\\[`int`\\]=*`1`*, **`sampler`**:`Optional`\\[`Sampler'>`\\[`int`\\]\\]=*`None`*, **`batch_sampler`**:`Optional`\\[`Sampler'>`\\[`Sequence`\\[`int`\\]\\]\\]=*`None`*, **`pin_memory`**:`bool`=*`False`*, **`timeout`**:`float`=*`0`*, **`worker_init_fn`**:`Optional`\\[`int`\\]=*`None`*, **`multiprocessing_context`**=*`None`*, **`generator`**=*`None`*, **`prefetch_factor`**:`int`=*`2`*, **`persistent_workers`**:`bool`=*`False`*)\n\n"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\nclass TfmdLists(FilteredBase, L, GetAttr):\n    \"A `Pipeline` of `tfms` applied to a collection of `items`\"\n    _default='tfms'\n    def __init__(self, items, tfms, use_list=None, do_setup=True, split_idx=None, train_setup=True,\n                 splits=None, types=None, verbose=False, dl_type=None):\n        super().__init__(items, use_list=use_list)\n        if dl_type is not None: self._dl_type = dl_type\n        self.splits = L([slice(None),[]] if splits is None else splits).map(mask2idxs)\n        if isinstance(tfms,TfmdLists): tfms = tfms.tfms\n        if isinstance(tfms,Pipeline): do_setup=False\n        self.tfms = Pipeline(tfms, split_idx=split_idx)\n        store_attr('types,split_idx')\n        if do_setup:\n            pv(f\"Setting up {self.tfms}\", verbose)\n            self.setup(train_setup=train_setup)\n\n    def _new(self, items, split_idx=None, **kwargs):\n        split_idx = ifnone(split_idx,self.split_idx)\n        return super()._new(items, tfms=self.tfms, do_setup=False, types=self.types, split_idx=split_idx, **kwargs)\n    def subset(self, i): return self._new(self._get(self.splits[i]), split_idx=i)\n    def _after_item(self, o): return self.tfms(o)\n    def __repr__(self): return f\"{self.__class__.__name__}: {self.items}\\ntfms - {self.tfms.fs}\"\n    def __iter__(self): return (self[i] for i in range(len(self)))\n    def show(self, o, **kwargs): return self.tfms.show(o, **kwargs)\n    def decode(self, o, **kwargs): return self.tfms.decode(o, **kwargs)\n    def __call__(self, o, **kwargs): return self.tfms.__call__(o, **kwargs)\n    def overlapping_splits(self): return L(Counter(self.splits.concat()).values()).filter(gt(1))\n    def new_empty(self): return self._new([])\n\n    def setup(self, train_setup=True):\n        self.tfms.setup(self, train_setup)\n        if len(self) != 0:\n            x = super().__getitem__(0) if self.splits is None else super().__getitem__(self.splits[0])[0]\n            self.types = []\n            for f in self.tfms.fs:\n                self.types.append(getattr(f, 'input_types', type(x)))\n                x = f(x)\n            self.types.append(type(x))\n        types = L(t if is_listy(t) else [t] for t in self.types).concat().unique()\n        self.pretty_types = '\\n'.join([f'  - {t}' for t in types])\n\n    def infer_idx(self, x):\n        # TODO: check if we really need this, or can simplify\n        idx = 0\n        for t in self.types:\n            if isinstance(x, t): break\n            idx += 1\n        types = L(t if is_listy(t) else [t] for t in self.types).concat().unique()\n        pretty_types = '\\n'.join([f'  - {t}' for t in types])\n        assert idx < len(self.types), f\"Expected an input of type in \\n{pretty_types}\\n but got {type(x)}\"\n        return idx\n\n    def infer(self, x):\n        return compose_tfms(x, tfms=self.tfms.fs[self.infer_idx(x):], split_idx=self.split_idx)\n\n    def __getitem__(self, idx):\n        res = super().__getitem__(idx)\n        if self._after_item is None: return res\n        return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)",
      "execution_count": 42,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\nadd_docs(TfmdLists,\n         setup=\"Transform setup with self\",\n         decode=\"From `Pipeline`\",\n         show=\"From `Pipeline`\",\n         overlapping_splits=\"All splits that are in more than one split\",\n         subset=\"New `TfmdLists` with same tfms that only includes items in `i`th split\",\n         infer_idx=\"Finds the index where `self.tfms` can be applied to `x`, depending on the type of `x`\",\n         infer=\"Apply `self.tfms` to `x` starting at the right tfm depending on the type of `x`\",\n         new_empty=\"A new version of `self` but with no items\")",
      "execution_count": 43,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#exports\ndef decode_at(o, idx):\n    \"Decoded item at `idx`\"\n    return o.decode(o[idx])",
      "execution_count": 44,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#exports\ndef show_at(o, idx, **kwargs):\n    \"Show item at `idx`\",\n    return o.show(o[idx], **kwargs)",
      "execution_count": 45,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "A `TfmdLists` combines a collection of object with a `Pipeline`. `tfms` can either be a `Pipeline` or a list of transforms, in which case, it will wrap them in a `Pipeline`. `use_list` is passed along to `L` with the `items` and `split_idx` are passed to each transform of the `Pipeline`. `do_setup` indicates if the `Pipeline.setup` method should be called during initialization."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class _IntFloatTfm(Transform):\n    def encodes(self, o):  return TitledInt(o)\n    def decodes(self, o):  return TitledFloat(o)\nint2f_tfm=_IntFloatTfm()\n\ndef _neg(o): return -o\nneg_tfm = Transform(_neg, _neg)",
      "execution_count": 46,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "items = L([1.,2.,3.]); tfms = [neg_tfm, int2f_tfm]\ntl = TfmdLists(items, tfms=tfms)\ntest_eq_type(tl[0], TitledInt(-1))\ntest_eq_type(tl[1], TitledInt(-2))\ntest_eq_type(tl.decode(tl[2]), TitledFloat(3.))\ntest_stdout(lambda: show_at(tl, 2), '-3')\ntest_eq(tl.types, [float, float, TitledInt])\ntl",
      "execution_count": 47,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 47,
          "data": {
            "text/plain": "TfmdLists: [1.0, 2.0, 3.0]\ntfms - [_neg:\nencodes: (object,object) -> _negdecodes: (object,object) -> _neg, _IntFloatTfm:\nencodes: (object,object) -> encodes\ndecodes: (object,object) -> decodes\n]"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# add splits to TfmdLists\nsplits = [[0,2],[1]]\ntl = TfmdLists(items, tfms=tfms, splits=splits)\ntest_eq(tl.n_subsets, 2)\ntest_eq(tl.train, tl.subset(0))\ntest_eq(tl.valid, tl.subset(1))\ntest_eq(tl.train.items, items[splits[0]])\ntest_eq(tl.valid.items, items[splits[1]])\ntest_eq(tl.train.tfms.split_idx, 0)\ntest_eq(tl.valid.tfms.split_idx, 1)\ntest_eq(tl.train.new_empty().split_idx, 0)\ntest_eq(tl.valid.new_empty().split_idx, 1)\ntest_eq_type(tl.splits, L(splits))\nassert not tl.overlapping_splits()",
      "execution_count": 48,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df = pd.DataFrame(dict(a=[1,2,3],b=[2,3,4]))\ntl = TfmdLists(df, lambda o: o.a+1, splits=[[0],[1,2]])\ntest_eq(tl[1,2], [3,4])\ntr = tl.subset(0)\ntest_eq(tr[:], [2])\nval = tl.subset(1)\ntest_eq(val[:], [3,4])",
      "execution_count": 49,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class _B(Transform):\n    def __init__(self): self.m = 0\n    def encodes(self, o): return o+self.m\n    def decodes(self, o): return o-self.m\n    def setups(self, items): \n        print(items)\n        self.m = tensor(items).float().mean().item()\n\n# test for setup, which updates `self.m`\ntl = TfmdLists(items, _B())\ntest_eq(tl.m, 2)",
      "execution_count": 50,
      "outputs": [
        {
          "output_type": "stream",
          "text": "TfmdLists: [1.0, 2.0, 3.0]\ntfms - []\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Here's how we can use `TfmdLists.setup` to implement a simple category list, getting labels from a mock file list:"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class _Cat(Transform):\n    order = 1\n    def encodes(self, o):    return int(self.o2i[o])\n    def decodes(self, o):    return TitledStr(self.vocab[o])\n    def setups(self, items): self.vocab,self.o2i = uniqueify(L(items), sort=True, bidir=True)\ntcat = _Cat()\n\ndef _lbl(o): return TitledStr(o.split('_')[0])\n\n# Check that tfms are sorted by `order` & `_lbl` is called first\nfns = ['dog_0.jpg','cat_0.jpg','cat_2.jpg','cat_1.jpg','dog_1.jpg']\ntl = TfmdLists(fns, [tcat,_lbl])\nexp_voc = ['cat','dog']\ntest_eq(tcat.vocab, exp_voc)\ntest_eq(tl.tfms.vocab, exp_voc)\ntest_eq(tl.vocab, exp_voc)\ntest_eq(tl, (1,0,0,0,1))\ntest_eq([tl.decode(o) for o in tl], ('dog','cat','cat','cat','dog'))",
      "execution_count": 51,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#Check only the training set is taken into account for setup\ntl = TfmdLists(fns, [tcat,_lbl], splits=[[0,4], [1,2,3]])\ntest_eq(tcat.vocab, ['dog'])",
      "execution_count": 52,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "tfm = NegTfm(split_idx=1)\ntds = TfmdLists(start, A())\ntdl = TfmdDL(tds, after_batch=tfm, bs=4)\nx = tdl.one_batch()\ntest_eq(x, torch.arange(4))\ntds.split_idx = 1\nx = tdl.one_batch()\ntest_eq(x, -torch.arange(4))\ntds.split_idx = 0\nx = tdl.one_batch()\ntest_eq(x, torch.arange(4))",
      "execution_count": 53,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "tds = TfmdLists(start, A())\ntdl = TfmdDL(tds, after_batch=NegTfm(), bs=4)\ntest_eq(tdl.dataset[0], start[0])\ntest_eq(len(tdl), (len(tds)-1)//4+1)\ntest_eq(tdl.bs, 4)\ntest_stdout(tdl.show_batch, '0\\n1\\n2\\n3')",
      "execution_count": 54,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdLists.subset)",
      "execution_count": 55,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdLists.subset\" class=\"doc_header\"><code>TfmdLists.subset</code><a href=\"__main__.py#L21\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdLists.subset</code>(**`i`**)\n\nNew [`TfmdLists`](/data.core.html#TfmdLists) with same tfms that only includes items in `i`th split"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdLists.infer_idx)",
      "execution_count": 56,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdLists.infer_idx\" class=\"doc_header\"><code>TfmdLists.infer_idx</code><a href=\"__main__.py#L43\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdLists.infer_idx</code>(**`x`**)\n\nFinds the index where `self.tfms` can be applied to `x`, depending on the type of `x`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(TfmdLists.infer)",
      "execution_count": 57,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"TfmdLists.infer\" class=\"doc_header\"><code>TfmdLists.infer</code><a href=\"__main__.py#L54\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>TfmdLists.infer</code>(**`x`**)\n\nApply `self.tfms` to `x` starting at the right tfm depending on the type of `x`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "def mult(x): return x*2\nmult.order = 2\n\nfns = ['dog_0.jpg','cat_0.jpg','cat_2.jpg','cat_1.jpg','dog_1.jpg']\ntl = TfmdLists(fns, [_lbl,_Cat(),mult])\n\ntest_eq(tl.infer_idx('dog_45.jpg'), 0)\ntest_eq(tl.infer('dog_45.jpg'), 2)\n\ntest_eq(tl.infer_idx(4), 2)\ntest_eq(tl.infer(4), 8)\n\ntest_fail(lambda: tl.infer_idx(2.0))\ntest_fail(lambda: tl.infer(2.0))",
      "execution_count": 58,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#Test input_types works on a Transform\ncat = _Cat()\ncat.input_types = (str, float)\ntl = TfmdLists(fns, [_lbl,cat,mult])\ntest_eq(tl.infer_idx(2.0), 1)\n\n#Test type annotations work on a function\ndef mult(x:(int,float)): return x*2\nmult.order = 2\ntl = TfmdLists(fns, [_lbl,_Cat(),mult])\ntest_eq(tl.infer_idx(2.0), 2)",
      "execution_count": 59,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Datasets -"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n@docs\n@delegates(TfmdLists)\nclass Datasets(FilteredBase):\n    \"A dataset that creates a tuple from each `tfms`, passed through `item_tfms`\"\n    def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):\n        super().__init__(dl_type=dl_type)\n        self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])\n        self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))\n\n    def __getitem__(self, it):\n        res = tuple([tl[it] for tl in self.tls])\n        return res if is_indexer(it) else list(zip(*res))\n\n    def __getattr__(self,k): return gather_attrs(self, k, 'tls')\n    def __dir__(self): return super().__dir__() + gather_attr_names(self, 'tls')\n    def __len__(self): return len(self.tls[0])\n    def __iter__(self): return (self[i] for i in range(len(self)))\n    def __repr__(self): return coll_repr(self)\n    def decode(self, o, full=True): return tuple(tl.decode(o_, full=full) for o_,tl in zip(o,tuplify(self.tls, match=o)))\n    def subset(self, i): return type(self)(tls=L(tl.subset(i) for tl in self.tls), n_inp=self.n_inp)\n    def _new(self, items, *args, **kwargs): return super()._new(items, tfms=self.tfms, do_setup=False, **kwargs)\n    def overlapping_splits(self): return self.tls[0].overlapping_splits()\n    def new_empty(self): return type(self)(tls=[tl.new_empty() for tl in self.tls], n_inp=self.n_inp)\n    @property\n    def splits(self): return self.tls[0].splits\n    @property\n    def split_idx(self): return self.tls[0].tfms.split_idx\n    @property\n    def items(self): return self.tls[0].items\n    @items.setter\n    def items(self, v):\n        for tl in self.tls: tl.items = v\n\n    def show(self, o, ctx=None, **kwargs):\n        for o_,tl in zip(o,self.tls): ctx = tl.show(o_, ctx=ctx, **kwargs)\n        return ctx\n\n    @contextmanager\n    def set_split_idx(self, i):\n        old_split_idx = self.split_idx\n        for tl in self.tls: tl.tfms.split_idx = i\n        try: yield self\n        finally:\n            for tl in self.tls: tl.tfms.split_idx = old_split_idx\n\n    _docs=dict(\n        decode=\"Compose `decode` of all `tuple_tfms` then all `tfms` on `i`\",\n        show=\"Show item `o` in `ctx`\",\n        dataloaders=\"Get a `DataLoaders`\",\n        overlapping_splits=\"All splits that are in more than one split\",\n        subset=\"New `Datasets` that only includes subset `i`\",\n        new_empty=\"Create a new empty version of the `self`, keeping only the transforms\",\n        set_split_idx=\"Contextmanager to use the same `Datasets` with another `split_idx`\"\n    )",
      "execution_count": 60,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "A `Datasets` creates a tuple from `items` (typically input,target) by applying to them each list of `Transform` (or `Pipeline`) in `tfms`. Note that if `tfms` contains only one list of `tfms`, the items given by `Datasets` will be tuples of one element. \n\n`n_inp` is the number of elements in the tuples that should be considered part of the input and will default to 1 if `tfms` consists of one set of transforms, `len(tfms)-1` otherwise. In most cases, the number of elements in the tuples spit out by `Datasets` will be 2 (for input,target) but it can happen that there is 3 (Siamese networks or tabular data) in which case we need to be able to determine when the inputs end and the targets begin."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "items = [1,2,3,4]\ndsets = Datasets(items, [[neg_tfm,int2f_tfm], [add(1)]])\nt = dsets[0]\ntest_eq(t, (-1,2))\ntest_eq(dsets[0,1,2], [(-1,2),(-2,3),(-3,4)])\ntest_eq(dsets.n_inp, 1)\ndsets.decode(t)",
      "execution_count": 61,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 61,
          "data": {
            "text/plain": "(1.0, 2)"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class Norm(Transform):\n    def encodes(self, o): return (o-self.m)/self.s\n    def decodes(self, o): return (o*self.s)+self.m\n    def setups(self, items):\n        its = tensor(items).float()\n        self.m,self.s = its.mean(),its.std()",
      "execution_count": 62,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "items = [1,2,3,4]\nnrm = Norm()\ndsets = Datasets(items, [[neg_tfm,int2f_tfm], [neg_tfm,nrm]])\n\nx,y = zip(*dsets)\ntest_close(tensor(y).mean(), 0)\ntest_close(tensor(y).std(), 1)\ntest_eq(x, (-1,-2,-3,-4,))\ntest_eq(nrm.m, -2.5)\ntest_stdout(lambda:show_at(dsets, 1), '-2')\n\ntest_eq(dsets.m, nrm.m)\ntest_eq(dsets.norm.m, nrm.m)\ntest_eq(dsets.train.norm.m, nrm.m)",
      "execution_count": 63,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#Check filtering is properly applied\nclass B(Transform):\n    def encodes(self, x)->None:  return int(x+1)\n    def decodes(self, x):        return TitledInt(x-1)\nadd1 = B(split_idx=1)\n\ndsets = Datasets(items, [neg_tfm, [neg_tfm,int2f_tfm,add1]], splits=[[3],[0,1,2]])\ntest_eq(dsets[1], [-2,-2])\ntest_eq(dsets.valid[1], [-2,-1])\ntest_eq(dsets.valid[[1,1]], [[-2,-1], [-2,-1]])\ntest_eq(dsets.train[0], [-4,-4])",
      "execution_count": 64,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "test_fns = ['dog_0.jpg','cat_0.jpg','cat_2.jpg','cat_1.jpg','kid_1.jpg']\ntcat = _Cat()\ndsets = Datasets(test_fns, [[tcat,_lbl]], splits=[[0,1,2], [3,4]])\ntest_eq(tcat.vocab, ['cat','dog'])\ntest_eq(dsets.train, [(1,),(0,),(0,)])\ntest_eq(dsets.valid[0], (0,))\ntest_stdout(lambda: show_at(dsets.train, 0), \"dog\")",
      "execution_count": 65,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "inp = [0,1,2,3,4]\ndsets = Datasets(inp, tfms=[None])\n\ntest_eq(*dsets[2], 2)          # Retrieve one item (subset 0 is the default)\ntest_eq(dsets[1,2], [(1,),(2,)])    # Retrieve two items by index\nmask = [True,False,False,True,False]\ntest_eq(dsets[mask], [(0,),(3,)])   # Retrieve two items by mask",
      "execution_count": 66,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "inp = pd.DataFrame(dict(a=[5,1,2,3,4]))\ndsets = Datasets(inp, tfms=attrgetter('a')).subset(0)\ntest_eq(*dsets[2], 2)          # Retrieve one item (subset 0 is the default)\ntest_eq(dsets[1,2], [(1,),(2,)])    # Retrieve two items by index\nmask = [True,False,False,True,False]\ntest_eq(dsets[mask], [(5,),(3,)])   # Retrieve two items by mask",
      "execution_count": 67,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#test n_inp\ninp = [0,1,2,3,4]\ndsets = Datasets(inp, tfms=[None])\ntest_eq(dsets.n_inp, 1)\ndsets = Datasets(inp, tfms=[[None],[None],[None]])\ntest_eq(dsets.n_inp, 2)\ndsets = Datasets(inp, tfms=[[None],[None],[None]], n_inp=1)\ntest_eq(dsets.n_inp, 1)",
      "execution_count": 68,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# splits can be indices\ndsets = Datasets(range(5), tfms=[None], splits=[tensor([0,2]), [1,3,4]])\n\ntest_eq(dsets.subset(0), [(0,),(2,)])\ntest_eq(dsets.train, [(0,),(2,)])       # Subset 0 is aliased to `train`\ntest_eq(dsets.subset(1), [(1,),(3,),(4,)])\ntest_eq(dsets.valid, [(1,),(3,),(4,)])     # Subset 1 is aliased to `valid`\ntest_eq(*dsets.valid[2], 4)\n#assert '[(1,),(3,),(4,)]' in str(dsets) and '[(0,),(2,)]' in str(dsets)\ndsets",
      "execution_count": 69,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 69,
          "data": {
            "text/plain": "(#5) [(0,),(1,),(2,),(3,),(4,)]"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# splits can be boolean masks (they don't have to cover all items, but must be disjoint)\nsplits = [[False,True,True,False,True], [True,False,False,False,False]]\ndsets = Datasets(range(5), tfms=[None], splits=splits)\n\ntest_eq(dsets.train, [(1,),(2,),(4,)])\ntest_eq(dsets.valid, [(0,)])",
      "execution_count": 70,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# apply transforms to all items\ntfm = [[lambda x: x*2,lambda x: x+1]]\nsplits = [[1,2],[0,3,4]]\ndsets = Datasets(range(5), tfm, splits=splits)\ntest_eq(dsets.train,[(3,),(5,)])\ntest_eq(dsets.valid,[(1,),(7,),(9,)])\ntest_eq(dsets.train[False,True], [(5,)])",
      "execution_count": 71,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# only transform subset 1\nclass _Tfm(Transform):\n    split_idx=1\n    def encodes(self, x): return x*2\n    def decodes(self, x): return TitledStr(x//2)",
      "execution_count": 72,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "dsets = Datasets(range(5), [_Tfm()], splits=[[1,2],[0,3,4]])\ntest_eq(dsets.train,[(1,),(2,)])\ntest_eq(dsets.valid,[(0,),(6,),(8,)])\ntest_eq(dsets.train[False,True], [(2,)])\ndsets",
      "execution_count": 73,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 73,
          "data": {
            "text/plain": "(#5) [(0,),(1,),(2,),(3,),(4,)]"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#A context manager to change the split_idx and apply the validation transform on the training set\nds = dsets.train\nwith ds.set_split_idx(1):\n    test_eq(ds,[(2,),(4,)])\ntest_eq(dsets.train,[(1,),(2,)])",
      "execution_count": 74,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#Test Datasets pickles\ndsrc1 = pickle.loads(pickle.dumps(dsets))\ntest_eq(dsets.train, dsrc1.train)\ntest_eq(dsets.valid, dsrc1.valid)",
      "execution_count": 75,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "dsets = Datasets(range(5), [_Tfm(),noop], splits=[[1,2],[0,3,4]])\ntest_eq(dsets.train,[(1,1),(2,2)])\ntest_eq(dsets.valid,[(0,0),(6,3),(8,4)])",
      "execution_count": 76,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "start = torch.arange(0,50)\ntds = Datasets(start, [A()])\ntdl = TfmdDL(tds, after_item=NegTfm(), bs=4)\nb = tdl.one_batch()\ntest_eq(tdl.decode_batch(b), ((0,),(1,),(2,),(3,)))\ntest_stdout(tdl.show_batch, \"0\\n1\\n2\\n3\")",
      "execution_count": 77,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# only transform subset 1\nclass _Tfm(Transform):\n    split_idx=1\n    def encodes(self, x): return x*2\n\ndsets = Datasets(range(8), [None], splits=[[1,2,5,7],[0,3,4,6]])",
      "execution_count": 78,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# only transform subset 1\nclass _Tfm(Transform):\n    split_idx=1\n    def encodes(self, x): return x*2\n\ndsets = Datasets(range(8), [None], splits=[[1,2,5,7],[0,3,4,6]])\ndls = dsets.dataloaders(bs=4, after_batch=_Tfm(), shuffle=False, device=torch.device('cpu'))\ntest_eq(dls.train, [(tensor([1,2,5, 7]),)])\ntest_eq(dls.valid, [(tensor([0,6,8,12]),)])\ntest_eq(dls.n_inp, 1)",
      "execution_count": 79,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Methods"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "items = [1,2,3,4]\ndsets = Datasets(items, [[neg_tfm,int2f_tfm]])",
      "execution_count": 80,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide_input\n_dsrc = Datasets([1,2])\nshow_doc(_dsrc.dataloaders, name=\"Datasets.dataloaders\")",
      "execution_count": 81,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"Datasets.dataloaders\" class=\"doc_header\"><code>Datasets.dataloaders</code><a href=\"__main__.py#L15\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>Datasets.dataloaders</code>(**`bs`**=*`64`*, **`shuffle_train`**=*`None`*, **`shuffle`**=*`True`*, **`val_shuffle`**=*`False`*, **`n`**=*`None`*, **`path`**=*`'.'`*, **`dl_type`**=*`None`*, **`dl_kwargs`**=*`None`*, **`device`**=*`None`*, **`drop_last`**=*`None`*, **`val_bs`**=*`None`*, **`num_workers`**=*`None`*, **`collate_fn`**=*`fa_collate`*, **`verbose`**=*`False`*, **`do_setup`**=*`True`*, **`indexed`**=*`None`*, **`batch_size`**:`Optional`\\[`int`\\]=*`1`*, **`sampler`**:`Optional`\\[`Sampler'>`\\[`int`\\]\\]=*`None`*, **`batch_sampler`**:`Optional`\\[`Sampler'>`\\[`Sequence`\\[`int`\\]\\]\\]=*`None`*, **`pin_memory`**:`bool`=*`False`*, **`timeout`**:`float`=*`0`*, **`worker_init_fn`**:`Optional`\\[`int`\\]=*`None`*, **`multiprocessing_context`**=*`None`*, **`generator`**=*`None`*, **`prefetch_factor`**:`int`=*`2`*, **`persistent_workers`**:`bool`=*`False`*)\n\nGet a [`DataLoaders`](/data.core.html#DataLoaders)"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "Used to create dataloaders. You may prepend 'val_' as in `val_shuffle` to override functionality for the validation set. `dl_kwargs` gives finer per dataloader control if you need to work with more than one dataloader. "
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(Datasets.decode)",
      "execution_count": 82,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"Datasets.decode\" class=\"doc_header\"><code>Datasets.decode</code><a href=\"__main__.py#L20\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>Datasets.decode</code>(**`o`**, **`full`**=*`True`*)\n\nCompose `decode` of all `tuple_tfms` then all `tfms` on `i`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "test_eq(*dsets[0], -1)\ntest_eq(*dsets.decode((-1,)), 1)",
      "execution_count": 83,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(Datasets.show)",
      "execution_count": 84,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"Datasets.show\" class=\"doc_header\"><code>Datasets.show</code><a href=\"__main__.py#L35\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>Datasets.show</code>(**`o`**, **`ctx`**=*`None`*, **\\*\\*`kwargs`**)\n\nShow item `o` in `ctx`"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "test_stdout(lambda:dsets.show(dsets[1]), '-2')",
      "execution_count": 85,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "show_doc(Datasets.new_empty)",
      "execution_count": 86,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "<IPython.core.display.Markdown object>",
            "text/markdown": "<h4 id=\"Datasets.new_empty\" class=\"doc_header\"><code>Datasets.new_empty</code><a href=\"__main__.py#L24\" class=\"source_link\" style=\"float:right\">[source]</a></h4>\n\n> <code>Datasets.new_empty</code>()\n\nCreate a new empty version of the `self`, keeping only the transforms"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "items = [1,2,3,4]\nnrm = Norm()\ndsets = Datasets(items, [[neg_tfm,int2f_tfm], [neg_tfm]])\nempty = dsets.new_empty()\ntest_eq(empty.items, [])",
      "execution_count": 87,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#test it works for dataframes too\ndf = pd.DataFrame({'a':[1,2,3,4,5], 'b':[6,7,8,9,10]})\ndsets = Datasets(df, [[attrgetter('a')], [attrgetter('b')]])\nempty = dsets.new_empty()",
      "execution_count": 88,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Add test set for inference"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class _Tfm(Transform):\n    split_idx=1\n    def encodes(self, x): return x*2",
      "execution_count": 89,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# only transform subset 0\nclass _Tfm1(Transform):\n    split_idx=0\n    def encodes(self, x): return x*3\n\ndsets = Datasets(range(8), [[_Tfm(),_Tfm1()]], splits=[[1,2,5,7],[0,3,4,6]])\ntest_eq(dsets.train, [(3,),(6,),(15,),(21,)])\ntest_eq(dsets.valid, [(0,),(6,),(8,),(12,)])",
      "execution_count": 90,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\ndef test_set(dsets, test_items, rm_tfms=None, with_labels=False):\n    \"Create a test set from `test_items` using validation transforms of `dsets`\"\n    if isinstance(dsets, (Datasets, TfmdDataset)):\n        tls = dsets.tls if with_labels else dsets.tls[:dsets.n_inp]\n        test_tls = [tl._new(test_items, split_idx=1) for tl in tls]\n        if rm_tfms is None: rm_tfms = [tl.infer_idx(get_first(test_items)) for tl in test_tls]\n        else:               rm_tfms = tuplify(rm_tfms, match=test_tls)\n        for i,j in enumerate(rm_tfms): test_tls[i].tfms.fs = test_tls[i].tfms.fs[j:]\n        return Datasets(tls=test_tls)\n    elif isinstance(dsets, TfmdLists):\n        test_tl = dsets._new(test_items, split_idx=1)\n        if rm_tfms is None: rm_tfms = dsets.infer_idx(get_first(test_items))\n        test_tl.tfms.fs = test_tl.tfms.fs[rm_tfms:]\n        return test_tl\n    else: raise Exception(f\"This method requires using the fastai library to assemble your data. Expected a `Datasets` or a `TfmdLists` but got {dsets.__class__.__name__}\")",
      "execution_count": 91,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "class _Tfm1(Transform):\n    split_idx=0\n    def encodes(self, x): return x*3\n\ndsets = Datasets(range(8), [[_Tfm(),_Tfm1()]], splits=[[1,2,5,7],[0,3,4,6]])\ntest_eq(dsets.train, [(3,),(6,),(15,),(21,)])\ntest_eq(dsets.valid, [(0,),(6,),(8,),(12,)])\n\n#Tranform of the validation set are applied\ntst = test_set(dsets, [1,2,3])\ntest_eq(tst, [(2,),(4,),(6,)])",
      "execution_count": 92,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#Test with different types\ntfm = _Tfm1()\ntfm.split_idx,tfm.order = None,2\ndsets = Datasets(['dog', 'cat', 'cat', 'dog'], [[_Cat(),tfm]])\n\n#With strings\ntest_eq(test_set(dsets, ['dog', 'cat', 'cat']), [(3,), (0,), (0,)])\n#With ints\ntest_eq(test_set(dsets, [1,2]), [(3,), (6,)])",
      "execution_count": 93,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#Test with various input lengths\ndsets = Datasets(range(8), [[_Tfm(),_Tfm1()],[_Tfm(),_Tfm1()],[_Tfm(),_Tfm1()]], splits=[[1,2,5,7],[0,3,4,6]])\ntst = test_set(dsets, [1,2,3])\ntest_eq(tst, [(2,2),(4,4),(6,6)])\n\ndsets = Datasets(range(8), [[_Tfm(),_Tfm1()],[_Tfm(),_Tfm1()],[_Tfm(),_Tfm1()]], splits=[[1,2,5,7],[0,3,4,6]], n_inp=1)\ntst = test_set(dsets, [1,2,3])\ntest_eq(tst, [(2,),(4,),(6,)])",
      "execution_count": 94,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\n#Test with rm_tfms\ndsets = Datasets(range(8), [[_Tfm(),_Tfm()]], splits=[[1,2,5,7],[0,3,4,6]])\ntst = test_set(dsets, [1,2,3])\ntest_eq(tst, [(4,),(8,),(12,)])\n\ndsets = Datasets(range(8), [[_Tfm(),_Tfm()]], splits=[[1,2,5,7],[0,3,4,6]])\ntst = test_set(dsets, [1,2,3], rm_tfms=1)\ntest_eq(tst, [(2,),(4,),(6,)])\n\ndsets = Datasets(range(8), [[_Tfm(),_Tfm()], [_Tfm(),_Tfm()]], splits=[[1,2,5,7],[0,3,4,6]], n_inp=2)\ntst = test_set(dsets, [1,2,3], rm_tfms=(1,0))\ntest_eq(tst, [(2,4),(4,8),(6,12)])",
      "execution_count": 95,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#export\n@patch\n@delegates(TfmdDL.__init__)\ndef test_dl(self:DataLoaders, test_items, rm_type_tfms=None, with_labels=False, **kwargs):\n    \"Create a test dataloader from `test_items` using validation transforms of `dls`\"\n    test_ds = test_set(self.valid_ds, test_items, rm_tfms=rm_type_tfms, with_labels=with_labels\n                      ) if isinstance(self.valid_ds, (Datasets, TfmdLists, TfmdDataset)) else test_items\n    return self.valid.new(test_ds, **kwargs)",
      "execution_count": 96,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Export -"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#hide\nfrom nbdev.export import notebook2script\nnotebook2script()",
      "execution_count": 97,
      "outputs": [
        {
          "output_type": "stream",
          "text": "Converted 00_torch_core.ipynb.\nConverted 01_layers.ipynb.\nConverted 01a_losses.ipynb.\nConverted 02_data.load.ipynb.\nConverted 02b_data.pytorch_load.ipynb.\nConverted 02c_data.pytorch.ipynb.\nConverted 03_data.core.ipynb.\nConverted 04_data.external.ipynb.\nConverted 05_data.transforms.ipynb.\nConverted 06_data.block.ipynb.\nConverted 07_vision.core.ipynb.\nConverted 08_vision.data.ipynb.\nConverted 09_vision.augment.ipynb.\nConverted 09b_vision.utils.ipynb.\nConverted 09c_vision.widgets.ipynb.\nConverted 10_tutorial.pets.ipynb.\nConverted 10b_tutorial.albumentations.ipynb.\nConverted 11_vision.models.xresnet.ipynb.\nConverted 12_optimizer.ipynb.\nConverted 13_callback.core.ipynb.\nConverted 13a_learner.ipynb.\nConverted 13b_metrics.ipynb.\nConverted 14_callback.schedule.ipynb.\nConverted 14a_callback.data.ipynb.\nConverted 15_callback.hook.ipynb.\nConverted 15a_vision.models.unet.ipynb.\nConverted 16_callback.progress.ipynb.\nConverted 17_callback.tracker.ipynb.\nConverted 18_callback.fp16.ipynb.\nConverted 18a_callback.training.ipynb.\nConverted 18b_callback.preds.ipynb.\nConverted 19_callback.mixup.ipynb.\nConverted 20_interpret.ipynb.\nConverted 20a_distributed.ipynb.\nConverted 21_vision.learner.ipynb.\nConverted 22_tutorial.imagenette.ipynb.\nConverted 23_tutorial.vision.ipynb.\nConverted 24_tutorial.image_sequence.ipynb.\nConverted 24_tutorial.siamese.ipynb.\nConverted 24_vision.gan.ipynb.\nConverted 30_text.core.ipynb.\nConverted 31_text.data.ipynb.\nConverted 32_text.models.awdlstm.ipynb.\nConverted 33_text.models.core.ipynb.\nConverted 34_callback.rnn.ipynb.\nConverted 35_tutorial.wikitext.ipynb.\nConverted 36_text.models.qrnn.ipynb.\nConverted 37_text.learner.ipynb.\nConverted 38_tutorial.text.ipynb.\nConverted 39_tutorial.transformers.ipynb.\nConverted 40_tabular.core.ipynb.\nConverted 41_tabular.data.ipynb.\nConverted 42_tabular.model.ipynb.\nConverted 43_tabular.learner.ipynb.\nConverted 44_tutorial.tabular.ipynb.\nConverted 45_collab.ipynb.\nConverted 46_tutorial.collab.ipynb.\nConverted 50_tutorial.datablock.ipynb.\nConverted 60_medical.imaging.ipynb.\nConverted 61_tutorial.medical_imaging.ipynb.\nConverted 65_medical.text.ipynb.\nConverted 70_callback.wandb.ipynb.\nConverted 71_callback.tensorboard.ipynb.\nConverted 72_callback.neptune.ipynb.\nConverted 73_callback.captum.ipynb.\nConverted 74_callback.azureml.ipynb.\nConverted 97_test_utils.ipynb.\nConverted 99_pytorch_doc.ipynb.\nConverted dev-setup.ipynb.\nConverted index.ipynb.\nConverted quick_start.ipynb.\nConverted tutorial.ipynb.\n",
          "name": "stdout"
        }
      ]
    }
  ],
  "metadata": {
    "jupytext": {
      "split_at_heading": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "language_info": {
      "name": "python",
      "version": "3.7.10",
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "nbconvert_exporter": "python",
      "file_extension": ".py"
    },
    "gist": {
      "id": "",
      "data": {
        "description": "git_repos/fastai/nbs/03_data.core.ipynb",
        "public": true
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }