Skip to content

Instantly share code, notes, and snippets.

@verdimrc
Last active November 1, 2024 14:06
Show Gist options
  • Save verdimrc/a0433b8171de47b02b41760329b42fe2 to your computer and use it in GitHub Desktop.
Save verdimrc/a0433b8171de47b02b41760329b42fe2 to your computer and use it in GitHub Desktop.
scratch-mermaid-trt-debug.md

Mermaid Diagram for TRT Debugging

This Mermaid diagram represents my own, personal understanding of the TRT debugging workflow.

Commit: 40e8b7d.

flowchart TD
    is_real_input[Real input fixes accurracy issue?]
    click is_real_input "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#does-real-input-data-make-a-difference"

    nab([Not a bug;
        model sensitive to input])

    is_intermittent[Intermittent?]
    click is_intermittent "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#intermittent-or-not"

    is_real_input-->|yes|nab
    is_real_input-->|no|is_intermittent

    is_failing_tactic["`debug build
        finds failing tactic?`"]
    click is_failing_tactic "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#debugging-intermittent-accuracy-issues"

    is_intermittent-->|yes|is_failing_tactic
    is_intermittent-->|no|is_layerwise

    min_failing_case[Minimal failing
        case found]
    click min_failing_case "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#you-have-a-minimal-failing-case"

    is_failing_tactic-->|yes|min_failing_case
    is_failing_tactic-->|no|is_layerwise

    is_layerwise[Layer-wise outputs
        reveal source
        of inaccuracy?]
    click is_layerwise "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#is-layerwise-an-option"

    is_layerwise-->|yes|Extract
    is_layerwise-->|no|Reduce

    Subgraph@{ shape: doc }
    Extract-->Subgraph-->does_subgraph_repro
    click Extract "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#extracting-a-failing-subgraph"

    does_subgraph_repro[Error reproduced?]
    does_subgraph_repro-->|yes|min_failing_case
    does_subgraph_repro-->|no|Reduce

    Reduced@{ shape: doc, label: "Reduced model"}
    does_reduced_model_repro[Failure reproduced?]
    Reduce-->Reduced-->does_reduced_model_repro
    click Reduce "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#reducing-a-failing-onnx-model"

    does_reduced_model_repro-->|yes|min_failing_case
    does_reduced_model_repro-->|no|verify_reduce_options

    verify_reduce_options[Ensure --check
        command is correct]
    click verify_reduce_options "https://github.com/NVIDIA/TensorRT/blob/main/tools/Polygraphy/how-to/debug_accuracy.md#double-check-your-reduce-options"

    Retry@{ shape: diamond, label: "Retry?" }
    verify_reduce_options-->Retry

    Retry-->|yes|Reduce
    Retry-->|Give up|report_bug

    min_failing_case-->troubleshoot_subgraph
    min_failing_case-->is_dev
    troubleshoot_subgraph([Troubleshoot subgraph])

    is_dev[TRT developer?]

    deep_dive([Dive into the code])
    report_bug([Report your bug])

    is_dev-->|yes|deep_dive
    is_dev-->|no|report_bug
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment