Created
October 20, 2022 00:56
-
-
Save AmosLewis/c4a7c1e78c478a8ab2f725298bd9ab2c to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
OVERVIEW: MLIR modular optimizer driver | |
Available Dialects: acc, affine, amdgpu, amx, arith, arm_neon, arm_sve, async, bufferization, builtin, cf, complex, dlti, emitc, func, gpu, linalg, llvm, math, memref, ml_program, nvgpu, nvvm, omp, pdl, pdl_interp, quant, rocdl, scf, shape, sparse_tensor, spirv, tensor, tm_tensor, torch, torch_c, tosa, transform, vector, x86vector | |
USAGE: torch-mlir-opt [options] <input file> | |
OPTIONS: | |
Color Options: | |
--color - Use colors in output (default=autodetect) | |
General options: | |
--allow-unregistered-dialect - Allow operation with no registered dialects | |
--disable-i2p-p2i-opt - Disables inttoptr/ptrtoint roundtrip optimization | |
--dot-cfg-mssa=<file name for generated dot file> - file name for generated dot file | |
--emit-bytecode - Emit bytecode when generating output | |
--generate-merged-base-profiles - When generating nested context-sensitive profiles, always generate extra base profile for function with all its context profiles merged into it. | |
--mlir-debug-counter=<string> - Comma separated list of debug counter skip and count arguments | |
--mlir-disable-threading - Disable multi-threading within MLIR, overrides any further call to MLIRContext::enableMultiThreading() | |
--mlir-elide-elementsattrs-if-larger=<uint> - Elide ElementsAttrs with "..." that have more elements than the given upper limit | |
--mlir-pass-pipeline-crash-reproducer=<string> - Generate a .mlir reproducer file at the given output path if the pass manager crashes or fails | |
--mlir-pass-pipeline-local-reproducer - When generating a crash reproducer, attempt to generated a reproducer with the smallest pipeline. | |
--mlir-pass-statistics - Display the statistics of each pass | |
--mlir-pass-statistics-display=<value> - Display method for pass statistics | |
=list - display the results in a merged list sorted by pass name | |
=pipeline - display the results with a nested pipeline view | |
--mlir-pretty-debuginfo - Print pretty debug info in MLIR output | |
--mlir-print-debug-counter - Print out debug counter information after all counters have been accumulated | |
--mlir-print-debuginfo - Print debug info in MLIR output | |
--mlir-print-elementsattrs-with-hex-if-larger=<long> - Print DenseElementsAttrs with a hex string that have more elements than the given upper limit (use -1 to disable) | |
--mlir-print-ir-after=<pass-arg> - Print IR after specified passes | |
--mlir-print-ir-after-all - Print IR after each pass | |
--mlir-print-ir-after-change - When printing the IR after a pass, only print if the IR changed | |
--mlir-print-ir-after-failure - When printing the IR after a pass, only print if the pass failed | |
--mlir-print-ir-before=<pass-arg> - Print IR before specified passes | |
--mlir-print-ir-before-all - Print IR before each pass | |
--mlir-print-ir-module-scope - When printing IR for print-ir-[before|after]{-all} always print the top-level operation | |
--mlir-print-local-scope - Print with local scope and inline information (eliding aliases for attributes, types, and locations | |
--mlir-print-op-on-diagnostic - When a diagnostic is emitted on an operation, also print the operation as an attached note | |
--mlir-print-stacktrace-on-diagnostic - When a diagnostic is emitted, also print the stack trace as an attached note | |
--mlir-print-value-users - Print users of operation results and block arguments as a comment | |
--mlir-timing - Display execution times | |
--mlir-timing-display=<value> - Display method for timing data | |
=list - display the results in a list sorted by total time | |
=tree - display the results ina with a nested tree view | |
--no-implicit-module - Disable implicit addition of a top-level module op during parsing | |
-o <filename> - Output filename | |
--opaque-pointers - Use opaque pointers | |
Compiler passes to run | |
--pass-pipeline - A textual description of a pass pipeline to run | |
Passes: | |
--affine-data-copy-generate - Generate explicit copying for affine memory operations | |
--fast-mem-capacity=<ulong> - Set fast memory space capacity in KiB (default: unlimited) | |
--fast-mem-space=<uint> - Fast memory space identifier for copy generation (default: 1) | |
--generate-dma - Generate DMA instead of point-wise copy | |
--min-dma-transfer=<int> - Minimum DMA transfer size supported by the target in bytes | |
--skip-non-unit-stride-loops - Testing purposes: avoid non-unit stride loop choice depths for copy placement | |
--slow-mem-space=<uint> - Slow memory space identifier for copy generation (default: 0) | |
--tag-mem-space=<uint> - Tag memory space identifier for copy generation (default: 0) | |
--affine-expand-index-ops - Lower affine operations operating on indices into more fundamental operations | |
--affine-loop-coalescing - Coalesce nested loops with independent bounds into a single loop | |
--affine-loop-fusion - Fuse affine loop nests | |
--fusion-compute-tolerance=<number> - Fractional increase in additional computation tolerated while fusing | |
--fusion-fast-mem-space=<uint> - Faster memory space number to promote fusion buffers to | |
--fusion-local-buf-threshold=<ulong> - Threshold size (KiB) for promoting local buffers to fast memory space | |
--fusion-maximal - Enables maximal loop fusion | |
--mode=<value> - fusion mode to attempt | |
=greedy - Perform greedy (both producer-consumer and sibling) fusion | |
=producer - Perform only producer-consumer fusion | |
=sibling - Perform only sibling fusion | |
--affine-loop-invariant-code-motion - Hoist loop invariant instructions outside of affine loops | |
--affine-loop-normalize - Apply normalization transformations to affine loop-like ops | |
--affine-loop-tile - Tile affine loop nests | |
--cache-size=<ulong> - Set size of cache to tile for in KiB (default: 512) | |
--separate - Separate full and partial tiles (default: false) | |
--tile-size=<uint> - Use this tile size for all loops | |
--tile-sizes=<uint> - List of tile sizes for each perfect nest (overridden by -tile-size) | |
--affine-loop-unroll - Unroll affine loops | |
--cleanup-unroll - Fully unroll the cleanup loop when possible. | |
--unroll-factor=<uint> - Use this unroll factor for all loops being unrolled | |
--unroll-full - Fully unroll loops | |
--unroll-full-threshold=<uint> - Unroll all loops with trip count less than or equal to this | |
--unroll-num-reps=<uint> - Unroll innermost loops repeatedly this many times | |
--unroll-up-to-factor - Allow unrolling up to the factor specified | |
--affine-loop-unroll-jam - Unroll and jam affine loops | |
--unroll-jam-factor=<uint> - Use this unroll jam factor for all loops (default 4) | |
--affine-parallelize - Convert affine.for ops into 1-D affine.parallel | |
--max-nested=<uint> - Maximum number of nested parallel loops to produce. Defaults to unlimited (UINT_MAX). | |
--parallel-reductions - Whether to parallelize reduction loops. Defaults to false. | |
--affine-pipeline-data-transfer - Pipeline non-blocking data transfers between explicitly managed levels of the memory hierarchy | |
--affine-scalrep - Replace affine memref acceses by scalars by forwarding stores to loads and eliminating redundant loads | |
--affine-simplify-structures - Simplify affine expressions in maps/sets and normalize memrefs | |
--affine-super-vectorize - Vectorize to a target independent n-D vector abstraction | |
--test-fastest-varying=<long> - Specify a 1-D, 2-D or 3-D pattern of fastest varying memory dimensions to match. See defaultPatterns in Vectorize.cpp for a description and examples. This is used for testing purposes | |
--vectorize-reductions - Vectorize known reductions expressed via iter_args. Switched off by default. | |
--virtual-vector-size=<long> - Specify an n-D virtual vector size for vectorization | |
--arith-bufferize - Bufferize Arith dialect ops. | |
--alignment=<uint> - Create global memrefs with a specified alignment | |
--arith-emulate-wide-int - Emulate 2*N-bit integer operations using N-bit operations | |
--widest-int-supported=<uint> - Widest integer type supported by the target | |
--arith-expand - Legalize Arith ops to be convertible to LLVM. | |
--arith-unsigned-when-equivalent - Replace signed ops with unsigned ones where they are proven equivalent | |
--arm-neon-2d-to-intr - Convert Arm NEON structured ops to intrinsics | |
--async-parallel-for - Convert scf.parallel operations to multiple async compute ops executed concurrently for non-overlapping iteration ranges | |
--async-dispatch - Dispatch async compute tasks using recursive work splitting. If `false` async compute tasks will be launched using simple for loop in the caller thread. | |
--min-task-size=<int> - The minimum task size for sharding parallel operation. | |
--num-workers=<int> - The number of available workers to execute async operations. If `-1` the value will be retrieved from the runtime. | |
--async-runtime-policy-based-ref-counting - Policy based reference counting for Async runtime operations | |
--async-runtime-ref-counting - Automatic reference counting for Async runtime operations | |
--async-runtime-ref-counting-opt - Optimize automatic reference counting operations for theAsync runtime by removing redundant operations | |
--async-to-async-runtime - Lower high level async operations (e.g. async.execute) to theexplicit async.runtime and async.coro operations | |
--eliminate-blocking-await-ops - Rewrite functions with blocking async.runtime.await as coroutines with async.runtime.await_and_resume. | |
--buffer-deallocation - Adds all required dealloc operations for all allocations in the input program | |
--buffer-hoisting - Optimizes placement of allocation operations by moving them into common dominators and out of nested regions | |
--buffer-loop-hoisting - Optimizes placement of allocation operations by moving them out of loop nests | |
--buffer-results-to-out-params - Converts memref-typed function results to out-params | |
--bufferization-bufferize - Bufferize the `bufferization` dialect | |
--canonicalize - Canonicalize operations | |
--disable-patterns=<string> - Labels of patterns that should be filtered out during application | |
--enable-patterns=<string> - Labels of patterns that should be used during application, all other patterns are filtered out | |
--max-iterations=<long> - Seed the worklist in general top-down order | |
--region-simplify - Seed the worklist in general top-down order | |
--top-down - Seed the worklist in general top-down order | |
--control-flow-sink - Sink operations into conditional blocks | |
--convert-affine-for-to-gpu - Convert top-level AffineFor Ops to GPU kernels | |
--gpu-block-dims=<uint> - Number of GPU block dimensions for mapping | |
--gpu-thread-dims=<uint> - Number of GPU thread dimensions for mapping | |
--convert-amdgpu-to-rocdl - Convert AMDGPU dialect to ROCDL dialect | |
--chipset=<string> - Chipset that these operations will run on | |
--convert-arith-to-llvm - Convert Arith dialect to LLVM dialect | |
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word | |
--convert-arith-to-spirv - Convert Arith dialect to SPIR-V dialect | |
--emulate-non-32-bit-scalar-types - Emulate non-32-bit scalar types with 32-bit ones if missing native support | |
--enable-fast-math - Enable fast math mode (assuming no NaN and infinity for floating point values) when performing conversion | |
--convert-async-to-llvm - Convert the operations from the async dialect into the LLVM dialect | |
--convert-bufferization-to-memref - Convert operations from the Bufferization dialect to the MemRef dialect | |
--convert-cf-to-llvm - Convert ControlFlow operations to the LLVM dialect | |
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word | |
--convert-cf-to-spirv - Convert ControlFlow dialect to SPIR-V dialect | |
--emulate-non-32-bit-scalar-types - Emulate non-32-bit scalar types with 32-bit ones if missing native support | |
--convert-complex-to-libm - Convert Complex dialect to libm calls | |
--convert-complex-to-llvm - Convert Complex dialect to LLVM dialect | |
--convert-complex-to-standard - Convert Complex dialect to standard dialect | |
--convert-elementwise-to-linalg - Convert ElementwiseMappable ops to linalg | |
--convert-func-to-llvm - Convert from the Func dialect to the LLVM dialect | |
--data-layout=<string> - String description (LLVM format) of the data layout that is expected on the produced module | |
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word | |
--use-bare-ptr-memref-call-conv - Replace FuncOp's MemRef arguments with bare pointers to the MemRef element types | |
--convert-func-to-spirv - Convert Func dialect to SPIR-V dialect | |
--emulate-non-32-bit-scalar-types - Emulate non-32-bit scalar types with 32-bit ones if missing native support | |
--convert-gpu-launch-to-vulkan-launch - Convert gpu.launch_func to vulkanLaunch external call | |
--convert-gpu-to-nvvm - Generate NVVM operations for gpu operations | |
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word | |
--convert-gpu-to-rocdl - Generate ROCDL operations for gpu operations | |
--chipset=<string> - Chipset that these operations will run on | |
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word | |
--runtime=<value> - Runtime code will be run on (default is Unknown, can also use HIP or OpenCl) | |
=unknown - Unknown (default) | |
=HIP - HIP | |
=OpenCL - OpenCL | |
--use-bare-ptr-memref-call-conv - Replace memref arguments in GPU functions with bare pointers.All memrefs must have static shape | |
--convert-gpu-to-spirv - Convert GPU dialect to SPIR-V dialect | |
--convert-linalg-to-affine-loops - Lower the operations from the linalg dialect into affine loops | |
--convert-linalg-to-llvm - Convert the operations from the linalg dialect into the LLVM dialect | |
--convert-linalg-to-loops - Lower the operations from the linalg dialect into loops | |
--convert-linalg-to-parallel-loops - Lower the operations from the linalg dialect into parallel loops | |
--convert-linalg-to-spirv - Convert Linalg dialect to SPIR-V dialect | |
--convert-linalg-to-std - Convert the operations from the linalg dialect into the Standard dialect | |
--convert-math-to-funcs - Convert Math operations to calls of outlined implementations. | |
--convert-math-to-libm - Convert Math dialect to libm calls | |
--convert-math-to-llvm - Convert Math dialect to LLVM dialect | |
--convert-math-to-spirv - Convert Math dialect to SPIR-V dialect | |
--convert-memref-to-llvm - Convert operations from the MemRef dialect to the LLVM dialect | |
--index-bitwidth=<uint> - Bitwidth of the index type, 0 to use size of machine word | |
--use-aligned-alloc - Use aligned_alloc in place of malloc for heap allocations | |
--use-generic-functions - Use generic allocation and deallocation functions instead of the classic 'malloc', 'aligned_alloc' and 'free' functions | |
--convert-memref-to-spirv - Convert MemRef dialect to SPIR-V dialect | |
--bool-num-bits=<int> - The number of bits to store a boolean value | |
--convert-nvgpu-to-nvvm - Convert NVGPU dialect to NVVM dialect | |
--convert-openacc-to-llvm - Convert the OpenACC ops to LLVM dialect | |
--convert-openacc-to-scf - Convert the OpenACC ops to OpenACC with SCF dialect | |
--convert-openmp-to-llvm - Convert the OpenMP ops to OpenMP ops with LLVM dialect | |
--convert-parallel-loops-to-gpu - Convert mapped scf.parallel ops to gpu launch operations | |
--convert-pdl-to-pdl-interp - Convert PDL ops to PDL interpreter ops | |
--convert-scf-to-cf - Convert SCF dialect to ControlFlow dialect, replacing structured control flow with a CFG | |
--convert-scf-to-openmp - Convert SCF parallel loop to OpenMP parallel + workshare constructs. | |
--convert-scf-to-spirv - Convert SCF dialect to SPIR-V dialect. | |
--convert-shape-constraints - Convert shape constraint operations to the standard dialect | |
--convert-shape-to-std - Convert operations from the shape dialect into the standard dialect | |
--convert-spirv-to-llvm - Convert SPIR-V dialect to LLVM dialect | |
--convert-tensor-to-linalg - Convert some Tensor dialect ops to Linalg dialect | |
--convert-tensor-to-spirv - Convert Tensor dialect to SPIR-V dialect | |
--emulate-non-32-bit-scalar-types - Emulate non-32-bit scalar types with 32-bit ones if missing native support | |
--convert-torch-to-arith - Convert recognized Torch ops to Std ops | |
--convert-torch-to-linalg - Convert recognized Torch ops to Linalg ops | |
--convert-torch-to-mhlo - Convert Torch ops to MHLO ops | |
--enable-i32-index - Enable truncate index from i64 to i32(unsafely) | |
--enable-static-shape - Enable static shape conversion | |
--convert-torch-to-scf - Convert recognized Torch ops to SCF ops | |
--convert-torch-to-tmtensor - Convert recognized Torch ops to TMTensor/Linalg ops | |
--convert-torch-to-tosa - Convert Torch ops to TOSA ops | |
--convert-vector-to-gpu - Lower the operations from the vector dialect into the GPU dialect | |
--use-nvgpu - convert to NvGPU ops instead of GPU dialect ops | |
--convert-vector-to-llvm - Lower the operations from the vector dialect into the LLVM dialect | |
--enable-amx - Enables the use of AMX dialect while lowering the vector dialect. | |
--enable-arm-neon - Enables the use of ArmNeon dialect while lowering the vector dialect. | |
--enable-arm-sve - Enables the use of ArmSVE dialect while lowering the vector dialect. | |
--enable-x86vector - Enables the use of X86Vector dialect while lowering the vector dialect. | |
--force-32bit-vector-indices - Allows compiler to assume vector indices fit in 32-bit if that yields faster code | |
--reassociate-fp-reductions - Allows llvm to reassociate floating-point reductions for speed | |
--convert-vector-to-scf - Lower the operations from the vector dialect into the SCF dialect | |
--full-unroll - Perform full unrolling when converting vector transfers to SCF | |
--lower-permutation-maps - Replace permutation maps with vector transposes/broadcasts before lowering transfer ops | |
--lower-tensors - Lower transfer ops that operate on tensors | |
--target-rank=<uint> - Target vector rank to which transfer ops should be lowered | |
--convert-vector-to-spirv - Convert Vector dialect to SPIR-V dialect | |
--cse - Eliminate common sub-expressions | |
--decorate-spirv-composite-type-layout - Decorate SPIR-V composite type with layout info | |
--drop-equivalent-buffer-results - Remove MemRef return values that are equivalent to a bbArg | |
--eliminate-alloc-tensors - Try to eliminate all alloc_tensor ops. | |
--empty-tensor-to-alloc-tensor - Replace all empty ops by alloc_tensor ops. | |
--finalizing-bufferize - Finalize a partial bufferization | |
--fold-memref-alias-ops - Fold memref alias ops into consumer load/store ops | |
--func-bufferize - Bufferize func/call/return ops | |
--gpu-async-region - Make GPU ops async | |
--gpu-kernel-outlining - Outline gpu.launch bodies to kernel functions | |
--data-layout-str=<string> - String containing the data layout specification to be attached to the GPU kernel module | |
--gpu-launch-sink-index-computations - Sink index computations into gpu.launch body | |
--gpu-map-parallel-loops - Greedily maps loops to GPU hardware dimensions. | |
--gpu-to-llvm - Convert GPU dialect to LLVM dialect with GPU runtime calls | |
--gpu-binary-annotation=<string> - Annotation attribute string for GPU binary | |
--use-bare-pointers-for-kernels - Use bare pointers to pass memref arguments to kernels. The kernel must use the same setting for this option. | |
--hlo-legalize-to-linalg - Legalize from HLO dialect to Linalg dialect. | |
--inline - Inline function calls | |
--default-pipeline=<string> - The default optimizer pipeline used for callables | |
--max-iterations=<uint> - Maximum number of iterations when inlining within an SCC | |
--op-pipelines=<pass-manager> - Callable operation specific optimizer pipelines (in the form of `dialect.op(pipeline)`) | |
--launch-func-to-vulkan - Convert vulkanLaunch external call to Vulkan runtime external calls | |
--linalg-bufferize - Bufferize the linalg dialect | |
--linalg-detensorize - Detensorize linalg ops | |
--aggressive-mode - Detensorize all ops that qualify for detensoring along with branch operands and basic-block arguments. | |
--linalg-fold-unit-extent-dims - Remove unit-extent dimension in Linalg ops on tensors | |
--fold-one-trip-loops-only - Only folds the one-trip loops from Linalg ops on tensors (for testing purposes only) | |
--linalg-fuse-elementwise-ops - Fuse elementwise operations on tensors | |
--linalg-generalize-named-ops - Convert named ops into generic ops | |
--linalg-inline-scalar-operands - Inline scalar operands into linalg generic ops | |
--linalg-named-op-conversion - Convert from one named linalg op to another. | |
--llvm-legalize-for-export - Legalize LLVM dialect to be convertible to LLVM IR | |
--llvm-optimize-for-nvvm-target - Optimize NVVM IR | |
--llvm-request-c-wrappers - Request C wrapper emission for all functions | |
--loop-invariant-code-motion - Hoist loop invariant instructions outside of the loop | |
--lower-affine - Lower Affine operations to a combination of Standard and SCF operations | |
--lower-host-to-llvm - Lowers the host module code and `gpu.launch_func` to LLVM | |
--map-memref-spirv-storage-class - Map numeric MemRef memory spaces to SPIR-V storage classes | |
--client-api=<string> - The client API to use for populating mappings | |
--memref-emulate-wide-int - Emulate 2*N-bit integer operations using N-bit operations | |
--widest-int-supported=<uint> - Widest integer type supported by the target | |
--memref-expand - Legalize memref operations to be convertible to LLVM. | |
--normalize-memrefs - Normalize memrefs | |
--nvgpu-optimize-shared-memory - Optimizes accesses to shard memory memrefs in order to reduce bank conflicts. | |
--one-shot-bufferize - One-Shot Bufferize | |
--allow-return-allocs - Allows returning/yielding new allocations from a block. | |
--allow-unknown-ops - Allows unknown (not bufferizable) ops in the input IR. | |
--analysis-fuzzer-seed=<uint> - Test only: Analyze ops in random order with a given seed (fuzzer) | |
--analysis-heuristic=<string> - Heuristic that control the IR traversal during analysis | |
--bufferize-function-boundaries - Bufferize function boundaries (experimental). | |
--copy-before-write - Skip the analysis. Make a buffer copy on every write. | |
--create-deallocs - Specify if buffers should be deallocated. For compatibility with core bufferization passes. | |
--dialect-filter=<string> - Restrict bufferization to ops from these dialects. | |
--function-boundary-type-conversion=<string> - Controls layout maps when bufferizing function signatures. | |
--must-infer-memory-space - The memory space of an memref types must always be inferred. If unset, a default memory space of 0 is used otherwise. | |
--print-conflicts - Test only: Annotate IR with RaW conflicts. Requires test-analysis-only. | |
--test-analysis-only - Test only: Only run inplaceability analysis and annotate IR | |
--unknown-type-conversion=<string> - Controls layout maps for non-inferrable memref types. | |
--outline-shape-computation - Using shape.func to preserve shape computation | |
--print-op-stats - Print statistics of operations | |
--json - print the stats as JSON | |
--promote-buffers-to-stack - Promotes heap-based allocations to automatically managed stack-based allocations | |
--max-alloc-size-in-bytes=<uint> - Maximal size in bytes to promote allocations to stack. | |
--max-rank-of-allocated-memref=<uint> - Maximal memref rank to promote dynamic buffers. | |
--reconcile-unrealized-casts - Simplify and eliminate unrealized conversion casts | |
--refback-expand-ops-for-llvm - Expand ops into more primitive ops before LLVM lowering. | |
--refback-generalize-tensor-pad - Convert tensor.pad to linalg ops | |
--refback-insert-rng-globals - Insert global variables and sequence to get the next global seed for RNG ops | |
--refback-munge-calling-conventions - Munge calling conventions for calling via ExecutionEngine | |
--refback-munge-memref-copy - Munge memref.copy to linalg.copy | |
--remove-shape-constraints - Replace all cstr_ ops with a true witness | |
--resolve-ranked-shaped-type-result-dims - Resolve memref.dim of result values of ranked shape type | |
--resolve-shaped-type-result-dims - Resolve memref.dim of result values | |
--sccp - Sparse Conditional Constant Propagation | |
--scf-bufferize - Bufferize the scf dialect. | |
--scf-for-loop-canonicalization - Canonicalize operations within scf.for loop bodies | |
--scf-for-loop-peeling - Peel `for` loops at their upper bounds. | |
--skip-partial - Do not peel loops inside of the last, partial iteration of another already peeled loop. | |
--scf-for-loop-range-folding - Fold add/mul ops into loop range | |
--scf-for-loop-specialization - Specialize `for` loops for vectorization | |
--scf-for-to-while - Convert SCF for loops to SCF while loops | |
--scf-parallel-loop-collapsing - Collapse parallel loops to use less induction variables | |
--collapsed-indices-0=<uint> - Which loop indices to combine 0th loop index | |
--collapsed-indices-1=<uint> - Which loop indices to combine into the position 1 loop index | |
--collapsed-indices-2=<uint> - Which loop indices to combine into the position 2 loop index | |
--scf-parallel-loop-fusion - Fuse adjacent parallel loops | |
--scf-parallel-loop-specialization - Specialize parallel loops for vectorization | |
--scf-parallel-loop-tiling - Tile parallel loops | |
--no-min-max-bounds - Perform tiling with fixed upper bound with inbound check inside the internal loops | |
--parallel-loop-tile-sizes=<long> - Factors to tile parallel loops by | |
--shape-bufferize - Bufferize the shape dialect. | |
--shape-to-shape-lowering - Legalize Shape dialect to be convertible to Arith | |
--simplify-extract-strided-metadata - Simplify extract_strided_metadata ops | |
--snapshot-op-locations - Generate new locations from the current IR | |
--filename=<string> - The filename to print the generated IR | |
--tag=<string> - A tag to use when fusing the new locations with the original. If unset, the locations are replaced. | |
--sparse-buffer-rewrite - Rewrite sparse primitives on buffers to actual code | |
--sparse-tensor-codegen - Convert sparse tensors and primitives to actual code | |
--sparse-tensor-conversion - Convert sparse tensors and primitives to library calls | |
--s2s-strategy=<int> - Set the strategy for sparse-to-sparse conversion | |
--sparse-tensor-rewrite - Applies sparse tensor rewriting rules prior to sparsification | |
--enable-runtime-library - Enable runtime library for manipulating sparse tensors | |
--sparsification - Automatically generate sparse tensor code from sparse tensor types | |
--enable-runtime-library - Enable runtime library for manipulating sparse tensors | |
--enable-simd-index32 - Enable i32 indexing into vectors (for efficiency) | |
--enable-vla-vectorization - Enable vector length agnostic vectorization | |
--parallelization-strategy=<value> - Set the parallelization strategy | |
=none - Turn off sparse parallelization. | |
=dense-outer-loop - Enable dense outer loop sparse parallelization. | |
=any-storage-outer-loop - Enable sparse parallelization regardless of storage for the outer loop. | |
=dense-any-loop - Enable dense parallelization for any loop. | |
=any-storage-any-loop - Enable sparse parallelization for any storage and loop. | |
--vectorization-strategy=<value> - Set the vectorization strategy | |
=none - Turn off sparse vectorization. | |
=dense-inner-loop - Enable vectorization for dense inner loops. | |
=any-storage-inner-loop - Enable sparse vectorization for inner loops with any storage. | |
--vl=<int> - Set the vector length | |
--spirv-canonicalize-gl - Run canonicalization involving GLSL ops | |
--spirv-lower-abi-attrs - Decorate SPIR-V composite type with layout info | |
--spirv-rewrite-inserts - Rewrite sequential chains of spirv.CompositeInsert operations into spirv.CompositeConstruct operations | |
--spirv-unify-aliased-resource - Unify access of multiple aliased resources into access of one single resource | |
--spirv-update-vce - Deduce and attach minimal (version, capabilities, extensions) requirements to spirv.module ops | |
--strip-debuginfo - Strip debug info from all operations | |
--symbol-dce - Eliminate dead symbols | |
--symbol-privatize - Mark symbols private | |
--exclude=<string> - Comma separated list of symbols that should not be marked private | |
--symbolic-shape-optimization - Analyzes shapes and performs shape-related optimizations | |
--tensor-bufferize - Bufferize the `tensor` dialect | |
--tensor-copy-insertion - Make all tensor IR inplaceable by inserting copies | |
--allow-return-allocs - Allows returning/yielding new allocations from a block. | |
--bufferize-function-boundaries - Bufferize function boundaries (experimental). | |
--create-deallocs - Specify if new allocations should be deallocated. | |
--must-infer-memory-space - The memory space of an memref types must always be inferred. If unset, a default memory space of 0 is used otherwise. | |
--tm-tensor-bufferize - Bufferize the TMTensor dialect | |
--tm-tensor-to-loops - Convert TMTensor ops to loops and Linalg ops. | |
--topological-sort - Sort regions without SSA dominance in topological order | |
--torch-adjust-calling-conventions - Adjust the calling conventions of functions | |
--torch-decompose-complex-ops - Decompose complicated torch operations | |
--legal-ops=<string> - List of operation names that should be considered legal | |
--torch-drop-shape-calculations - Drop reified shape calculations. | |
--torch-erase-module-initializer - Erase the `torch.global_slot.module_initializer` op. | |
--torch-finalizing-backend-type-conversion - Finalizes a partial conversion to builtin tensors | |
--torch-func-backend-type-conversion - Convert functions to operate on builtin tensors | |
--torch-globalize-object-graph - Converts TorchScript object graphs to a globalized form | |
--torch-inline-global-slots - Inlines torch.global_slot ops. | |
--torch-lower-to-backend-contract - Perform simplifications until the backend contract is satisfied. | |
--backend-legal-ops=<string> - List of ops to be considered legal for the backend. | |
--decompose - Decompose ops. | |
--max-iterations=<int> - Maximum number of invocations of the simplification pipeline. | |
--torch-maximize-value-semantics - Use value-semantic tensors where possible. | |
--torch-prepare-for-globalize-object-graph - Lowering in preparation for globalizing | |
--torch-reduce-op-variants - Reduces variants of ops to a smaller set of ops. | |
--torch-refine-public-return - Refine public return | |
--torch-refine-types - Refine types | |
--torch-reify-shape-calculations - Decompose complicated torch operations | |
--torch-simplify-shape-calculations - Simplify reified shape calculations. | |
--torch-verify-backend-contract - Check that program satisfies backend contract. | |
--torch-verify-linalg-on-tensors-backend-contract - Verifies conformity to the linalg-on-tensors backend contract | |
--torch-verify-mhlo-backend-contract - Verifies conformity to the mhlo backend contract | |
--torch-verify-tosa-backend-contract - Verifies conformity to the linalg-on-tensors backend contract | |
--tosa-infer-shapes - Propagate shapes across TOSA operations | |
--tosa-layerwise-constant-fold - Fold layerwise operations on constant tensors | |
--tosa-make-broadcastable - TOSA rank Reshape to enable Broadcasting | |
--tosa-optional-decompositions - Applies Tosa operations optional decompositions | |
--tosa-to-arith - Lower TOSA to the Arith dialect | |
--include-apply-rescale - Whether to include the lowering for tosa.apply_rescale to arith | |
--use-32-bit - Whether to prioritze lowering to 32-bit operations | |
--tosa-to-linalg - Lower TOSA to LinAlg on tensors | |
--tosa-to-linalg-named - Lower TOSA to LinAlg named operations | |
--tosa-to-scf - Lower TOSA to the SCF dialect | |
--tosa-to-tensor - Lower TOSA to the Tensor dialect | |
--transform-dialect-check-uses - warn about potential use-after-free in the transform dialect | |
--vector-bufferize - Bufferize Vector dialect ops | |
--view-op-graph - Print Graphviz visualization of an operation | |
--max-label-len=<uint> - Limit attribute/type length to number of chars | |
--print-attrs - Print attributes of operations | |
--print-control-flow-edges - Print control flow edges | |
--print-data-flow-edges - Print data flow edges | |
--print-result-types - Print result types of operations | |
Pass Pipelines: | |
--sparse-compiler - The standard pipeline for taking sparsity-agnostic IR using the sparse-tensor type, and lowering it to LLVM IR with concrete representations and algorithms for sparse tensors. | |
--enable-amx - Enables the use of AMX dialect while lowering the vector dialect. | |
--enable-arm-neon - Enables the use of ArmNeon dialect while lowering the vector dialect. | |
--enable-arm-sve - Enables the use of ArmSVE dialect while lowering the vector dialect. | |
--enable-index-optimizations - Allows compiler to assume indices fit in 32-bit if that yields faster code | |
--enable-runtime-library - Enable runtime library for manipulating sparse tensors | |
--enable-simd-index32 - Enable i32 indexing into vectors (for efficiency) | |
--enable-vla-vectorization - Enable vector length agnostic vectorization | |
--enable-x86vector - Enables the use of X86Vector dialect while lowering the vector dialect. | |
--parallelization-strategy=<value> - Set the parallelization strategy | |
=none - Turn off sparse parallelization. | |
=dense-outer-loop - Enable dense outer loop sparse parallelization. | |
=any-storage-outer-loop - Enable sparse parallelization regardless of storage for the outer loop. | |
=dense-any-loop - Enable dense parallelization for any loop. | |
=any-storage-any-loop - Enable sparse parallelization for any storage and loop. | |
--reassociate-fp-reductions - Allows llvm to reassociate floating-point reductions for speed | |
--s2s-strategy=<int> - Set the strategy for sparse-to-sparse conversion | |
--test-bufferization-analysis-only - Run only the inplacability analysis | |
--vectorization-strategy=<value> - Set the vectorization strategy | |
=none - Turn off sparse vectorization. | |
=dense-inner-loop - Enable vectorization for dense inner loops. | |
=any-storage-inner-loop - Enable sparse vectorization for inner loops with any storage. | |
--vl=<int> - Set the vector length | |
--torch-backend-to-linalg-on-tensors-backend-pipeline- Pipeline lowering torch backend contract to linalg-on-tensors backend contract. | |
--torch-backend-to-mhlo-backend-pipeline - Pipeline lowering torch backend contract to MHLO backend contract. | |
--enable-i32-index - Enable truncate index from i64 to i32(unsafely) | |
--enable-static-shape - Enable static shape conversion. | |
--torch-backend-to-tosa-backend-pipeline - Pipeline lowering torch backend contract to TOSA backend contract. | |
--torch-function-to-torch-backend-pipeline - Pipeline lowering a Torch function to Torch backend form. | |
--backend-legal-ops=<string> - List of ops to be considered legal for the backend. | |
--decompose-complex-ops - Decompose complex operations. | |
--max-iterations=<int> - Maximum number of invocations of the simplification pipeline. | |
--torch-shape-refinement-pipeline - Pipeline refining shapes of tensors. | |
--torch-simplification-pipeline - Pipeline simplifying computations in the program. | |
--backend-legal-ops=<string> - List of ops to be considered legal for the backend. | |
--decompose-complex-ops - Decompose complex operations. | |
--max-iterations=<int> - Maximum number of invocations of the simplification pipeline. | |
--torchscript-module-to-torch-backend-pipeline - Pipeline lowering TorchScript object graph IR to Torch backend form. | |
--backend-legal-ops=<string> - List of ops to be considered legal for the backend. | |
--decompose-complex-ops - Decompose complex operations. | |
--max-iterations=<int> - Maximum number of invocations of the simplification pipeline. | |
--show-dialects - Print the list of registered dialects | |
--split-input-file - Split the input file into pieces and process each chunk independently | |
--verify-diagnostics - Check that emitted diagnostics match expected-* lines on the corresponding line | |
--verify-each - Run the verifier after each transformation pass | |
Generic Options: | |
--help - Display available options (--help-hidden for more) | |
--help-list - Display list of available options (--help-list-hidden for more) | |
--version - Display the version of this program | |
➜ ~ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment