If nothing is specified, all argument combination should be considered
- copy_ no_sparse && no_quantize && self!=source && not_copy_transpose
- gather
- gather(out=)
- scatter_(Tensor)
- scatter(Tensor)
- scatter_(value)
- scatter(value)
- index_put_ all calls
- index
- index(out=)
- where
- logical_not_ CPU + GPU
- logical_not CPU + GPU
- logical_not(out=) CPU + GPU
- backward of unfold with step >= size
- backward of unfold with step < size
- sum / sum(out=)
- prod / prod(out=)
- mean / mean(out=)
- norm / norm(out=) Except without dim and sparse.
- all / all(out=)
- any / any(out=)
- min_values
- max_values
- argmax / argmax(out=)
- argmin / argmin(out=)
- var / var(out=)
- std / std(out=)
- var_mean / var_mean(out=)
- std_mean / std_mean(out=)
- cumsum normal + dimname
- cumsum(out=) normal + dimname
- cumprod normal + dimname
- cumprod(out=) normal + dimname
- scatter_add_(Tensor)
- scatter_add(Tensor)
- _min
- _min(out=)
- min
- min(out=)
- _max
- _max(out=)
- max
- max(out=)
- index_select
- index_select(out=)
- masked_fill_(Tensor)
- masked_fill_(value)
- masked_select / masked_select(out=) byte mask
- masked_select / masked_select(out=) boolean mask
- _fused_dropout_backward (backward of _fused_dropout used for dropout when train + cuda + 0<p<1 + numel!=0 + inplace=false)
- int_repr
- Created as output of quantized ops like qadd / qmul / qconv
- quant/dequant in most quantization ops
- More quant for "fake_*" functions
copy_impl
- copy_
- copy_ no_sparse + no_quantize + self!=source + not_copy_transpose
cpu_cum_base_kernel
- cumsum_cpu_kernel
- _cumsum_cpu
- native: _cumsum if CPU
- _cumsum_out_cpu
- native: _cumsum.out if CPU
- cumprod_cpu_kernel
- same as above
- cumsum CPU + dimname
- cumsum(out=) CPU + dimname
- cumprod CPU + dimname
- cumprod(out=) CPU + dimname
cpu_scatter_gather_base_kernel
- gather_cpu_kernel
- scatter_cpu_kernel
- scatter_fill_cpu_kernel
- scatter_add_cpu_kernel cuda_scatter_gather_base_kernel
- gather_cuda_kernel
- scatter_cuda_kernel cuda_scatter_fill_base_kernel
- scatter_fill_cuda_kernel
- gather CPU + GPU
- gather(out=) CPU + GPU
- scatter_(Tensor) CPU + GPU
- scatter(Tensor) CPU + GPU
- scatter_(value) CPU + GPU
- scatter(value) CPU + GPU
- scatter_add_(Tensor) CPU
- scatter_add(Tensor) CPU
compare_base_kernel
- min_kernel_impl
- _min_out_cpu + _min(out=)
- _min_cpu + _min
- max_kernel_impl
- _min CPU
- _min(out=) CPU
- min CPU
- min(out=) CPU
- _max CPU
- _max(out=) CPU
- max CPU
- max(out=) CPU
masked_scale_kernel
- masked_scale_cuda
- native: _masked_scale
- _fused_dropout_backward (backward of _fused_dropout used for dropout when train + cuda + 0<p<1 + numel!=0 + inplace=false)
index_put_impl
- index_put_
- index_put_ all calls
index index_out
- index
- index(out=)
index_select_out_cpu_ index_select_cpu_
- index_select CPU
- index_select(out=) CPU
masked_fill_impl_cpu
- masked_fill__cpu
- masked_fill_(Tensor) CPU
- masked_fill_(value) CPU
_s_where
- where
- where CPU + GPU
logical_not_out
- logical_not
- logical_not_
- logical_not_ CPU + GPU
- logical_not CPU + GPU
- logical_not(out=) CPU + GPU
_make_unfold_backward_iter_over_grad_out
- unfold_backward_cpu_kernel
- unfold_backward_cuda_kernel
- backward of unfold with step >= size
_make_unfold_backward_iter_over_grad_in
- unfold_backward_cpu_kernel
- unfold_backward_cuda_kernel
- backward of unfold with step < size
int_repr_quant_cpu int_repr_quant_cuda
- int_repr CPU + GPU
quantize_tensor_per_tensor_affine_cuda
- quantize_tensor_per_tensor_affine GPU
- PerTensorAffineQuantizer::quantize() dequantize_tensor_per_tensor_affine_cuda
- dequantize_tensor_per_tensor_affine GPU
- PerTensorAffineQuantizer::dequantize()
- Created as output of quantized ops like qadd / qmul / qconv
- quant/dequant in most quantization ops
fake_quantize_tensor_kernel_cuda fake_quantize_grad_tensor_kernel_cuda make_per_tensor_quantized_tensor_cuda fake_quantize_per_channel_affine fake_quantize_per_channel_affine_backward
- More quant for "fake_*" functions
comparison_op
reduce_op
- make_reduction
- sum / sum(out=)
- prod / prod(out=)
- mean / mean(out=)
- norm / norm(out=) Except without dim and sparse.
- all / all(out=)
- any / any(out=)
- min_values
- max_values
- argmax / argmax(out=)
- argmin / argmin(out=)
- var / var(out=) with dim provided
- std / std(out=) with dim provided
- var_mean / var_mean(out=)
- std_mean / std_mean(out=)
- two_pass_reduction
- parallel_reduce if output.nelement() == 1
- prelu backward with weight.nelement() == 1
- binary_kernel_reduce_vec - sum_kernel_impl CPU - prod_kernel_impl CPU - and_kernel_impl CPU - or_kernel_impl CPU - min_values_kernel_impl CPU - max_values_kernel_impl CPU
- min_all_kernel_impl if not bool + min() Without dim provided CPU
- max_all_kernel_impl if not bool + max() Without dim provided CPU
- TH_TENSOR_APPLY_REDUCTION_SUM_PARALLEL
- THTensor(sumall)() (NOT THE THC VERSION !)
- THTensor(maskedSelect)() -> _th_masked_select
- masked_select / masked_select(out=) byte mask CPU
- THTensor_(maskedSelectBool)() -> _th_masked_select_bool
- masked_select / masked_select(out=) boolean mask CPU
- THTensor_(meanall)() -> Not bound to TH
- var_all -> _th_var
- var no dim
- std_all -> _th_std
- std no dim