Ops to test on python side

If nothing is specified, all argument combination should be considered

CPU and GPU

copy_ no_sparse && no_quantize && self!=source && not_copy_transpose
gather
gather(out=)
scatter_(Tensor)
scatter(Tensor)
scatter_(value)
scatter(value)
index_put_ all calls
index
index(out=)
where
logical_not_ CPU + GPU
logical_not CPU + GPU
logical_not(out=) CPU + GPU
backward of unfold with step >= size
backward of unfold with step < size
sum / sum(out=)
prod / prod(out=)
mean / mean(out=)
norm / norm(out=) Except without dim and sparse.
all / all(out=)
any / any(out=)
min_values
max_values
argmax / argmax(out=)
argmin / argmin(out=)
var / var(out=)
std / std(out=)
var_mean / var_mean(out=)
std_mean / std_mean(out=)

CPU only

cumsum normal + dimname
cumsum(out=) normal + dimname
cumprod normal + dimname
cumprod(out=) normal + dimname
scatter_add_(Tensor)
scatter_add(Tensor)
_min
_min(out=)
min
min(out=)
_max
_max(out=)
max
max(out=)
index_select
index_select(out=)
masked_fill_(Tensor)
masked_fill_(value)
masked_select / masked_select(out=) byte mask
masked_select / masked_select(out=) boolean mask

GPU only

_fused_dropout_backward (backward of _fused_dropout used for dropout when train + cuda + 0<p<1 + numel!=0 + inplace=false)

Quantized

int_repr
Created as output of quantized ops like qadd / qmul / qconv
quant/dequant in most quantization ops
More quant for "fake_*" functions

raw data:

dont_compute_common_dtype

copy_impl

copy_

copy_ no_sparse + no_quantize + self!=source + not_copy_transpose

cpu_cum_base_kernel

cumsum_cpu_kernel
_cumsum_cpu
native: _cumsum if CPU
_cumsum_out_cpu
native: _cumsum.out if CPU
cumprod_cpu_kernel
same as above

cumsum CPU + dimname
cumsum(out=) CPU + dimname
cumprod CPU + dimname
cumprod(out=) CPU + dimname

cpu_scatter_gather_base_kernel

gather_cpu_kernel
scatter_cpu_kernel
scatter_fill_cpu_kernel
scatter_add_cpu_kernel cuda_scatter_gather_base_kernel
gather_cuda_kernel
scatter_cuda_kernel cuda_scatter_fill_base_kernel
scatter_fill_cuda_kernel

gather CPU + GPU
gather(out=) CPU + GPU
scatter_(Tensor) CPU + GPU
scatter(Tensor) CPU + GPU
scatter_(value) CPU + GPU
scatter(value) CPU + GPU
scatter_add_(Tensor) CPU
scatter_add(Tensor) CPU

compare_base_kernel

min_kernel_impl
_min_out_cpu + _min(out=)
_min_cpu + _min
max_kernel_impl

_min CPU
_min(out=) CPU
min CPU
min(out=) CPU
_max CPU
_max(out=) CPU
max CPU
max(out=) CPU

masked_scale_kernel

masked_scale_cuda
native: _masked_scale

_fused_dropout_backward (backward of _fused_dropout used for dropout when train + cuda + 0<p<1 + numel!=0 + inplace=false)

index_put_impl

index_put_

index_put_ all calls

index index_out

index
index(out=)

index_select_out_cpu_ index_select_cpu_

index_select CPU
index_select(out=) CPU

masked_fill_impl_cpu

masked_fill__cpu

masked_fill_(Tensor) CPU
masked_fill_(value) CPU

_s_where

where

where CPU + GPU

logical_not_out

logical_not
logical_not_

logical_not_ CPU + GPU
logical_not CPU + GPU
logical_not(out=) CPU + GPU

_make_unfold_backward_iter_over_grad_out

unfold_backward_cpu_kernel
unfold_backward_cuda_kernel

backward of unfold with step >= size

_make_unfold_backward_iter_over_grad_in

unfold_backward_cpu_kernel
unfold_backward_cuda_kernel

backward of unfold with step < size

int_repr_quant_cpu int_repr_quant_cuda

int_repr CPU + GPU

quantize_tensor_per_tensor_affine_cuda

quantize_tensor_per_tensor_affine GPU
PerTensorAffineQuantizer::quantize() dequantize_tensor_per_tensor_affine_cuda
dequantize_tensor_per_tensor_affine GPU
PerTensorAffineQuantizer::dequantize()

Created as output of quantized ops like qadd / qmul / qconv
quant/dequant in most quantization ops

fake_quantize_tensor_kernel_cuda fake_quantize_grad_tensor_kernel_cuda make_per_tensor_quantized_tensor_cuda fake_quantize_per_channel_affine fake_quantize_per_channel_affine_backward

More quant for "fake_*" functions

compute_common_dtype_only_for_inputs

comparison_op

reduce_op

make_reduction

sum / sum(out=)
prod / prod(out=)
mean / mean(out=)
norm / norm(out=) Except without dim and sparse.
all / all(out=)
any / any(out=)
min_values
max_values
argmax / argmax(out=)
argmin / argmin(out=)
var / var(out=) with dim provided
std / std(out=) with dim provided
var_mean / var_mean(out=)
std_mean / std_mean(out=)

two_pass_reduction
parallel_reduce if output.nelement() == 1

prelu backward with weight.nelement() == 1

binary_kernel_reduce_vec - sum_kernel_impl CPU - prod_kernel_impl CPU - and_kernel_impl CPU - or_kernel_impl CPU - min_values_kernel_impl CPU - max_values_kernel_impl CPU
min_all_kernel_impl if not bool + min() Without dim provided CPU
max_all_kernel_impl if not bool + max() Without dim provided CPU
TH_TENSOR_APPLY_REDUCTION_SUM_PARALLEL - THTensor(sumall)() (NOT THE THC VERSION !)
- THTensor(maskedSelect)() -> _th_masked_select
- masked_select / masked_select(out=) byte mask CPU
- THTensor_(maskedSelectBool)() -> _th_masked_select_bool
- masked_select / masked_select(out=) boolean mask CPU
- THTensor_(meanall)() -> Not bound to TH
- var_all -> _th_var
- var no dim
- std_all -> _th_std
- std no dim

albanD/common_dtype.md