Command line:
blaze-bin/third_party/iree/experimental/runners/mlir-proto-opt -linalg-comprehensive-bufferize-inplace /tmp/a.mlirOutput:
return val does not fold: %0 = tensor.generate %arg0, %arg1, %arg2, %arg3  {
^bb0(%arg4: index, %arg5: index, %arg6: index, %arg7: index):  // no predecessors
  %1 = index_cast %arg4 : index to i32
  %2 = sitofp %1 : i32 to f32
  tensor.yield %2 : f32
} : tensor<?x?x?x?xf32>
/tmp/a.mlir:26:10: error: 'std.tensor_load' op requires the same shape for all operands and results
  %lhs = call @generate_pseudorandom_4d_f32 (%M, %M0, %K, %K0) : (index, index, index, index) -> tensor<?x?x?x?xf32>
         ^
/tmp/a.mlir:26:10: note: see current operation: %1 = "std.tensor_load"(%0) : (tensor<?x?x?x?xf32>) -> none
/tmp/a.mlir:26:10: error: 'std.tensor_load' op requires the same shape for all operands and results
  %lhs = call @generate_pseudorandom_4d_f32 (%M, %M0, %K, %K0) : (index, index, index, index) -> tensor<?x?x?x?xf32>
         ^
/tmp/a.mlir:26:10: note: see current operation: %1 = "std.tensor_load"(%0) : (tensor<?x?x?x?xf32>) -> none