soumith · February 18, 2018 06:42
diff --git a/foo b/foo
 Scopes 0.3.1 backport   https://github.com/pytorch/pytorch/pull/5153
 Cherry pick dataloader issue fix to 0.3.1   https://github.com/pytorch/pytorch/pull/5140
 Fixed double memory accesses of several pointwise operations.   https://github.com/pytorch/pytorch/pull/5068
 Broadcast output requires_grad only if corresponding input requires_grad   https://github.com/pytorch/pytorch/pull/5061
 Fix topk work size computation   https://github.com/pytorch/pytorch/pull/5053
 Fix maxpool3d / avgpool3d crashs   https://github.com/pytorch/pytorch/pull/5052
 Fix blas addmm (gemm) condition check   https://github.com/pytorch/pytorch/pull/5048
 Fix C FFI extension after moving TH to C++   https://github.com/pytorch/pytorch/pull/5005
 make torch.set_num_threads also set MKL threads (take 2)   https://github.com/pytorch/pytorch/pull/5002
 Fix reduction functions to respect the stride of the output   https://github.com/pytorch/pytorch/pull/4995
 Fix refcycles in DataParallel scatter and gather   https://github.com/pytorch/pytorch/pull/4988
 Improve CUDA softmax performance   https://github.com/pytorch/pytorch/pull/4973
 Fix triu and tril for zero-strided inputs on gpu   https://github.com/pytorch/pytorch/pull/4962
 Make torch.cuda.empty_cache() a no-op when cuda is not initialized   https://github.com/pytorch/pytorch/pull/4936
 Lazy init order in set device, should not be called in getDevCount   https://github.com/pytorch/pytorch/pull/4918
 Add missing _lazy_init in cuda python module   https://github.com/pytorch/pytorch/pull/4907
 Don't throw exceptions inside OpenMP parallel blocks   https://github.com/pytorch/pytorch/pull/4857
 Fix typo in docs   https://github.com/pytorch/pytorch/pull/4846
 Backport dlpack aten changes to v0.3.1 branch   https://github.com/pytorch/pytorch/pull/4823
 Initialize cuda before setting cuda tensor types as default   https://github.com/pytorch/pytorch/pull/4788
 More documentation for CUDA stream functions.   https://github.com/pytorch/pytorch/pull/4756
 Legacy Padding: correct output size with nInputDim   https://github.com/pytorch/pytorch/pull/4735
 [ASAN] fix more load_real deletes   https://github.com/pytorch/pytorch/pull/4694
 updated documentation for Embedding layer   https://github.com/pytorch/pytorch/pull/4684
 Fix cast direction in THCBlas   https://github.com/pytorch/pytorch/pull/4670
 Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2   https://github.com/pytorch/pytorch/pull/4656
 NLLLoss: current code works with dim = 3, so I added it to dim checks   https://github.com/pytorch/pytorch/pull/4654
 Dataloader issues   https://github.com/pytorch/pytorch/pull/4643
 More strict shape check on Conv operators.   https://github.com/pytorch/pytorch/pull/4637
 Clean up error checking in THPTensor_(_convertToTensorIndexers)   https://github.com/pytorch/pytorch/pull/4616
 Tiny fix on MaxPool2d __repr__   https://github.com/pytorch/pytorch/pull/4591
 Fix use after free when advanced indexing tensors with tensors   https://github.com/pytorch/pytorch/pull/4559
 Fix abs specialization for `uint8_t` type.   https://github.com/pytorch/pytorch/pull/4521
 Extract the finish check for profiler   https://github.com/pytorch/pytorch/pull/4519
 Improve memory access patterns for index operations.   https://github.com/pytorch/pytorch/pull/4493
 Fix StepLR example docs   https://github.com/pytorch/pytorch/pull/4478
 Improve float precision stability of `linspace` op, fix 4419.   https://github.com/pytorch/pytorch/pull/4470
 Fix setting using running stats in InstanceNorm*d   https://github.com/pytorch/pytorch/pull/4444
 Fix python gc race condition with THPVariable_traverse   https://github.com/pytorch/pytorch/pull/4437
 Add random_split to torch.utils.data.dataset   https://github.com/pytorch/pytorch/pull/4435
 More detailed documentation.   https://github.com/pytorch/pytorch/pull/4428
 fix documentation of RNN weight_ih_l[k] shape   https://github.com/pytorch/pytorch/pull/4407
 Fix undefined FileNotFoundError   https://github.com/pytorch/pytorch/pull/4384
 add bias term to linear __repr__ functions, fix spacing   https://github.com/pytorch/pytorch/pull/4352
 Improved documentation of several index operations.   https://github.com/pytorch/pytorch/pull/4345
 Add check for slice shape match in index_copy_ and index_add_.   https://github.com/pytorch/pytorch/pull/4342
 fix MaxPool2d __repr__ (adds missing ceil_mode summary)   https://github.com/pytorch/pytorch/pull/4335
 fix an out of bounds hypothetical   https://github.com/pytorch/pytorch/pull/4240
 fix typo.   https://github.com/pytorch/pytorch/pull/4206
 Allow map_location in torch.load to be a string   https://github.com/pytorch/pytorch/pull/4203
 Fix distributed dataloader so it pins memory to current GPU not GPU 0.   https://github.com/pytorch/pytorch/pull/4196
 Add function to explicitly initialize PyTorch CUDA state.   https://github.com/pytorch/pytorch/pull/4180
 Fix CUDA version typo   https://github.com/pytorch/pytorch/pull/4175
 Rearrange dimensions for pointwise operations for better performance.   https://github.com/pytorch/pytorch/pull/4174
 Correct instancenorm input size   https://github.com/pytorch/pytorch/pull/4171
 Better error messages for blas ops with cuda.LongTensor   https://github.com/pytorch/pytorch/pull/4160
 Re-initialize autograd engine in child processes   https://github.com/pytorch/pytorch/pull/4158
 Improve svd doc   https://github.com/pytorch/pytorch/pull/4155
 Add cublas batched gemm support.   https://github.com/pytorch/pytorch/pull/4151
 Added explicit tuple dimensions to doc for Conv1d.   https://github.com/pytorch/pytorch/pull/4136
 improve performance of maxpooling backwards   https://github.com/pytorch/pytorch/pull/4106
 Add proper shape checking to torch.cat   https://github.com/pytorch/pytorch/pull/4087
 Fix repeat non owning   https://github.com/pytorch/pytorch/pull/4084
 Assert MKL ld* conditions for ger, gemm, and gemv   https://github.com/pytorch/pytorch/pull/4056
 Add mutex for CPU RNG and move TH to C++   https://github.com/pytorch/pytorch/pull/4041
 slightly simplified math in IndexToOffset   https://github.com/pytorch/pytorch/pull/4040
 Implement NLLLossNd   https://github.com/pytorch/pytorch/pull/4035
 Use enabled argument in torch.autograd.profiler.emit_nvtx   https://github.com/pytorch/pytorch/pull/4032
 allow cudnn for fp16 batch norm   https://github.com/pytorch/pytorch/pull/4021
 Neg n_workers are now the same as zero   https://github.com/pytorch/pytorch/pull/4019
 Add default PyTorch seeding and worker_init_fn to DataLoader   https://github.com/pytorch/pytorch/pull/4018
 Fix CUDA Multinomial checks   https://github.com/pytorch/pytorch/pull/4009
 Accept longs in default_collate for dataloader in python 2   https://github.com/pytorch/pytorch/pull/4001
 Improve docs for torch and torch.Tensor   https://github.com/pytorch/pytorch/pull/3969
 Improve Tensor.new doc   https://github.com/pytorch/pytorch/pull/3954
 Fix CUDA index_fill_ boundary check with small tensor size   https://github.com/pytorch/pytorch/pull/3953
 [docs] rnn.py: Note zero defaults for hidden state/cell   https://github.com/pytorch/pytorch/pull/3951
 Improve Tensor.scatter_ doc   https://github.com/pytorch/pytorch/pull/3937
 Add rnn args check   https://github.com/pytorch/pytorch/pull/3925
 Allow target.requires_grad in l1_loss and mse_loss   https://github.com/pytorch/pytorch/pull/3876
 More docs for Conv1d Conv2d   https://github.com/pytorch/pytorch/pull/3870
 Fix padding_idx getting ignored in backward for Embedding(sparse=True)   https://github.com/pytorch/pytorch/pull/3842
 Fix MultiLabelMarginLoss docs   https://github.com/pytorch/pytorch/pull/3836
 Have __sizeof__ account for size of stored elements   https://github.com/pytorch/pytorch/pull/3821
 Fix cosine_similarity's output shape   https://github.com/pytorch/pytorch/pull/3811
 add reduce arg to PoissonNLLLoss   https://github.com/pytorch/pytorch/pull/3770
 Fix DataParallel scattering for empty lists / dicts / tuples   https://github.com/pytorch/pytorch/pull/3769
 change doc for Adaptive Pooling   https://github.com/pytorch/pytorch/pull/3746
 Add missing trtrs, orgqr, ormqr docs   https://github.com/pytorch/pytorch/pull/3720
 Remove hard file offset reset in load()   https://github.com/pytorch/pytorch/pull/3695
 Fix cuBLAS arguments for fp16 dot   https://github.com/pytorch/pytorch/pull/3660
 fixed a typo in ConcatDataset.cumulative_sizes attribute name   https://github.com/pytorch/pytorch/pull/3534
 Signal handling in DataLoader workers; Timeout option   https://github.com/pytorch/pytorch/pull/3474
 Add Cosine Annealing LR Scheduler   https://github.com/pytorch/pytorch/pull/3311
	Scopes 0.3.1 backport https://github.com/pytorch/pytorch/pull/5153
	Cherry pick dataloader issue fix to 0.3.1 https://github.com/pytorch/pytorch/pull/5140
	Fixed double memory accesses of several pointwise operations. https://github.com/pytorch/pytorch/pull/5068
	Broadcast output requires_grad only if corresponding input requires_grad https://github.com/pytorch/pytorch/pull/5061
	Fix topk work size computation https://github.com/pytorch/pytorch/pull/5053
	Fix maxpool3d / avgpool3d crashs https://github.com/pytorch/pytorch/pull/5052
	Fix blas addmm (gemm) condition check https://github.com/pytorch/pytorch/pull/5048
	Fix C FFI extension after moving TH to C++ https://github.com/pytorch/pytorch/pull/5005
	make torch.set_num_threads also set MKL threads (take 2) https://github.com/pytorch/pytorch/pull/5002
	Fix reduction functions to respect the stride of the output https://github.com/pytorch/pytorch/pull/4995
	Fix refcycles in DataParallel scatter and gather https://github.com/pytorch/pytorch/pull/4988
	Improve CUDA softmax performance https://github.com/pytorch/pytorch/pull/4973
	Fix triu and tril for zero-strided inputs on gpu https://github.com/pytorch/pytorch/pull/4962
	Make torch.cuda.empty_cache() a no-op when cuda is not initialized https://github.com/pytorch/pytorch/pull/4936
	Lazy init order in set device, should not be called in getDevCount https://github.com/pytorch/pytorch/pull/4918
	Add missing _lazy_init in cuda python module https://github.com/pytorch/pytorch/pull/4907
	Don't throw exceptions inside OpenMP parallel blocks https://github.com/pytorch/pytorch/pull/4857
	Fix typo in docs https://github.com/pytorch/pytorch/pull/4846
	Backport dlpack aten changes to v0.3.1 branch https://github.com/pytorch/pytorch/pull/4823
	Initialize cuda before setting cuda tensor types as default https://github.com/pytorch/pytorch/pull/4788
	More documentation for CUDA stream functions. https://github.com/pytorch/pytorch/pull/4756
	Legacy Padding: correct output size with nInputDim https://github.com/pytorch/pytorch/pull/4735
	[ASAN] fix more load_real deletes https://github.com/pytorch/pytorch/pull/4694
	updated documentation for Embedding layer https://github.com/pytorch/pytorch/pull/4684
	Fix cast direction in THCBlas https://github.com/pytorch/pytorch/pull/4670
	Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2 https://github.com/pytorch/pytorch/pull/4656
	NLLLoss: current code works with dim = 3, so I added it to dim checks https://github.com/pytorch/pytorch/pull/4654
	Dataloader issues https://github.com/pytorch/pytorch/pull/4643
	More strict shape check on Conv operators. https://github.com/pytorch/pytorch/pull/4637
	Clean up error checking in THPTensor_(_convertToTensorIndexers) https://github.com/pytorch/pytorch/pull/4616
	Tiny fix on MaxPool2d __repr__ https://github.com/pytorch/pytorch/pull/4591
	Fix use after free when advanced indexing tensors with tensors https://github.com/pytorch/pytorch/pull/4559
	Fix abs specialization for `uint8_t` type. https://github.com/pytorch/pytorch/pull/4521
	Extract the finish check for profiler https://github.com/pytorch/pytorch/pull/4519
	Improve memory access patterns for index operations. https://github.com/pytorch/pytorch/pull/4493
	Fix StepLR example docs https://github.com/pytorch/pytorch/pull/4478
	Improve float precision stability of `linspace` op, fix 4419. https://github.com/pytorch/pytorch/pull/4470
	Fix setting using running stats in InstanceNorm*d https://github.com/pytorch/pytorch/pull/4444
	Fix python gc race condition with THPVariable_traverse https://github.com/pytorch/pytorch/pull/4437
	Add random_split to torch.utils.data.dataset https://github.com/pytorch/pytorch/pull/4435
	More detailed documentation. https://github.com/pytorch/pytorch/pull/4428
	fix documentation of RNN weight_ih_l[k] shape https://github.com/pytorch/pytorch/pull/4407
	Fix undefined FileNotFoundError https://github.com/pytorch/pytorch/pull/4384
	add bias term to linear __repr__ functions, fix spacing https://github.com/pytorch/pytorch/pull/4352
	Improved documentation of several index operations. https://github.com/pytorch/pytorch/pull/4345
	Add check for slice shape match in index_copy_ and index_add_. https://github.com/pytorch/pytorch/pull/4342
	fix MaxPool2d __repr__ (adds missing ceil_mode summary) https://github.com/pytorch/pytorch/pull/4335
	fix an out of bounds hypothetical https://github.com/pytorch/pytorch/pull/4240
	fix typo. https://github.com/pytorch/pytorch/pull/4206
	Allow map_location in torch.load to be a string https://github.com/pytorch/pytorch/pull/4203
	Fix distributed dataloader so it pins memory to current GPU not GPU 0. https://github.com/pytorch/pytorch/pull/4196
	Add function to explicitly initialize PyTorch CUDA state. https://github.com/pytorch/pytorch/pull/4180
	Fix CUDA version typo https://github.com/pytorch/pytorch/pull/4175
	Rearrange dimensions for pointwise operations for better performance. https://github.com/pytorch/pytorch/pull/4174
	Correct instancenorm input size https://github.com/pytorch/pytorch/pull/4171
	Better error messages for blas ops with cuda.LongTensor https://github.com/pytorch/pytorch/pull/4160
	Re-initialize autograd engine in child processes https://github.com/pytorch/pytorch/pull/4158
	Improve svd doc https://github.com/pytorch/pytorch/pull/4155
	Add cublas batched gemm support. https://github.com/pytorch/pytorch/pull/4151
	Added explicit tuple dimensions to doc for Conv1d. https://github.com/pytorch/pytorch/pull/4136
	improve performance of maxpooling backwards https://github.com/pytorch/pytorch/pull/4106
	Add proper shape checking to torch.cat https://github.com/pytorch/pytorch/pull/4087
	Fix repeat non owning https://github.com/pytorch/pytorch/pull/4084
	Assert MKL ld* conditions for ger, gemm, and gemv https://github.com/pytorch/pytorch/pull/4056
	Add mutex for CPU RNG and move TH to C++ https://github.com/pytorch/pytorch/pull/4041
	slightly simplified math in IndexToOffset https://github.com/pytorch/pytorch/pull/4040
	Implement NLLLossNd https://github.com/pytorch/pytorch/pull/4035
	Use enabled argument in torch.autograd.profiler.emit_nvtx https://github.com/pytorch/pytorch/pull/4032
	allow cudnn for fp16 batch norm https://github.com/pytorch/pytorch/pull/4021
	Neg n_workers are now the same as zero https://github.com/pytorch/pytorch/pull/4019
	Add default PyTorch seeding and worker_init_fn to DataLoader https://github.com/pytorch/pytorch/pull/4018
	Fix CUDA Multinomial checks https://github.com/pytorch/pytorch/pull/4009
	Accept longs in default_collate for dataloader in python 2 https://github.com/pytorch/pytorch/pull/4001
	Improve docs for torch and torch.Tensor https://github.com/pytorch/pytorch/pull/3969
	Improve Tensor.new doc https://github.com/pytorch/pytorch/pull/3954
	Fix CUDA index_fill_ boundary check with small tensor size https://github.com/pytorch/pytorch/pull/3953
	[docs] rnn.py: Note zero defaults for hidden state/cell https://github.com/pytorch/pytorch/pull/3951
	Improve Tensor.scatter_ doc https://github.com/pytorch/pytorch/pull/3937
	Add rnn args check https://github.com/pytorch/pytorch/pull/3925
	Allow target.requires_grad in l1_loss and mse_loss https://github.com/pytorch/pytorch/pull/3876
	More docs for Conv1d Conv2d https://github.com/pytorch/pytorch/pull/3870
	Fix padding_idx getting ignored in backward for Embedding(sparse=True) https://github.com/pytorch/pytorch/pull/3842
	Fix MultiLabelMarginLoss docs https://github.com/pytorch/pytorch/pull/3836
	Have __sizeof__ account for size of stored elements https://github.com/pytorch/pytorch/pull/3821
	Fix cosine_similarity's output shape https://github.com/pytorch/pytorch/pull/3811
	add reduce arg to PoissonNLLLoss https://github.com/pytorch/pytorch/pull/3770
	Fix DataParallel scattering for empty lists / dicts / tuples https://github.com/pytorch/pytorch/pull/3769
	change doc for Adaptive Pooling https://github.com/pytorch/pytorch/pull/3746
	Add missing trtrs, orgqr, ormqr docs https://github.com/pytorch/pytorch/pull/3720
	Remove hard file offset reset in load() https://github.com/pytorch/pytorch/pull/3695
	Fix cuBLAS arguments for fp16 dot https://github.com/pytorch/pytorch/pull/3660
	fixed a typo in ConcatDataset.cumulative_sizes attribute name https://github.com/pytorch/pytorch/pull/3534
	Signal handling in DataLoader workers; Timeout option https://github.com/pytorch/pytorch/pull/3474
	Add Cosine Annealing LR Scheduler https://github.com/pytorch/pytorch/pull/3311