In this gist, I list up the possible changes on existing APIs for Chainer v2. It also includes some ideas of adding new APIs related to existing ones. Note that this is not a complete list of future changes for v2.
__len__
should return the length of the first axis.- Remove
volatile
flag?- It can be replaced by
no_backprop_mode
, which is easier to use. - We can also remove
Flag
- It can be replaced by
- Remove
zerograd
- Support uninitialized Variable
- It can be used for the better support of uninitialized parameters (used for "parameter shape placeholder")
- Support optimizer for Variable
- It enables us to choose an optimizer for each parameter variable.
- We need a design document and discussions on it.
type_check_enable
: Make it a global/thread-local flag- Make type checking enabled only in debug mode by default
add_param
andadd_persistent
does not work well with PyCharm- An added attribute is not recognized by the IDE
- It is better to design a new API to avoid this issue
add_uninitialized_param
should be redesigned to work with uninitialized Variable (see above)
add_link
: see the above discussions onadd_param
to_gpu
should be applied to links added in the future- We want to support duplicated parents of a link (it is currently prohibited)
- Remove deprecated methods (most of which can be replaced by optimizer hooks)
- Support per-parameter configurations (see notes on Variable)
- Stop using Abstract Base Class
- Support non-strict mode that allows the parameter set mismatch to the set of loaded parameters
- The interface should be updated to support the updated optimizer APIs.
- Support non-scalar observations.
- Remove it.
- Remove deprecated APIs.
- Deprecate
get_device()
and add alternatives:get_device_from_id
,get_device_from_object
, etc. - to_cpu / to_gpu: support Variable as an input.
- snapshot/snapshot_object: Remove the trigger option, which is redundant.
- LogReport: Think a better name of the trigger option.
- Flags: make them global/thread-local and remove them from arguments
- use_cudnn
- train/test
- deterministic
- batch_normalization: Think about better interface
- softmax_cross_entropy: Rename
normalize
option - softmax_cross_entropy: Allow ignore_label configurable by init argument
- split_axis: Make force_tuple True by default
- initialW, initial_bias, ...: Unify the naming convention of the arguments.
- input size, input channels, ...: Make them optional (we may need to change the overall APIs)
- wscale: Remove it.
- set_state / reset_state of RNN units: Unify the interface.
- BatchNormalization: Think about better interface
- ConvolutionND: Make the bias enabled by default
- Linear: Make the number of axes for batch dimensions configurable
I've been the only Chainer user at my company since last fall, but about two months ago the rest of our (fairly small) research team started to switch, and now everyone is using it 😃. Over the past several weeks of helping the team get up to speed with the framework, I've noticed a handful of unexpected behaviors that often trip people up. Many of them would be good things to revisit in a major-version update. In no particular order:
split_axis
function defaults toforce_tuple=False
; this allows for very subtle bugs when the batch size or sequence length is 1 and the variable isn't actually split: since variables can be sliced and iterated, and behave somewhat like sequences, the error often doesn't show itself until several steps later in the computation graph.force_tuple
should probably default toTrue
and allow the user to turn it off if desired.Variable.backward()
overwrites gradients, even ifzerograds()
orcleargrads()
isn't run first. Sometimes we want to accumulate some gradients, then accumulate some more, then make an update; this makes that effectively impossible. It looks like this is because theneeds_copy
status of a particular gradient array wouldn't be preserved between calls tobackward()
, but there might be a way to account for this.L.Linear
works when applied to 3D tensors, but assumes that the last two dimensions are both channel dimensions and only the first one is a batch dimension. It is much more common for us to want the first two dimensions to be treated as batch dimensions and only the last as a channel; I imagine this might be a difference between computer vision and NLP work? Maybe we can provide both behaviors and a parameter to choose which one.CHAINER_TYPE_CHECK
is separate fromdebug_mode
, even though they're both for debugging, and the former defaults to on while the latter defaults to off.F.batch_normalization
andF.dropout
both have different modes for training and testing. However one of them takes a flag that isTrue
for training and the other takes a flag that isTrue
for testing; these flags are required even though in most cases the functions should be able to figure out the correct behavior from the volatility flag of their input. I think this one would be completely fixed by the move to global train/test flags, so I'm definitely +1 on that.ndarray
,tuple
, anddict
. This whole issue can be avoided by merging this PR chainer/chainer#1654 and simply passing the input batch unmodified if it's neither atuple
nor adict
.Another thing to consider in an update is a more powerful data/dataset system. I've built something that lets me write declarative dataset definitions like:
and I'd be happy to turn it into a PR if there's interest.