-
RLlib ModelV2 API is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.
-
ray.experimental.sgd.pytorch.PyTorchTraineris ready for early adopters. Checkout the doc here and we welcome your feedback!
model_creator = lambda config: YourPyTorchModel()
data_creator = lambda config: YourTrainingSet(), YourValidationSet()
trainer = PyTorchTrainer(
model_creator,
data_creator,
optimizer_creator=utils.sgd_mse_optimizer,
config={"lr": 1e-4},
num_replicas=2,
resources_per_replica=Resources(num_gpus=1),
batch_size=16,
backend="auto")
for i in range(NUM_EPOCHS):
trainer.train()- Jobs table is added to the state API. You can query all the clients that have performed
ray.initto connected to the current cluster. #5076
>>> ray.state.jobs()
[{'JobID': '02000000',
'NodeManagerAddress': '10.99.88.77',
'DriverPid': 74949,
'StartTime': 1564168784,
'StopTime': 1564168798},
{'JobID': '01000000',
'NodeManagerAddress': '10.99.88.77',
'DriverPid': 74871,
'StartTime': 1564168742}]- Improvement on memory storage handling. #5143, #5216, #4893
- Improved workflow:
- Debugging tool
local_modenow behaves more consistent with ray default mode. #5060 - Improved KeyboardInterrupt Exception Handling, stack trace reduced from 115 lines to 22 lines. #5237
- Debugging tool
- Ray core:
- Experimental direct actor call. #5140, #5184
- Use gRPC for Raylet communication. #5120, #5054, #5121
- Improvement in core worker, the shared module between Python and Java. #5079, #5034, #5062
- GCS (global control store) was refactored. #5058, #5050
- Finished port of all major RLlib algorithms to builder pattern #5277, #5258, #5249
learner_queue_timeoutcan be configured for async sample optimizer. #5270reproducible_seedcan be used for reproducible experiments. #5197- Added entropy coefficient decay to IMPALA, APPO and PPO #5043
- Support nested dictionaries for CSVLogger. So your
Trainer._trainfunction can return arbirarily nested dictionary. #5295 - Add system performance tracking for gpu, ram, vram, cpu usage statistics #4924
- Faster Node Recovery #5053
- Add a 'request_cores' function for manual autoscaling. You can know manually request resources for the autoscaler. #4754
- Local cluster:
- More readable example yaml with comments. #5290
- Multiple cluster name is supported. #4864
- Improved logging with AWS NodeProvider.
create_instancecall will be logged. #4998
- SGD:
- Example for Training. #5292
- Deprecate old distributed SGD implementation. #5160
- Kuberentes: Ray namespace added for k8s. #4111
- Dev experience: Add linting pre-push hook. #5154
We thank the following contributors for their amazing contributions:
@joneswong, @1beb, @richardliaw, @pcmoritz, @raulchen, @stephanie-wang, @jiangzihao2009, @LorenzoCevolani, @kfstorm, @pschafhalter, @micafan, @simon-mo, @vipulharsh, @haje01, @ls-daniel, @hartikainen, @stefanpantic, @edoakes, @llan-ml, @alex-petrenko, @ztangent, @gravitywp, @MQQ, @dulex123, @morgangiraud, @antoine-galataud, @robertnishihara, @qxcv, @vakker, @jovany-wang, @zhijunfu, @ericl
Please leave comments here! Note that we are still blocked on ray-project/ray#5310