Ray 0.7.3 Release Note

Highlights

RLlib ModelV2 API is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.
ray.experimental.sgd.pytorch.PyTorchTrainer is ready for early adopters. Checkout the doc here and we welcome your feedback!

model_creator = lambda config: YourPyTorchModel()
data_creator = lambda config: YourTrainingSet(), YourValidationSet()

trainer = PyTorchTrainer(
    model_creator,
    data_creator,
    optimizer_creator=utils.sgd_mse_optimizer,
    config={"lr": 1e-4},
    num_replicas=2,
    resources_per_replica=Resources(num_gpus=1),
    batch_size=16,
    backend="auto")

for i in range(NUM_EPOCHS):
    trainer.train()

Jobs table is added to the state API. You can query all the clients that have performed ray.init to connected to the current cluster. #5076

>>> ray.state.jobs()
[{'JobID': '02000000',
  'NodeManagerAddress': '10.99.88.77',
  'DriverPid': 74949,
  'StartTime': 1564168784,
  'StopTime': 1564168798},
 {'JobID': '01000000',
  'NodeManagerAddress': '10.99.88.77',
  'DriverPid': 74871,
  'StartTime': 1564168742}]

Core

Improvement on memory storage handling. #5143, #5216, #4893
Improved workflow:
- Debugging tool local_mode now behaves more consistent with ray default mode. #5060
- Improved KeyboardInterrupt Exception Handling, stack trace reduced from 115 lines to 22 lines. #5237
Ray core:
- Experimental direct actor call. #5140, #5184
- Use gRPC for Raylet communication. #5120, #5054, #5121
- Improvement in core worker, the shared module between Python and Java. #5079, #5034, #5062
- GCS (global control store) was refactored. #5058, #5050

RLlib

Finished port of all major RLlib algorithms to builder pattern #5277, #5258, #5249
learner_queue_timeout can be configured for async sample optimizer. #5270
reproducible_seed can be used for reproducible experiments. #5197
Added entropy coefficient decay to IMPALA, APPO and PPO #5043

Tune:

Support nested dictionaries for CSVLogger. So your Trainer._train function can return arbirarily nested dictionary. #5295
Add system performance tracking for gpu, ram, vram, cpu usage statistics #4924
Faster Node Recovery #5053

Autoscaler

Add a 'request_cores' function for manual autoscaling. You can know manually request resources for the autoscaler. #4754
Local cluster:
- More readable example yaml with comments. #5290
- Multiple cluster name is supported. #4864
Improved logging with AWS NodeProvider. create_instance call will be logged. #4998

Others Libraries:

SGD:
- Example for Training. #5292
- Deprecate old distributed SGD implementation. #5160
Kuberentes: Ray namespace added for k8s. #4111
Dev experience: Add linting pre-push hook. #5154

Thanks:

We thank the following contributors for their amazing contributions:

@joneswong, @1beb, @richardliaw, @pcmoritz, @raulchen, @stephanie-wang, @jiangzihao2009, @LorenzoCevolani, @kfstorm, @pschafhalter, @micafan, @simon-mo, @vipulharsh, @haje01, @ls-daniel, @hartikainen, @stefanpantic, @edoakes, @llan-ml, @alex-petrenko, @ztangent, @gravitywp, @MQQ, @dulex123, @morgangiraud, @antoine-galataud, @robertnishihara, @qxcv, @vakker, @jovany-wang, @zhijunfu, @ericl

Syncing behavior between head and workers can now be customized (sync_to_driver). Syncing behavior (upload_dir) between cluster and cloud is now separately customizable (sync_to_cloud). This changes the structure of the uploaded directory - now local_dir is synced with upload_dir. #4450
BREAKING: ExperimentAnalysis is now returned by default from tune.run. To obtain a list of trials, use analysis.trials. (#5115)
Analysis object will now return all trials in a folder; ExperimentAnalysis is a subclass that returns all trials of a experiment. (#5115)
Bug fix: Tune CLI sort is fixed
Add missing argument keep_checkpoints_num to tune (#5117)
Trials on failed nodes will be prioritized in processing (#5053)
Trial Checkpointing is now more flexible (#4728)
Add system performance tracking for gpu, ram, vram, cpu usage statistics - toggle with tune.run(log_sys_usage=True) (#4924)
Experiment checkpointing frequency is now less frequent and can be controlled with tune.run(global_checkpoint_period=...). (#4859)

simon-mo/0.7.3-Release-Note.md

Select an option

No results found

Select an option

No results found

Ray 0.7.3 Release Note

Highlights

Core

RLlib

Tune:

Autoscaler

Others Libraries:

Thanks:

simon-mo commented Jul 30, 2019

Uh oh!

robertnishihara commented Jul 30, 2019

Uh oh!

ericl commented Jul 30, 2019

Uh oh!

simon-mo commented Jul 30, 2019

Uh oh!

ericl commented Jul 30, 2019

Uh oh!

richardliaw commented Jul 31, 2019 •

edited

Loading

Uh oh!

simon-mo/0.7.3-Release-Note.md

Ray 0.7.3 Release Note

Highlights

Core

RLlib

Tune:

Autoscaler

Others Libraries:

Thanks:

simon-mo commented Jul 30, 2019

Uh oh!

robertnishihara commented Jul 30, 2019

Uh oh!

ericl commented Jul 30, 2019

Uh oh!

simon-mo commented Jul 30, 2019

Uh oh!

ericl commented Jul 30, 2019

Uh oh!

richardliaw commented Jul 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardliaw commented Jul 31, 2019 •

edited

Loading