Skip to content

Instantly share code, notes, and snippets.

@bveeramani
Created January 12, 2023 22:25
Show Gist options
  • Save bveeramani/13684c85982f9d24c9ec4ff8445c7fb4 to your computer and use it in GitHub Desktop.
Save bveeramani/13684c85982f9d24c9ec4ff8445c7fb4 to your computer and use it in GitHub Desktop.
ray on  keras-callbacks [$?] via 🐍 v3.10.8 (.venv)
❯ python repro.py --smoke-test
/Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:35: UnderReviewWarning: The feature generate_power_seq is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
"lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
/Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:93: UnderReviewWarning: The feature FeatureMapContrastiveTask is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
/Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/losses/self_supervised_learning.py:234: UnderReviewWarning: The feature AmdimNCELoss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
self.nce_loss = AmdimNCELoss(tclip)
/Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/callbacks/vision/confused_logit.py:16: UnderReviewWarning: The feature warn_missing_pkg is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
warn_missing_pkg("matplotlib")
2023-01-12 14:22:01,477 INFO worker.py:1546 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
== Status ==
Current time: 2023-01-12 14:22:08 (running for 00:00:04.02)
Memory usage on this node: 13.9/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/10 CPUs, 0/0 GPUs, 0.0/44.24 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/balaji/ray_results/ray_2023-01-12-14-21-59
Number of trials: 1/1 (1 RUNNING)
+---------------------------+----------+-----------------+--------------+-----------+-----------+--------+
| Trial name | status | loc | batch_size | layer_1 | layer_2 | lr |
|---------------------------+----------+-----------------+--------------+-----------+-----------+--------|
| train_lm_tune_8aa49_00000 | RUNNING | 127.0.0.1:31409 | 32 | 32 | 64 | 0.0001 |
+---------------------------+----------+-----------------+--------------+-----------+-----------+--------+
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
(pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:35: UnderReviewWarning: The feature generate_power_seq is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
(pid=31409) "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
(pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:93: UnderReviewWarning: The feature FeatureMapContrastiveTask is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
(pid=31409) contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
(pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/losses/self_supervised_learning.py:234: UnderReviewWarning: The feature AmdimNCELoss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
(pid=31409) self.nce_loss = AmdimNCELoss(tclip)
(pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pl_bolts/callbacks/vision/confused_logit.py:16: UnderReviewWarning: The feature warn_missing_pkg is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
(pid=31409) warn_missing_pkg("matplotlib")
(train_lm_tune pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting `Trainer(gpus=0)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=0)` instead.
(train_lm_tune pid=31409) rank_zero_deprecation(
(train_lm_tune pid=31409) GPU available: True (mps), used: False
(train_lm_tune pid=31409) TPU available: False, using: 0 TPU cores
(train_lm_tune pid=31409) IPU available: False, using: 0 IPUs
(train_lm_tune pid=31409) HPU available: False, using: 0 HPUs
(train_lm_tune pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:200: UserWarning: MPS available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='mps', devices=1)`.
(train_lm_tune pid=31409) rank_zero_warn(
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/train-images-idx3-ubyte.gz
0%| | 0/9912422 [00:00<?, ?it/s]
5%|▍ | 458752/9912422 [00:00<00:02, 3482578.20it/s]
10%|█ | 1015808/9912422 [00:00<00:01, 4471935.16it/s]
15%|█▍ | 1474560/9912422 [00:00<00:02, 4169836.64it/s]
20%|██ | 2031616/9912422 [00:00<00:01, 4480469.64it/s]
28%|██▊ | 2752512/9912422 [00:00<00:01, 5321824.76it/s]
35%|███▌ | 3506176/9912422 [00:00<00:01, 6023809.22it/s]
43%|████▎ | 4227072/9912422 [00:00<00:00, 6358434.40it/s]
52%|█████▏ | 5144576/9912422 [00:00<00:00, 7164429.35it/s]
60%|██████ | 5963776/9912422 [00:00<00:00, 7387665.50it/s]
68%|██████▊ | 6717440/9912422 [00:01<00:00, 6470852.62it/s]
75%|███████▍ | 7405568/9912422 [00:01<00:00, 6452695.74it/s]
82%|████████▏ | 8093696/9912422 [00:01<00:00, 6541499.27it/s]
91%|█████████ | 8978432/9912422 [00:01<00:00, 7173748.27it/s]
98%|█████████▊| 9732096/9912422 [00:01<00:00, 7134173.79it/s]
(train_lm_tune pid=31409) Extracting /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/train-images-idx3-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw
100%|██████████| 9912422/9912422 [00:01<00:00, 6301259.66it/s]
(train_lm_tune pid=31409)
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/train-labels-idx1-ubyte.gz
(train_lm_tune pid=31409) Extracting /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/train-labels-idx1-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw
(train_lm_tune pid=31409)
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 8521680.89it/s]
0%| | 0/1648877 [00:00<?, ?it/s]
8%|▊ | 131072/1648877 [00:00<00:01, 1234338.41it/s]
40%|███▉ | 655360/1648877 [00:00<00:00, 3456078.38it/s]
81%|████████▏ | 1343488/1648877 [00:00<00:00, 4960684.04it/s]
(train_lm_tune pid=31409) Extracting /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/t10k-images-idx3-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw
(train_lm_tune pid=31409)
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
(train_lm_tune pid=31409) Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 4612848.41it/s]
(train_lm_tune pid=31409) Extracting /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/data/MNIST/raw
(train_lm_tune pid=31409)
100%|██████████| 4542/4542 [00:00<00:00, 56030966.96it/s]
(train_lm_tune pid=31409) Missing logger folder: /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04/lightning_logs
(train_lm_tune pid=31409)
(train_lm_tune pid=31409) | Name | Type | Params
(train_lm_tune pid=31409) ------------------------------------------------
(train_lm_tune pid=31409) 0 | layer_1 | Linear | 25.1 K
(train_lm_tune pid=31409) 1 | layer_2 | Linear | 2.1 K
(train_lm_tune pid=31409) 2 | layer_3 | Linear | 650
(train_lm_tune pid=31409) 3 | accuracy | MulticlassAccuracy | 0
(train_lm_tune pid=31409) ------------------------------------------------
(train_lm_tune pid=31409) 27.9 K Trainable params
(train_lm_tune pid=31409) 0 Non-trainable params
(train_lm_tune pid=31409) 27.9 K Total params
(train_lm_tune pid=31409) 0.112 Total estimated model params size (MB)
(train_lm_tune pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 10 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
(train_lm_tune pid=31409) rank_zero_warn(
(train_lm_tune pid=31409) /Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 10 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
(train_lm_tune pid=31409) rank_zero_warn(
== Status ==
Current time: 2023-01-12 14:22:13 (running for 00:00:09.04)
Memory usage on this node: 14.4/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/10 CPUs, 0/0 GPUs, 0.0/44.24 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/balaji/ray_results/ray_2023-01-12-14-21-59
Number of trials: 1/1 (1 RUNNING)
+---------------------------+----------+-----------------+--------------+-----------+-----------+--------+
| Trial name | status | loc | batch_size | layer_1 | layer_2 | lr |
|---------------------------+----------+-----------------+--------------+-----------+-----------+--------|
| train_lm_tune_8aa49_00000 | RUNNING | 127.0.0.1:31409 | 32 | 32 | 64 | 0.0001 |
+---------------------------+----------+-----------------+--------------+-----------+-----------+--------+
Result for train_lm_tune_8aa49_00000:
acc: 0.8759166598320007
date: 2023-01-12_14-22-15
done: false
experiment_id: 0cf0ad0dbe5a422ebdab74e7ea9cd2d8
hostname: Balajis-MacBook-Pro-16
iterations_since_restore: 1
loss: 0.4731682538986206
node_ip: 127.0.0.1
pid: 31409
time_since_restore: 7.434058904647827
time_this_iter_s: 7.434058904647827
time_total_s: 7.434058904647827
timestamp: 1673562135
timesteps_since_restore: 0
training_iteration: 1
trial_id: 8aa49_00000
warmup_time: 0.0021829605102539062
(train_lm_tune pid=31409) `Trainer.fit` stopped: `max_epochs=1` reached.
Trial train_lm_tune_8aa49_00000 completed.
2023-01-12 14:22:15,863 WARNING trial_runner.py:295 -- Experiment checkpoint syncing has been triggered multiple times in the last 30.0 seconds. A sync will be triggered whenever a trial has checkpointed more than `num_to_keep` times since last sync or if 300 seconds have passed since last sync. If you have set `num_to_keep` in your `CheckpointConfig`, consider increasing the checkpoint frequency or keeping more checkpoints. You can supress this warning by changing the `TUNE_WARN_EXCESSIVE_EXPERIMENT_CHECKPOINT_SYNC_THRESHOLD_S` environment variable.
== Status ==
Current time: 2023-01-12 14:22:30 (running for 00:00:26.41)
Memory usage on this node: 14.5/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/10 CPUs, 0/0 GPUs, 0.0/44.24 GiB heap, 0.0/2.0 GiB objects
Current best trial: 8aa49_00000 with loss=0.4731682538986206 and parameters={'layer_1': 32, 'layer_2': 64, 'lr': 0.0001, 'batch_size': 32}
Result logdir: /Users/balaji/ray_results/ray_2023-01-12-14-21-59
Number of trials: 1/1 (1 TERMINATED)
+---------------------------+------------+-----------------+--------------+-----------+-----------+--------+--------+------------------+----------+----------+
| Trial name | status | loc | batch_size | layer_1 | layer_2 | lr | iter | total time (s) | loss | acc |
|---------------------------+------------+-----------------+--------------+-----------+-----------+--------+--------+------------------+----------+----------|
| train_lm_tune_8aa49_00000 | TERMINATED | 127.0.0.1:31409 | 32 | 32 | 64 | 0.0001 | 1 | 7.43406 | 0.473168 | 0.875917 |
+---------------------------+------------+-----------------+--------------+-----------+-----------+--------+--------+------------------+----------+----------+
2023-01-12 14:22:30,484 INFO tune.py:774 -- Total run time: 26.78 seconds (11.81 seconds for the tuning loop).
Best hyperparameters found were: {'layer_1': 32, 'layer_2': 64, 'lr': 0.0001, 'batch_size': 32}
Best model found were: /Users/balaji/ray_results/ray_2023-01-12-14-21-59/train_lm_tune_8aa49_00000_0_batch_size=32,layer_1=32,layer_2=64,lr=0.0001_2023-01-12_14-22-04
(train_lm_tune pid=31409) /opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
(train_lm_tune pid=31409) warnings.warn('resource_tracker: There appear to be %d '
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment