- First attempt at using Torch for some type of "deep" learning
- Take advantage of modal to access serverless Python compute, including GPUs
I find the Torch for R documentation excellent in terms of explaining cocnepts. So my goal was to follow their getting started guide, but using Python instead of R.
Because I don't have ready access to a GPU, and I wanted my work to be fast despite my slow internet, I decided to use Modal for execution.
The structure of the code is a Modal app, which consists of:
- specifying an image (where python deps are specified)
- specifying an entrypoint which will call various modal functions:
- functions we want to evaluate remotely
- a modal volume, which is a persistent disk
The intent of the code is to:
- download a bunch of images of x/y scatterplots from Kaggle along with some metadata about the true correlation shown in each image
- cache the downloaded images in a modal volume
- create a simple CNN following the R tutorial
- train the CNN on both a CPU and GPU
- running on the CPU: took ~300s
- running on the GPU: took ~40s
- the loss function was always jumping everywhere, which isn't suprising since the model structure is garbage ... but at least it runs!
(.venv) .venvlopp@Seans-MacBook-Pro torch_experiments % modal run r_get_started_to_py.py
✓ Initialized. View run at https://modal.com/apps/slopp/main/ap-ORU5nv0uFhIhCDqs4AHtRW
✓ Created objects.
├── 🔨 Created mount /Users/lopp/Projects/torch_experiments/r_get_started_to_py.py
├── 🔨 Created mount PythonPackage:_remote_module_non_scriptable
├── 🔨 Created function get_dataset.
├── 🔨 Created function unzip_train.
├── 🔨 Created function unzip.
└── 🔨 Created function train_my_cnn.
Getting started
Using device: cuda
Test model output with no training: tensor([[-0.3558, 0.1729, -0.2098, ..., -0.2338, 0.0097, 0.2568],
[-0.3577, 0.2050, -0.2150, ..., -0.2148, -0.0254, 0.1179],
[-0.3023, 0.2556, -0.1702, ..., -0.2177, 0.0264, 0.1928],
...,
[-0.3042, 0.1326, -0.1455, ..., -0.2130, 0.0060, 0.1976],
[-0.3430, 0.2401, -0.1778, ..., -0.1850, -0.0759, 0.1300],
[-0.3457, 0.1536, -0.1847, ..., -0.1420, -0.0363, 0.2074]],
device='cuda:0', grad_fn=)
True values: tensor([-0.4578, -0.5231, -0.1790, 0.2515, 0.3540, 0.8361, -0.3141, -0.1900,
-0.0079, 0.5127, -0.6961, -0.6385, 0.3890, 0.7433, 0.3033, 0.5869,
0.4751, 0.6581, -0.2687, 0.1978, 0.4256, 0.2940, -0.3028, -0.4231,
0.0289, 0.1835, 0.6971, 0.1370, 0.6549, 0.2446, -0.2694, -0.4043,
-0.6028, 0.5621, -0.3291, 0.6700, -0.2683, -0.0634, 0.0409, 0.2124,
-0.2689, 0.4382, 0.1959, -0.5694, 0.4116, -0.5908, 0.2020, -0.5625,
-0.1433, -0.5540, 0.3926, -0.2108, 0.8185, -0.5357, 0.2938, 0.4767,
0.3512, -0.1851, -0.5930, 0.0535, 0.4559, 0.4891, 0.3569, 0.1135],
device='cuda:0')
MSE with no training: 0.24444007873535156
BATCH: 1
Current loss is: 0.24444007873535156
BATCH: 2
Current loss is: 21744.98828125
BATCH: 3
Current loss is: 1392.033447265625
BATCH: 4
Current loss is: 4206.7177734375
BATCH: 5
Current loss is: 11732.6806640625
BATCH: 6
Current loss is: 9478.251953125
BATCH: 7
Current loss is: 3385.68115234375
BATCH: 8
Current loss is: 88.81295776367188
BATCH: 9
Current loss is: 1193.997802734375
BATCH: 10
Current loss is: 4003.126220703125
BATCH: 11
Current loss is: 5196.607421875
BATCH: 12
Current loss is: 4032.54931640625
BATCH: 13
Current loss is: 1814.0806884765625
BATCH: 14
Current loss is: 245.66452026367188
BATCH: 15
Current loss is: 123.73826599121094
BATCH: 16
Current loss is: 1057.9072265625
BATCH: 17
Current loss is: 2032.654541015625
BATCH: 18
Current loss is: 2302.048828125
BATCH: 19
Current loss is: 1725.1231689453125
BATCH: 20
Current loss is: 806.3592529296875
BATCH: 21
Current loss is: 139.1751251220703
BATCH: 22
Current loss is: 26.688827514648438
BATCH: 23
Current loss is: 370.3856201171875
BATCH: 24
Current loss is: 812.04833984375
BATCH: 25
Current loss is: 1035.07373046875
BATCH: 26
Current loss is: 882.0335693359375
BATCH: 27
Current loss is: 512.5167236328125
BATCH: 28
Current loss is: 161.7069091796875
BATCH: 29
Current loss is: 3.03468918800354
Took 20.191120147705078 seconds
BATCH: 30
Current loss is: 79.19509887695312
MSE at start: 0.24444007873535156
MSE at end: 283.2290344238281