Skip to content

Instantly share code, notes, and snippets.

@hamelsmu
Created May 14, 2019 21:09
Show Gist options
  • Save hamelsmu/0aad67c7719a52b9868c5c2e59af0dda to your computer and use it in GitHub Desktop.
Save hamelsmu/0aad67c7719a52b9868c5c2e59af0dda to your computer and use it in GitHub Desktop.
For troubleshooting wandb
from fastai.text import TextLMDataBunch as lmdb
from fastai.text.transform import Tokenizer
import pandas as pd
from pathlib import Path
small_df = pd.read_csv('https://storage.googleapis.com/issue_label_bot/pre_processed_data/processed_part0000.csv').head(1000)
stokenizer = Tokenizer(pre_rules=[pass_through], n_cpus=30)
spath = Path('lang_model_test/')
sdata_lm = lmdb.from_df(path=spath,
train_df=small_df,
valid_df=small_df,
text_cols='text',
tokenizer=stokenizer)
slearn = language_model_learner(data=sdata_lm,
arch=AWD_LSTM,
drop_mult=.5,
pretrained=False)
wandb.init()
escb = EarlyStoppingCallback(learn=slearn, patience=5)
smcb = SaveModelCallback(learn=slearn)
rpcb = ReduceLROnPlateauCallback(learn=slearn, patience=3)
sgcb = ShowGraph(learn=slearn)
wandcb = WandbCallback(learn=slearn, log='all', save_model=True, monitor='valid_loss')
scallbacks = [escb, smcb, rpcb, sgcb, wandcb]
slearn.fit_one_cycle(cyc_len=1,
max_lr=1e-2,
tot_epochs=10,
callbacks=scallbacks)
@borisdayma
Copy link

I did a few modifications to reproduce this code (some values were undefined, see comments):

from fastai.text import TextLMDataBunch as lmdb
from fastai.text.transform import Tokenizer
import pandas as pd
from pathlib import Path

# Added imports
from fastai.text import *
from fastai import *
from fastai.callbacks import *
import wandb
from wandb.fastai import WandbCallback

small_df = pd.read_csv('https://storage.googleapis.com/issue_label_bot/pre_processed_data/processed_part0000.csv').head(1000)

stokenizer = Tokenizer(n_cpus=30)   # Removed pass_through as it is undefined

spath = Path('lang_model_test/')

sdata_lm = lmdb.from_df(path=spath,
                        train_df=small_df,
                        valid_df=small_df,
                        text_cols='text',
                        tokenizer=stokenizer)

slearn = language_model_learner(data=sdata_lm,
                                arch=AWD_LSTM,
                                drop_mult=.5,
                                pretrained=False)

wandb.init()

escb = EarlyStoppingCallback(learn=slearn, patience=5)
smcb = SaveModelCallback(learn=slearn)
rpcb = ReduceLROnPlateauCallback(learn=slearn, patience=3)
sgcb = ShowGraph(learn=slearn)
wandcb = WandbCallback(learn=slearn, log='all', save_model=True, monitor='val_loss')   # "valid_loss" replaced with "val_loss"

scallbacks = [escb, smcb, rpcb, sgcb, wandcb]

slearn.fit_one_cycle(cyc_len=1,
                     max_lr=1e-2,
                     tot_epochs=10,
                     callbacks=scallbacks)

Also it seems that this commit solved the issue: wandb/wandb@2f6ec39

You can clone the repo and install from it until the new version is released and the issue should be fixed.

@lukas @raubitsj In this case (variant of RNN) you would have to do graph.criterion = output[0][0].grad_fn because output[0] is a list. At the present moment you will not extract the criterion, not sure if it's a problem or not.

@borisdayma
Copy link

@lukas @raubitsj Here is the modification that I think should be made in this case to extract the criterion
borisdayma/client@d69319f

However it seems it does not affect anything (not even logged model) so I'm not exactly sure what's the purpose of graph.criterion

@raubitsj
Copy link

criterion controls code that is effectively commented out currently. The change you provided would be helpful when we enable that feature. If you submit this as a PR it would be great to have a test for it otherwise we can capture this in an issue and address it later when the code is active.

@borisdayma
Copy link

Thanks I just created wandb/wandb#355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment