-
-
Save Lyken17/91b81526a8245a028d4f85ccc9191884 to your computer and use it in GitHub Desktop.
Great work! In the paper, label variable is also trainable, however, label variable is fixed as the groundtruth in the code. It looks like that the earlier version requires more iterations to converge. Could you also give hints or code how to train when data and label are both trainable?
Hi, great work! Is there any code for batched data?
I have experimented on ResNet base the code shared on https://github.com/mit-han-lab/dlg.And I haved following the paper saying: remove stride and replace activation function ReLu as Sigmoid, but I found the DLG way fail on ResNet. Cloud you please show me the code on ResNet?
Or anyone have experimented on ResNet, please show me how you done?
你好,我用这套代码,每次跑了几十个iter之后loss突然就炸了,请问你遇到过吗? 以下是一次实验记录:
0 117.4059 10 4.3706 20 0.2128 30 0.0191 40 0.0050 50 0.0022 60 0.0030 70 0.0008 80 0.0004 90 213.8976 100 213.8976 110 213.8976 120 213.8976 130 213.8976 140 213.8976 150 213.8976 160 213.8976 170 213.8976 180 213.8976 190 213.8976 200 213.8976 210 213.8976 220 213.8976 230 213.8976 240 213.8976 250 213.8976 260 213.8976 270 213.8976 280 213.8976 290 213.8976
@815961618 不知道你解决了没 我把手动种子torch.manual_seed注释了就好了
I have experimented on ResNet base the code shared on https://github.com/mit-han-lab/dlg.And I haved following the paper saying: remove stride and replace activation function ReLu as Sigmoid, but I found the DLG way fail on ResNet. Cloud you please show me the code on ResNet? Or anyone have experimented on ResNet, please show me how you done?
Have you resolved this? I'm also getting stuck here @Harxis @Lyken17
@815961618 I think that is an issue for L-BFGS. I am now testing adam (which is more robust). Will update a new version soon.