Skip to content

Instantly share code, notes, and snippets.

@taylanbil
Created August 30, 2019 05:48
Show Gist options
  • Save taylanbil/708c5da353bd98e59cabd22a517c7485 to your computer and use it in GitHub Desktop.
Save taylanbil/708c5da353bd98e59cabd22a517c7485 to your computer and use it in GitHub Desktop.
This file has been truncated, but you can view the full file.
Mon Aug 26 16:15:47 UTC 2019
#!/bin/bash
batch_size=512
n_words=64
data_path=/home/taylanbil/data/wmt18_en_de_bpej32k
#data_path=/home/taylanbil/data/dummy
tensors_dir=/home/taylanbil/tensors
taskname=fairseq_transformer
#conda activate pytorch
#pkill -9 python
TPU_IP_ADDRESS=10.1.2.2 # nightly
#TPU_IP_ADDRESS=10.1.4.2 # nightly
#export XLA_USE_32BIT_LONG=1
#export XLA_IR_DEBUG=1
#export XLA_HLO_DEBUG=1
#export GET_TENSORS_OPBYOP=1
#export SYNC_TENSORS_OPBYOP=1
#export XLA_SAVE_TENSORS_FILE=$tensors_dir/${taskname}_tensors_hlo.txt
#export XLA_SAVE_TENSORS_FMT=hlo
#export TRIM_GRAPH_SIZE=50000
#export XLA_SYNC_WAIT=1
export XLA_USE_BF16=1
export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470"
other_flags="
--disable-validation \
--encoder-embed-dim=128 \
--decoder-embed-dim=128 \
--max-tokens=4096 \ # has no effect w/ TPUS
--curriculum=4 \
--num-workers=16 \
--decoder-normalize-before \
"
#LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 python tpu-examples/fairseq_train_tpu.py \
python tpu-examples/fairseq_train_tpu.py \
$data_path \
--arch=transformer_vaswani_wmt_en_de_big \
--max-sentences=$batch_size \
--max-sentences-valid=$batch_size \
--max-source-positions=$n_words \
--max-target-positions=$n_words \
--required-batch-size-multiple=$batch_size \
--no-save \
--attention-dropout=0.1 \
--no-progress-bar \
--criterion=label_smoothed_cross_entropy \
--log-interval=100 \
--source-lang=en \
--lr-scheduler=inverse_sqrt \
--min-lr 1e-09 \
--skip-invalid-size-inputs-valid-test \
--target-lang=de \
--label-smoothing=0.1 \
--update-freq=1 \
--optimizer adam \
--adam-betas '(0.9, 0.98)' \
--warmup-init-lr 1e-07 \
--lr 0.0005 \
--warmup-updates 4000 \
--share-all-embeddings \
--dropout 0.3 \
--weight-decay 0.0 \
--valid-subset=valid \
--max-epoch=50 \
--num_cores=8 \
--metrics_debug \
--pad_to_length=$n_words \
--log_steps=100
--------------
nohup: ignoring input
2019-08-26 16:15:47.750892: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-08-26 16:15:47.750929: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-08-26 16:15:47.750936: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-08-26 16:15:47.750943: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-08-26 16:15:47.750948: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-08-26 16:15:47.750954: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-08-26 16:15:47.750959: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-08-26 16:15:47.750965: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-08-26 16:15:47.750971: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:196] XRT device (LOCAL) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-08-26 16:15:47.750995: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:200] Worker grpc://10.1.2.2:8470 for /job:tpu_worker/replica:0/task:0
2019-08-26 16:15:47.751001: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:204] XRT default device: TPU:0
2019-08-26 16:15:47.753784: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1086] Configuring TPU for worker tpu_worker:0 at grpc://10.1.2.2:8470
2019-08-26 16:15:50.353795: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1102] TPU topology: mesh_shape: 2
mesh_shape: 2
mesh_shape: 2
num_tasks: 1
num_tpu_devices_per_task: 8
device_coordinates: 0
device_coordinates: 0
device_coordinates: 0
device_coordinates: 0
device_coordinates: 0
device_coordinates: 1
device_coordinates: 0
device_coordinates: 1
device_coordinates: 0
device_coordinates: 0
device_coordinates: 1
device_coordinates: 1
device_coordinates: 1
device_coordinates: 0
device_coordinates: 0
device_coordinates: 1
device_coordinates: 0
device_coordinates: 1
device_coordinates: 1
device_coordinates: 1
device_coordinates: 0
device_coordinates: 1
device_coordinates: 1
device_coordinates: 1
2019-08-26 16:15:54.074240: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
batch sizes: [1024, 512, 256]
| [en] dictionary: 35662 types
| [de] dictionary: 35662 types
| /home/taylanbil/data/wmt18_en_de_bpej32k valid en-de 52385 examples
TransformerModel(
(encoder): TransformerEncoder(
(embed_tokens): Embedding(35662, 1024, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
(decoder): TransformerDecoder(
(embed_tokens): Embedding(35662, 1024, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
)
| model transformer_vaswani_wmt_en_de_big, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 212875264 (num. trained: 212875264)
| no existing checkpoint found checkpoints/checkpoint_last.pt
| loading train data for epoch 0
| /home/taylanbil/data/wmt18_en_de_bpej32k train en-de 5186259 examples
| WARNING: 240829 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[1422704, 2718830, 2897878, 3673048, 2016896, 2200333, 3886976, 2097242, 3124502, 2871279]
Epoch 1 begin 2019-08-26 16:16:39.737359
training torch.Size([512, 32])/ 2019-08-26 16:17:49.480330, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 16:17:49.481539, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 16:17:49.482289, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 16:17:49.483518, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 16:17:49.493231, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 16:17:49.518330, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 16:17:49.528034, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 16:17:49.539031, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=1, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:12:08.504742, device xla:4, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:12:08.509413, device xla:6, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:12:08.516673, device xla:1, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:12:08.525526, device xla:3, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:12:08.546323, device xla:7, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:12:08.551893, device xla:5, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:12:08.536538, device xla:8, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:12:08.553890, device xla:2, step 100, Rate=3.14, Global Rate=15.67, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:14:54.971653, device xla:6, step 200, Rate=64.03, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:14:54.975935, device xla:3, step 200, Rate=64.03, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:14:54.990468, device xla:1, step 200, Rate=64.02, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:14:55.032469, device xla:5, step 200, Rate=64.02, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:14:55.014665, device xla:7, step 200, Rate=64.03, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:14:55.005190, device xla:8, step 200, Rate=64.03, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:14:55.040356, device xla:2, step 200, Rate=64.02, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:14:54.982685, device xla:4, step 200, Rate=64.02, Global Rate=29.82, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:17:39.340357, device xla:4, step 300, Rate=113.52, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:17:39.345317, device xla:8, step 300, Rate=113.53, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:17:39.363743, device xla:3, step 300, Rate=113.52, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:17:39.386191, device xla:5, step 300, Rate=113.52, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:17:39.388780, device xla:6, step 300, Rate=113.50, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:17:39.371545, device xla:2, step 300, Rate=113.53, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:17:39.354146, device xla:1, step 300, Rate=113.52, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:17:39.379501, device xla:7, step 300, Rate=113.52, Global Rate=42.69, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:20:24.649473, device xla:6, step 400, Rate=152.77, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:20:24.654163, device xla:4, step 400, Rate=152.76, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:20:24.659452, device xla:5, step 400, Rate=152.78, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:20:24.661693, device xla:8, step 400, Rate=152.77, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:20:24.685030, device xla:2, step 400, Rate=152.77, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:20:24.693080, device xla:1, step 400, Rate=152.75, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:20:24.703817, device xla:3, step 400, Rate=152.75, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:20:24.670289, device xla:7, step 400, Rate=152.77, Global Rate=54.42, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:23:09.515577, device xla:4, step 500, Rate=184.32, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:23:09.499086, device xla:2, step 500, Rate=184.34, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:23:09.491119, device xla:3, step 500, Rate=184.34, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:23:09.507325, device xla:5, step 500, Rate=184.34, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:23:09.517875, device xla:7, step 500, Rate=184.33, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:23:09.527054, device xla:6, step 500, Rate=184.32, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:23:09.533330, device xla:8, step 500, Rate=184.32, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:23:09.549789, device xla:1, step 500, Rate=184.31, Global Rate=65.17, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:25:56.725765, device xla:1, step 600, Rate=208.70, Global Rate=75.01, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:25:56.734212, device xla:6, step 600, Rate=208.70, Global Rate=75.01, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:25:56.765430, device xla:5, step 600, Rate=208.69, Global Rate=75.01, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:25:56.740960, device xla:7, step 600, Rate=208.70, Global Rate=75.01, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:25:56.771154, device xla:2, step 600, Rate=208.69, Global Rate=75.01, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:25:56.756326, device xla:8, step 600, Rate=208.69, Global Rate=75.01, Compiles=22, _local_scalar_dense=Nonetraining torch.Size([256, 64])/ 2019-08-26 17:25:56.782535, device xla:4, step 600, Rate=208.68, Global Rate=75.01, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:25:56.792417, device xla:3, step 600, Rate=208.68, Global Rate=75.01, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:28:42.076333, device xla:4, step 700, Rate=228.89, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:28:42.081032, device xla:6, step 700, Rate=228.89, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:28:42.093565, device xla:5, step 700, Rate=228.89, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:28:42.097866, device xla:7, step 700, Rate=228.89, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:28:42.086215, device xla:3, step 700, Rate=228.89, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:28:42.103775, device xla:2, step 700, Rate=228.89, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:28:42.114874, device xla:8, step 700, Rate=228.88, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:28:42.127279, device xla:1, step 700, Rate=228.87, Global Rate=84.12, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:31:29.356570, device xla:6, step 800, Rate=244.33, Global Rate=92.51, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:31:29.361677, device xla:3, step 800, Rate=244.33, Global Rate=92.51, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:31:29.379872, device xla:4, step 800, Rate=244.32, Global Rate=92.50, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:31:29.368918, device xla:7, step 800, Rate=244.33, Global Rate=92.51, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:31:29.395467, device xla:5, step 800, Rate=244.32, Global Rate=92.50, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:31:29.413487, device xla:8, step 800, Rate=244.31, Global Rate=92.50, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:31:29.385214, device xla:1, step 800, Rate=244.32, Global Rate=92.50, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:31:29.405105, device xla:2, step 800, Rate=244.32, Global Rate=92.50, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:34:14.495298, device xla:6, step 900, Rate=257.47, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:34:14.499917, device xla:3, step 900, Rate=257.47, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:34:14.525598, device xla:1, step 900, Rate=257.47, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:34:14.517362, device xla:2, step 900, Rate=257.47, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:34:14.536132, device xla:5, step 900, Rate=257.46, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:34:14.507334, device xla:7, step 900, Rate=257.47, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:34:14.530141, device xla:4, step 900, Rate=257.46, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:34:14.539850, device xla:8, step 900, Rate=257.46, Global Rate=100.33, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:36:59.357159, device xla:6, step 1000, Rate=268.09, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:36:59.361695, device xla:4, step 1000, Rate=268.09, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:36:59.366535, device xla:7, step 1000, Rate=268.09, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:36:59.375953, device xla:2, step 1000, Rate=268.09, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:36:59.425070, device xla:5, step 1000, Rate=268.07, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:36:59.397963, device xla:1, step 1000, Rate=268.08, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:36:59.383503, device xla:3, step 1000, Rate=268.08, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:36:59.413636, device xla:8, step 1000, Rate=268.08, Global Rate=107.61, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:39:45.723206, device xla:6, step 1100, Rate=276.02, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:39:45.737107, device xla:5, step 1100, Rate=276.03, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:39:45.750126, device xla:1, step 1100, Rate=276.02, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:39:45.729444, device xla:2, step 1100, Rate=276.03, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:39:45.739079, device xla:3, step 1100, Rate=276.02, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:39:45.758025, device xla:7, step 1100, Rate=276.01, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:39:45.775976, device xla:8, step 1100, Rate=276.02, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:39:45.768642, device xla:4, step 1100, Rate=276.01, Global Rate=114.37, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:42:29.910145, device xla:6, step 1200, Rate=283.19, Global Rate=120.75, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:42:29.921806, device xla:7, step 1200, Rate=283.19, Global Rate=120.74, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:42:29.931009, device xla:5, step 1200, Rate=283.19, Global Rate=120.74, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:42:29.947070, device xla:1, step 1200, Rate=283.18, Global Rate=120.74, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:42:29.914975, device xla:3, step 1200, Rate=283.19, Global Rate=120.74, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:42:29.956730, device xla:8, step 1200, Rate=283.18, Global Rate=120.74, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:42:29.938407, device xla:2, step 1200, Rate=283.18, Global Rate=120.74, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:42:29.967074, device xla:4, step 1200, Rate=283.17, Global Rate=120.74, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:45:14.674624, device xla:6, step 1300, Rate=288.70, Global Rate=126.70, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:45:14.697145, device xla:5, step 1300, Rate=288.70, Global Rate=126.70, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:45:14.686956, device xla:2, step 1300, Rate=288.70, Global Rate=126.70, Compiles=22, _local_scalar_dense=Nonetraining torch.Size([512, 32])/ 2019-08-26 17:45:14.679525, device xla:3, step 1300, Rate=288.70, Global Rate=126.70, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:45:14.710541, device xla:7, step 1300, Rate=288.69, Global Rate=126.70, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:45:14.699402, device xla:1, step 1300, Rate=288.70, Global Rate=126.70, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:45:14.715232, device xla:8, step 1300, Rate=288.70, Global Rate=126.70, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:45:14.734289, device xla:4, step 1300, Rate=288.69, Global Rate=126.70, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:47:57.210211, device xla:6, step 1400, Rate=293.96, Global Rate=132.36, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:47:57.238392, device xla:8, step 1400, Rate=293.96, Global Rate=132.36, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:47:57.216672, device xla:5, step 1400, Rate=293.97, Global Rate=132.36, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:47:57.229404, device xla:3, step 1400, Rate=293.96, Global Rate=132.36, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:47:57.241625, device xla:4, step 1400, Rate=293.96, Global Rate=132.36, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:47:57.247026, device xla:2, step 1400, Rate=293.95, Global Rate=132.35, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:47:57.255294, device xla:7, step 1400, Rate=293.95, Global Rate=132.35, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:47:57.224077, device xla:1, step 1400, Rate=293.96, Global Rate=132.36, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:50:41.037444, device xla:6, step 1500, Rate=297.67, Global Rate=137.65, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:50:41.050330, device xla:5, step 1500, Rate=297.68, Global Rate=137.65, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:50:41.052663, device xla:2, step 1500, Rate=297.68, Global Rate=137.65, Compiles=22, _local_scalar_dense=None
training torch.Size([512, 32])/ 2019-08-26 17:50:41.095153, device xla:4, step 1500, Rate=297.66, Global Rate=137.64, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:50:41.084011, device xla:3, step 1500, Rate=297.66, Global Rate=137.64, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:50:41.041947, device xla:1, step 1500, Rate=297.68, Global Rate=137.65, Compiles=22, _local_scalar_dense=None
training torch.Size([1024, 16])/ 2019-08-26 17:50:41.101317, device xla:8, step 1500, Rate=297.66, Global Rate=137.64, Compiles=22, _local_scalar_dense=None
training torch.Size([256, 64])/ 2019-08-26 17:50:41.065773, device xla:7, step 1500, Rate=297.67, Global Rate=137.65, Compiles=22, _local_scalar_dense=None
Epoch 1 Training stats:
device xla:1
| epoch 001 | loss 4.000 | nll_loss 4.000 | ppl 16.00 | wps 2945 | ups 0 | wpb 11117.096 | bsz 416.934 | num_updates 1508 | lr 0.000188562 | gnorm 1.062 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 3564
device xla:2
| epoch 001 | loss 4.000 | nll_loss 4.000 | ppl 16.00 | wps 2956 | ups 0 | wpb 11161.388 | bsz 410.992 | num_updates 1508 | lr 0.000188562 | gnorm 1.016 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 4435
device xla:3
| epoch 001 | loss 3.969 | nll_loss 3.969 | ppl 15.66 | wps 2969 | ups 0 | wpb 11210.055 | bsz 410.313 | num_updates 1508 | lr 0.000188562 | gnorm 1.062 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 4195
device xla:4
| epoch 001 | loss 4.000 | nll_loss 4.000 | ppl 16.00 | wps 2941 | ups 0 | wpb 11103.859 | bsz 413.708 | num_updates 1508 | lr 0.000188562 | gnorm 1.039 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 4438
device xla:5
| epoch 001 | loss 4.000 | nll_loss 4.000 | ppl 16.00 | wps 2957 | ups 0 | wpb 11163.219 | bsz 408.106 | num_updates 1508 | lr 0.000188562 | gnorm 1.031 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 4432
device xla:6
| epoch 001 | loss 4.000 | nll_loss 4.000 | ppl 16.00 | wps 2950 | ups 0 | wpb 11135.934 | bsz 402.504 | num_updates 1508 | lr 0.000188562 | gnorm 1.031 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 4357
device xla:7
| epoch 001 | loss 4.000 | nll_loss 4.000 | ppl 16.00 | wps 2952 | ups 0 | wpb 11146.327 | bsz 404.032 | num_updates 1508 | lr 0.000188562 | gnorm 1.023 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 3223
device xla:8
| epoch 001 | loss 4.000 | nll_loss 4.000 | ppl 16.00 | wps 2949 | ups 0 | wpb 11133.156 | bsz 412.011 | num_updates 1508 | lr 0.000188562 | gnorm 1.055 | clip 0.000 | oom 0.000 | wall 5693 | train_wall 3322
Epoch 1 Tracker Rates:
Rate=294.04, Global Rate=138.02
Rate=294.08, Global Rate=138.02
Rate=294.19, Global Rate=138.02
Rate=294.23, Global Rate=138.02
Rate=294.07, Global Rate=138.02
Rate=294.02, Global Rate=138.02
Rate=294.13, Global Rate=138.02
Rate=294.26, Global Rate=138.02
Epoch 1 end 2019-08-26 17:50:55.696891
Metric: CompileTime
TotalSamples: 22
Counter: 06h09m28s368ms679.254us
ValueRate: 03s090ms191.357us / second
Rate: 0.00738287 / second
Percentiles: 1%=080ms169.631us; 5%=02m04s913ms918.706us; 10%=02m04s918ms710.532us; 20%=02m04s949ms359.097us; 50%=06m09s474ms849.253us; 80%=07m09s614ms115.550us; 90%=22m13s835ms112.498us; 95%=01h00m12s899ms207.004us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 12072
Counter: 21h02m19s145ms981.105us
ValueRate: 06s058ms741.626us / second
Rate: 4.92409 / second
Percentiles: 1%=01s072ms216.005us; 5%=01s167ms466.596us; 10%=01s171ms827.205us; 20%=01s176ms984.740us; 50%=01s268ms540.426us; 80%=01s288ms665.326us; 90%=01s291ms358.962us; 95%=01s294ms525.744us; 99%=01s300ms730.538us
Metric: InboundData
TotalSamples: 40
Counter: 80.00B
ValueRate: 238.62B / second
Rate: 119.309 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 49940
Counter: 7.67GB
ValueRate: 495.46KB / second
Rate: 20.2991 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 11
Counter: 02h07m27s731ms604.816us
ValueRate: 01s158ms323.963us / second
Rate: 0.00383006 / second
Percentiles: 1%=039ms60.340us; 5%=039ms60.340us; 10%=041ms409.780us; 20%=047ms263.570us; 50%=072ms789.361us; 80%=17m24s857ms50.287us; 90%=17m24s875ms983.828us; 95%=20m20s733ms697.196us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 110580
Counter: 03h13m18s409ms864.283us
ValueRate: 369ms945.267us / second
Rate: 45.3874 / second
Percentiles: 1%=450.980us; 5%=493.944us; 10%=544.581us; 20%=615.148us; 50%=828.008us; 80%=002ms340.339us; 90%=011ms273.118us; 95%=023ms811.218us; 99%=046ms204.139us
Metric: TransferFromServerTime
TotalSamples: 40
Counter: 872ms274.519us
ValueRate: 03s602ms777.571us / second
Rate: 119.31 / second
Percentiles: 1%=750.829us; 5%=781.647us; 10%=801.396us; 20%=866.352us; 50%=001ms413.528us; 80%=063ms911.629us; 90%=069ms311.171us; 95%=071ms554.678us; 99%=073ms960.045us
Metric: TransferToServerTime
TotalSamples: 49940
Counter: 19h07m30s506ms723.193us
ValueRate: 05s053ms744.689us / second
Rate: 20.7326 / second
Percentiles: 1%=001ms63.894us; 5%=001ms209.028us; 10%=001ms273.948us; 20%=001ms411.516us; 50%=002ms143.822us; 80%=899ms691.742us; 90%=978ms221.067us; 95%=01s043ms9.282us; 99%=01s088ms226.441us
Counter: CachedSyncTensors
Value: 12050
Counter: CreateCompileHandles
Value: 22
Counter: CreateDataHandles
Value: 8954065
Counter: CreateXlaTensor
Value: 58276010
Counter: DestroyCompileHandles
Value: 11
Counter: DestroyDataHandles
Value: 8947081
Counter: DestroyXlaTensor
Value: 58270050
Counter: ReleaseCompileHandles
Value: 11
Counter: ReleaseDataHandles
Value: 8947081
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 22
Counter: XRTAllocateFromTensor_Empty
Value: 18037
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 40
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 17:51:00.706111, device xla:5, step 0, Compiles=22, _local_scalar_dense=40
validation/ 2019-08-26 17:51:00.709636, device xla:3, step 0, Compiles=22, _local_scalar_dense=40
validation/ 2019-08-26 17:51:00.714751, device xla:6, step 0, Compiles=22, _local_scalar_dense=40
validation/ 2019-08-26 17:51:00.723280, device xla:8, step 0, Compiles=22, _local_scalar_dense=40
validation/ 2019-08-26 17:51:00.733944, device xla:2, step 0, Compiles=22, _local_scalar_dense=40
validation/ 2019-08-26 17:51:00.735600, device xla:4, step 0, Compiles=22, _local_scalar_dense=40
validation/ 2019-08-26 17:51:00.738992, device xla:1, step 0, Compiles=22, _local_scalar_dense=40
validation/ 2019-08-26 17:51:00.743811, device xla:7, step 0, Compiles=22, _local_scalar_dense=40
validation stats on subset "valid" - 2019-08-26 18:05:44.285475
| epoch 001 | valid on 'valid' subset | loss 8.562 | nll_loss 7.438 | ppl 173.34 | num_updates 1508
| epoch 001 | valid on 'valid' subset | loss 8.688 | nll_loss 7.531 | ppl 184.98 | num_updates 1508
| epoch 001 | valid on 'valid' subset | loss 8.688 | nll_loss 7.531 | ppl 184.98 | num_updates 1508
| epoch 001 | valid on 'valid' subset | loss 8.688 | nll_loss 7.531 | ppl 184.98 | num_updates 1508
| epoch 001 | valid on 'valid' subset | loss 8.625 | nll_loss 7.500 | ppl 181.02 | num_updates 1508
| epoch 001 | valid on 'valid' subset | loss 8.688 | nll_loss 7.469 | ppl 177.14 | num_updates 1508
| epoch 001 | valid on 'valid' subset | loss 8.625 | nll_loss 7.531 | ppl 184.98 | num_updates 1508
| epoch 001 | valid on 'valid' subset | loss 8.625 | nll_loss 7.625 | ppl 197.40 | num_updates 1508
old learning rate: 1e-07
new learning rate: 0.00018856230000000002
Metric: CompileTime
TotalSamples: 52
Counter: 10h19m13s151ms937.560us
ValueRate: 02s186ms865.370us / second
Rate: 0.00805384 / second
Percentiles: 1%=080ms169.631us; 5%=28s580ms834.984us; 10%=28s171ms150.865us; 20%=02m31s104ms293.644us; 50%=03m40s716ms790.312us; 80%=06m09s474ms849.253us; 90%=07m09s613ms812.856us; 95%=22m13s835ms112.498us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 12192
Counter: 23h14m19s673ms76.935us
ValueRate: 03s051ms941.739us / second
Rate: 0.954869 / second
Percentiles: 1%=377ms63.514us; 5%=01s162ms483.160us; 10%=01s169ms594.126us; 20%=01s175ms979.665us; 50%=01s268ms540.426us; 80%=01s290ms590.005us; 90%=01s296ms706.457us; 95%=04s382ms44.494us; 99%=32s295ms92.584us
Metric: InboundData
TotalSamples: 64
Counter: 128.00B
ValueRate: 0.14B / second
Rate: 0.0719799 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 50180
Counter: 7.71GB
ValueRate: 50.50KB / second
Rate: 1.10366 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 111231
Counter: 04h14m24s653ms981.854us
ValueRate: 02s680ms514.848us / second
Rate: 1.14136 / second
Percentiles: 1%=436.748us; 5%=491.913us; 10%=526.902us; 20%=578.040us; 50%=768.648us; 80%=001ms291.052us; 90%=035ms774.511us; 95%=04s719ms953.775us; 99%=32s171ms6.430us
Metric: TransferFromServerTime
TotalSamples: 64
Counter: 02s532ms797.434us
ValueRate: 002ms722.792us / second
Rate: 0.0719799 / second
Percentiles: 1%=750.829us; 5%=794.296us; 10%=815.151us; 20%=924.009us; 50%=002ms699.847us; 80%=056ms35.594us; 90%=063ms320.584us; 95%=069ms311.171us; 99%=073ms960.045us
Metric: TransferToServerTime
TotalSamples: 50180
Counter: 21h19m11s407ms969.077us
ValueRate: 03s501ms788.749us / second
Rate: 1.10366 / second
Percentiles: 1%=001ms75.060us; 5%=001ms223.414us; 10%=001ms309.826us; 20%=001ms460.937us; 50%=002ms78.980us; 80%=920ms866.679us; 90%=01s043ms9.282us; 95%=04s220ms565.300us; 99%=32s182ms189.020us
Counter: CachedSyncTensors
Value: 12140
Counter: CreateCompileHandles
Value: 52
Counter: CreateDataHandles
Value: 8955673
Counter: CreateXlaTensor
Value: 58410826
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 8948641
Counter: DestroyXlaTensor
Value: 58404818
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 8948641
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 52
Counter: XRTAllocateFromTensor_Empty
Value: 18037
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 64
Epoch 2 begin 2019-08-26 18:05:44.498880
training torch.Size([256, 64])/ 2019-08-26 18:05:54.902679, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:05:54.941314, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:05:54.968464, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:05:55.054895, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:05:55.238419, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:05:55.267267, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:05:55.286399, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:05:55.475336, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:08:47.169225, device xla:6, step 100, Rate=59.57, Global Rate=286.84, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:08:47.173632, device xla:5, step 100, Rate=59.56, Global Rate=286.84, Compiles=52, _local_scalar_dense=64
training torch.Size([1024, 16])/ 2019-08-26 18:08:47.205832, device xla:2, step 100, Rate=59.44, Global Rate=286.78, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:08:47.217577, device xla:8, step 100, Rate=59.62, Global Rate=286.77, Compiles=52, _local_scalar_dense=64
training torch.Size([1024, 16])/ 2019-08-26 18:08:47.179875, device xla:7, step 100, Rate=59.57, Global Rate=286.83, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:08:47.235649, device xla:1, step 100, Rate=59.42, Global Rate=286.73, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:08:47.227206, device xla:4, step 100, Rate=59.48, Global Rate=286.75, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:08:47.190042, device xla:3, step 100, Rate=59.46, Global Rate=286.81, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:11:33.301421, device xla:4, step 200, Rate=109.24, Global Rate=297.13, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:11:33.329662, device xla:5, step 200, Rate=109.27, Global Rate=297.11, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:11:33.306036, device xla:1, step 200, Rate=109.20, Global Rate=297.13, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:11:33.314193, device xla:8, step 200, Rate=109.35, Global Rate=297.12, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:11:33.351583, device xla:2, step 200, Rate=109.19, Global Rate=297.09, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:11:33.321619, device xla:6, step 200, Rate=109.29, Global Rate=297.12, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:11:33.333772, device xla:3, step 200, Rate=109.20, Global Rate=297.10, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:11:33.360769, device xla:7, step 200, Rate=109.28, Global Rate=297.08, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:14:18.349070, device xla:6, step 300, Rate=149.48, Global Rate=301.37, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:14:18.353725, device xla:4, step 300, Rate=149.43, Global Rate=301.37, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:14:18.365568, device xla:7, step 300, Rate=149.48, Global Rate=301.36, Compiles=52, _local_scalar_dense=64
training torch.Size([1024, 16])/ 2019-08-26 18:14:18.374349, device xla:3, step 300, Rate=149.41, Global Rate=301.35, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:14:18.393207, device xla:2, step 300, Rate=149.39, Global Rate=301.34, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:14:18.358607, device xla:8, step 300, Rate=149.52, Global Rate=301.36, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:14:18.401180, device xla:1, step 300, Rate=149.38, Global Rate=301.34, Compiles=52, _local_scalar_dense=64training torch.Size([512, 32])/ 2019-08-26 18:14:18.408740, device xla:5, step 300, Rate=149.45, Global Rate=301.33, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:17:02.856998, device xla:6, step 400, Rate=181.83, Global Rate=303.78, Compiles=52, _local_scalar_dense=64
training torch.Size([1024, 16])/ 2019-08-26 18:17:02.863369, device xla:4, step 400, Rate=181.79, Global Rate=303.77, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:17:02.869240, device xla:5, step 400, Rate=181.82, Global Rate=303.77, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:17:02.882836, device xla:8, step 400, Rate=181.86, Global Rate=303.76, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:17:02.896891, device xla:3, step 400, Rate=181.76, Global Rate=303.76, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:17:02.906164, device xla:7, step 400, Rate=181.82, Global Rate=303.75, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:17:02.874172, device xla:1, step 400, Rate=181.77, Global Rate=303.77, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:17:02.922075, device xla:2, step 400, Rate=181.75, Global Rate=303.75, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:19:49.287353, device xla:6, step 500, Rate=206.99, Global Rate=304.54, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:19:49.291832, device xla:4, step 500, Rate=206.96, Global Rate=304.54, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:19:49.296293, device xla:2, step 500, Rate=206.95, Global Rate=304.54, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:19:49.312298, device xla:5, step 500, Rate=206.98, Global Rate=304.53, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:19:49.330436, device xla:7, step 500, Rate=206.98, Global Rate=304.52, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:19:49.303459, device xla:1, step 500, Rate=206.94, Global Rate=304.53, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:19:49.342872, device xla:3, step 500, Rate=206.93, Global Rate=304.52, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:19:49.320742, device xla:8, step 500, Rate=207.01, Global Rate=304.53, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:22:34.463657, device xla:6, step 600, Rate=227.59, Global Rate=305.43, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:22:34.492273, device xla:1, step 600, Rate=227.54, Global Rate=305.42, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:22:34.468334, device xla:8, step 600, Rate=227.61, Global Rate=305.43, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:22:34.475277, device xla:2, step 600, Rate=227.55, Global Rate=305.43, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:22:34.482647, device xla:3, step 600, Rate=227.55, Global Rate=305.43, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:22:34.518253, device xla:4, step 600, Rate=227.54, Global Rate=305.42, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:22:34.501856, device xla:7, step 600, Rate=227.58, Global Rate=305.42, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:22:34.512276, device xla:5, step 600, Rate=227.57, Global Rate=305.42, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:25:20.780145, device xla:4, step 700, Rate=243.63, Global Rate=305.77, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:25:20.786359, device xla:7, step 700, Rate=243.65, Global Rate=305.77, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:25:20.812191, device xla:8, step 700, Rate=243.65, Global Rate=305.77, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:25:20.828897, device xla:5, step 700, Rate=243.63, Global Rate=305.76, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:25:20.806163, device xla:3, step 700, Rate=243.61, Global Rate=305.77, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:25:20.814907, device xla:2, step 700, Rate=243.60, Global Rate=305.77, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:25:20.796401, device xla:1, step 700, Rate=243.61, Global Rate=305.77, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:25:20.823534, device xla:6, step 700, Rate=243.62, Global Rate=305.76, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:28:06.941084, device xla:5, step 800, Rate=256.55, Global Rate=306.07, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:28:06.972756, device xla:4, step 800, Rate=256.52, Global Rate=306.06, Compiles=52, _local_scalar_dense=64
training torch.Size([1024, 16])/ 2019-08-26 18:28:06.961756, device xla:3, step 800, Rate=256.52, Global Rate=306.06, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:28:06.953371, device xla:6, step 800, Rate=256.54, Global Rate=306.06, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:28:06.977692, device xla:8, step 800, Rate=256.55, Global Rate=306.06, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:28:06.986769, device xla:1, step 800, Rate=256.50, Global Rate=306.06, Compiles=52, _local_scalar_dense=64
training torch.Size([1024, 16])/ 2019-08-26 18:28:06.945964, device xla:2, step 800, Rate=256.52, Global Rate=306.07, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:28:07.002194, device xla:7, step 800, Rate=256.53, Global Rate=306.05, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:30:51.928356, device xla:5, step 900, Rate=267.30, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:30:51.972112, device xla:6, step 900, Rate=267.28, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:30:51.933264, device xla:2, step 900, Rate=267.28, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:30:51.940473, device xla:7, step 900, Rate=267.30, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:30:51.958529, device xla:4, step 900, Rate=267.28, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:30:51.964051, device xla:8, step 900, Rate=267.30, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:30:51.949504, device xla:1, step 900, Rate=267.28, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:30:51.974139, device xla:3, step 900, Rate=267.27, Global Rate=306.53, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:33:37.150099, device xla:6, step 1000, Rate=275.82, Global Rate=306.87, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:33:37.154849, device xla:5, step 1000, Rate=275.82, Global Rate=306.87, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:33:37.159793, device xla:4, step 1000, Rate=275.81, Global Rate=306.87, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:33:37.164788, device xla:7, step 1000, Rate=275.82, Global Rate=306.86, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:33:37.174496, device xla:1, step 1000, Rate=275.80, Global Rate=306.86, Compiles=52, _local_scalar_dense=64training torch.Size([256, 64])/ 2019-08-26 18:33:37.183812, device xla:3, step 1000, Rate=275.80, Global Rate=306.86, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:33:37.203191, device xla:2, step 1000, Rate=275.79, Global Rate=306.86, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:33:37.211648, device xla:8, step 1000, Rate=275.81, Global Rate=306.86, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:36:23.793555, device xla:6, step 1100, Rate=282.10, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:36:23.818170, device xla:8, step 1100, Rate=282.11, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:36:23.797878, device xla:1, step 1100, Rate=282.09, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:36:23.808323, device xla:3, step 1100, Rate=282.09, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:36:23.840669, device xla:5, step 1100, Rate=282.09, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:36:23.852234, device xla:7, step 1100, Rate=282.09, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:36:23.863959, device xla:4, step 1100, Rate=282.07, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:36:23.826192, device xla:2, step 1100, Rate=282.08, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([1024, 16])/ 2019-08-26 18:39:10.646756, device xla:6, step 1200, Rate=287.05, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:39:10.658384, device xla:1, step 1200, Rate=287.04, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:39:10.651275, device xla:2, step 1200, Rate=287.05, Global Rate=306.90, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:39:10.666186, device xla:7, step 1200, Rate=287.06, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:39:10.675676, device xla:3, step 1200, Rate=287.04, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:39:10.694082, device xla:5, step 1200, Rate=287.04, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:39:10.701267, device xla:4, step 1200, Rate=287.03, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:39:10.711941, device xla:8, step 1200, Rate=287.04, Global Rate=306.89, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:41:55.338931, device xla:5, step 1300, Rate=291.83, Global Rate=307.20, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:41:55.350423, device xla:3, step 1300, Rate=291.82, Global Rate=307.20, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:41:55.360102, device xla:1, step 1300, Rate=291.81, Global Rate=307.20, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:41:55.385297, device xla:2, step 1300, Rate=291.80, Global Rate=307.19, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:41:55.393227, device xla:6, step 1300, Rate=291.80, Global Rate=307.19, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:41:55.376835, device xla:4, step 1300, Rate=291.81, Global Rate=307.19, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:41:55.403463, device xla:7, step 1300, Rate=291.80, Global Rate=307.19, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:41:55.343423, device xla:8, step 1300, Rate=291.84, Global Rate=307.20, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:44:38.857366, device xla:6, step 1400, Rate=296.08, Global Rate=307.62, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:44:38.861870, device xla:2, step 1400, Rate=296.08, Global Rate=307.61, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:44:38.878756, device xla:8, step 1400, Rate=296.08, Global Rate=307.61, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:44:38.868943, device xla:3, step 1400, Rate=296.08, Global Rate=307.61, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:44:38.896137, device xla:4, step 1400, Rate=296.07, Global Rate=307.61, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:44:38.906231, device xla:5, step 1400, Rate=296.07, Global Rate=307.61, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:44:38.886357, device xla:7, step 1400, Rate=296.08, Global Rate=307.61, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:44:38.913936, device xla:1, step 1400, Rate=296.06, Global Rate=307.61, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:47:21.641193, device xla:6, step 1500, Rate=299.77, Global Rate=308.07, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:47:21.645760, device xla:2, step 1500, Rate=299.77, Global Rate=308.07, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:47:21.662357, device xla:8, step 1500, Rate=299.77, Global Rate=308.06, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:47:21.689338, device xla:5, step 1500, Rate=299.76, Global Rate=308.06, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:47:21.652697, device xla:3, step 1500, Rate=299.77, Global Rate=308.07, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:47:21.679536, device xla:7, step 1500, Rate=299.77, Global Rate=308.06, Compiles=52, _local_scalar_dense=64
training torch.Size([256, 64])/ 2019-08-26 18:47:21.669886, device xla:1, step 1500, Rate=299.76, Global Rate=308.06, Compiles=52, _local_scalar_dense=64
training torch.Size([512, 32])/ 2019-08-26 18:47:21.691650, device xla:4, step 1500, Rate=299.76, Global Rate=308.06, Compiles=52, _local_scalar_dense=64
Epoch 2 Training stats:
device xla:1
| epoch 002 | loss 2.000 | nll_loss 2.000 | ppl 4.00 | wps 3688 | ups 0 | wpb 11120.283 | bsz 405.560 | num_updates 3016 | lr 0.000377025 | gnorm 0.574 | clip 0.000 | oom 0.000 | wall 9093 | train_wall 5622
device xla:2
| epoch 002 | loss 1.984 | nll_loss 1.984 | ppl 3.96 | wps 3719 | ups 0 | wpb 11212.669 | bsz 408.446 | num_updates 3016 | lr 0.000377025 | gnorm 0.547 | clip 0.000 | oom 0.000 | wall 9093 | train_wall 6492
device xla:3
| epoch 002 | loss 2.000 | nll_loss 2.000 | ppl 4.00 | wps 3696 | ups 0 | wpb 11143.257 | bsz 408.700 | num_updates 3016 | lr 0.000377025 | gnorm 0.582 | clip 0.000 | oom 0.000 | wall 9093 | train_wall 6247
device xla:4
| epoch 002 | loss 2.000 | nll_loss 2.000 | ppl 4.00 | wps 3689 | ups 0 | wpb 11122.824 | bsz 413.538 | num_updates 3016 | lr 0.000377025 | gnorm 0.570 | clip 0.000 | oom 0.000 | wall 9093 | train_wall 6495
device xla:5
| epoch 002 | loss 1.984 | nll_loss 1.984 | ppl 3.96 | wps 3714 | ups 0 | wpb 11197.395 | bsz 412.095 | num_updates 3016 | lr 0.000377025 | gnorm 0.555 | clip 0.000 | oom 0.000 | wall 9093 | train_wall 6492
device xla:6
| epoch 002 | loss 2.000 | nll_loss 2.000 | ppl 4.00 | wps 3699 | ups 0 | wpb 11153.047 | bsz 406.578 | num_updates 3016 | lr 0.000377025 | gnorm 0.566 | clip 0.000 | oom 0.000 | wall 9093 | train_wall 6421
device xla:7
| epoch 002 | loss 2.016 | nll_loss 2.016 | ppl 4.04 | wps 3678 | ups 0 | wpb 11089.443 | bsz 410.143 | num_updates 3016 | lr 0.000377025 | gnorm 0.562 | clip 0.000 | oom 0.000 | wall 9093 | train_wall 5280
device xla:8
| epoch 002 | loss 2.000 | nll_loss 2.000 | ppl 4.00 | wps 3692 | ups 0 | wpb 11131.778 | bsz 413.538 | num_updates 3016 | lr 0.000377025 | gnorm 0.578 | clip 0.000 | oom 0.000 | wall 9094 | train_wall 5376
Epoch 2 Tracker Rates:
Rate=297.40, Global Rate=307.95
Rate=297.31, Global Rate=307.95
Rate=297.34, Global Rate=307.95
Rate=297.49, Global Rate=307.95
Rate=297.48, Global Rate=307.95
Rate=297.29, Global Rate=307.95
Rate=297.44, Global Rate=307.95
Rate=297.38, Global Rate=307.95
Epoch 2 end 2019-08-26 18:47:35.894135
Metric: CompileTime
TotalSamples: 52
Counter: 10h19m13s151ms937.560us
ValueRate: 02s186ms865.370us / second
Rate: 0.00805384 / second
Percentiles: 1%=080ms169.631us; 5%=28s580ms834.984us; 10%=28s171ms150.865us; 20%=02m31s104ms293.644us; 50%=03m40s716ms790.312us; 80%=06m09s474ms849.253us; 90%=07m09s613ms812.856us; 95%=22m13s835ms112.498us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 24256
Counter: 33h22m54s251ms210.799us
ValueRate: 06s105ms614.784us / second
Rate: 4.9635 / second
Percentiles: 1%=01s163ms213.318us; 5%=01s168ms80.922us; 10%=01s171ms431.270us; 20%=01s176ms937.204us; 50%=01s190ms308.726us; 80%=01s288ms496.316us; 90%=01s292ms432.824us; 95%=01s295ms669.116us; 99%=01s301ms237.423us
Metric: InboundData
TotalSamples: 104
Counter: 208.00B
ValueRate: 0.06B / second
Rate: 0.0305834 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 99888
Counter: 12.20GB
ValueRate: 486.87KB / second
Rate: 19.9694 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 222107
Counter: 04h23m54s966ms918.548us
ValueRate: 408ms733.405us / second
Rate: 43.3257 / second
Percentiles: 1%=427.272us; 5%=485.067us; 10%=528.278us; 20%=593.884us; 50%=849.467us; 80%=003ms625.755us; 90%=012ms786.018us; 95%=025ms776.258us; 99%=050ms838.562us
Metric: TransferFromServerTime
TotalSamples: 104
Counter: 02s297ms599.401us
ValueRate: 675.365us / second
Rate: 0.0305834 / second
Percentiles: 1%=723.603us; 5%=749.754us; 10%=794.296us; 20%=891.150us; 50%=002ms515.528us; 80%=056ms902.589us; 90%=066ms819.161us; 95%=069ms311.171us; 99%=071ms554.678us
Metric: TransferToServerTime
TotalSamples: 99888
Counter: 29h06m28s237ms97.223us
ValueRate: 05s030ms374.352us / second
Rate: 19.9695 / second
Percentiles: 1%=001ms58.649us; 5%=001ms155.984us; 10%=001ms254.005us; 20%=001ms388.361us; 50%=003ms739.031us; 80%=906ms162.441us; 90%=988ms728.359us; 95%=01s048ms841.590us; 99%=01s089ms934.395us
Counter: CachedSyncTensors
Value: 24204
Counter: CreateCompileHandles
Value: 52
Counter: CreateDataHandles
Value: 17908005
Counter: CreateXlaTensor
Value: 116682470
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 17900973
Counter: DestroyXlaTensor
Value: 116676462
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 17900973
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 52
Counter: XRTAllocateFromTensor_Empty
Value: 19942
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 104
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 18:47:40.209615, device xla:4, step 0, Compiles=52, _local_scalar_dense=104
validation/ 2019-08-26 18:47:40.211964, device xla:1, step 0, Compiles=52, _local_scalar_dense=104
validation/ 2019-08-26 18:47:40.214420, device xla:6, step 0, Compiles=52, _local_scalar_dense=104
validation/ 2019-08-26 18:47:40.216876, device xla:7, step 0, Compiles=52, _local_scalar_dense=104
validation/ 2019-08-26 18:47:40.219440, device xla:2, step 0, Compiles=52, _local_scalar_dense=104
validation/ 2019-08-26 18:47:40.222383, device xla:8, step 0, Compiles=52, _local_scalar_dense=104
validation/ 2019-08-26 18:47:40.223915, device xla:3, step 0, Compiles=52, _local_scalar_dense=104
validation/ 2019-08-26 18:47:40.229890, device xla:5, step 0, Compiles=52, _local_scalar_dense=104
validation stats on subset "valid" - 2019-08-26 18:47:46.303160
| epoch 002 | valid on 'valid' subset | loss 5.406 | nll_loss 3.672 | ppl 12.75 | num_updates 3016
| epoch 002 | valid on 'valid' subset | loss 5.406 | nll_loss 3.688 | ppl 12.88 | num_updates 3016
| epoch 002 | valid on 'valid' subset | loss 5.500 | nll_loss 3.797 | ppl 13.90 | num_updates 3016
| epoch 002 | valid on 'valid' subset | loss 5.469 | nll_loss 3.812 | ppl 14.05 | num_updates 3016
| epoch 002 | valid on 'valid' subset | loss 5.438 | nll_loss 3.672 | ppl 12.75 | num_updates 3016
| epoch 002 | valid on 'valid' subset | loss 5.406 | nll_loss 3.688 | ppl 12.88 | num_updates 3016
| epoch 002 | valid on 'valid' subset | loss 5.500 | nll_loss 3.734 | ppl 13.31 | num_updates 3016
| epoch 002 | valid on 'valid' subset | loss 5.500 | nll_loss 3.781 | ppl 13.75 | num_updates 3016
old learning rate: 0.00018856230000000002
new learning rate: 0.00037702460000000004
Metric: CompileTime
TotalSamples: 53
Counter: 10h19m13s227ms909.188us
ValueRate: 02s568ms491.121us / second
Rate: 0.00589022 / second
Percentiles: 1%=076ms971.628us; 5%=28s578ms625.257us; 10%=28s584ms463.030us; 20%=02m31s102ms802.991us; 50%=03m40s714ms918.104us; 80%=06m09s474ms849.253us; 90%=07m09s613ms812.856us; 95%=22m13s835ms112.498us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 24377
Counter: 33h23m40s208ms172.114us
ValueRate: 06s985ms175.071us / second
Rate: 5.30151 / second
Percentiles: 1%=377ms525.070us; 5%=378ms104.105us; 10%=391ms81.114us; 20%=01s171ms847.569us; 50%=01s185ms696.482us; 80%=01s288ms51.939us; 90%=01s292ms164.312us; 95%=01s295ms669.116us; 99%=01s301ms237.423us
Metric: InboundData
TotalSamples: 129
Counter: 257.00B
ValueRate: 0.08B / second
Rate: 0.0378163 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 100128
Counter: 12.25GB
ValueRate: 926.29KB / second
Rate: 20.244 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 222771
Counter: 04h23m25s454ms379.083us
ValueRate: 02s748ms222.232us / second
Rate: 54.5285 / second
Percentiles: 1%=422.351us; 5%=465.766us; 10%=497.131us; 20%=541.103us; 50%=714.334us; 80%=001ms203.651us; 90%=020ms346.922us; 95%=375ms659.523us; 99%=388ms962.802us
Metric: TransferFromServerTime
TotalSamples: 129
Counter: 03s805ms176.804us
ValueRate: 822.337us / second
Rate: 0.0378163 / second
Percentiles: 1%=712.931us; 5%=749.754us; 10%=801.396us; 20%=942.866us; 50%=002ms530.554us; 80%=056ms902.589us; 90%=064ms761.130us; 95%=069ms814.666us; 99%=071ms554.678us
Metric: TransferToServerTime
TotalSamples: 100128
Counter: 29h07m56s428ms464.188us
ValueRate: 04s264ms166.554us / second
Rate: 20.2439 / second
Percentiles: 1%=001ms70.030us; 5%=001ms190.361us; 10%=001ms285.038us; 20%=001ms418.349us; 50%=002ms186.257us; 80%=250ms225.232us; 90%=968ms196.440us; 95%=01s042ms292.603us; 99%=01s085ms632.230us
Counter: CachedSyncTensors
Value: 24324
Counter: CreateCompileHandles
Value: 53
Counter: CreateDataHandles
Value: 17909614
Counter: CreateXlaTensor
Value: 116817287
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 17902580
Counter: DestroyXlaTensor
Value: 116811279
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 17902582
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 53
Counter: XRTAllocateFromTensor_Empty
Value: 19942
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 129
Epoch 3 begin 2019-08-26 18:47:46.586578
training torch.Size([512, 32])/ 2019-08-26 18:47:55.027955, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:47:55.197450, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:47:55.222156, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:47:55.228516, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:47:55.310059, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:47:55.380326, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:47:55.431217, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:47:55.463575, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:50:45.735692, device xla:3, step 100, Rate=60.05, Global Rate=291.72, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:50:45.740641, device xla:1, step 100, Rate=60.04, Global Rate=291.71, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:50:45.745570, device xla:4, step 100, Rate=60.08, Global Rate=291.70, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:50:45.762183, device xla:8, step 100, Rate=60.13, Global Rate=291.68, Compiles=53, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 18:50:45.754551, device xla:5, step 100, Rate=60.10, Global Rate=291.69, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:50:45.772062, device xla:7, step 100, Rate=60.04, Global Rate=291.66, Compiles=53, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 18:50:45.806432, device xla:6, step 100, Rate=60.10, Global Rate=291.60, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:50:45.793714, device xla:2, step 100, Rate=59.97, Global Rate=291.62, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:53:32.381848, device xla:6, step 200, Rate=109.56, Global Rate=299.28, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:53:32.408992, device xla:4, step 200, Rate=109.51, Global Rate=299.25, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:53:32.411981, device xla:2, step 200, Rate=109.43, Global Rate=299.25, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:53:32.399889, device xla:8, step 200, Rate=109.55, Global Rate=299.26, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:53:32.394659, device xla:1, step 200, Rate=109.48, Global Rate=299.27, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:53:32.377112, device xla:3, step 200, Rate=109.49, Global Rate=299.28, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:53:32.421709, device xla:7, step 200, Rate=109.48, Global Rate=299.24, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:53:32.386864, device xla:5, step 200, Rate=109.54, Global Rate=299.27, Compiles=53, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 18:56:17.743311, device xla:3, step 300, Rate=149.52, Global Rate=302.65, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:56:17.747784, device xla:6, step 300, Rate=149.57, Global Rate=302.65, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:56:17.752457, device xla:1, step 300, Rate=149.51, Global Rate=302.64, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:56:17.758846, device xla:8, step 300, Rate=149.57, Global Rate=302.64, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:56:17.799761, device xla:5, step 300, Rate=149.53, Global Rate=302.62, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:56:17.771857, device xla:4, step 300, Rate=149.53, Global Rate=302.63, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 18:56:17.812824, device xla:7, step 300, Rate=149.50, Global Rate=302.61, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:56:17.781116, device xla:2, step 300, Rate=149.47, Global Rate=302.63, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:59:03.776667, device xla:6, step 400, Rate=181.33, Global Rate=304.06, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:59:03.781166, device xla:1, step 400, Rate=181.28, Global Rate=304.06, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:59:03.786021, device xla:3, step 400, Rate=181.28, Global Rate=304.06, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:59:03.791175, device xla:8, step 400, Rate=181.33, Global Rate=304.05, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:59:03.798845, device xla:7, step 400, Rate=181.29, Global Rate=304.05, Compiles=53, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 18:59:03.823915, device xla:4, step 400, Rate=181.29, Global Rate=304.04, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:59:03.834065, device xla:2, step 400, Rate=181.24, Global Rate=304.03, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 18:59:03.845626, device xla:5, step 400, Rate=181.30, Global Rate=304.03, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:01:49.634016, device xla:1, step 500, Rate=206.77, Global Rate=304.98, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:01:49.648739, device xla:3, step 500, Rate=206.77, Global Rate=304.97, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:01:49.679848, device xla:5, step 500, Rate=206.79, Global Rate=304.96, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:01:49.640379, device xla:4, step 500, Rate=206.79, Global Rate=304.97, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:01:49.687826, device xla:2, step 500, Rate=206.73, Global Rate=304.96, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:01:49.653754, device xla:8, step 500, Rate=206.80, Global Rate=304.97, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:01:49.661401, device xla:7, step 500, Rate=206.77, Global Rate=304.97, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:01:49.701950, device xla:6, step 500, Rate=206.78, Global Rate=304.95, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:04:34.912686, device xla:1, step 600, Rate=227.37, Global Rate=305.77, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:04:34.924778, device xla:5, step 600, Rate=227.40, Global Rate=305.76, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:04:34.917678, device xla:8, step 600, Rate=227.40, Global Rate=305.77, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:04:34.932454, device xla:7, step 600, Rate=227.38, Global Rate=305.76, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:04:34.942752, device xla:3, step 600, Rate=227.36, Global Rate=305.76, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:04:34.948795, device xla:6, step 600, Rate=227.39, Global Rate=305.76, Compiles=53, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 19:04:34.954238, device xla:2, step 600, Rate=227.35, Global Rate=305.75, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:04:34.965699, device xla:4, step 600, Rate=227.37, Global Rate=305.75, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:07:20.007134, device xla:1, step 700, Rate=243.92, Global Rate=306.38, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:07:20.011531, device xla:6, step 700, Rate=243.95, Global Rate=306.38, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:07:20.023613, device xla:4, step 700, Rate=243.93, Global Rate=306.38, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:07:20.016276, device xla:8, step 700, Rate=243.95, Global Rate=306.38, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:07:20.027194, device xla:5, step 700, Rate=243.94, Global Rate=306.38, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:07:20.042032, device xla:7, step 700, Rate=243.92, Global Rate=306.37, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:07:20.056145, device xla:3, step 700, Rate=243.91, Global Rate=306.37, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:07:20.068922, device xla:2, step 700, Rate=243.90, Global Rate=306.37, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:10:10.612713, device xla:6, step 800, Rate=255.18, Global Rate=305.58, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:10:10.617403, device xla:8, step 800, Rate=255.18, Global Rate=305.58, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:10:10.634528, device xla:2, step 800, Rate=255.15, Global Rate=305.58, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:10:10.658840, device xla:5, step 800, Rate=255.16, Global Rate=305.57, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:10:10.643264, device xla:4, step 800, Rate=255.16, Global Rate=305.58, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:10:10.652733, device xla:1, step 800, Rate=255.14, Global Rate=305.57, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:10:10.663887, device xla:3, step 800, Rate=255.15, Global Rate=305.57, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:10:10.624526, device xla:7, step 800, Rate=255.17, Global Rate=305.58, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:13:00.785725, device xla:6, step 900, Rate=264.32, Global Rate=305.05, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:13:00.790287, device xla:1, step 900, Rate=264.30, Global Rate=305.05, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:13:00.795438, device xla:3, step 900, Rate=264.31, Global Rate=305.05, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:13:00.813910, device xla:4, step 900, Rate=264.31, Global Rate=305.05, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:13:00.800359, device xla:8, step 900, Rate=264.31, Global Rate=305.05, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:13:00.843601, device xla:7, step 900, Rate=264.29, Global Rate=305.04, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:13:00.830942, device xla:2, step 900, Rate=264.29, Global Rate=305.04, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:13:00.854902, device xla:5, step 900, Rate=264.30, Global Rate=305.04, Compiles=53, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 19:15:44.285756, device xla:1, step 1000, Rate=274.07, Global Rate=305.84, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:15:44.290379, device xla:3, step 1000, Rate=274.08, Global Rate=305.84, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:15:44.291792, device xla:8, step 1000, Rate=274.08, Global Rate=305.84, Compiles=53, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:15:44.309362, device xla:2, step 1000, Rate=274.07, Global Rate=305.84, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:15:44.326376, device xla:4, step 1000, Rate=274.07, Global Rate=305.84, Compiles=53, _local_scalar_dense=129training torch.Size([256, 64])/ 2019-08-26 19:15:44.299151, device xla:7, step 1000, Rate=274.08, Global Rate=305.84, Compiles=53, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 19:15:44.339513, device xla:5, step 1000, Rate=274.07, Global Rate=305.83, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:15:44.331707, device xla:6, step 1000, Rate=274.07, Global Rate=305.83, Compiles=53, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:42:31.612014, device xla:6, step 1100, Rate=225.63, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:42:31.616999, device xla:5, step 1100, Rate=225.63, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:42:31.624046, device xla:8, step 1100, Rate=225.64, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:42:31.632271, device xla:3, step 1100, Rate=225.63, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:42:31.649727, device xla:1, step 1100, Rate=225.63, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:42:31.639944, device xla:2, step 1100, Rate=225.63, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:42:31.651896, device xla:7, step 1100, Rate=225.63, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:42:31.677796, device xla:4, step 1100, Rate=225.63, Global Rate=171.63, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:45:16.428274, device xla:6, step 1200, Rate=242.63, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:45:16.432377, device xla:8, step 1200, Rate=242.64, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:45:16.439411, device xla:4, step 1200, Rate=242.65, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:45:16.447892, device xla:5, step 1200, Rate=242.63, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:45:16.483216, device xla:2, step 1200, Rate=242.62, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:45:16.464519, device xla:7, step 1200, Rate=242.64, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:45:16.455773, device xla:1, step 1200, Rate=242.64, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:45:16.493251, device xla:3, step 1200, Rate=242.62, Global Rate=178.28, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:48:01.068337, device xla:6, step 1300, Rate=256.30, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:48:01.073499, device xla:5, step 1300, Rate=256.30, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:48:01.075129, device xla:2, step 1300, Rate=256.31, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:48:01.097268, device xla:1, step 1300, Rate=256.31, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:48:01.084106, device xla:3, step 1300, Rate=256.31, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:48:01.117664, device xla:4, step 1300, Rate=256.30, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 19:48:01.124444, device xla:8, step 1300, Rate=256.29, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:48:01.106569, device xla:7, step 1300, Rate=256.31, Global Rate=184.33, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:50:43.767555, device xla:6, step 1400, Rate=267.98, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:50:43.771534, device xla:1, step 1400, Rate=267.99, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:50:43.790143, device xla:8, step 1400, Rate=267.98, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:50:43.781927, device xla:2, step 1400, Rate=267.98, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 19:50:43.802976, device xla:4, step 1400, Rate=267.99, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:50:43.775413, device xla:5, step 1400, Rate=267.98, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:50:43.821131, device xla:3, step 1400, Rate=267.97, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:50:43.838929, device xla:7, step 1400, Rate=267.97, Global Rate=189.95, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:53:27.229286, device xla:1, step 1500, Rate=277.04, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:53:27.235256, device xla:6, step 1500, Rate=277.03, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:53:27.240902, device xla:7, step 1500, Rate=277.04, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:53:27.251356, device xla:8, step 1500, Rate=277.03, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:53:27.269372, device xla:5, step 1500, Rate=277.02, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
training torch.Size([1024, 16])/ 2019-08-26 19:53:27.260782, device xla:3, step 1500, Rate=277.03, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
training torch.Size([512, 32])/ 2019-08-26 19:53:27.272943, device xla:4, step 1500, Rate=277.03, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
training torch.Size([256, 64])/ 2019-08-26 19:53:27.276942, device xla:2, step 1500, Rate=277.02, Global Rate=195.07, Compiles=57, _local_scalar_dense=129
Epoch 3 Training stats:
device xla:1
| epoch 003 | loss 1.336 | nll_loss 1.336 | ppl 2.52 | wps 3864 | ups 0 | wpb 11154.221 | bsz 406.069 | num_updates 4524 | lr 0.000470152 | gnorm 0.385 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 9132
device xla:2
| epoch 003 | loss 1.328 | nll_loss 1.328 | ppl 2.51 | wps 3874 | ups 0 | wpb 11183.116 | bsz 406.578 | num_updates 4524 | lr 0.000470152 | gnorm 0.369 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 9990
device xla:3
| epoch 003 | loss 1.336 | nll_loss 1.336 | ppl 2.52 | wps 3858 | ups 0 | wpb 11137.516 | bsz 404.767 | num_updates 4524 | lr 0.000470152 | gnorm 0.395 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 8317
device xla:4
| epoch 003 | loss 1.336 | nll_loss 1.336 | ppl 2.52 | wps 3861 | ups 0 | wpb 11144.499 | bsz 415.179 | num_updates 4524 | lr 0.000470152 | gnorm 0.385 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 9989
device xla:5
| epoch 003 | loss 1.328 | nll_loss 1.328 | ppl 2.51 | wps 3871 | ups 0 | wpb 11174.752 | bsz 411.388 | num_updates 4524 | lr 0.000470152 | gnorm 0.371 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 9986
device xla:6
| epoch 003 | loss 1.336 | nll_loss 1.336 | ppl 2.52 | wps 3860 | ups 0 | wpb 11142.492 | bsz 411.218 | num_updates 4524 | lr 0.000470152 | gnorm 0.381 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 9921
device xla:7
| epoch 003 | loss 1.336 | nll_loss 1.336 | ppl 2.52 | wps 3852 | ups 0 | wpb 11120.191 | bsz 412.973 | num_updates 4524 | lr 0.000470152 | gnorm 0.379 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 8782
device xla:8
| epoch 003 | loss 1.336 | nll_loss 1.336 | ppl 2.52 | wps 3850 | ups 0 | wpb 11113.557 | bsz 410.370 | num_updates 4524 | lr 0.000470152 | gnorm 0.395 | clip 0.000 | oom 0.000 | wall 13059 | train_wall 8875
Epoch 3 Tracker Rates:
Rate=280.30, Global Rate=195.42
Rate=280.48, Global Rate=195.42
Rate=280.42, Global Rate=195.42
Rate=280.47, Global Rate=195.42
Rate=280.45, Global Rate=195.42
Rate=280.31, Global Rate=195.42
Rate=280.35, Global Rate=195.42
Rate=280.38, Global Rate=195.42
Epoch 3 end 2019-08-26 19:53:41.193407
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 36441
Counter: 50h07m42s615ms844.157us
ValueRate: 06s081ms341.519us / second
Rate: 4.96606 / second
Percentiles: 1%=01s072ms362.998us; 5%=01s166ms291.105us; 10%=01s169ms141.780us; 20%=01s174ms533.769us; 50%=01s186ms825.814us; 80%=01s286ms653.742us; 90%=01s290ms775.782us; 95%=01s293ms249.902us; 99%=01s304ms536.515us
Metric: InboundData
TotalSamples: 169
Counter: 337.00B
ValueRate: 0.05B / second
Rate: 0.0229438 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 149791
Counter: 16.74GB
ValueRate: 499.92KB / second
Rate: 20.5017 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 332912
Counter: 04h09m36s084ms247.622us
ValueRate: 487ms758.001us / second
Rate: 41.2964 / second
Percentiles: 1%=437.289us; 5%=500.996us; 10%=541.043us; 20%=626.076us; 50%=873.841us; 80%=004ms714.381us; 90%=017ms564.103us; 95%=029ms182.000us; 99%=049ms926.910us
Metric: TransferFromServerTime
TotalSamples: 169
Counter: 03s393ms920.286us
ValueRate: 460.630us / second
Rate: 0.0229438 / second
Percentiles: 1%=679.144us; 5%=712.931us; 10%=747.062us; 20%=888.397us; 50%=001ms404.449us; 80%=055ms810.205us; 90%=063ms911.629us; 95%=068ms932.198us; 99%=071ms554.678us
Metric: TransferToServerTime
TotalSamples: 149791
Counter: 45h19m22s486ms406.704us
ValueRate: 05s150ms419.348us / second
Rate: 20.9026 / second
Percentiles: 1%=001ms64.631us; 5%=001ms193.280us; 10%=001ms271.582us; 20%=001ms379.312us; 50%=003ms625.697us; 80%=892ms336.435us; 90%=968ms792.779us; 95%=01s015ms979.976us; 99%=01s091ms335.586us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 36384
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 26861901
Counter: CreateXlaTensor
Value: 175088919
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 26854869
Counter: DestroyXlaTensor
Value: 175082911
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 26854869
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 20827
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 169
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 19:53:45.127926, device xla:5, step 0, Compiles=57, _local_scalar_dense=169
validation/ 2019-08-26 19:53:45.131800, device xla:1, step 0, Compiles=57, _local_scalar_dense=169
validation/ 2019-08-26 19:53:45.139820, device xla:2, step 0, Compiles=57, _local_scalar_dense=169
validation/ 2019-08-26 19:53:45.142661, device xla:3, step 0, Compiles=57, _local_scalar_dense=169
validation/ 2019-08-26 19:53:45.144251, device xla:8, step 0, Compiles=57, _local_scalar_dense=169
validation/ 2019-08-26 19:53:45.148342, device xla:4, step 0, Compiles=57, _local_scalar_dense=169
validation/ 2019-08-26 19:53:45.150209, device xla:7, step 0, Compiles=57, _local_scalar_dense=169
validation/ 2019-08-26 19:53:45.152033, device xla:6, step 0, Compiles=57, _local_scalar_dense=169
validation stats on subset "valid" - 2019-08-26 19:53:51.238025
| epoch 003 | valid on 'valid' subset | loss 4.562 | nll_loss 2.781 | ppl 6.87 | num_updates 4524
| epoch 003 | valid on 'valid' subset | loss 4.562 | nll_loss 2.797 | ppl 6.95 | num_updates 4524
| epoch 003 | valid on 'valid' subset | loss 4.688 | nll_loss 2.875 | ppl 7.34 | num_updates 4524
| epoch 003 | valid on 'valid' subset | loss 4.656 | nll_loss 2.875 | ppl 7.34 | num_updates 4524
| epoch 003 | valid on 'valid' subset | loss 4.594 | nll_loss 2.781 | ppl 6.87 | num_updates 4524
| epoch 003 | valid on 'valid' subset | loss 4.625 | nll_loss 2.812 | ppl 7.03 | num_updates 4524
| epoch 003 | valid on 'valid' subset | loss 4.656 | nll_loss 2.797 | ppl 6.95 | num_updates 4524
| epoch 003 | valid on 'valid' subset | loss 4.688 | nll_loss 2.906 | ppl 7.50 | num_updates 4524
old learning rate: 0.00037702460000000004
new learning rate: 0.0004701524481395373
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 36562
Counter: 50h07m28s592ms453.788us
ValueRate: 06s986ms982.439us / second
Rate: 5.30979 / second
Percentiles: 1%=377ms873.479us; 5%=378ms72.980us; 10%=391ms481.860us; 20%=01s170ms924.630us; 50%=01s183ms148.453us; 80%=01s285ms213.133us; 90%=01s290ms651.657us; 95%=01s293ms30.871us; 99%=01s304ms536.515us
Metric: InboundData
TotalSamples: 194
Counter: 386.00B
ValueRate: 0.05B / second
Rate: 0.0263009 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 150031
Counter: 16.78GB
ValueRate: 931.33KB / second
Rate: 20.3542 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 333537
Counter: 04h09m08s967ms402.592us
ValueRate: 02s706ms786.699us / second
Rate: 52.2017 / second
Percentiles: 1%=445.151us; 5%=510.293us; 10%=550.684us; 20%=610.568us; 50%=802.851us; 80%=001ms350.420us; 90%=024ms399.193us; 95%=375ms547.292us; 99%=388ms893.292us
Metric: TransferFromServerTime
TotalSamples: 194
Counter: 04s917ms887.859us
ValueRate: 531.020us / second
Rate: 0.0263009 / second
Percentiles: 1%=679.144us; 5%=723.603us; 10%=754.679us; 20%=927.366us; 50%=002ms546.214us; 80%=047ms926.074us; 90%=060ms234.935us; 95%=068ms576.480us; 99%=071ms554.678us
Metric: TransferToServerTime
TotalSamples: 150031
Counter: 45h20m51s260ms541.595us
ValueRate: 04s378ms586.529us / second
Rate: 20.354 / second
Percentiles: 1%=001ms92.731us; 5%=001ms234.528us; 10%=001ms306.390us; 20%=001ms440.876us; 50%=003ms596.592us; 80%=259ms427.960us; 90%=955ms379.528us; 95%=982ms152.480us; 99%=01s086ms533.785us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 36505
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 26863510
Counter: CreateXlaTensor
Value: 175223736
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 26856476
Counter: DestroyXlaTensor
Value: 175217728
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 26856478
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 20827
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 194
Epoch 4 begin 2019-08-26 19:53:51.526839
training torch.Size([256, 64])/ 2019-08-26 19:54:02.040846, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:54:02.194172, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:54:02.262043, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 19:54:02.316008, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:54:02.367767, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:54:02.507223, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:54:02.531730, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:54:03.160113, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:56:54.918465, device xla:3, step 100, Rate=59.34, Global Rate=287.39, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:56:54.911508, device xla:2, step 100, Rate=59.24, Global Rate=287.40, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 19:56:54.905221, device xla:1, step 100, Rate=59.29, Global Rate=287.41, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:56:54.930549, device xla:7, step 100, Rate=59.39, Global Rate=287.37, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 19:56:54.936608, device xla:8, step 100, Rate=59.61, Global Rate=287.36, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 19:56:54.921441, device xla:6, step 100, Rate=59.40, Global Rate=287.38, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 19:56:54.956461, device xla:4, step 100, Rate=59.31, Global Rate=287.32, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:56:54.943756, device xla:5, step 100, Rate=59.30, Global Rate=287.35, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:59:42.332522, device xla:3, step 200, Rate=108.64, Global Rate=296.32, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 19:59:42.335095, device xla:7, step 200, Rate=108.68, Global Rate=296.32, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:59:42.342884, device xla:8, step 200, Rate=108.86, Global Rate=296.31, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:59:42.366790, device xla:4, step 200, Rate=108.62, Global Rate=296.29, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:59:42.325635, device xla:5, step 200, Rate=108.62, Global Rate=296.33, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:59:42.372789, device xla:6, step 200, Rate=108.67, Global Rate=296.29, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 19:59:42.381385, device xla:1, step 200, Rate=108.57, Global Rate=296.28, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 19:59:42.358504, device xla:2, step 200, Rate=108.54, Global Rate=296.30, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:02:27.981782, device xla:7, step 300, Rate=148.76, Global Rate=300.46, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:02:27.975546, device xla:1, step 300, Rate=148.70, Global Rate=300.46, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:02:27.986606, device xla:6, step 300, Rate=148.77, Global Rate=300.45, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:02:27.991979, device xla:5, step 300, Rate=148.70, Global Rate=300.45, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:02:28.017684, device xla:8, step 300, Rate=148.89, Global Rate=300.44, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:02:28.028506, device xla:3, step 300, Rate=148.71, Global Rate=300.43, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:02:28.003714, device xla:2, step 300, Rate=148.65, Global Rate=300.44, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:02:28.040704, device xla:4, step 300, Rate=148.70, Global Rate=300.42, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:05:13.516885, device xla:7, step 400, Rate=180.87, Global Rate=302.62, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:05:13.528810, device xla:6, step 400, Rate=180.87, Global Rate=302.61, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:05:13.535960, device xla:5, step 400, Rate=180.82, Global Rate=302.61, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:05:13.554391, device xla:3, step 400, Rate=180.83, Global Rate=302.60, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:05:13.521856, device xla:1, step 400, Rate=180.81, Global Rate=302.62, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:05:13.544582, device xla:2, step 400, Rate=180.78, Global Rate=302.61, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:05:13.557318, device xla:4, step 400, Rate=180.83, Global Rate=302.60, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:05:13.568403, device xla:8, step 400, Rate=180.97, Global Rate=302.60, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:07:56.338786, device xla:1, step 500, Rate=207.54, Global Rate=304.91, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:07:56.332500, device xla:5, step 500, Rate=207.56, Global Rate=304.92, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:07:56.365024, device xla:4, step 500, Rate=207.56, Global Rate=304.91, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:07:56.345185, device xla:7, step 500, Rate=207.58, Global Rate=304.91, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:07:56.352226, device xla:2, step 500, Rate=207.52, Global Rate=304.91, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:07:56.404179, device xla:6, step 500, Rate=207.57, Global Rate=304.89, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:07:56.391538, device xla:3, step 500, Rate=207.55, Global Rate=304.90, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:07:56.374539, device xla:8, step 500, Rate=207.67, Global Rate=304.90, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:10:42.060937, device xla:6, step 600, Rate=227.87, Global Rate=305.58, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:10:42.065879, device xla:7, step 600, Rate=227.86, Global Rate=305.58, Compiles=57, _local_scalar_dense=194
training torch.Size([1024, 16])/ 2019-08-26 20:10:42.088436, device xla:5, step 600, Rate=227.82, Global Rate=305.57, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:10:42.080428, device xla:3, step 600, Rate=227.84, Global Rate=305.57, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:10:42.071311, device xla:4, step 600, Rate=227.84, Global Rate=305.58, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:10:42.094493, device xla:8, step 600, Rate=227.93, Global Rate=305.57, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:10:42.112224, device xla:1, step 600, Rate=227.81, Global Rate=305.56, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:10:42.103839, device xla:2, step 600, Rate=227.80, Global Rate=305.57, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:13:26.085563, device xla:6, step 700, Rate=244.72, Global Rate=306.50, Compiles=57, _local_scalar_dense=194
training torch.Size([1024, 16])/ 2019-08-26 20:13:26.096819, device xla:3, step 700, Rate=244.71, Global Rate=306.50, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:13:26.098971, device xla:7, step 700, Rate=244.71, Global Rate=306.50, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:13:26.103760, device xla:5, step 700, Rate=244.69, Global Rate=306.50, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:13:26.116225, device xla:4, step 700, Rate=244.70, Global Rate=306.49, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:13:26.126514, device xla:8, step 700, Rate=244.77, Global Rate=306.49, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:13:26.090107, device xla:1, step 700, Rate=244.69, Global Rate=306.50, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:13:26.148029, device xla:2, step 700, Rate=244.66, Global Rate=306.49, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:16:10.512213, device xla:7, step 800, Rate=258.05, Global Rate=307.10, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:16:10.531207, device xla:6, step 800, Rate=258.05, Global Rate=307.10, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:16:10.545175, device xla:4, step 800, Rate=258.03, Global Rate=307.10, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:16:10.551656, device xla:8, step 800, Rate=258.09, Global Rate=307.09, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:16:10.523689, device xla:5, step 800, Rate=258.03, Global Rate=307.10, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:16:10.537305, device xla:2, step 800, Rate=258.02, Global Rate=307.10, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:16:10.516768, device xla:3, step 800, Rate=258.05, Global Rate=307.10, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:16:10.559036, device xla:1, step 800, Rate=258.01, Global Rate=307.09, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:18:54.086308, device xla:6, step 900, Rate=269.05, Global Rate=307.75, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:18:54.090866, device xla:2, step 900, Rate=269.02, Global Rate=307.75, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:18:54.123463, device xla:5, step 900, Rate=269.02, Global Rate=307.74, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:18:54.098025, device xla:4, step 900, Rate=269.04, Global Rate=307.75, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:18:54.107331, device xla:8, step 900, Rate=269.08, Global Rate=307.74, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:18:54.131954, device xla:3, step 900, Rate=269.02, Global Rate=307.74, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:18:54.143565, device xla:7, step 900, Rate=269.02, Global Rate=307.74, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:18:54.152972, device xla:1, step 900, Rate=269.01, Global Rate=307.73, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:21:39.204784, device xla:7, step 1000, Rate=277.25, Global Rate=307.98, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:21:39.189406, device xla:3, step 1000, Rate=277.26, Global Rate=307.98, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:21:39.219415, device xla:5, step 1000, Rate=277.24, Global Rate=307.98, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:21:39.195941, device xla:4, step 1000, Rate=277.25, Global Rate=307.98, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:21:39.238402, device xla:1, step 1000, Rate=277.23, Global Rate=307.97, Compiles=57, _local_scalar_dense=194training torch.Size([512, 32])/ 2019-08-26 20:21:39.211130, device xla:2, step 1000, Rate=277.23, Global Rate=307.98, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:21:39.246101, device xla:8, step 1000, Rate=277.28, Global Rate=307.97, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:21:39.233067, device xla:6, step 1000, Rate=277.24, Global Rate=307.98, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:24:22.464857, device xla:8, step 1100, Rate=284.56, Global Rate=308.48, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:24:22.442676, device xla:7, step 1100, Rate=284.53, Global Rate=308.49, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:24:22.467631, device xla:5, step 1100, Rate=284.52, Global Rate=308.48, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:24:22.454628, device xla:4, step 1100, Rate=284.53, Global Rate=308.49, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:24:22.447359, device xla:1, step 1100, Rate=284.53, Global Rate=308.49, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:24:22.475248, device xla:2, step 1100, Rate=284.51, Global Rate=308.48, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:24:22.487276, device xla:3, step 1100, Rate=284.51, Global Rate=308.48, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:24:22.498926, device xla:6, step 1100, Rate=284.52, Global Rate=308.48, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:27:08.246738, device xla:7, step 1200, Rate=289.39, Global Rate=308.51, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:27:08.264755, device xla:2, step 1200, Rate=289.37, Global Rate=308.51, Compiles=57, _local_scalar_dense=194
training torch.Size([1024, 16])/ 2019-08-26 20:27:08.251517, device xla:1, step 1200, Rate=289.38, Global Rate=308.51, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:27:08.272342, device xla:4, step 1200, Rate=289.37, Global Rate=308.51, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:27:08.259891, device xla:6, step 1200, Rate=289.39, Global Rate=308.51, Compiles=57, _local_scalar_dense=194
training torch.Size([1024, 16])/ 2019-08-26 20:27:08.283423, device xla:8, step 1200, Rate=289.40, Global Rate=308.51, Compiles=57, _local_scalar_dense=194
training torch.Size([1024, 16])/ 2019-08-26 20:27:08.302452, device xla:5, step 1200, Rate=289.36, Global Rate=308.50, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:27:08.293553, device xla:3, step 1200, Rate=289.37, Global Rate=308.51, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:29:52.552820, device xla:7, step 1300, Rate=293.83, Global Rate=308.75, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:29:52.557449, device xla:2, step 1300, Rate=293.82, Global Rate=308.75, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:29:52.573182, device xla:1, step 1300, Rate=293.82, Global Rate=308.75, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:29:52.588926, device xla:6, step 1300, Rate=293.82, Global Rate=308.74, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:29:52.580012, device xla:8, step 1300, Rate=293.85, Global Rate=308.75, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:29:52.564113, device xla:5, step 1300, Rate=293.83, Global Rate=308.75, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:29:52.621228, device xla:3, step 1300, Rate=293.81, Global Rate=308.74, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:29:52.596469, device xla:4, step 1300, Rate=293.82, Global Rate=308.74, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:32:33.909226, device xla:6, step 1400, Rate=298.54, Global Rate=309.35, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:32:33.914361, device xla:7, step 1400, Rate=298.53, Global Rate=309.35, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:32:33.940589, device xla:8, step 1400, Rate=298.54, Global Rate=309.34, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:32:33.919759, device xla:4, step 1400, Rate=298.53, Global Rate=309.34, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:32:33.928443, device xla:3, step 1400, Rate=298.53, Global Rate=309.34, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:32:33.958537, device xla:2, step 1400, Rate=298.50, Global Rate=309.34, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:32:33.971641, device xla:5, step 1400, Rate=298.51, Global Rate=309.34, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:32:33.979027, device xla:1, step 1400, Rate=298.50, Global Rate=309.34, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:35:13.620490, device xla:1, step 1500, Rate=302.94, Global Rate=310.07, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:35:13.642714, device xla:7, step 1500, Rate=302.93, Global Rate=310.07, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:35:13.627179, device xla:6, step 1500, Rate=302.94, Global Rate=310.07, Compiles=57, _local_scalar_dense=194
training torch.Size([512, 32])/ 2019-08-26 20:35:13.614186, device xla:5, step 1500, Rate=302.95, Global Rate=310.07, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:35:13.647762, device xla:2, step 1500, Rate=302.93, Global Rate=310.07, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:35:13.669556, device xla:3, step 1500, Rate=302.93, Global Rate=310.06, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:35:13.655382, device xla:4, step 1500, Rate=302.93, Global Rate=310.07, Compiles=57, _local_scalar_dense=194
training torch.Size([256, 64])/ 2019-08-26 20:35:13.634513, device xla:8, step 1500, Rate=302.95, Global Rate=310.07, Compiles=57, _local_scalar_dense=194
Epoch 4 Training stats:
device xla:1
| epoch 004 | loss 1.000 | nll_loss 1.000 | ppl 2.00 | wps 4328 | ups 0 | wpb 11168.377 | bsz 405.729 | num_updates 6032 | lr 0.000407164 | gnorm 0.289 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 11189
device xla:2
| epoch 004 | loss 0.992 | nll_loss 0.992 | ppl 1.99 | wps 4335 | ups 0 | wpb 11187.060 | bsz 406.748 | num_updates 6032 | lr 0.000407164 | gnorm 0.277 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 12035
device xla:3
| epoch 004 | loss 1.000 | nll_loss 1.000 | ppl 2.00 | wps 4316 | ups 0 | wpb 11136.560 | bsz 404.159 | num_updates 6032 | lr 0.000407164 | gnorm 0.299 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 10366
device xla:4
| epoch 004 | loss 1.000 | nll_loss 1.000 | ppl 2.00 | wps 4314 | ups 0 | wpb 11132.348 | bsz 415.491 | num_updates 6032 | lr 0.000407164 | gnorm 0.291 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 12032
device xla:5
| epoch 004 | loss 1.000 | nll_loss 1.000 | ppl 2.00 | wps 4322 | ups 0 | wpb 11153.536 | bsz 411.841 | num_updates 6032 | lr 0.000407164 | gnorm 0.279 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 12044
device xla:6
| epoch 004 | loss 1.000 | nll_loss 1.000 | ppl 2.00 | wps 4315 | ups 0 | wpb 11134.471 | bsz 410.568 | num_updates 6032 | lr 0.000407164 | gnorm 0.285 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 11980
device xla:7
| epoch 004 | loss 1.000 | nll_loss 1.000 | ppl 2.00 | wps 4315 | ups 0 | wpb 11135.873 | bsz 412.987 | num_updates 6032 | lr 0.000407164 | gnorm 0.285 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 10837
device xla:8
| epoch 004 | loss 1.000 | nll_loss 1.000 | ppl 2.00 | wps 4310 | ups 0 | wpb 11122.291 | bsz 410.992 | num_updates 6032 | lr 0.000407164 | gnorm 0.303 | clip 0.000 | oom 0.000 | wall 15566 | train_wall 10923
Epoch 4 Tracker Rates:
Rate=297.76, Global Rate=309.87
Rate=297.85, Global Rate=309.87
Rate=297.93, Global Rate=309.87
Rate=297.87, Global Rate=309.87
Rate=297.73, Global Rate=309.87
Rate=297.78, Global Rate=309.87
Rate=297.83, Global Rate=309.87
Rate=297.82, Global Rate=309.87
Epoch 4 end 2019-08-26 20:35:28.407466
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 48626
Counter: 01d01h15m53s099ms883.304us
ValueRate: 06s152ms257.765us / second
Rate: 5.02656 / second
Percentiles: 1%=01s073ms642.146us; 5%=01s165ms315.603us; 10%=01s169ms103.096us; 20%=01s175ms18.395us; 50%=01s187ms55.415us; 80%=01s286ms263.573us; 90%=01s290ms244.817us; 95%=01s294ms751.204us; 99%=01s300ms308.983us
Metric: InboundData
TotalSamples: 234
Counter: 466.00B
ValueRate: 0.05B / second
Rate: 0.0237009 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 199754
Counter: 21.28GB
ValueRate: 516.37KB / second
Rate: 21.1493 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 443591
Counter: 05h17m24s154ms337.163us
ValueRate: 400ms237.259us / second
Rate: 45.2101 / second
Percentiles: 1%=454.272us; 5%=510.179us; 10%=555.363us; 20%=623.030us; 50%=885.700us; 80%=003ms837.541us; 90%=012ms862.493us; 95%=027ms191.764us; 99%=047ms979.750us
Metric: TransferFromServerTime
TotalSamples: 234
Counter: 04s472ms777.431us
ValueRate: 452.928us / second
Rate: 0.0237009 / second
Percentiles: 1%=679.144us; 5%=723.603us; 10%=754.679us; 20%=891.150us; 50%=001ms483.750us; 80%=045ms15.400us; 90%=060ms234.935us; 95%=067ms800.115us; 99%=070ms495.424us
Metric: TransferToServerTime
TotalSamples: 199754
Counter: 53h09m32s921ms175.356us
ValueRate: 05s160ms797.081us / second
Rate: 21.1495 / second
Percentiles: 1%=001ms73.964us; 5%=001ms185.529us; 10%=001ms268.535us; 20%=001ms402.675us; 50%=002ms188.172us; 80%=907ms660.625us; 90%=01s023ms774.002us; 95%=01s080ms667.081us; 99%=01s092ms821.148us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 48569
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 35815857
Counter: CreateXlaTensor
Value: 233495368
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 35808825
Counter: DestroyXlaTensor
Value: 233489360
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 35808825
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 21282
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 234
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 20:35:32.433760, device xla:6, step 0, Compiles=57, _local_scalar_dense=234
validation/ 2019-08-26 20:35:32.436174, device xla:7, step 0, Compiles=57, _local_scalar_dense=234
validation/ 2019-08-26 20:35:32.590435, device xla:3, step 0, Compiles=57, _local_scalar_dense=234
validation/ 2019-08-26 20:35:32.594505, device xla:4, step 0, Compiles=57, _local_scalar_dense=234
validation/ 2019-08-26 20:35:32.602074, device xla:5, step 0, Compiles=57, _local_scalar_dense=234
validation/ 2019-08-26 20:35:32.605306, device xla:2, step 0, Compiles=57, _local_scalar_dense=234
validation/ 2019-08-26 20:35:32.608221, device xla:8, step 0, Compiles=57, _local_scalar_dense=234
validation/ 2019-08-26 20:35:32.615683, device xla:1, step 0, Compiles=57, _local_scalar_dense=234
validation stats on subset "valid" - 2019-08-26 20:35:38.640699
| epoch 004 | valid on 'valid' subset | loss 4.312 | nll_loss 2.484 | ppl 5.60 | num_updates 6032
| epoch 004 | valid on 'valid' subset | loss 4.312 | nll_loss 2.484 | ppl 5.60 | num_updates 6032
| epoch 004 | valid on 'valid' subset | loss 4.344 | nll_loss 2.594 | ppl 6.04 | num_updates 6032
| epoch 004 | valid on 'valid' subset | loss 4.375 | nll_loss 2.594 | ppl 6.04 | num_updates 6032
| epoch 004 | valid on 'valid' subset | loss 4.312 | nll_loss 2.500 | ppl 5.66 | num_updates 6032
| epoch 004 | valid on 'valid' subset | loss 4.312 | nll_loss 2.516 | ppl 5.72 | num_updates 6032
| epoch 004 | valid on 'valid' subset | loss 4.344 | nll_loss 2.531 | ppl 5.78 | num_updates 6032
| epoch 004 | valid on 'valid' subset | loss 4.375 | nll_loss 2.594 | ppl 6.04 | num_updates 6032
old learning rate: 0.0004701524481395373
new learning rate: 0.00040716396374028516
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 48747
Counter: 01d01h16m39s115ms179.109us
ValueRate: 06s051ms436.419us / second
Rate: 5.37945 / second
Percentiles: 1%=377ms757.224us; 5%=378ms905.267us; 10%=392ms944.341us; 20%=01s169ms658.526us; 50%=01s183ms566.813us; 80%=01s285ms444.750us; 90%=01s290ms741.923us; 95%=01s293ms302.908us; 99%=01s300ms308.983us
Metric: InboundData
TotalSamples: 259
Counter: 515.00B
ValueRate: 0.05B / second
Rate: 0.0262053 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 199994
Counter: 21.32GB
ValueRate: 963.46KB / second
Rate: 21.0564 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 444207
Counter: 05h18m55s923ms502.188us
ValueRate: 02s594ms437.728us / second
Rate: 50.4866 / second
Percentiles: 1%=420.972us; 5%=473.309us; 10%=514.813us; 20%=573.024us; 50%=765.967us; 80%=001ms342.343us; 90%=020ms441.341us; 95%=374ms371.118us; 99%=388ms387.094us
Metric: TransferFromServerTime
TotalSamples: 259
Counter: 05s906ms821.959us
ValueRate: 496.366us / second
Rate: 0.0262053 / second
Percentiles: 1%=679.144us; 5%=723.781us; 10%=758.875us; 20%=924.009us; 50%=002ms515.528us; 80%=045ms857.826us; 90%=060ms996.139us; 95%=066ms819.161us; 99%=070ms495.424us
Metric: TransferToServerTime
TotalSamples: 199994
Counter: 53h09m01s842ms375.146us
ValueRate: 04s345ms856.250us / second
Rate: 21.0563 / second
Percentiles: 1%=001ms73.964us; 5%=001ms205.588us; 10%=001ms310.882us; 20%=001ms449.434us; 50%=002ms131.120us; 80%=253ms880.742us; 90%=969ms47.549us; 95%=01s045ms222.114us; 99%=01s091ms72.426us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 48690
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 35817466
Counter: CreateXlaTensor
Value: 233630185
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 35810432
Counter: DestroyXlaTensor
Value: 233624177
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 35810434
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 21282
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 259
Epoch 5 begin 2019-08-26 20:35:38.844246
training torch.Size([256, 64])/ 2019-08-26 20:35:46.541084, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:35:46.648782, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:35:46.668848, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:35:46.733403, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:35:46.786218, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:35:46.821517, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:35:46.959325, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:35:47.565237, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 20:38:35.291723, device xla:7, step 100, Rate=60.83, Global Rate=295.90, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:38:35.311347, device xla:3, step 100, Rate=60.76, Global Rate=295.87, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:38:35.296505, device xla:5, step 100, Rate=60.72, Global Rate=295.89, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:38:35.313664, device xla:8, step 100, Rate=61.04, Global Rate=295.86, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:38:35.319039, device xla:1, step 100, Rate=60.67, Global Rate=295.85, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:38:35.325296, device xla:6, step 100, Rate=60.72, Global Rate=295.84, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:38:35.304101, device xla:4, step 100, Rate=60.78, Global Rate=295.88, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:38:35.334948, device xla:2, step 100, Rate=60.73, Global Rate=295.82, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:41:19.948962, device xla:1, step 200, Rate=110.74, Global Rate=303.24, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:41:19.942064, device xla:5, step 200, Rate=110.77, Global Rate=303.24, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:41:19.954096, device xla:6, step 200, Rate=110.77, Global Rate=303.23, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 20:41:19.955756, device xla:7, step 200, Rate=110.85, Global Rate=303.23, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:41:19.976779, device xla:3, step 200, Rate=110.80, Global Rate=303.21, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:41:19.988855, device xla:2, step 200, Rate=110.78, Global Rate=303.20, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:41:19.980649, device xla:8, step 200, Rate=111.02, Global Rate=303.21, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:41:19.962308, device xla:4, step 200, Rate=110.81, Global Rate=303.23, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:44:03.461658, device xla:1, step 300, Rate=151.21, Global Rate=306.46, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:44:03.466285, device xla:7, step 300, Rate=151.31, Global Rate=306.46, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:44:03.486253, device xla:2, step 300, Rate=151.25, Global Rate=306.45, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:44:03.478325, device xla:4, step 300, Rate=151.27, Global Rate=306.45, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 20:44:03.471137, device xla:6, step 300, Rate=151.24, Global Rate=306.46, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:44:03.491557, device xla:8, step 300, Rate=151.44, Global Rate=306.45, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:44:03.498869, device xla:5, step 300, Rate=151.22, Global Rate=306.44, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:44:03.507263, device xla:3, step 300, Rate=151.26, Global Rate=306.44, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:46:45.177323, device xla:1, step 400, Rate=184.29, Global Rate=308.94, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:46:45.181712, device xla:8, step 400, Rate=184.49, Global Rate=308.94, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:46:45.186247, device xla:7, step 400, Rate=184.37, Global Rate=308.93, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:46:45.190824, device xla:2, step 400, Rate=184.33, Global Rate=308.93, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:46:45.207637, device xla:4, step 400, Rate=184.33, Global Rate=308.92, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:46:45.200236, device xla:3, step 400, Rate=184.33, Global Rate=308.93, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:46:45.219392, device xla:6, step 400, Rate=184.30, Global Rate=308.92, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 20:46:45.233103, device xla:5, step 400, Rate=184.29, Global Rate=308.91, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:49:29.491828, device xla:7, step 500, Rate=209.82, Global Rate=309.47, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:49:29.496514, device xla:3, step 500, Rate=209.79, Global Rate=309.46, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:49:29.512913, device xla:8, step 500, Rate=209.90, Global Rate=309.46, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:49:29.503565, device xla:2, step 500, Rate=209.78, Global Rate=309.46, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:49:29.522735, device xla:6, step 500, Rate=209.76, Global Rate=309.45, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:49:29.532522, device xla:1, step 500, Rate=209.74, Global Rate=309.45, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:49:29.524724, device xla:4, step 500, Rate=209.79, Global Rate=309.45, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:49:29.534714, device xla:5, step 500, Rate=209.76, Global Rate=309.45, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:52:13.016189, device xla:7, step 600, Rate=230.47, Global Rate=310.07, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 20:52:13.035739, device xla:1, step 600, Rate=230.42, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:52:13.048647, device xla:8, step 600, Rate=230.54, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:52:13.028010, device xla:3, step 600, Rate=230.45, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:52:13.055655, device xla:6, step 600, Rate=230.43, Global Rate=310.05, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 20:52:13.020679, device xla:5, step 600, Rate=230.44, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:52:13.040925, device xla:4, step 600, Rate=230.45, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:52:13.060498, device xla:2, step 600, Rate=230.43, Global Rate=310.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:54:58.161633, device xla:8, step 700, Rate=246.45, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:54:58.166169, device xla:1, step 700, Rate=246.35, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:54:58.154858, device xla:5, step 700, Rate=246.36, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:54:58.147927, device xla:3, step 700, Rate=246.38, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:54:58.185122, device xla:7, step 700, Rate=246.38, Global Rate=310.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:54:58.206179, device xla:6, step 700, Rate=246.35, Global Rate=310.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:54:58.172912, device xla:2, step 700, Rate=246.37, Global Rate=310.06, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:54:58.193105, device xla:4, step 700, Rate=246.37, Global Rate=310.05, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 20:57:42.451803, device xla:1, step 800, Rate=259.41, Global Rate=310.26, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:57:42.467845, device xla:7, step 800, Rate=259.43, Global Rate=310.25, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:57:42.437771, device xla:4, step 800, Rate=259.44, Global Rate=310.26, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:57:42.444889, device xla:8, step 800, Rate=259.49, Global Rate=310.26, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:57:42.473355, device xla:6, step 800, Rate=259.42, Global Rate=310.25, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:57:42.431219, device xla:5, step 800, Rate=259.42, Global Rate=310.26, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 20:57:42.456699, device xla:2, step 800, Rate=259.42, Global Rate=310.26, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 20:57:42.481155, device xla:3, step 800, Rate=259.41, Global Rate=310.25, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:00:25.548635, device xla:7, step 900, Rate=270.34, Global Rate=310.66, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:00:25.553263, device xla:8, step 900, Rate=270.37, Global Rate=310.66, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:00:25.558625, device xla:1, step 900, Rate=270.31, Global Rate=310.66, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:00:25.542042, device xla:4, step 900, Rate=270.33, Global Rate=310.66, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:00:25.605900, device xla:6, step 900, Rate=270.30, Global Rate=310.65, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:00:25.593560, device xla:5, step 900, Rate=270.30, Global Rate=310.65, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:00:25.584328, device xla:3, step 900, Rate=270.31, Global Rate=310.65, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:00:25.563358, device xla:2, step 900, Rate=270.32, Global Rate=310.66, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:03:10.120782, device xla:4, step 1000, Rate=278.49, Global Rate=310.71, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:03:10.127470, device xla:5, step 1000, Rate=278.48, Global Rate=310.70, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:03:10.143415, device xla:3, step 1000, Rate=278.48, Global Rate=310.70, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:03:10.146457, device xla:8, step 1000, Rate=278.51, Global Rate=310.70, Compiles=57, _local_scalar_dense=259training torch.Size([512, 32])/ 2019-08-26 21:03:10.154727, device xla:6, step 1000, Rate=278.47, Global Rate=310.70, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:03:10.158398, device xla:1, step 1000, Rate=278.46, Global Rate=310.70, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:03:10.133949, device xla:2, step 1000, Rate=278.48, Global Rate=310.70, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:03:10.163632, device xla:7, step 1000, Rate=278.48, Global Rate=310.70, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:05:52.894608, device xla:1, step 1100, Rate=285.69, Global Rate=311.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:05:52.887612, device xla:4, step 1100, Rate=285.70, Global Rate=311.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:05:52.906246, device xla:8, step 1100, Rate=285.72, Global Rate=311.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:05:52.899355, device xla:6, step 1100, Rate=285.70, Global Rate=311.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:05:52.922481, device xla:3, step 1100, Rate=285.69, Global Rate=311.05, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:05:52.935415, device xla:5, step 1100, Rate=285.68, Global Rate=311.04, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:05:52.944409, device xla:7, step 1100, Rate=285.69, Global Rate=311.04, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 21:05:52.912605, device xla:2, step 1100, Rate=285.69, Global Rate=311.05, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:08:36.412693, device xla:8, step 1200, Rate=291.21, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:08:36.417300, device xla:1, step 1200, Rate=291.17, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:08:36.440822, device xla:6, step 1200, Rate=291.17, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:08:36.450856, device xla:7, step 1200, Rate=291.18, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:08:36.456092, device xla:2, step 1200, Rate=291.17, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:08:36.443456, device xla:5, step 1200, Rate=291.17, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:08:36.421780, device xla:4, step 1200, Rate=291.18, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:08:36.429187, device xla:3, step 1200, Rate=291.18, Global Rate=311.22, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:11:20.681258, device xla:8, step 1300, Rate=295.30, Global Rate=311.26, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:11:20.699877, device xla:1, step 1300, Rate=295.27, Global Rate=311.25, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:11:20.706375, device xla:7, step 1300, Rate=295.28, Global Rate=311.25, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 21:11:20.711406, device xla:3, step 1300, Rate=295.28, Global Rate=311.25, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:11:20.719356, device xla:5, step 1300, Rate=295.27, Global Rate=311.25, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:11:20.726784, device xla:2, step 1300, Rate=295.27, Global Rate=311.25, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 21:11:20.685966, device xla:6, step 1300, Rate=295.28, Global Rate=311.26, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:11:20.692920, device xla:4, step 1300, Rate=295.28, Global Rate=311.26, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:14:01.227600, device xla:1, step 1400, Rate=300.01, Global Rate=311.79, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:14:01.238053, device xla:4, step 1400, Rate=300.00, Global Rate=311.79, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:14:01.253146, device xla:6, step 1400, Rate=300.00, Global Rate=311.79, Compiles=57, _local_scalar_dense=259
training torch.Size([1024, 16])/ 2019-08-26 21:14:01.233199, device xla:7, step 1400, Rate=300.02, Global Rate=311.79, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:14:01.245055, device xla:5, step 1400, Rate=300.01, Global Rate=311.79, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:14:01.256363, device xla:2, step 1400, Rate=300.00, Global Rate=311.79, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:14:01.268663, device xla:3, step 1400, Rate=300.00, Global Rate=311.79, Compiles=57, _local_scalar_dense=259
training torch.Size([512, 32])/ 2019-08-26 21:14:01.283338, device xla:8, step 1400, Rate=300.00, Global Rate=311.78, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.215509, device xla:7, step 1500, Rate=303.23, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.219368, device xla:8, step 1500, Rate=303.24, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.224347, device xla:6, step 1500, Rate=303.22, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.230407, device xla:1, step 1500, Rate=303.21, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.237960, device xla:3, step 1500, Rate=303.22, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.250606, device xla:5, step 1500, Rate=303.21, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.264684, device xla:4, step 1500, Rate=303.20, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
training torch.Size([256, 64])/ 2019-08-26 21:16:43.280426, device xla:2, step 1500, Rate=303.20, Global Rate=312.07, Compiles=57, _local_scalar_dense=259
Epoch 5 Training stats:
device xla:1
| epoch 005 | loss 0.797 | nll_loss 0.797 | ppl 1.74 | wps 4664 | ups 0 | wpb 11167.067 | bsz 408.412 | num_updates 7540 | lr 0.000364179 | gnorm 0.230 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 13240
device xla:2
| epoch 005 | loss 0.797 | nll_loss 0.797 | ppl 1.74 | wps 4667 | ups 0 | wpb 11176.055 | bsz 406.646 | num_updates 7540 | lr 0.000364179 | gnorm 0.222 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 14073
device xla:3
| epoch 005 | loss 0.801 | nll_loss 0.801 | ppl 1.74 | wps 4655 | ups 0 | wpb 11145.393 | bsz 407.767 | num_updates 7540 | lr 0.000364179 | gnorm 0.238 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 12423
device xla:4
| epoch 005 | loss 0.801 | nll_loss 0.801 | ppl 1.74 | wps 4644 | ups 0 | wpb 11119.313 | bsz 413.844 | num_updates 7540 | lr 0.000364179 | gnorm 0.236 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 14077
device xla:5
| epoch 005 | loss 0.797 | nll_loss 0.797 | ppl 1.74 | wps 4669 | ups 0 | wpb 11180.975 | bsz 411.399 | num_updates 7540 | lr 0.000364179 | gnorm 0.225 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 14093
device xla:6
| epoch 005 | loss 0.801 | nll_loss 0.801 | ppl 1.74 | wps 4643 | ups 0 | wpb 11117.006 | bsz 408.955 | num_updates 7540 | lr 0.000364179 | gnorm 0.228 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 14019
device xla:7
| epoch 005 | loss 0.801 | nll_loss 0.801 | ppl 1.74 | wps 4653 | ups 0 | wpb 11141.577 | bsz 410.924 | num_updates 7540 | lr 0.000364179 | gnorm 0.228 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 12896
device xla:8
| epoch 005 | loss 0.801 | nll_loss 0.801 | ppl 1.74 | wps 4645 | ups 0 | wpb 11122.895 | bsz 410.551 | num_updates 7540 | lr 0.000364179 | gnorm 0.243 | clip 0.000 | oom 0.000 | wall 18055 | train_wall 12974
Epoch 5 Tracker Rates:
Rate=301.78, Global Rate=311.98
Rate=301.98, Global Rate=311.98
Rate=301.81, Global Rate=311.98
Rate=301.91, Global Rate=311.98
Rate=301.86, Global Rate=311.98
Rate=301.76, Global Rate=311.98
Rate=301.73, Global Rate=311.98
Rate=301.75, Global Rate=311.98
Epoch 5 end 2019-08-26 21:16:57.067182
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 60811
Counter: 01d11h23m23s577ms414.125us
ValueRate: 06s152ms852.303us / second
Rate: 4.99616 / second
Percentiles: 1%=01s161ms845.255us; 5%=01s168ms324.691us; 10%=01s172ms562.595us; 20%=01s176ms466.028us; 50%=01s272ms789.898us; 80%=01s288ms231.560us; 90%=01s291ms305.079us; 95%=01s293ms218.161us; 99%=01s298ms318.045us
Metric: InboundData
TotalSamples: 299
Counter: 595.00B
ValueRate: 0.05B / second
Rate: 0.0241876 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 249731
Counter: 25.81GB
ValueRate: 498.89KB / second
Rate: 20.5121 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 553415
Counter: 05h02m11s194ms814.716us
ValueRate: 344ms228.706us / second
Rate: 45.8777 / second
Percentiles: 1%=445.898us; 5%=502.680us; 10%=534.036us; 20%=590.738us; 50%=808.495us; 80%=003ms556.526us; 90%=010ms489.558us; 95%=026ms229.712us; 99%=047ms94.984us
Metric: TransferFromServerTime
TotalSamples: 299
Counter: 05s187ms196.792us
ValueRate: 419.618us / second
Rate: 0.0241876 / second
Percentiles: 1%=630.348us; 5%=712.931us; 10%=750.829us; 20%=866.352us; 50%=001ms346.922us; 80%=043ms629.680us; 90%=058ms759.246us; 95%=064ms799.204us; 99%=070ms495.424us
Metric: TransferToServerTime
TotalSamples: 249731
Counter: 01d02h00m09s402ms49.943us
ValueRate: 05s090ms46.999us / second
Rate: 20.9718 / second
Percentiles: 1%=001ms68.477us; 5%=001ms202.645us; 10%=001ms292.528us; 20%=001ms415.668us; 50%=003ms652.679us; 80%=883ms869.764us; 90%=997ms801.701us; 95%=01s071ms987.269us; 99%=01s099ms690.324us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 60754
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 44769827
Counter: CreateXlaTensor
Value: 291901817
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 44762795
Counter: DestroyXlaTensor
Value: 291895809
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 44762795
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 21652
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 299
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 21:17:00.944790, device xla:3, step 0, Compiles=57, _local_scalar_dense=299
validation/ 2019-08-26 21:17:00.952628, device xla:1, step 0, Compiles=57, _local_scalar_dense=299
validation/ 2019-08-26 21:17:00.954935, device xla:2, step 0, Compiles=57, _local_scalar_dense=299
validation/ 2019-08-26 21:17:01.115648, device xla:4, step 0, Compiles=57, _local_scalar_dense=299
validation/ 2019-08-26 21:17:01.117293, device xla:5, step 0, Compiles=57, _local_scalar_dense=299
validation/ 2019-08-26 21:17:01.120034, device xla:8, step 0, Compiles=57, _local_scalar_dense=299
validation/ 2019-08-26 21:17:01.129290, device xla:7, step 0, Compiles=57, _local_scalar_dense=299
validation/ 2019-08-26 21:17:01.141335, device xla:6, step 0, Compiles=57, _local_scalar_dense=299
validation stats on subset "valid" - 2019-08-26 21:17:07.161609
| epoch 005 | valid on 'valid' subset | loss 4.156 | nll_loss 2.344 | ppl 5.08 | num_updates 7540
| epoch 005 | valid on 'valid' subset | loss 4.188 | nll_loss 2.344 | ppl 5.08 | num_updates 7540
| epoch 005 | valid on 'valid' subset | loss 4.250 | nll_loss 2.438 | ppl 5.42 | num_updates 7540
| epoch 005 | valid on 'valid' subset | loss 4.250 | nll_loss 2.453 | ppl 5.48 | num_updates 7540
| epoch 005 | valid on 'valid' subset | loss 4.188 | nll_loss 2.344 | ppl 5.08 | num_updates 7540
| epoch 005 | valid on 'valid' subset | loss 4.188 | nll_loss 2.359 | ppl 5.13 | num_updates 7540
| epoch 005 | valid on 'valid' subset | loss 4.219 | nll_loss 2.391 | ppl 5.24 | num_updates 7540
| epoch 005 | valid on 'valid' subset | loss 4.281 | nll_loss 2.453 | ppl 5.48 | num_updates 7540
old learning rate: 0.00040716396374028516
new learning rate: 0.00036417852036461484
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 60932
Counter: 01d11h00m09s694ms914.508us
ValueRate: 06s068ms477.373us / second
Rate: 5.35569 / second
Percentiles: 1%=377ms40.518us; 5%=379ms305.022us; 10%=392ms765.630us; 20%=01s171ms375.965us; 50%=01s189ms580.907us; 80%=01s288ms582.032us; 90%=01s291ms911.271us; 95%=01s293ms968.689us; 99%=01s298ms318.045us
Metric: InboundData
TotalSamples: 324
Counter: 644.00B
ValueRate: 0.05B / second
Rate: 0.0261885 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 249971
Counter: 25.86GB
ValueRate: 948.55KB / second
Rate: 20.7305 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 554030
Counter: 05h03m40s484ms387.509us
ValueRate: 02s595ms201.345us / second
Rate: 52.8927 / second
Percentiles: 1%=434.534us; 5%=491.433us; 10%=514.803us; 20%=559.272us; 50%=745.626us; 80%=001ms338.357us; 90%=020ms99.030us; 95%=374ms364.523us; 99%=388ms78.098us
Metric: TransferFromServerTime
TotalSamples: 324
Counter: 05s306ms509.569us
ValueRate: 428.838us / second
Rate: 0.0261885 / second
Percentiles: 1%=641.086us; 5%=712.931us; 10%=754.679us; 20%=874.360us; 50%=001ms337.107us; 80%=042ms902.229us; 90%=057ms323.173us; 95%=063ms320.584us; 99%=070ms297.500us
Metric: TransferToServerTime
TotalSamples: 249971
Counter: 01d02h01m36s461ms305.928us
ValueRate: 04s384ms554.767us / second
Rate: 21.1341 / second
Percentiles: 1%=001ms106.856us; 5%=001ms242.075us; 10%=001ms335.773us; 20%=002ms506.880us; 50%=003ms589.765us; 80%=252ms68.673us; 90%=957ms744.685us; 95%=01s017ms664.932us; 99%=01s085ms838.856us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 60875
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 44771436
Counter: CreateXlaTensor
Value: 292036634
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 44764402
Counter: DestroyXlaTensor
Value: 292030626
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 44764404
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 21652
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 324
Epoch 6 begin 2019-08-26 21:17:07.183809
training torch.Size([512, 32])/ 2019-08-26 21:17:15.293581, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:17:15.316755, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:17:15.414639, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:17:15.563380, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:17:15.587119, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:17:15.601854, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:17:16.016203, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:17:16.345357, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:20:02.890591, device xla:3, step 100, Rate=61.11, Global Rate=297.24, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:20:02.913975, device xla:7, step 100, Rate=61.35, Global Rate=297.20, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:20:02.916465, device xla:1, step 100, Rate=61.13, Global Rate=297.20, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:20:02.904263, device xla:4, step 100, Rate=61.19, Global Rate=297.22, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:20:02.927278, device xla:6, step 100, Rate=61.19, Global Rate=297.18, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:20:02.921790, device xla:5, step 100, Rate=61.20, Global Rate=297.19, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:20:02.935129, device xla:2, step 100, Rate=61.08, Global Rate=297.16, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:20:02.896819, device xla:8, step 100, Rate=61.48, Global Rate=297.23, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:22:46.714607, device xla:5, step 200, Rate=111.48, Global Rate=304.70, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:22:46.726471, device xla:1, step 200, Rate=111.42, Global Rate=304.68, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:22:46.719160, device xla:6, step 200, Rate=111.47, Global Rate=304.69, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:22:46.741432, device xla:3, step 200, Rate=111.38, Global Rate=304.67, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:22:46.748204, device xla:7, step 200, Rate=111.59, Global Rate=304.67, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:22:46.729552, device xla:8, step 200, Rate=111.69, Global Rate=304.68, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:22:46.765817, device xla:4, step 200, Rate=111.45, Global Rate=304.65, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:22:46.750538, device xla:2, step 200, Rate=111.38, Global Rate=304.66, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:25:30.359038, device xla:3, step 300, Rate=151.69, Global Rate=307.37, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:25:30.370603, device xla:8, step 300, Rate=151.93, Global Rate=307.37, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:25:30.377915, device xla:1, step 300, Rate=151.71, Global Rate=307.36, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:25:30.384421, device xla:5, step 300, Rate=151.75, Global Rate=307.36, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:25:30.363862, device xla:6, step 300, Rate=151.75, Global Rate=307.37, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:25:30.389456, device xla:7, step 300, Rate=151.84, Global Rate=307.36, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:25:30.393455, device xla:4, step 300, Rate=151.74, Global Rate=307.35, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:25:30.403610, device xla:2, step 300, Rate=151.67, Global Rate=307.35, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:28:14.060228, device xla:1, step 400, Rate=183.93, Global Rate=308.70, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:28:14.071659, device xla:3, step 400, Rate=183.90, Global Rate=308.70, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:28:14.088794, device xla:7, step 400, Rate=184.03, Global Rate=308.69, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:28:14.064823, device xla:6, step 400, Rate=183.96, Global Rate=308.70, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:28:14.096189, device xla:2, step 400, Rate=183.89, Global Rate=308.69, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:28:14.077745, device xla:8, step 400, Rate=184.09, Global Rate=308.70, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:28:14.110047, device xla:5, step 400, Rate=183.94, Global Rate=308.68, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:28:14.118657, device xla:4, step 400, Rate=183.93, Global Rate=308.68, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:30:58.708795, device xla:1, step 500, Rate=209.33, Global Rate=309.15, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:30:58.713191, device xla:8, step 500, Rate=209.47, Global Rate=309.15, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:30:58.720154, device xla:6, step 500, Rate=209.35, Global Rate=309.15, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:30:58.758931, device xla:7, step 500, Rate=209.41, Global Rate=309.13, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:30:58.772619, device xla:4, step 500, Rate=209.34, Global Rate=309.13, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:30:58.751117, device xla:5, step 500, Rate=209.35, Global Rate=309.14, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:30:58.726614, device xla:2, step 500, Rate=209.31, Global Rate=309.15, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:30:58.743437, device xla:3, step 500, Rate=209.31, Global Rate=309.14, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:33:42.301291, device xla:5, step 600, Rate=230.09, Global Rate=309.78, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:33:42.305777, device xla:1, step 600, Rate=230.06, Global Rate=309.78, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:33:42.317821, device xla:2, step 600, Rate=230.05, Global Rate=309.78, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:33:42.334795, device xla:3, step 600, Rate=230.04, Global Rate=309.77, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:33:42.339938, device xla:7, step 600, Rate=230.13, Global Rate=309.77, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:33:42.310519, device xla:8, step 600, Rate=230.17, Global Rate=309.78, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:33:42.321336, device xla:4, step 600, Rate=230.08, Global Rate=309.78, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:33:42.349240, device xla:6, step 600, Rate=230.06, Global Rate=309.77, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:36:25.227621, device xla:5, step 700, Rate=246.92, Global Rate=310.41, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:36:25.232008, device xla:3, step 700, Rate=246.89, Global Rate=310.41, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:36:25.243912, device xla:2, step 700, Rate=246.89, Global Rate=310.41, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:36:25.247622, device xla:6, step 700, Rate=246.91, Global Rate=310.41, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:36:25.236896, device xla:8, step 700, Rate=246.99, Global Rate=310.41, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:36:25.254801, device xla:4, step 700, Rate=246.91, Global Rate=310.41, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:36:25.266666, device xla:1, step 700, Rate=246.88, Global Rate=310.40, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:36:25.275090, device xla:7, step 700, Rate=246.95, Global Rate=310.40, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:39:08.740453, device xla:3, step 800, Rate=260.14, Global Rate=310.75, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:39:08.745180, device xla:1, step 800, Rate=260.15, Global Rate=310.75, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:39:08.757974, device xla:7, step 800, Rate=260.19, Global Rate=310.75, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:39:08.760263, device xla:5, step 800, Rate=260.16, Global Rate=310.75, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:39:08.750305, device xla:8, step 800, Rate=260.21, Global Rate=310.75, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:39:08.761915, device xla:6, step 800, Rate=260.15, Global Rate=310.75, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:39:08.774966, device xla:2, step 800, Rate=260.13, Global Rate=310.74, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:39:08.786118, device xla:4, step 800, Rate=260.15, Global Rate=310.74, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:41:52.333820, device xla:3, step 900, Rate=270.71, Global Rate=311.00, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:41:52.345190, device xla:5, step 900, Rate=270.72, Global Rate=310.99, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:41:52.350764, device xla:2, step 900, Rate=270.70, Global Rate=310.99, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:41:52.353076, device xla:1, step 900, Rate=270.71, Global Rate=310.99, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:41:52.360421, device xla:7, step 900, Rate=270.75, Global Rate=310.99, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:41:52.364462, device xla:8, step 900, Rate=270.76, Global Rate=310.99, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:41:52.338398, device xla:6, step 900, Rate=270.72, Global Rate=310.99, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:41:52.374286, device xla:4, step 900, Rate=270.72, Global Rate=310.99, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:44:34.658510, device xla:1, step 1000, Rate=279.66, Global Rate=311.43, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:44:34.663958, device xla:8, step 1000, Rate=279.70, Global Rate=311.43, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:44:34.670832, device xla:3, step 1000, Rate=279.64, Global Rate=311.43, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:44:34.701131, device xla:2, step 1000, Rate=279.64, Global Rate=311.42, Compiles=57, _local_scalar_dense=324training torch.Size([256, 64])/ 2019-08-26 21:44:34.677209, device xla:4, step 1000, Rate=279.66, Global Rate=311.43, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:44:34.707707, device xla:7, step 1000, Rate=279.67, Global Rate=311.42, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:44:34.693669, device xla:5, step 1000, Rate=279.65, Global Rate=311.43, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:44:34.719796, device xla:6, step 1000, Rate=279.64, Global Rate=311.42, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:47:18.010960, device xla:5, step 1100, Rate=286.42, Global Rate=311.61, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:47:18.023435, device xla:8, step 1100, Rate=286.44, Global Rate=311.61, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:47:18.016622, device xla:6, step 1100, Rate=286.42, Global Rate=311.61, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:47:18.032777, device xla:2, step 1100, Rate=286.40, Global Rate=311.61, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:47:18.059416, device xla:7, step 1100, Rate=286.42, Global Rate=311.60, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:47:18.053168, device xla:1, step 1100, Rate=286.39, Global Rate=311.61, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:47:18.068088, device xla:3, step 1100, Rate=286.38, Global Rate=311.60, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:47:18.041447, device xla:4, step 1100, Rate=286.41, Global Rate=311.61, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:50:03.781958, device xla:1, step 1200, Rate=290.90, Global Rate=311.38, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:50:03.817249, device xla:3, step 1200, Rate=290.89, Global Rate=311.38, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:50:03.802115, device xla:2, step 1200, Rate=290.90, Global Rate=311.38, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:50:03.786392, device xla:8, step 1200, Rate=290.93, Global Rate=311.38, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:50:03.830294, device xla:5, step 1200, Rate=290.89, Global Rate=311.37, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:50:03.793348, device xla:4, step 1200, Rate=290.91, Global Rate=311.38, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:50:03.822915, device xla:6, step 1200, Rate=290.90, Global Rate=311.38, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:50:03.840891, device xla:7, step 1200, Rate=290.91, Global Rate=311.37, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:52:46.739148, device xla:5, step 1300, Rate=295.57, Global Rate=311.60, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:52:46.752209, device xla:8, step 1300, Rate=295.58, Global Rate=311.59, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:52:46.744163, device xla:6, step 1300, Rate=295.57, Global Rate=311.60, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:52:46.755982, device xla:7, step 1300, Rate=295.58, Global Rate=311.59, Compiles=57, _local_scalar_dense=324
training torch.Size([1024, 16])/ 2019-08-26 21:52:46.766298, device xla:4, step 1300, Rate=295.56, Global Rate=311.59, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:52:46.776912, device xla:3, step 1300, Rate=295.55, Global Rate=311.59, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:52:46.789375, device xla:1, step 1300, Rate=295.54, Global Rate=311.59, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:52:46.799181, device xla:2, step 1300, Rate=295.54, Global Rate=311.59, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:55:27.588226, device xla:7, step 1400, Rate=300.13, Global Rate=312.07, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:55:27.601227, device xla:2, step 1400, Rate=300.11, Global Rate=312.06, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:55:27.594597, device xla:8, step 1400, Rate=300.13, Global Rate=312.07, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:55:27.633881, device xla:5, step 1400, Rate=300.10, Global Rate=312.06, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:55:27.617321, device xla:6, step 1400, Rate=300.11, Global Rate=312.06, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:55:27.610137, device xla:1, step 1400, Rate=300.11, Global Rate=312.06, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:55:27.624909, device xla:3, step 1400, Rate=300.10, Global Rate=312.06, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:55:27.641183, device xla:4, step 1400, Rate=300.10, Global Rate=312.06, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:58:10.669043, device xla:1, step 1500, Rate=302.88, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:58:10.673690, device xla:3, step 1500, Rate=302.88, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:58:10.678628, device xla:8, step 1500, Rate=302.89, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:58:10.688205, device xla:7, step 1500, Rate=302.89, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:58:10.691283, device xla:2, step 1500, Rate=302.88, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
training torch.Size([512, 32])/ 2019-08-26 21:58:10.680692, device xla:6, step 1500, Rate=302.88, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:58:10.711904, device xla:4, step 1500, Rate=302.87, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
training torch.Size([256, 64])/ 2019-08-26 21:58:10.705355, device xla:5, step 1500, Rate=302.87, Global Rate=312.19, Compiles=57, _local_scalar_dense=324
Epoch 6 Training stats:
device xla:1
| epoch 006 | loss 0.664 | nll_loss 0.664 | ppl 1.58 | wps 4920 | ups 0 | wpb 11169.801 | bsz 408.248 | num_updates 9048 | lr 0.000332448 | gnorm 0.193 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 15293
device xla:2
| epoch 006 | loss 0.664 | nll_loss 0.664 | ppl 1.58 | wps 4924 | ups 0 | wpb 11179.408 | bsz 407.880 | num_updates 9048 | lr 0.000332448 | gnorm 0.186 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 16115
device xla:3
| epoch 006 | loss 0.668 | nll_loss 0.668 | ppl 1.59 | wps 4893 | ups 0 | wpb 11108.156 | bsz 408.757 | num_updates 9048 | lr 0.000332448 | gnorm 0.199 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 14470
device xla:4
| epoch 006 | loss 0.668 | nll_loss 0.668 | ppl 1.59 | wps 4905 | ups 0 | wpb 11137.040 | bsz 413.029 | num_updates 9048 | lr 0.000332448 | gnorm 0.197 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 16117
device xla:5
| epoch 006 | loss 0.664 | nll_loss 0.664 | ppl 1.58 | wps 4924 | ups 0 | wpb 11180.304 | bsz 411.388 | num_updates 9048 | lr 0.000332448 | gnorm 0.188 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 16141
device xla:6
| epoch 006 | loss 0.668 | nll_loss 0.668 | ppl 1.59 | wps 4905 | ups 0 | wpb 11135.991 | bsz 409.040 | num_updates 9048 | lr 0.000332448 | gnorm 0.190 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 16062
device xla:7
| epoch 006 | loss 0.668 | nll_loss 0.668 | ppl 1.59 | wps 4908 | ups 0 | wpb 11144.304 | bsz 409.860 | num_updates 9048 | lr 0.000332448 | gnorm 0.190 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 14934
device xla:8
| epoch 006 | loss 0.668 | nll_loss 0.668 | ppl 1.59 | wps 4896 | ups 0 | wpb 11114.951 | bsz 410.313 | num_updates 9048 | lr 0.000332448 | gnorm 0.204 | clip 0.000 | oom 0.000 | wall 20543 | train_wall 15023
Epoch 6 Tracker Rates:
Rate=299.19, Global Rate=312.03
Rate=299.27, Global Rate=312.03
Rate=299.21, Global Rate=312.03
Rate=299.35, Global Rate=312.03
Rate=299.33, Global Rate=312.03
Rate=299.24, Global Rate=312.03
Rate=299.27, Global Rate=312.03
Rate=299.24, Global Rate=312.03
Epoch 6 end 2019-08-26 21:58:25.070440
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 72996
Counter: 01d21h07m20s253ms975.456us
ValueRate: 06s117ms188.001us / second
Rate: 4.95459 / second
Percentiles: 1%=01s165ms166.267us; 5%=01s169ms756.175us; 10%=01s171ms159.435us; 20%=01s176ms107.249us; 50%=01s275ms817.867us; 80%=01s288ms797.088us; 90%=01s291ms461.406us; 95%=01s294ms751.918us; 99%=01s298ms386.309us
Metric: InboundData
TotalSamples: 364
Counter: 724.00B
ValueRate: 0.05B / second
Rate: 0.0245123 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 299640
Counter: 30.35GB
ValueRate: 495.43KB / second
Rate: 20.3043 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 663473
Counter: 05h11m07s643ms428.812us
ValueRate: 320ms244.462us / second
Rate: 45.8625 / second
Percentiles: 1%=443.058us; 5%=491.123us; 10%=533.625us; 20%=589.262us; 50%=802.818us; 80%=002ms168.562us; 90%=010ms747.551us; 95%=023ms626.776us; 99%=050ms41.353us
Metric: TransferFromServerTime
TotalSamples: 364
Counter: 06s638ms675.836us
ValueRate: 379.649us / second
Rate: 0.0245123 / second
Percentiles: 1%=641.086us; 5%=714.432us; 10%=754.679us; 20%=856.449us; 50%=001ms310.966us; 80%=040ms723.960us; 90%=056ms318.094us; 95%=062ms362.348us; 99%=070ms297.500us
Metric: TransferToServerTime
TotalSamples: 299640
Counter: 01d11h14m03s390ms898.679us
ValueRate: 05s033ms949.597us / second
Rate: 20.3044 / second
Percentiles: 1%=001ms80.499us; 5%=001ms184.717us; 10%=001ms275.464us; 20%=001ms402.610us; 50%=003ms511.959us; 80%=894ms272.439us; 90%=01s004ms575.388us; 95%=01s075ms716.209us; 99%=01s094ms865.370us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 72939
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 53723729
Counter: CreateXlaTensor
Value: 350308278
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 53716697
Counter: DestroyXlaTensor
Value: 350302270
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 53716697
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 21917
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 364
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 21:58:29.172039, device xla:2, step 0, Compiles=57, _local_scalar_dense=364
validation/ 2019-08-26 21:58:29.181537, device xla:8, step 0, Compiles=57, _local_scalar_dense=364
validation/ 2019-08-26 21:58:29.185279, device xla:4, step 0, Compiles=57, _local_scalar_dense=364
validation/ 2019-08-26 21:58:29.188051, device xla:7, step 0, Compiles=57, _local_scalar_dense=364
validation/ 2019-08-26 21:58:29.337976, device xla:5, step 0, Compiles=57, _local_scalar_dense=364
validation/ 2019-08-26 21:58:29.340449, device xla:1, step 0, Compiles=57, _local_scalar_dense=364
validation/ 2019-08-26 21:58:29.357170, device xla:6, step 0, Compiles=57, _local_scalar_dense=364
validation/ 2019-08-26 21:58:29.360848, device xla:3, step 0, Compiles=57, _local_scalar_dense=364
validation stats on subset "valid" - 2019-08-26 21:58:35.330266
| epoch 006 | valid on 'valid' subset | loss 4.094 | nll_loss 2.250 | ppl 4.76 | num_updates 9048
| epoch 006 | valid on 'valid' subset | loss 4.125 | nll_loss 2.281 | ppl 4.86 | num_updates 9048
| epoch 006 | valid on 'valid' subset | loss 4.188 | nll_loss 2.328 | ppl 5.02 | num_updates 9048
| epoch 006 | valid on 'valid' subset | loss 4.188 | nll_loss 2.375 | ppl 5.19 | num_updates 9048
| epoch 006 | valid on 'valid' subset | loss 4.062 | nll_loss 2.281 | ppl 4.86 | num_updates 9048
| epoch 006 | valid on 'valid' subset | loss 4.062 | nll_loss 2.281 | ppl 4.86 | num_updates 9048
| epoch 006 | valid on 'valid' subset | loss 4.094 | nll_loss 2.281 | ppl 4.86 | num_updates 9048
| epoch 006 | valid on 'valid' subset | loss 4.125 | nll_loss 2.359 | ppl 5.13 | num_updates 9048
old learning rate: 0.00036417852036461484
new learning rate: 0.0003324479842709235
Metric: CompileTime
TotalSamples: 57
Counter: 11h19m06s390ms862.509us
ValueRate: 01s269ms796.350us / second
Rate: 0.00465197 / second
Percentiles: 1%=076ms971.628us; 5%=156ms512.565us; 10%=28s578ms625.257us; 20%=02m31s090ms346.188us; 50%=02m04s949ms359.097us; 80%=06m09s474ms849.253us; 90%=07m09s614ms115.550us; 95%=24m53s516ms570.189us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 73117
Counter: 01d21h08m06s230ms832.028us
ValueRate: 06s011ms288.993us / second
Rate: 5.30992 / second
Percentiles: 1%=377ms666.005us; 5%=378ms304.019us; 10%=391ms123.584us; 20%=01s171ms602.541us; 50%=01s184ms900.205us; 80%=01s287ms101.026us; 90%=01s291ms217.409us; 95%=01s294ms751.918us; 99%=01s298ms386.309us
Metric: InboundData
TotalSamples: 389
Counter: 773.00B
ValueRate: 0.05B / second
Rate: 0.0261777 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 299879
Counter: 30.39GB
ValueRate: 928.32KB / second
Rate: 20.2885 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 664073
Counter: 05h12m37s001ms454.802us
ValueRate: 02s591ms596.256us / second
Rate: 51.0376 / second
Percentiles: 1%=442.101us; 5%=489.937us; 10%=522.071us; 20%=569.120us; 50%=757.951us; 80%=001ms316.966us; 90%=019ms508.392us; 95%=374ms390.720us; 99%=388ms938.027us
Metric: TransferFromServerTime
TotalSamples: 389
Counter: 06s757ms343.219us
ValueRate: 387.439us / second
Rate: 0.0261777 / second
Percentiles: 1%=641.086us; 5%=714.432us; 10%=755.536us; 20%=856.449us; 50%=001ms292.074us; 80%=038ms992.975us; 90%=056ms58.859us; 95%=062ms169.467us; 99%=070ms297.500us
Metric: TransferToServerTime
TotalSamples: 299879
Counter: 01d11h15m31s895ms202.532us
ValueRate: 04s294ms475.325us / second
Rate: 20.2905 / second
Percentiles: 1%=001ms84.058us; 5%=001ms204.367us; 10%=001ms314.142us; 20%=001ms475.584us; 50%=002ms218.108us; 80%=242ms974.450us; 90%=968ms121.363us; 95%=01s059ms499.820us; 99%=01s092ms273.173us
Counter: CachedSyncParamMismatch
Value: 4
Counter: CachedSyncTensors
Value: 73060
Counter: CreateCompileHandles
Value: 54
Counter: CreateDataHandles
Value: 53725337
Counter: CreateXlaTensor
Value: 350443095
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 53718303
Counter: DestroyXlaTensor
Value: 350437087
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 53718305
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 57
Counter: XRTAllocateFromTensor_Empty
Value: 21917
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 389
Epoch 7 begin 2019-08-26 21:58:35.352174
training torch.Size([512, 32])/ 2019-08-26 21:58:43.654628, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 21:58:43.698507, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 21:58:43.774827, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 21:58:43.782395, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 21:58:43.808449, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 21:58:43.930245, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 21:58:43.937293, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 21:58:44.479287, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:01:33.252973, device xla:2, step 100, Rate=60.48, Global Rate=293.64, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:01:33.258952, device xla:1, step 100, Rate=60.42, Global Rate=293.63, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:01:33.263835, device xla:4, step 100, Rate=60.39, Global Rate=293.62, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:01:33.269099, device xla:8, step 100, Rate=60.67, Global Rate=293.62, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:01:33.279001, device xla:6, step 100, Rate=60.41, Global Rate=293.60, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:01:33.270993, device xla:3, step 100, Rate=60.37, Global Rate=293.61, Compiles=57, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:01:33.286934, device xla:7, step 100, Rate=60.42, Global Rate=293.59, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:01:33.303708, device xla:5, step 100, Rate=60.46, Global Rate=293.56, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:04:18.616701, device xla:4, step 200, Rate=110.24, Global Rate=301.42, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:04:18.645850, device xla:1, step 200, Rate=110.25, Global Rate=301.39, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:04:18.628812, device xla:7, step 200, Rate=110.27, Global Rate=301.41, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:04:18.638548, device xla:6, step 200, Rate=110.26, Global Rate=301.40, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:04:18.652211, device xla:8, step 200, Rate=110.45, Global Rate=301.39, Compiles=57, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:04:18.656174, device xla:2, step 200, Rate=110.29, Global Rate=301.38, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:04:18.621436, device xla:3, step 200, Rate=110.23, Global Rate=301.42, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:04:18.667117, device xla:5, step 200, Rate=110.29, Global Rate=301.38, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:07:03.918480, device xla:6, step 300, Rate=150.16, Global Rate=304.14, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:07:03.923694, device xla:1, step 300, Rate=150.16, Global Rate=304.14, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:07:03.928367, device xla:2, step 300, Rate=150.19, Global Rate=304.14, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:07:03.956112, device xla:8, step 300, Rate=150.31, Global Rate=304.12, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:07:03.932938, device xla:5, step 300, Rate=150.19, Global Rate=304.13, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:07:03.949286, device xla:4, step 300, Rate=150.13, Global Rate=304.12, Compiles=57, _local_scalar_dense=389training torch.Size([256, 64])/ 2019-08-26 22:07:03.939701, device xla:7, step 300, Rate=150.16, Global Rate=304.13, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:07:03.960391, device xla:3, step 300, Rate=150.11, Global Rate=304.12, Compiles=57, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:09:48.943390, device xla:4, step 400, Rate=182.16, Global Rate=305.65, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:09:48.948098, device xla:1, step 400, Rate=182.18, Global Rate=305.65, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:09:48.952790, device xla:3, step 400, Rate=182.16, Global Rate=305.64, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:09:48.966808, device xla:7, step 400, Rate=182.18, Global Rate=305.64, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:09:48.981752, device xla:6, step 400, Rate=182.17, Global Rate=305.63, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:09:48.989970, device xla:5, step 400, Rate=182.20, Global Rate=305.63, Compiles=57, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:09:48.959320, device xla:8, step 400, Rate=182.30, Global Rate=305.64, Compiles=57, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:09:49.002998, device xla:2, step 400, Rate=182.19, Global Rate=305.62, Compiles=57, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:14:57.424778, device xla:8, step 500, Rate=179.04, Global Rate=261.62, Compiles=63, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:14:57.392689, device xla:6, step 500, Rate=178.94, Global Rate=261.62, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:14:57.414419, device xla:7, step 500, Rate=178.94, Global Rate=261.62, Compiles=63, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:14:57.405451, device xla:5, step 500, Rate=178.96, Global Rate=261.62, Compiles=63, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:14:57.433351, device xla:1, step 500, Rate=178.94, Global Rate=261.61, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:14:57.427541, device xla:4, step 500, Rate=178.93, Global Rate=261.62, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:14:57.398121, device xla:3, step 500, Rate=178.92, Global Rate=261.62, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:14:57.438313, device xla:2, step 500, Rate=178.95, Global Rate=261.61, Compiles=63, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:17:41.076004, device xla:4, step 600, Rate=205.71, Global Rate=268.96, Compiles=63, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:17:41.085991, device xla:7, step 600, Rate=205.72, Global Rate=268.96, Compiles=63, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:17:41.111362, device xla:2, step 600, Rate=205.72, Global Rate=268.95, Compiles=63, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:17:41.116652, device xla:8, step 600, Rate=205.79, Global Rate=268.95, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:17:41.105670, device xla:1, step 600, Rate=205.71, Global Rate=268.95, Compiles=63, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:17:41.080644, device xla:6, step 600, Rate=205.71, Global Rate=268.96, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:17:41.097919, device xla:3, step 600, Rate=205.69, Global Rate=268.95, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:17:41.122606, device xla:5, step 600, Rate=205.71, Global Rate=268.95, Compiles=63, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:20:36.063473, device xla:4, step 700, Rate=223.09, Global Rate=272.10, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:20:36.068099, device xla:2, step 700, Rate=223.11, Global Rate=272.10, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:20:36.111474, device xla:8, step 700, Rate=223.15, Global Rate=272.09, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:20:36.088977, device xla:7, step 700, Rate=223.09, Global Rate=272.09, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:20:36.069746, device xla:5, step 700, Rate=223.10, Global Rate=272.10, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:20:36.114931, device xla:1, step 700, Rate=223.08, Global Rate=272.09, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:20:36.078176, device xla:3, step 700, Rate=223.07, Global Rate=272.10, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:20:36.104356, device xla:6, step 700, Rate=223.07, Global Rate=272.09, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:23:21.828559, device xla:2, step 800, Rate=240.26, Global Rate=276.21, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:23:21.833662, device xla:7, step 800, Rate=240.25, Global Rate=276.21, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:23:21.860450, device xla:4, step 800, Rate=240.23, Global Rate=276.20, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:23:21.823337, device xla:6, step 800, Rate=240.25, Global Rate=276.21, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:23:21.865725, device xla:1, step 800, Rate=240.24, Global Rate=276.20, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:23:21.852557, device xla:3, step 800, Rate=240.23, Global Rate=276.20, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:23:21.874191, device xla:8, step 800, Rate=240.29, Global Rate=276.20, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:23:21.843007, device xla:5, step 800, Rate=240.25, Global Rate=276.21, Compiles=70, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:26:06.797379, device xla:1, step 900, Rate=254.28, Global Rate=279.63, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:26:06.802111, device xla:2, step 900, Rate=254.28, Global Rate=279.63, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:26:06.809642, device xla:3, step 900, Rate=254.26, Global Rate=279.63, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:26:06.803775, device xla:6, step 900, Rate=254.27, Global Rate=279.63, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:26:06.820361, device xla:8, step 900, Rate=254.32, Global Rate=279.62, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:26:06.842335, device xla:4, step 900, Rate=254.25, Global Rate=279.62, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:26:06.811428, device xla:5, step 900, Rate=254.28, Global Rate=279.63, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:26:06.830331, device xla:7, step 900, Rate=254.26, Global Rate=279.62, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:28:50.902411, device xla:1, step 1000, Rate=265.82, Global Rate=282.56, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:28:50.906835, device xla:4, step 1000, Rate=265.82, Global Rate=282.56, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:28:50.949965, device xla:8, step 1000, Rate=265.84, Global Rate=282.55, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:28:50.941961, device xla:6, step 1000, Rate=265.80, Global Rate=282.55, Compiles=70, _local_scalar_dense=389training torch.Size([512, 32])/ 2019-08-26 22:28:50.954782, device xla:2, step 1000, Rate=265.80, Global Rate=282.55, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:28:50.911568, device xla:3, step 1000, Rate=265.81, Global Rate=282.56, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:28:50.927283, device xla:7, step 1000, Rate=265.81, Global Rate=282.56, Compiles=70, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:28:50.918707, device xla:5, step 1000, Rate=265.82, Global Rate=282.56, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:31:33.968786, device xla:1, step 1100, Rate=275.46, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:31:33.973204, device xla:2, step 1100, Rate=275.46, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:31:34.000587, device xla:4, step 1100, Rate=275.44, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:31:33.989036, device xla:8, step 1100, Rate=275.48, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:31:33.979628, device xla:7, step 1100, Rate=275.45, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:31:34.024147, device xla:3, step 1100, Rate=275.43, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:31:34.018397, device xla:6, step 1100, Rate=275.43, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:31:34.005893, device xla:5, step 1100, Rate=275.44, Global Rate=285.15, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:34:20.197522, device xla:1, step 1200, Rate=281.97, Global Rate=286.93, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:34:20.201746, device xla:2, step 1200, Rate=281.97, Global Rate=286.93, Compiles=70, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:34:20.207609, device xla:6, step 1200, Rate=281.96, Global Rate=286.93, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:34:20.213186, device xla:8, step 1200, Rate=281.99, Global Rate=286.93, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:34:20.215279, device xla:4, step 1200, Rate=281.96, Global Rate=286.93, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:34:20.224707, device xla:7, step 1200, Rate=281.96, Global Rate=286.92, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:34:20.242605, device xla:3, step 1200, Rate=281.95, Global Rate=286.92, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:34:20.260600, device xla:5, step 1200, Rate=281.95, Global Rate=286.92, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:37:05.547476, device xla:2, step 1300, Rate=287.51, Global Rate=288.56, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:37:05.563159, device xla:1, step 1300, Rate=287.50, Global Rate=288.55, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:37:05.581190, device xla:8, step 1300, Rate=287.51, Global Rate=288.55, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:37:05.570139, device xla:3, step 1300, Rate=287.49, Global Rate=288.55, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:37:05.585091, device xla:7, step 1300, Rate=287.49, Global Rate=288.55, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:37:05.555029, device xla:5, step 1300, Rate=287.51, Global Rate=288.56, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:37:05.597436, device xla:4, step 1300, Rate=287.48, Global Rate=288.55, Compiles=70, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:37:05.609170, device xla:6, step 1300, Rate=287.48, Global Rate=288.55, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:39:47.686677, device xla:1, step 1400, Rate=293.16, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:39:47.696342, device xla:4, step 1400, Rate=293.16, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:39:47.690861, device xla:6, step 1400, Rate=293.16, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:39:47.708727, device xla:7, step 1400, Rate=293.15, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:39:47.714881, device xla:5, step 1400, Rate=293.15, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:39:47.701087, device xla:3, step 1400, Rate=293.15, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:39:47.724482, device xla:2, step 1400, Rate=293.15, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([1024, 16])/ 2019-08-26 22:39:47.732429, device xla:8, step 1400, Rate=293.16, Global Rate=290.34, Compiles=70, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:42:27.919631, device xla:1, step 1500, Rate=298.43, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:42:27.925170, device xla:6, step 1500, Rate=298.44, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:42:27.945714, device xla:4, step 1500, Rate=298.43, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:42:27.951807, device xla:2, step 1500, Rate=298.43, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
training torch.Size([256, 64])/ 2019-08-26 22:42:27.938691, device xla:3, step 1500, Rate=298.43, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:42:27.956857, device xla:8, step 1500, Rate=298.44, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:42:27.930418, device xla:5, step 1500, Rate=298.44, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
training torch.Size([512, 32])/ 2019-08-26 22:42:27.966234, device xla:7, step 1500, Rate=298.42, Global Rate=292.12, Compiles=76, _local_scalar_dense=389
Epoch 7 Training stats:
device xla:1
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5082 | ups 0 | wpb 11167.823 | bsz 406.700 | num_updates 10556 | lr 0.000307787 | gnorm 0.165 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 17502
device xla:2
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5084 | ups 0 | wpb 11173.027 | bsz 408.082 | num_updates 10556 | lr 0.000307787 | gnorm 0.158 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 18319
device xla:3
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5056 | ups 0 | wpb 11112.338 | bsz 409.319 | num_updates 10556 | lr 0.000307787 | gnorm 0.171 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 16667
device xla:4
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5067 | ups 0 | wpb 11135.140 | bsz 413.805 | num_updates 10556 | lr 0.000307787 | gnorm 0.169 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 18331
device xla:5
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5085 | ups 0 | wpb 11174.786 | bsz 411.744 | num_updates 10556 | lr 0.000307787 | gnorm 0.161 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 18206
device xla:6
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5070 | ups 0 | wpb 11143.601 | bsz 410.046 | num_updates 10556 | lr 0.000307787 | gnorm 0.163 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 18272
device xla:7
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5069 | ups 0 | wpb 11141.086 | bsz 408.906 | num_updates 10556 | lr 0.000307787 | gnorm 0.163 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 17133
device xla:8
| epoch 007 | loss 0.570 | nll_loss 0.570 | ppl 1.48 | wps 5061 | ups 0 | wpb 11121.822 | bsz 409.901 | num_updates 10556 | lr 0.000307787 | gnorm 0.174 | clip 0.000 | oom 0.000 | wall 23199 | train_wall 17220
Epoch 7 Tracker Rates:
Rate=297.82, Global Rate=292.14
Rate=297.95, Global Rate=292.14
Rate=297.90, Global Rate=292.14
Rate=297.93, Global Rate=292.14
Rate=297.87, Global Rate=292.14
Rate=297.85, Global Rate=292.14
Rate=298.01, Global Rate=292.14
Rate=297.99, Global Rate=292.14
Epoch 7 end 2019-08-26 22:42:41.786840
Metric: CompileTime
TotalSamples: 76
Counter: 11h21m21s338ms471.581us
ValueRate: 682ms856.501us / second
Rate: 0.00330463 / second
Percentiles: 1%=063ms970.208us; 5%=080ms169.631us; 10%=132ms454.929us; 20%=175ms68.686us; 50%=02m32s044ms245.953us; 80%=06m45s357ms180.148us; 90%=07m09s556ms196.779us; 95%=22m13s835ms112.498us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 85181
Counter: 02d32h10m49s852ms844.994us
ValueRate: 06s191ms672.034us / second
Rate: 5.04384 / second
Percentiles: 1%=01s163ms799.239us; 5%=01s168ms57.781us; 10%=01s170ms467.659us; 20%=01s174ms266.658us; 50%=01s185ms570.921us; 80%=01s288ms229.723us; 90%=01s292ms38.001us; 95%=01s294ms332.450us; 99%=01s390ms337.001us
Metric: InboundData
TotalSamples: 429
Counter: 853.00B
ValueRate: 0.05B / second
Rate: 0.0245053 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 349525
Counter: 34.89GB
ValueRate: 510.01KB / second
Rate: 20.9055 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 773679
Counter: 06h20m43s408ms68.395us
ValueRate: 365ms778.026us / second
Rate: 43.3649 / second
Percentiles: 1%=437.215us; 5%=501.766us; 10%=545.850us; 20%=626.088us; 50%=851.053us; 80%=003ms88.125us; 90%=012ms585.393us; 95%=025ms1.717us; 99%=051ms907.300us
Metric: TransferFromServerTime
TotalSamples: 429
Counter: 06s954ms981.123us
ValueRate: 340.103us / second
Rate: 0.0245053 / second
Percentiles: 1%=651.667us; 5%=710.846us; 10%=749.754us; 20%=831.614us; 50%=001ms236.732us; 80%=035ms134.431us; 90%=056ms793.445us; 95%=061ms140.954us; 99%=070ms942.544us
Metric: TransferToServerTime
TotalSamples: 349525
Counter: 01d20h22m30s247ms809.035us
ValueRate: 05s169ms517.500us / second
Rate: 21.3255 / second
Percentiles: 1%=001ms42.886us; 5%=001ms154.087us; 10%=001ms223.700us; 20%=001ms359.094us; 50%=002ms51.012us; 80%=923ms741.974us; 90%=987ms281.783us; 95%=01s056ms185.109us; 99%=01s097ms958.326us
Counter: CachedSyncParamMismatch
Value: 23
Counter: CachedSyncTensors
Value: 85105
Counter: CreateCompileHandles
Value: 55
Counter: CreateDataHandles
Value: 62677607
Counter: CreateXlaTensor
Value: 408714727
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 62670575
Counter: DestroyXlaTensor
Value: 408708719
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 62670575
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 76
Counter: XRTAllocateFromTensor_Empty
Value: 22162
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 429
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 22:42:45.701555, device xla:7, step 0, Compiles=76, _local_scalar_dense=429
validation/ 2019-08-26 22:42:45.710502, device xla:4, step 0, Compiles=76, _local_scalar_dense=429
validation/ 2019-08-26 22:42:45.718964, device xla:5, step 0, Compiles=76, _local_scalar_dense=429
validation/ 2019-08-26 22:42:45.729468, device xla:1, step 0, Compiles=76, _local_scalar_dense=429
validation/ 2019-08-26 22:42:45.737484, device xla:2, step 0, Compiles=76, _local_scalar_dense=429
validation/ 2019-08-26 22:42:45.739911, device xla:3, step 0, Compiles=76, _local_scalar_dense=429
validation/ 2019-08-26 22:42:45.742363, device xla:6, step 0, Compiles=76, _local_scalar_dense=429
validation/ 2019-08-26 22:42:45.874449, device xla:8, step 0, Compiles=76, _local_scalar_dense=429
validation stats on subset "valid" - 2019-08-26 22:42:51.826577
| epoch 007 | valid on 'valid' subset | loss 4.031 | nll_loss 2.234 | ppl 4.71 | num_updates 10556
| epoch 007 | valid on 'valid' subset | loss 4.062 | nll_loss 2.219 | ppl 4.65 | num_updates 10556
| epoch 007 | valid on 'valid' subset | loss 4.125 | nll_loss 2.312 | ppl 4.97 | num_updates 10556
| epoch 007 | valid on 'valid' subset | loss 4.156 | nll_loss 2.344 | ppl 5.08 | num_updates 10556
| epoch 007 | valid on 'valid' subset | loss 4.031 | nll_loss 2.219 | ppl 4.65 | num_updates 10556
| epoch 007 | valid on 'valid' subset | loss 4.062 | nll_loss 2.219 | ppl 4.65 | num_updates 10556
| epoch 007 | valid on 'valid' subset | loss 4.062 | nll_loss 2.250 | ppl 4.76 | num_updates 10556
| epoch 007 | valid on 'valid' subset | loss 4.125 | nll_loss 2.344 | ppl 5.08 | num_updates 10556
old learning rate: 0.0003324479842709235
new learning rate: 0.00030778702596688995
Metric: CompileTime
TotalSamples: 76
Counter: 11h21m21s338ms471.581us
ValueRate: 682ms856.501us / second
Rate: 0.00330463 / second
Percentiles: 1%=063ms970.208us; 5%=080ms169.631us; 10%=132ms454.929us; 20%=175ms68.686us; 50%=02m32s044ms245.953us; 80%=06m45s357ms180.148us; 90%=07m09s556ms196.779us; 95%=22m13s835ms112.498us; 99%=01h00m17s484ms593.625us
Metric: ExecuteTime
TotalSamples: 85302
Counter: 02d32h11m35s914ms986.032us
ValueRate: 06s093ms701.789us / second
Rate: 5.40523 / second
Percentiles: 1%=377ms33.424us; 5%=379ms473.239us; 10%=392ms650.351us; 20%=01s170ms220.172us; 50%=01s180ms266.359us; 80%=01s287ms834.733us; 90%=01s291ms232.767us; 95%=01s294ms905.813us; 99%=01s390ms337.001us
Metric: InboundData
TotalSamples: 454
Counter: 902.00B
ValueRate: 0.05B / second
Rate: 0.0259184 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 349763
Counter: 34.93GB
ValueRate: 953.93KB / second
Rate: 20.8481 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 774287
Counter: 06h20m16s732ms660.980us
ValueRate: 02s741ms219.366us / second
Rate: 52.1159 / second
Percentiles: 1%=437.403us; 5%=479.860us; 10%=521.382us; 20%=582.181us; 50%=782.293us; 80%=002ms613.122us; 90%=029ms97.467us; 95%=376ms110.637us; 99%=389ms701.741us
Metric: TransferFromServerTime
TotalSamples: 454
Counter: 06s082ms503.228us
ValueRate: 347.187us / second
Rate: 0.0259184 / second
Percentiles: 1%=641.086us; 5%=710.182us; 10%=749.754us; 20%=831.614us; 50%=001ms231.654us; 80%=034ms337.979us; 90%=055ms908.022us; 95%=060ms299.299us; 99%=070ms942.544us
Metric: TransferToServerTime
TotalSamples: 349763
Counter: 01d20h22m59s691ms753.328us
ValueRate: 04s431ms194.645us / second
Rate: 20.848 / second
Percentiles: 1%=001ms54.996us; 5%=001ms179.407us; 10%=001ms261.785us; 20%=001ms423.914us; 50%=002ms271.537us; 80%=244ms42.608us; 90%=978ms769.655us; 95%=01s007ms309.092us; 99%=01s090ms245.048us
Counter: CachedSyncParamMismatch
Value: 23
Counter: CachedSyncTensors
Value: 85226
Counter: CreateCompileHandles
Value: 55
Counter: CreateDataHandles
Value: 62679214
Counter: CreateXlaTensor
Value: 408849544
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 62672180
Counter: DestroyXlaTensor
Value: 408843536
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 62672182
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 76
Counter: XRTAllocateFromTensor_Empty
Value: 22162
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 454
Epoch 8 begin 2019-08-26 22:42:51.847189
training torch.Size([256, 64])/ 2019-08-26 22:42:59.562768, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:42:59.583575, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:42:59.610627, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:42:59.672663, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:42:59.799745, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:43:00.195847, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:43:00.224832, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:43:00.252358, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:45:45.994935, device xla:8, step 100, Rate=61.78, Global Rate=299.83, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:45:45.999342, device xla:5, step 100, Rate=61.61, Global Rate=299.83, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:45:46.013727, device xla:1, step 100, Rate=61.52, Global Rate=299.80, Compiles=76, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:45:46.004509, device xla:4, step 100, Rate=61.56, Global Rate=299.82, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:45:46.019742, device xla:3, step 100, Rate=61.54, Global Rate=299.79, Compiles=76, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 22:45:46.024164, device xla:7, step 100, Rate=61.75, Global Rate=299.78, Compiles=76, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:45:46.040482, device xla:6, step 100, Rate=61.76, Global Rate=299.75, Compiles=76, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 22:45:46.045405, device xla:2, step 100, Rate=61.52, Global Rate=299.74, Compiles=76, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:55:23.783099, device xla:6, step 200, Rate=67.13, Global Rate=136.80, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:55:23.784974, device xla:5, step 200, Rate=67.01, Global Rate=136.80, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:55:23.786551, device xla:8, step 200, Rate=67.15, Global Rate=136.80, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:55:23.796640, device xla:7, step 200, Rate=67.12, Global Rate=136.80, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:55:23.802507, device xla:1, step 200, Rate=66.94, Global Rate=136.79, Compiles=86, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 22:55:23.787811, device xla:3, step 200, Rate=66.95, Global Rate=136.80, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:55:23.811220, device xla:2, step 200, Rate=66.94, Global Rate=136.79, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:55:23.817437, device xla:4, step 200, Rate=66.97, Global Rate=136.79, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:58:11.192002, device xla:6, step 300, Rate=114.87, Global Rate=167.69, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:58:11.243998, device xla:2, step 300, Rate=114.71, Global Rate=167.68, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:58:11.227168, device xla:3, step 300, Rate=114.72, Global Rate=167.69, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:58:11.196333, device xla:7, step 300, Rate=114.87, Global Rate=167.69, Compiles=86, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 22:58:11.217347, device xla:8, step 300, Rate=114.88, Global Rate=167.69, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 22:58:11.209814, device xla:1, step 300, Rate=114.72, Global Rate=167.69, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 22:58:11.204167, device xla:5, step 300, Rate=114.77, Global Rate=167.69, Compiles=86, _local_scalar_dense=454training torch.Size([512, 32])/ 2019-08-26 22:58:11.247983, device xla:4, step 300, Rate=114.74, Global Rate=167.68, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:00:56.796156, device xla:6, step 400, Rate=153.73, Global Rate=189.36, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:00:56.800845, device xla:8, step 400, Rate=153.74, Global Rate=189.35, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:00:56.821808, device xla:2, step 400, Rate=153.61, Global Rate=189.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:00:56.806947, device xla:5, step 400, Rate=153.65, Global Rate=189.35, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:00:56.829586, device xla:4, step 400, Rate=153.63, Global Rate=189.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:00:56.812603, device xla:7, step 400, Rate=153.73, Global Rate=189.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:00:56.857759, device xla:3, step 400, Rate=153.60, Global Rate=189.34, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:00:56.849951, device xla:1, step 400, Rate=153.60, Global Rate=189.35, Compiles=86, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:03:41.819166, device xla:8, step 500, Rate=185.05, Global Rate=205.36, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:03:41.823719, device xla:5, step 500, Rate=184.98, Global Rate=205.36, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:03:41.835758, device xla:1, step 500, Rate=184.94, Global Rate=205.36, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:03:41.850546, device xla:4, step 500, Rate=184.96, Global Rate=205.36, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:03:41.868437, device xla:7, step 500, Rate=185.02, Global Rate=205.35, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:03:41.878008, device xla:6, step 500, Rate=185.01, Global Rate=205.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:03:41.828835, device xla:2, step 500, Rate=184.95, Global Rate=205.36, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:03:41.885762, device xla:3, step 500, Rate=184.93, Global Rate=205.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:06:27.969396, device xla:6, step 600, Rate=209.66, Global Rate=217.45, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:06:27.973865, device xla:2, step 600, Rate=209.59, Global Rate=217.45, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:06:28.006495, device xla:8, step 600, Rate=209.66, Global Rate=217.44, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:06:27.996997, device xla:5, step 600, Rate=209.60, Global Rate=217.45, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:06:27.989829, device xla:1, step 600, Rate=209.58, Global Rate=217.45, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:06:27.980791, device xla:4, step 600, Rate=209.61, Global Rate=217.45, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:06:28.011897, device xla:7, step 600, Rate=209.65, Global Rate=217.44, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:06:28.025288, device xla:3, step 600, Rate=209.58, Global Rate=217.44, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:09:11.617584, device xla:6, step 700, Rate=230.30, Global Rate=227.36, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:09:11.627520, device xla:7, step 700, Rate=230.31, Global Rate=227.35, Compiles=86, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:09:11.652456, device xla:8, step 700, Rate=230.30, Global Rate=227.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:09:11.644851, device xla:2, step 700, Rate=230.24, Global Rate=227.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:09:11.622316, device xla:5, step 700, Rate=230.27, Global Rate=227.36, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:09:11.634743, device xla:4, step 700, Rate=230.26, Global Rate=227.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:09:11.659223, device xla:3, step 700, Rate=230.24, Global Rate=227.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:09:11.669402, device xla:1, step 700, Rate=230.23, Global Rate=227.35, Compiles=86, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:11:56.885668, device xla:6, step 800, Rate=246.20, Global Rate=235.18, Compiles=92, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:11:56.890015, device xla:1, step 800, Rate=246.16, Global Rate=235.18, Compiles=92, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:11:56.931779, device xla:7, step 800, Rate=246.19, Global Rate=235.17, Compiles=92, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:11:56.896713, device xla:3, step 800, Rate=246.16, Global Rate=235.18, Compiles=92, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:11:56.911087, device xla:8, step 800, Rate=246.20, Global Rate=235.18, Compiles=92, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:11:56.905320, device xla:5, step 800, Rate=246.17, Global Rate=235.18, Compiles=92, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:11:56.919375, device xla:4, step 800, Rate=246.16, Global Rate=235.17, Compiles=92, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:11:56.940010, device xla:2, step 800, Rate=246.14, Global Rate=235.17, Compiles=92, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:14:42.678654, device xla:6, step 900, Rate=258.73, Global Rate=241.58, Compiles=92, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:14:42.691463, device xla:8, step 900, Rate=258.73, Global Rate=241.58, Compiles=92, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:14:42.684594, device xla:1, step 900, Rate=258.69, Global Rate=241.58, Compiles=92, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:14:42.696921, device xla:5, step 900, Rate=258.70, Global Rate=241.58, Compiles=92, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:14:42.714126, device xla:2, step 900, Rate=258.68, Global Rate=241.58, Compiles=92, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:14:42.717952, device xla:7, step 900, Rate=258.72, Global Rate=241.57, Compiles=92, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:14:42.726935, device xla:4, step 900, Rate=258.68, Global Rate=241.57, Compiles=92, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:14:42.703063, device xla:3, step 900, Rate=258.69, Global Rate=241.58, Compiles=92, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:17:38.926678, device xla:2, step 1000, Rate=265.06, Global Rate=245.72, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:17:38.933948, device xla:1, step 1000, Rate=265.05, Global Rate=245.72, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:17:38.959037, device xla:8, step 1000, Rate=265.08, Global Rate=245.71, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:17:38.975621, device xla:6, step 1000, Rate=265.07, Global Rate=245.71, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:17:38.948663, device xla:4, step 1000, Rate=265.06, Global Rate=245.71, Compiles=99, _local_scalar_dense=454training torch.Size([1024, 16])/ 2019-08-26 23:17:38.940886, device xla:7, step 1000, Rate=265.08, Global Rate=245.72, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:17:38.965605, device xla:3, step 1000, Rate=265.05, Global Rate=245.71, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:17:38.921482, device xla:5, step 1000, Rate=265.07, Global Rate=245.72, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:20:22.937989, device xla:8, step 1100, Rate=274.51, Global Rate=250.57, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:20:22.942688, device xla:3, step 1100, Rate=274.49, Global Rate=250.57, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:20:22.958754, device xla:1, step 1100, Rate=274.47, Global Rate=250.56, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:20:22.990525, device xla:2, step 1100, Rate=274.46, Global Rate=250.56, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:20:22.951419, device xla:7, step 1100, Rate=274.50, Global Rate=250.57, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:20:22.983375, device xla:5, step 1100, Rate=274.47, Global Rate=250.56, Compiles=99, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:20:22.971560, device xla:4, step 1100, Rate=274.48, Global Rate=250.56, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:20:22.997219, device xla:6, step 1100, Rate=274.48, Global Rate=250.56, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:23:09.013534, device xla:6, step 1200, Rate=281.27, Global Rate=254.54, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:23:09.017775, device xla:5, step 1200, Rate=281.25, Global Rate=254.54, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:23:09.023058, device xla:3, step 1200, Rate=281.25, Global Rate=254.54, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:23:09.031573, device xla:2, step 1200, Rate=281.24, Global Rate=254.54, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:23:09.044268, device xla:8, step 1200, Rate=281.26, Global Rate=254.54, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:23:09.070833, device xla:7, step 1200, Rate=281.24, Global Rate=254.53, Compiles=99, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:23:09.060138, device xla:4, step 1200, Rate=281.23, Global Rate=254.53, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:23:09.051972, device xla:1, step 1200, Rate=281.23, Global Rate=254.53, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:25:54.231640, device xla:6, step 1300, Rate=286.99, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:25:54.242796, device xla:4, step 1300, Rate=286.98, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:25:54.271341, device xla:8, step 1300, Rate=286.98, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:25:54.236087, device xla:1, step 1300, Rate=286.98, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:25:54.262274, device xla:5, step 1300, Rate=286.97, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:25:54.252083, device xla:3, step 1300, Rate=286.97, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:25:54.278887, device xla:7, step 1300, Rate=286.98, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:25:54.288318, device xla:2, step 1300, Rate=286.96, Global Rate=258.08, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:28:37.766644, device xla:8, step 1400, Rate=292.22, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:28:37.778800, device xla:2, step 1400, Rate=292.20, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:28:37.780688, device xla:3, step 1400, Rate=292.20, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:28:37.771226, device xla:7, step 1400, Rate=292.21, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:28:37.806785, device xla:6, step 1400, Rate=292.20, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:28:37.799240, device xla:5, step 1400, Rate=292.19, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:28:37.813573, device xla:1, step 1400, Rate=292.18, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:28:37.789294, device xla:4, step 1400, Rate=292.20, Global Rate=261.36, Compiles=99, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:31:21.228539, device xla:8, step 1500, Rate=296.42, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:31:21.233069, device xla:2, step 1500, Rate=296.41, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:31:21.240083, device xla:4, step 1500, Rate=296.41, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:31:21.250430, device xla:1, step 1500, Rate=296.40, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
training torch.Size([256, 64])/ 2019-08-26 23:31:21.260002, device xla:3, step 1500, Rate=296.39, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:31:21.284546, device xla:7, step 1500, Rate=296.40, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
training torch.Size([1024, 16])/ 2019-08-26 23:31:21.291378, device xla:6, step 1500, Rate=296.39, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
training torch.Size([512, 32])/ 2019-08-26 23:31:21.276226, device xla:5, step 1500, Rate=296.39, Global Rate=264.28, Compiles=104, _local_scalar_dense=454
Epoch 8 Training stats:
device xla:1
| epoch 008 | loss 0.496 | nll_loss 0.496 | ppl 1.41 | wps 5161 | ups 0 | wpb 11179.671 | bsz 408.085 | num_updates 12064 | lr 0.000287908 | gnorm 0.145 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 19580
device xla:2
| epoch 008 | loss 0.500 | nll_loss 0.500 | ppl 1.41 | wps 5152 | ups 0 | wpb 11160.815 | bsz 407.130 | num_updates 12064 | lr 0.000287908 | gnorm 0.139 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 20798
device xla:3
| epoch 008 | loss 0.500 | nll_loss 0.500 | ppl 1.41 | wps 5129 | ups 0 | wpb 11109.623 | bsz 409.464 | num_updates 12064 | lr 0.000287908 | gnorm 0.149 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 19139
device xla:4
| epoch 008 | loss 0.500 | nll_loss 0.500 | ppl 1.41 | wps 5142 | ups 0 | wpb 11138.977 | bsz 413.093 | num_updates 12064 | lr 0.000287908 | gnorm 0.148 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 20807
device xla:5
| epoch 008 | loss 0.496 | nll_loss 0.496 | ppl 1.41 | wps 5157 | ups 0 | wpb 11171.799 | bsz 413.093 | num_updates 12064 | lr 0.000287908 | gnorm 0.142 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 20683
device xla:6
| epoch 008 | loss 0.500 | nll_loss 0.500 | ppl 1.41 | wps 5153 | ups 0 | wpb 11162.635 | bsz 409.422 | num_updates 12064 | lr 0.000287908 | gnorm 0.143 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 20753
device xla:7
| epoch 008 | loss 0.500 | nll_loss 0.500 | ppl 1.41 | wps 5142 | ups 0 | wpb 11139.010 | bsz 408.424 | num_updates 12064 | lr 0.000287908 | gnorm 0.143 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 19610
device xla:8
| epoch 008 | loss 0.500 | nll_loss 0.500 | ppl 1.41 | wps 5127 | ups 0 | wpb 11106.984 | bsz 409.804 | num_updates 12064 | lr 0.000287908 | gnorm 0.153 | clip 0.000 | oom 0.000 | wall 26133 | train_wall 19699
Epoch 8 Tracker Rates:
Rate=293.70, Global Rate=264.37
Rate=293.64, Global Rate=264.37
Rate=293.74, Global Rate=264.37
Rate=293.67, Global Rate=264.37
Rate=293.80, Global Rate=264.37
Rate=293.86, Global Rate=264.37
Rate=293.83, Global Rate=264.37
Rate=293.63, Global Rate=264.37
Epoch 8 end 2019-08-26 23:31:35.728243
Metric: CompileTime
TotalSamples: 104
Counter: 11h04m07s264ms89.914us
ValueRate: 619ms329.894us / second
Rate: 0.00400381 / second
Percentiles: 1%=049ms149.898us; 5%=063ms970.208us; 10%=073ms669.112us; 20%=110ms188.871us; 50%=28s580ms414.981us; 80%=06m45s337ms595.036us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 97366
Counter: 02d45h20m05s242ms188.742us
ValueRate: 06s080ms724.511us / second
Rate: 4.9377 / second
Percentiles: 1%=01s163ms914.572us; 5%=01s168ms856.311us; 10%=01s171ms795.297us; 20%=01s176ms842.732us; 50%=01s197ms31.623us; 80%=01s287ms819.093us; 90%=01s290ms24.232us; 95%=01s293ms680.044us; 99%=01s448ms538.889us
Metric: InboundData
TotalSamples: 494
Counter: 982.00B
ValueRate: 0.05B / second
Rate: 0.0241679 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 399360
Counter: 39.42GB
ValueRate: 489.78KB / second
Rate: 20.073 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 884038
Counter: 06h12m40s493ms972.653us
ValueRate: 373ms138.987us / second
Rate: 42.4786 / second
Percentiles: 1%=478.653us; 5%=537.945us; 10%=586.274us; 20%=647.884us; 50%=872.853us; 80%=002ms486.923us; 90%=011ms371.164us; 95%=026ms735.470us; 99%=054ms135.334us
Metric: TransferFromServerTime
TotalSamples: 494
Counter: 06s244ms485.493us
ValueRate: 305.498us / second
Rate: 0.0241679 / second
Percentiles: 1%=641.086us; 5%=710.182us; 10%=750.207us; 20%=815.151us; 50%=001ms203.054us; 80%=032ms427.229us; 90%=054ms104.948us; 95%=060ms75.361us; 99%=070ms942.544us
Metric: TransferToServerTime
TotalSamples: 399360
Counter: 02d31h12m04s740ms30.748us
ValueRate: 05s895ms446.844us / second
Rate: 20.0733 / second
Percentiles: 1%=001ms89.110us; 5%=001ms240.525us; 10%=001ms314.781us; 20%=001ms481.306us; 50%=002ms415.595us; 80%=869ms831.056us; 90%=983ms860.870us; 95%=01s055ms643.716us; 99%=01s271ms670.912us
Counter: CachedSyncParamMismatch
Value: 51
Counter: CachedSyncTensors
Value: 97262
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 71631435
Counter: CreateXlaTensor
Value: 467121188
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 71624403
Counter: DestroyXlaTensor
Value: 467115180
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 71624403
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 104
Counter: XRTAllocateFromTensor_Empty
Value: 22332
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 494
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-26 23:31:39.802352, device xla:7, step 0, Compiles=104, _local_scalar_dense=494
validation/ 2019-08-26 23:31:39.808008, device xla:6, step 0, Compiles=104, _local_scalar_dense=494
validation/ 2019-08-26 23:31:39.815077, device xla:3, step 0, Compiles=104, _local_scalar_dense=494
validation/ 2019-08-26 23:31:39.957583, device xla:8, step 0, Compiles=104, _local_scalar_dense=494
validation/ 2019-08-26 23:31:39.978546, device xla:5, step 0, Compiles=104, _local_scalar_dense=494
validation/ 2019-08-26 23:31:39.981656, device xla:2, step 0, Compiles=104, _local_scalar_dense=494
validation/ 2019-08-26 23:31:39.988613, device xla:4, step 0, Compiles=104, _local_scalar_dense=494
validation/ 2019-08-26 23:31:39.995568, device xla:1, step 0, Compiles=104, _local_scalar_dense=494
validation stats on subset "valid" - 2019-08-26 23:31:45.987297
| epoch 008 | valid on 'valid' subset | loss 3.969 | nll_loss 2.172 | ppl 4.51 | num_updates 12064
| epoch 008 | valid on 'valid' subset | loss 4.000 | nll_loss 2.188 | ppl 4.56 | num_updates 12064
| epoch 008 | valid on 'valid' subset | loss 4.062 | nll_loss 2.266 | ppl 4.81 | num_updates 12064
| epoch 008 | valid on 'valid' subset | loss 4.062 | nll_loss 2.297 | ppl 4.91 | num_updates 12064
| epoch 008 | valid on 'valid' subset | loss 3.969 | nll_loss 2.172 | ppl 4.51 | num_updates 12064
| epoch 008 | valid on 'valid' subset | loss 4.000 | nll_loss 2.203 | ppl 4.60 | num_updates 12064
| epoch 008 | valid on 'valid' subset | loss 4.031 | nll_loss 2.203 | ppl 4.60 | num_updates 12064
| epoch 008 | valid on 'valid' subset | loss 4.094 | nll_loss 2.281 | ppl 4.86 | num_updates 12064
old learning rate: 0.00030778702596688995
new learning rate: 0.0002879083998155492
Metric: CompileTime
TotalSamples: 104
Counter: 11h04m07s264ms89.914us
ValueRate: 619ms329.894us / second
Rate: 0.00400381 / second
Percentiles: 1%=049ms149.898us; 5%=063ms970.208us; 10%=073ms669.112us; 20%=110ms188.871us; 50%=28s580ms414.981us; 80%=06m45s337ms595.036us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 97487
Counter: 02d45h21m51s257ms969.221us
ValueRate: 06s992ms892.789us / second
Rate: 5.29247 / second
Percentiles: 1%=377ms606.707us; 5%=379ms670.791us; 10%=392ms846.696us; 20%=01s170ms418.008us; 50%=01s186ms579.977us; 80%=01s286ms186.211us; 90%=01s290ms847.262us; 95%=01s292ms408.459us; 99%=01s448ms538.889us
Metric: InboundData
TotalSamples: 519
Counter: 1.01KB
ValueRate: 0.05B / second
Rate: 0.0253781 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 399598
Counter: 39.47GB
ValueRate: 928.89KB / second
Rate: 20.301 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 884678
Counter: 07h12m12s673ms869.695us
ValueRate: 02s646ms386.182us / second
Rate: 51.4578 / second
Percentiles: 1%=453.539us; 5%=493.090us; 10%=528.675us; 20%=581.623us; 50%=770.719us; 80%=001ms301.818us; 90%=022ms392.560us; 95%=375ms516.602us; 99%=389ms607.659us
Metric: TransferFromServerTime
TotalSamples: 519
Counter: 06s433ms984.083us
ValueRate: 314.560us / second
Rate: 0.0253781 / second
Percentiles: 1%=651.667us; 5%=710.306us; 10%=751.722us; 20%=821.740us; 50%=001ms221.587us; 80%=032ms773.897us; 90%=050ms817.210us; 95%=060ms996.139us; 99%=069ms311.171us
Metric: TransferToServerTime
TotalSamples: 399598
Counter: 02d31h12m29s287ms767.344us
ValueRate: 04s102ms502.219us / second
Rate: 20.301 / second
Percentiles: 1%=001ms126.565us; 5%=001ms282.166us; 10%=001ms375.446us; 20%=002ms535.798us; 50%=002ms412.402us; 80%=236ms323.872us; 90%=935ms173.783us; 95%=998ms820.164us; 99%=01s070ms729.862us
Counter: CachedSyncParamMismatch
Value: 51
Counter: CachedSyncTensors
Value: 97383
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 71633042
Counter: CreateXlaTensor
Value: 467256005
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 71626008
Counter: DestroyXlaTensor
Value: 467249997
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 71626010
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 104
Counter: XRTAllocateFromTensor_Empty
Value: 22332
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 519
Epoch 9 begin 2019-08-26 23:31:46.079320
training torch.Size([256, 64])/ 2019-08-26 23:31:54.613184, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:31:54.660835, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:31:54.673531, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:31:54.680970, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:31:54.774906, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:31:54.998185, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:31:55.583085, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:31:55.667164, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.309265, device xla:3, step 100, Rate=59.99, Global Rate=291.63, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.342282, device xla:1, step 100, Rate=60.00, Global Rate=291.58, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.311082, device xla:5, step 100, Rate=60.05, Global Rate=291.63, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.316781, device xla:4, step 100, Rate=60.00, Global Rate=291.62, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.301821, device xla:8, step 100, Rate=60.34, Global Rate=291.65, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.345878, device xla:7, step 100, Rate=60.35, Global Rate=291.58, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.330547, device xla:2, step 100, Rate=60.01, Global Rate=291.60, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:34:45.322717, device xla:6, step 100, Rate=60.12, Global Rate=291.61, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:37:31.250305, device xla:6, step 200, Rate=109.81, Global Rate=299.85, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:37:31.254896, device xla:3, step 200, Rate=109.70, Global Rate=299.85, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:37:31.267236, device xla:2, step 200, Rate=109.72, Global Rate=299.84, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:37:31.277444, device xla:4, step 200, Rate=109.70, Global Rate=299.83, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:37:31.283086, device xla:1, step 200, Rate=109.71, Global Rate=299.82, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:37:31.262034, device xla:5, step 200, Rate=109.74, Global Rate=299.84, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:37:31.291136, device xla:7, step 200, Rate=109.99, Global Rate=299.82, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:37:31.300221, device xla:8, step 200, Rate=109.96, Global Rate=299.81, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:40:17.408050, device xla:7, step 300, Rate=149.63, Global Rate=302.56, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:40:17.413889, device xla:6, step 300, Rate=149.47, Global Rate=302.56, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:40:17.424296, device xla:8, step 300, Rate=149.61, Global Rate=302.56, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:40:17.440390, device xla:3, step 300, Rate=149.38, Global Rate=302.54, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:40:17.418743, device xla:4, step 300, Rate=149.40, Global Rate=302.56, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:40:17.445986, device xla:2, step 300, Rate=149.39, Global Rate=302.54, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:40:17.431590, device xla:5, step 300, Rate=149.42, Global Rate=302.55, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:40:17.458258, device xla:1, step 300, Rate=149.39, Global Rate=302.53, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:43:03.762231, device xla:6, step 400, Rate=181.14, Global Rate=303.85, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:43:03.767010, device xla:7, step 400, Rate=181.26, Global Rate=303.85, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:43:03.756899, device xla:4, step 400, Rate=181.08, Global Rate=303.85, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:43:03.773465, device xla:8, step 400, Rate=181.24, Global Rate=303.85, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:43:03.780702, device xla:5, step 400, Rate=181.09, Global Rate=303.84, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-26 23:43:03.786342, device xla:3, step 400, Rate=181.06, Global Rate=303.84, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:43:03.797624, device xla:1, step 400, Rate=181.07, Global Rate=303.83, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:43:03.806159, device xla:2, step 400, Rate=181.07, Global Rate=303.83, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:45:49.652539, device xla:7, step 500, Rate=206.74, Global Rate=304.80, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:45:49.647139, device xla:4, step 500, Rate=206.59, Global Rate=304.80, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:45:49.664705, device xla:6, step 500, Rate=206.63, Global Rate=304.79, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:45:49.657457, device xla:8, step 500, Rate=206.72, Global Rate=304.79, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:45:49.666789, device xla:5, step 500, Rate=206.60, Global Rate=304.79, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:45:49.675505, device xla:3, step 500, Rate=206.58, Global Rate=304.79, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:45:49.696402, device xla:1, step 500, Rate=206.58, Global Rate=304.78, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:45:49.685243, device xla:2, step 500, Rate=206.59, Global Rate=304.78, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:48:33.382084, device xla:7, step 600, Rate=227.93, Global Rate=306.09, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:48:33.393666, device xla:3, step 600, Rate=227.81, Global Rate=306.08, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-26 23:48:33.395745, device xla:4, step 600, Rate=227.81, Global Rate=306.08, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:48:33.403021, device xla:5, step 600, Rate=227.82, Global Rate=306.08, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:48:33.386661, device xla:8, step 600, Rate=227.92, Global Rate=306.09, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:48:33.412557, device xla:2, step 600, Rate=227.81, Global Rate=306.08, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:48:33.427208, device xla:1, step 600, Rate=227.81, Global Rate=306.07, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:48:33.440540, device xla:6, step 600, Rate=227.83, Global Rate=306.07, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:51:18.953291, device xla:6, step 700, Rate=244.13, Global Rate=306.53, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:51:18.958235, device xla:7, step 700, Rate=244.19, Global Rate=306.53, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-26 23:51:18.947994, device xla:5, step 700, Rate=244.11, Global Rate=306.53, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:51:18.978263, device xla:3, step 700, Rate=244.09, Global Rate=306.53, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:51:18.970150, device xla:1, step 700, Rate=244.10, Global Rate=306.53, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:51:18.963359, device xla:8, step 700, Rate=244.18, Global Rate=306.53, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:51:18.981074, device xla:4, step 700, Rate=244.09, Global Rate=306.53, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:51:18.987818, device xla:2, step 700, Rate=244.09, Global Rate=306.52, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:54:05.543582, device xla:7, step 800, Rate=256.82, Global Rate=306.63, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:54:05.538320, device xla:4, step 800, Rate=256.75, Global Rate=306.63, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:54:05.555747, device xla:1, step 800, Rate=256.75, Global Rate=306.63, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-26 23:54:05.548475, device xla:8, step 800, Rate=256.81, Global Rate=306.63, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:54:05.581415, device xla:3, step 800, Rate=256.73, Global Rate=306.62, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:54:05.570523, device xla:2, step 800, Rate=256.75, Global Rate=306.63, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:54:05.583686, device xla:6, step 800, Rate=256.76, Global Rate=306.62, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:54:05.595427, device xla:5, step 800, Rate=256.74, Global Rate=306.62, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:56:52.011455, device xla:7, step 900, Rate=266.97, Global Rate=306.74, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-26 23:56:52.006018, device xla:4, step 900, Rate=266.91, Global Rate=306.74, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:56:52.016745, device xla:3, step 900, Rate=266.91, Global Rate=306.74, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:56:52.000727, device xla:5, step 900, Rate=266.93, Global Rate=306.74, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-26 23:56:52.029727, device xla:6, step 900, Rate=266.93, Global Rate=306.73, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:56:52.039898, device xla:8, step 900, Rate=266.96, Global Rate=306.73, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:56:52.052660, device xla:1, step 900, Rate=266.90, Global Rate=306.73, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:56:52.063805, device xla:2, step 900, Rate=266.90, Global Rate=306.73, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:59:39.590963, device xla:6, step 1000, Rate=274.66, Global Rate=306.62, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-26 23:59:39.595952, device xla:7, step 1000, Rate=274.68, Global Rate=306.61, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:59:39.585714, device xla:5, step 1000, Rate=274.64, Global Rate=306.62, Compiles=104, _local_scalar_dense=519training torch.Size([512, 32])/ 2019-08-26 23:59:39.600949, device xla:8, step 1000, Rate=274.68, Global Rate=306.61, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:59:39.623268, device xla:1, step 1000, Rate=274.63, Global Rate=306.61, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:59:39.631088, device xla:3, step 1000, Rate=274.62, Global Rate=306.61, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-26 23:59:39.645932, device xla:4, step 1000, Rate=274.61, Global Rate=306.61, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-26 23:59:39.613297, device xla:2, step 1000, Rate=274.64, Global Rate=306.61, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-27 00:02:26.088529, device xla:6, step 1100, Rate=281.23, Global Rate=306.70, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:02:26.093208, device xla:5, step 1100, Rate=281.21, Global Rate=306.70, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:02:26.107722, device xla:3, step 1100, Rate=281.21, Global Rate=306.69, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:02:26.098360, device xla:2, step 1100, Rate=281.22, Global Rate=306.70, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:02:26.121158, device xla:1, step 1100, Rate=281.21, Global Rate=306.69, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:02:26.115406, device xla:4, step 1100, Rate=281.20, Global Rate=306.69, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:02:26.142498, device xla:7, step 1100, Rate=281.23, Global Rate=306.69, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:02:26.134640, device xla:8, step 1100, Rate=281.23, Global Rate=306.69, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:05:13.173836, device xla:7, step 1200, Rate=286.29, Global Rate=306.67, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:05:13.178657, device xla:5, step 1200, Rate=286.26, Global Rate=306.67, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:05:13.194586, device xla:8, step 1200, Rate=286.28, Global Rate=306.67, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:05:13.184380, device xla:2, step 1200, Rate=286.26, Global Rate=306.67, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:05:13.202313, device xla:3, step 1200, Rate=286.25, Global Rate=306.67, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:05:13.168090, device xla:4, step 1200, Rate=286.26, Global Rate=306.68, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:05:13.219283, device xla:1, step 1200, Rate=286.25, Global Rate=306.67, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:05:13.212644, device xla:6, step 1200, Rate=286.25, Global Rate=306.67, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-27 00:07:59.590576, device xla:5, step 1300, Rate=290.54, Global Rate=306.75, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:07:59.596026, device xla:7, step 1300, Rate=290.56, Global Rate=306.75, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:07:59.635539, device xla:3, step 1300, Rate=290.53, Global Rate=306.74, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:07:59.624846, device xla:2, step 1300, Rate=290.53, Global Rate=306.75, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:07:59.602072, device xla:1, step 1300, Rate=290.54, Global Rate=306.75, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:07:59.610984, device xla:4, step 1300, Rate=290.53, Global Rate=306.75, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:07:59.643361, device xla:8, step 1300, Rate=290.54, Global Rate=306.74, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:07:59.616810, device xla:6, step 1300, Rate=290.54, Global Rate=306.75, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:10:42.978007, device xla:6, step 1400, Rate=295.11, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:10:42.997346, device xla:1, step 1400, Rate=295.10, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:10:43.016682, device xla:3, step 1400, Rate=295.10, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:10:42.987707, device xla:2, step 1400, Rate=295.11, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:10:43.025768, device xla:8, step 1400, Rate=295.11, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:10:43.020215, device xla:7, step 1400, Rate=295.11, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:10:42.982427, device xla:4, step 1400, Rate=295.10, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:10:43.005240, device xla:5, step 1400, Rate=295.09, Global Rate=307.21, Compiles=104, _local_scalar_dense=519
training torch.Size([1024, 16])/ 2019-08-27 00:13:23.820577, device xla:6, step 1500, Rate=299.76, Global Rate=307.93, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:13:23.825053, device xla:3, step 1500, Rate=299.75, Global Rate=307.93, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:13:23.842102, device xla:4, step 1500, Rate=299.74, Global Rate=307.93, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:13:23.831899, device xla:2, step 1500, Rate=299.75, Global Rate=307.93, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:13:23.863182, device xla:1, step 1500, Rate=299.74, Global Rate=307.92, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:13:23.847984, device xla:5, step 1500, Rate=299.74, Global Rate=307.93, Compiles=104, _local_scalar_dense=519
training torch.Size([256, 64])/ 2019-08-27 00:13:23.853912, device xla:8, step 1500, Rate=299.76, Global Rate=307.93, Compiles=104, _local_scalar_dense=519
training torch.Size([512, 32])/ 2019-08-27 00:13:23.877585, device xla:7, step 1500, Rate=299.75, Global Rate=307.92, Compiles=104, _local_scalar_dense=519
Epoch 9 Training stats:
device xla:1
| epoch 009 | loss 0.441 | nll_loss 0.441 | ppl 1.36 | wps 5294 | ups 0 | wpb 11178.093 | bsz 408.163 | num_updates 13572 | lr 0.000271443 | gnorm 0.129 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 21637
device xla:2
| epoch 009 | loss 0.445 | nll_loss 0.445 | ppl 1.36 | wps 5284 | ups 0 | wpb 11157.271 | bsz 407.804 | num_updates 13572 | lr 0.000271443 | gnorm 0.123 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 22864
device xla:3
| epoch 009 | loss 0.445 | nll_loss 0.445 | ppl 1.36 | wps 5258 | ups 0 | wpb 11101.046 | bsz 408.729 | num_updates 13572 | lr 0.000271443 | gnorm 0.133 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 21197
device xla:4
| epoch 009 | loss 0.445 | nll_loss 0.445 | ppl 1.36 | wps 5278 | ups 0 | wpb 11143.532 | bsz 412.162 | num_updates 13572 | lr 0.000271443 | gnorm 0.132 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 22872
device xla:5
| epoch 009 | loss 0.441 | nll_loss 0.441 | ppl 1.36 | wps 5289 | ups 0 | wpb 11166.221 | bsz 413.652 | num_updates 13572 | lr 0.000271443 | gnorm 0.126 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 22749
device xla:6
| epoch 009 | loss 0.445 | nll_loss 0.445 | ppl 1.36 | wps 5287 | ups 0 | wpb 11163.932 | bsz 409.030 | num_updates 13572 | lr 0.000271443 | gnorm 0.127 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 22814
device xla:7
| epoch 009 | loss 0.445 | nll_loss 0.445 | ppl 1.36 | wps 5276 | ups 0 | wpb 11139.455 | bsz 408.597 | num_updates 13572 | lr 0.000271443 | gnorm 0.127 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 21669
device xla:8
| epoch 009 | loss 0.445 | nll_loss 0.445 | ppl 1.36 | wps 5267 | ups 0 | wpb 11120.095 | bsz 410.370 | num_updates 13572 | lr 0.000271443 | gnorm 0.136 | clip 0.000 | oom 0.000 | wall 28656 | train_wall 21749
Epoch 9 Tracker Rates:
Rate=296.58, Global Rate=307.79
Rate=296.46, Global Rate=307.79
Rate=296.44, Global Rate=307.79
Rate=296.50, Global Rate=307.79
Rate=296.52, Global Rate=307.79
Rate=296.42, Global Rate=307.79
Rate=296.64, Global Rate=307.79
Rate=296.56, Global Rate=307.79
Epoch 9 end 2019-08-27 00:13:38.289413
Metric: CompileTime
TotalSamples: 104
Counter: 11h04m07s264ms89.914us
ValueRate: 619ms329.894us / second
Rate: 0.00400381 / second
Percentiles: 1%=049ms149.898us; 5%=063ms970.208us; 10%=073ms669.112us; 20%=110ms188.871us; 50%=28s580ms414.981us; 80%=06m45s337ms595.036us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 109551
Counter: 02d55h05m44s570ms943.312us
ValueRate: 06s135ms721.304us / second
Rate: 4.99551 / second
Percentiles: 1%=01s163ms938.075us; 5%=01s169ms148.562us; 10%=01s173ms711.979us; 20%=01s178ms633.457us; 50%=01s189ms380.937us; 80%=01s287ms369.945us; 90%=01s291ms847.424us; 95%=01s293ms478.001us; 99%=01s297ms126.340us
Metric: InboundData
TotalSamples: 559
Counter: 1.08KB
ValueRate: 0.05B / second
Rate: 0.0243436 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 449451
Counter: 43.96GB
ValueRate: 496.85KB / second
Rate: 20.3401 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 995552
Counter: 07h21m38s119ms507.882us
ValueRate: 434ms942.440us / second
Rate: 42.8383 / second
Percentiles: 1%=452.800us; 5%=515.978us; 10%=558.904us; 20%=625.230us; 50%=869.295us; 80%=003ms304.505us; 90%=013ms90.303us; 95%=027ms800.636us; 99%=051ms58.115us
Metric: TransferFromServerTime
TotalSamples: 559
Counter: 07s606ms580.407us
ValueRate: 287.663us / second
Rate: 0.0243436 / second
Percentiles: 1%=641.086us; 5%=710.846us; 10%=753.414us; 20%=818.424us; 50%=001ms189.179us; 80%=031ms631.736us; 90%=048ms723.860us; 95%=059ms398.470us; 99%=069ms311.171us
Metric: TransferToServerTime
TotalSamples: 449451
Counter: 02d39h01m31s613ms50.762us
ValueRate: 05s939ms71.870us / second
Rate: 20.34 / second
Percentiles: 1%=001ms68.731us; 5%=001ms180.699us; 10%=001ms268.276us; 20%=001ms378.907us; 50%=002ms214.176us; 80%=895ms624.035us; 90%=969ms308.827us; 95%=01s056ms816.491us; 99%=01s091ms141.667us
Counter: CachedSyncParamMismatch
Value: 51
Counter: CachedSyncTensors
Value: 109447
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 80585519
Counter: CreateXlaTensor
Value: 525527637
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 80578487
Counter: DestroyXlaTensor
Value: 525521629
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 80578487
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 104
Counter: XRTAllocateFromTensor_Empty
Value: 22507
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 559
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 00:13:42.401927, device xla:1, step 0, Compiles=104, _local_scalar_dense=559
validation/ 2019-08-27 00:13:42.403765, device xla:8, step 0, Compiles=104, _local_scalar_dense=559
validation/ 2019-08-27 00:13:42.417288, device xla:6, step 0, Compiles=104, _local_scalar_dense=559
validation/ 2019-08-27 00:13:42.421389, device xla:5, step 0, Compiles=104, _local_scalar_dense=559
validation/ 2019-08-27 00:13:42.557863, device xla:4, step 0, Compiles=104, _local_scalar_dense=559
validation/ 2019-08-27 00:13:42.560615, device xla:3, step 0, Compiles=104, _local_scalar_dense=559
validation/ 2019-08-27 00:13:42.569731, device xla:7, step 0, Compiles=104, _local_scalar_dense=559
validation/ 2019-08-27 00:13:42.598132, device xla:2, step 0, Compiles=104, _local_scalar_dense=559
validation stats on subset "valid" - 2019-08-27 00:13:48.535757
| epoch 009 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 13572
| epoch 009 | valid on 'valid' subset | loss 4.000 | nll_loss 2.172 | ppl 4.51 | num_updates 13572
| epoch 009 | valid on 'valid' subset | loss 4.031 | nll_loss 2.234 | ppl 4.71 | num_updates 13572
| epoch 009 | valid on 'valid' subset | loss 4.062 | nll_loss 2.250 | ppl 4.76 | num_updates 13572
| epoch 009 | valid on 'valid' subset | loss 3.953 | nll_loss 2.156 | ppl 4.46 | num_updates 13572
| epoch 009 | valid on 'valid' subset | loss 3.969 | nll_loss 2.172 | ppl 4.51 | num_updates 13572
| epoch 009 | valid on 'valid' subset | loss 3.969 | nll_loss 2.188 | ppl 4.56 | num_updates 13572
| epoch 009 | valid on 'valid' subset | loss 4.062 | nll_loss 2.250 | ppl 4.76 | num_updates 13572
old learning rate: 0.0002879083998155492
new learning rate: 0.00027144264249352344
Metric: CompileTime
TotalSamples: 104
Counter: 11h04m07s264ms89.914us
ValueRate: 619ms329.894us / second
Rate: 0.00400381 / second
Percentiles: 1%=049ms149.898us; 5%=063ms970.208us; 10%=073ms669.112us; 20%=110ms188.871us; 50%=28s580ms414.981us; 80%=06m45s337ms595.036us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 109672
Counter: 02d55h05m30s591ms66.832us
ValueRate: 06s046ms543.078us / second
Rate: 5.36817 / second
Percentiles: 1%=376ms486.643us; 5%=378ms499.656us; 10%=391ms227.322us; 20%=01s172ms243.688us; 50%=01s185ms26.311us; 80%=01s286ms891.912us; 90%=01s290ms329.691us; 95%=01s293ms283.722us; 99%=01s297ms126.340us
Metric: InboundData
TotalSamples: 584
Counter: 1.13KB
ValueRate: 0.05B / second
Rate: 0.0254209 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 449691
Counter: 44.00GB
ValueRate: 931.44KB / second
Rate: 20.3567 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 996166
Counter: 07h21m10s663ms251.621us
ValueRate: 02s670ms920.293us / second
Rate: 51.5716 / second
Percentiles: 1%=448.378us; 5%=498.542us; 10%=535.542us; 20%=598.456us; 50%=774.206us; 80%=001ms430.326us; 90%=022ms393.229us; 95%=375ms16.100us; 99%=389ms165.721us
Metric: TransferFromServerTime
TotalSamples: 584
Counter: 07s649ms96.646us
ValueRate: 289.428us / second
Rate: 0.0254209 / second
Percentiles: 1%=641.086us; 5%=712.931us; 10%=753.987us; 20%=822.474us; 50%=001ms203.054us; 80%=028ms58.063us; 90%=046ms173.709us; 95%=058ms759.246us; 99%=069ms311.171us
Metric: TransferToServerTime
TotalSamples: 449691
Counter: 02d39h01m57s649ms347.524us
ValueRate: 04s206ms588.816us / second
Rate: 20.3576 / second
Percentiles: 1%=001ms100.519us; 5%=001ms221.461us; 10%=001ms308.079us; 20%=001ms442.340us; 50%=002ms88.352us; 80%=239ms983.902us; 90%=948ms37.848us; 95%=01s019ms903.364us; 99%=01s079ms362.865us
Counter: CachedSyncParamMismatch
Value: 51
Counter: CachedSyncTensors
Value: 109568
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 80587128
Counter: CreateXlaTensor
Value: 525662454
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 80580095
Counter: DestroyXlaTensor
Value: 525656446
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 80580096
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 104
Counter: XRTAllocateFromTensor_Empty
Value: 22507
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 584
Epoch 10 begin 2019-08-27 00:13:48.558644
training torch.Size([256, 64])/ 2019-08-27 00:13:57.372533, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:13:57.410030, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:13:57.501469, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:13:57.751533, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:13:57.899011, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:13:58.102646, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:13:58.284466, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:13:58.432998, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:16:48.537045, device xla:4, step 100, Rate=59.96, Global Rate=290.47, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:16:48.550054, device xla:3, step 100, Rate=59.87, Global Rate=290.45, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:16:48.526276, device xla:1, step 100, Rate=59.83, Global Rate=290.48, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:16:48.554147, device xla:2, step 100, Rate=59.83, Global Rate=290.44, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:16:48.531544, device xla:7, step 100, Rate=60.15, Global Rate=290.48, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:16:48.562223, device xla:6, step 100, Rate=60.07, Global Rate=290.43, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:16:48.541978, device xla:5, step 100, Rate=60.01, Global Rate=290.46, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:16:48.572260, device xla:8, step 100, Rate=60.19, Global Rate=290.41, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:19:35.237862, device xla:7, step 200, Rate=109.54, Global Rate=298.57, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:19:35.243162, device xla:4, step 200, Rate=109.39, Global Rate=298.57, Compiles=104, _local_scalar_dense=584
training torch.Size([1024, 16])/ 2019-08-27 00:19:35.271024, device xla:8, step 200, Rate=109.58, Global Rate=298.54, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:19:35.255157, device xla:2, step 200, Rate=109.29, Global Rate=298.55, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:19:35.248021, device xla:5, step 200, Rate=109.43, Global Rate=298.56, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:19:35.276299, device xla:6, step 200, Rate=109.48, Global Rate=298.54, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:19:35.263157, device xla:3, step 200, Rate=109.32, Global Rate=298.55, Compiles=104, _local_scalar_dense=584
training torch.Size([1024, 16])/ 2019-08-27 00:19:35.295999, device xla:1, step 200, Rate=109.27, Global Rate=298.52, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:22:21.085212, device xla:1, step 300, Rate=149.18, Global Rate=301.88, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:22:21.091669, device xla:8, step 300, Rate=149.42, Global Rate=301.87, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:22:21.098148, device xla:3, step 300, Rate=149.20, Global Rate=301.87, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:22:21.101354, device xla:6, step 300, Rate=149.34, Global Rate=301.87, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:22:21.142018, device xla:4, step 300, Rate=149.24, Global Rate=301.84, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:22:21.116697, device xla:5, step 300, Rate=149.28, Global Rate=301.86, Compiles=104, _local_scalar_dense=584training torch.Size([256, 64])/ 2019-08-27 00:22:21.128955, device xla:2, step 300, Rate=149.17, Global Rate=301.85, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:22:21.148166, device xla:7, step 300, Rate=149.36, Global Rate=301.84, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:25:06.328637, device xla:4, step 400, Rate=181.38, Global Rate=303.83, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:25:06.338221, device xla:3, step 400, Rate=181.33, Global Rate=303.83, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:25:06.340234, device xla:8, step 400, Rate=181.50, Global Rate=303.83, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:25:06.332983, device xla:1, step 400, Rate=181.31, Global Rate=303.83, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:25:06.370525, device xla:5, step 400, Rate=181.39, Global Rate=303.81, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:25:06.378256, device xla:7, step 400, Rate=181.46, Global Rate=303.81, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:25:06.344982, device xla:2, step 400, Rate=181.31, Global Rate=303.82, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:25:06.352811, device xla:6, step 400, Rate=181.44, Global Rate=303.82, Compiles=104, _local_scalar_dense=584
training torch.Size([1024, 16])/ 2019-08-27 00:27:52.141725, device xla:4, step 500, Rate=206.86, Global Rate=304.81, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:27:52.152194, device xla:6, step 500, Rate=206.91, Global Rate=304.80, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:27:52.161808, device xla:5, step 500, Rate=206.88, Global Rate=304.80, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:27:52.174545, device xla:7, step 500, Rate=206.93, Global Rate=304.80, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:27:52.145991, device xla:3, step 500, Rate=206.82, Global Rate=304.81, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:27:52.184205, device xla:1, step 500, Rate=206.79, Global Rate=304.79, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:27:52.210634, device xla:8, step 500, Rate=206.93, Global Rate=304.78, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:27:52.197410, device xla:2, step 500, Rate=206.79, Global Rate=304.79, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:30:36.311612, device xla:7, step 600, Rate=227.93, Global Rate=305.96, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:30:36.317938, device xla:4, step 600, Rate=227.86, Global Rate=305.96, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:30:36.353657, device xla:1, step 600, Rate=227.81, Global Rate=305.95, Compiles=104, _local_scalar_dense=584
training torch.Size([1024, 16])/ 2019-08-27 00:30:36.322538, device xla:2, step 600, Rate=227.83, Global Rate=305.96, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:30:36.362536, device xla:8, step 600, Rate=227.93, Global Rate=305.95, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:30:36.329632, device xla:3, step 600, Rate=227.83, Global Rate=305.96, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:30:36.336729, device xla:6, step 600, Rate=227.90, Global Rate=305.96, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:30:36.372675, device xla:5, step 600, Rate=227.86, Global Rate=305.94, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:33:20.610274, device xla:1, step 700, Rate=244.59, Global Rate=306.76, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:33:20.615441, device xla:7, step 700, Rate=244.67, Global Rate=306.76, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:33:20.620699, device xla:4, step 700, Rate=244.61, Global Rate=306.76, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:33:20.625646, device xla:3, step 700, Rate=244.59, Global Rate=306.76, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:33:20.635735, device xla:2, step 700, Rate=244.58, Global Rate=306.75, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:33:20.656623, device xla:6, step 700, Rate=244.63, Global Rate=306.75, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:33:20.644845, device xla:5, step 700, Rate=244.62, Global Rate=306.75, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:33:20.627725, device xla:8, step 700, Rate=244.68, Global Rate=306.76, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:36:06.667981, device xla:8, step 800, Rate=257.42, Global Rate=306.96, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:36:06.678725, device xla:4, step 800, Rate=257.36, Global Rate=306.95, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:36:06.672377, device xla:5, step 800, Rate=257.38, Global Rate=306.95, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:36:06.686059, device xla:7, step 800, Rate=257.39, Global Rate=306.95, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:36:06.695254, device xla:1, step 800, Rate=257.32, Global Rate=306.95, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:36:06.720161, device xla:3, step 800, Rate=257.32, Global Rate=306.94, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:36:06.706221, device xla:2, step 800, Rate=257.33, Global Rate=306.95, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:36:06.726626, device xla:6, step 800, Rate=257.37, Global Rate=306.94, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:38:50.858022, device xla:1, step 900, Rate=268.24, Global Rate=307.49, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:38:50.862771, device xla:4, step 900, Rate=268.25, Global Rate=307.49, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:38:50.868533, device xla:3, step 900, Rate=268.24, Global Rate=307.49, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:38:50.872600, device xla:2, step 900, Rate=268.24, Global Rate=307.49, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:38:50.894012, device xla:6, step 900, Rate=268.27, Global Rate=307.48, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:38:50.903272, device xla:5, step 900, Rate=268.25, Global Rate=307.48, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:38:50.884691, device xla:7, step 900, Rate=268.28, Global Rate=307.48, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:38:50.918429, device xla:8, step 900, Rate=268.28, Global Rate=307.48, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:41:35.840277, device xla:4, step 1000, Rate=276.67, Global Rate=307.77, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:41:35.858145, device xla:8, step 1000, Rate=276.71, Global Rate=307.77, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:41:35.849933, device xla:6, step 1000, Rate=276.69, Global Rate=307.77, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:41:35.861789, device xla:5, step 1000, Rate=276.68, Global Rate=307.77, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:41:35.886830, device xla:3, step 1000, Rate=276.65, Global Rate=307.76, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:41:35.844583, device xla:7, step 1000, Rate=276.70, Global Rate=307.77, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:41:35.874201, device xla:1, step 1000, Rate=276.64, Global Rate=307.77, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:41:35.892751, device xla:2, step 1000, Rate=276.64, Global Rate=307.76, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:44:22.397752, device xla:1, step 1100, Rate=282.81, Global Rate=307.74, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:44:22.402931, device xla:8, step 1100, Rate=282.85, Global Rate=307.74, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:44:22.407215, device xla:7, step 1100, Rate=282.84, Global Rate=307.74, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:44:22.412434, device xla:6, step 1100, Rate=282.83, Global Rate=307.74, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:44:22.422897, device xla:3, step 1100, Rate=282.81, Global Rate=307.73, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:44:22.449771, device xla:4, step 1100, Rate=282.80, Global Rate=307.73, Compiles=104, _local_scalar_dense=584
training torch.Size([1024, 16])/ 2019-08-27 00:44:22.435990, device xla:5, step 1100, Rate=282.82, Global Rate=307.73, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:44:22.460113, device xla:2, step 1100, Rate=282.79, Global Rate=307.73, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:47:07.817501, device xla:7, step 1200, Rate=288.18, Global Rate=307.89, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:47:07.822818, device xla:4, step 1200, Rate=288.16, Global Rate=307.88, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:47:07.849855, device xla:3, step 1200, Rate=288.14, Global Rate=307.88, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:47:07.836055, device xla:2, step 1200, Rate=288.15, Global Rate=307.88, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:47:07.827291, device xla:5, step 1200, Rate=288.17, Global Rate=307.88, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:47:07.859293, device xla:8, step 1200, Rate=288.17, Global Rate=307.88, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:47:07.864824, device xla:6, step 1200, Rate=288.16, Global Rate=307.88, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:47:07.872955, device xla:1, step 1200, Rate=288.13, Global Rate=307.88, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:49:53.083501, device xla:4, step 1300, Rate=292.49, Global Rate=308.03, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:49:53.094944, device xla:3, step 1300, Rate=292.48, Global Rate=308.03, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:49:53.096995, device xla:8, step 1300, Rate=292.51, Global Rate=308.03, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:49:53.111960, device xla:6, step 1300, Rate=292.49, Global Rate=308.03, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:49:53.087752, device xla:2, step 1300, Rate=292.49, Global Rate=308.03, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:49:53.102574, device xla:5, step 1300, Rate=292.49, Global Rate=308.03, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:49:53.137772, device xla:7, step 1300, Rate=292.48, Global Rate=308.02, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:49:53.127839, device xla:1, step 1300, Rate=292.47, Global Rate=308.03, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:52:35.485971, device xla:8, step 1400, Rate=297.06, Global Rate=308.54, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:52:35.491627, device xla:4, step 1400, Rate=297.04, Global Rate=308.54, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:52:35.513723, device xla:5, step 1400, Rate=297.04, Global Rate=308.53, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:52:35.501643, device xla:2, step 1400, Rate=297.04, Global Rate=308.54, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:52:35.521283, device xla:6, step 1400, Rate=297.05, Global Rate=308.53, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:52:35.496225, device xla:1, step 1400, Rate=297.04, Global Rate=308.54, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:52:35.533164, device xla:7, step 1400, Rate=297.04, Global Rate=308.53, Compiles=104, _local_scalar_dense=584
training torch.Size([1024, 16])/ 2019-08-27 00:52:35.542730, device xla:3, step 1400, Rate=297.02, Global Rate=308.53, Compiles=104, _local_scalar_dense=584
training torch.Size([1024, 16])/ 2019-08-27 00:55:15.927937, device xla:8, step 1500, Rate=301.47, Global Rate=309.22, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:55:15.946557, device xla:4, step 1500, Rate=301.45, Global Rate=309.22, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:55:15.938639, device xla:5, step 1500, Rate=301.46, Global Rate=309.22, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:55:15.951350, device xla:7, step 1500, Rate=301.47, Global Rate=309.22, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:55:15.932300, device xla:3, step 1500, Rate=301.46, Global Rate=309.22, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:55:15.962345, device xla:1, step 1500, Rate=301.45, Global Rate=309.22, Compiles=104, _local_scalar_dense=584
training torch.Size([256, 64])/ 2019-08-27 00:55:15.968260, device xla:6, step 1500, Rate=301.46, Global Rate=309.22, Compiles=104, _local_scalar_dense=584
training torch.Size([512, 32])/ 2019-08-27 00:55:15.985998, device xla:2, step 1500, Rate=301.44, Global Rate=309.21, Compiles=104, _local_scalar_dense=584
Epoch 10 Training stats:
device xla:1
| epoch 010 | loss 0.398 | nll_loss 0.398 | ppl 1.32 | wps 5403 | ups 0 | wpb 11165.940 | bsz 408.547 | num_updates 15080 | lr 0.000257513 | gnorm 0.115 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 23693
device xla:2
| epoch 010 | loss 0.400 | nll_loss 0.400 | ppl 1.32 | wps 5399 | ups 0 | wpb 11159.360 | bsz 408.344 | num_updates 15080 | lr 0.000257513 | gnorm 0.111 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 24914
device xla:3
| epoch 010 | loss 0.400 | nll_loss 0.400 | ppl 1.32 | wps 5367 | ups 0 | wpb 11092.345 | bsz 409.159 | num_updates 15080 | lr 0.000257513 | gnorm 0.119 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 23246
device xla:4
| epoch 010 | loss 0.400 | nll_loss 0.400 | ppl 1.32 | wps 5391 | ups 0 | wpb 11142.272 | bsz 411.060 | num_updates 15080 | lr 0.000257513 | gnorm 0.118 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 24930
device xla:5
| epoch 010 | loss 0.398 | nll_loss 0.398 | ppl 1.32 | wps 5403 | ups 0 | wpb 11167.808 | bsz 413.403 | num_updates 15080 | lr 0.000257513 | gnorm 0.113 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 24800
device xla:6
| epoch 010 | loss 0.398 | nll_loss 0.398 | ppl 1.32 | wps 5401 | ups 0 | wpb 11163.405 | bsz 409.040 | num_updates 15080 | lr 0.000257513 | gnorm 0.114 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 24859
device xla:7
| epoch 010 | loss 0.398 | nll_loss 0.398 | ppl 1.32 | wps 5401 | ups 0 | wpb 11162.400 | bsz 409.108 | num_updates 15080 | lr 0.000257513 | gnorm 0.114 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 23734
device xla:8
| epoch 010 | loss 0.400 | nll_loss 0.400 | ppl 1.32 | wps 5379 | ups 0 | wpb 11116.267 | bsz 409.855 | num_updates 15080 | lr 0.000257513 | gnorm 0.122 | clip 0.000 | oom 0.000 | wall 31167 | train_wall 23806
Epoch 10 Tracker Rates:
Rate=301.03, Global Rate=309.16
Rate=301.13, Global Rate=309.16
Rate=300.91, Global Rate=309.16
Rate=300.97, Global Rate=309.16
Rate=300.94, Global Rate=309.16
Rate=301.07, Global Rate=309.16
Rate=301.00, Global Rate=309.16
Rate=300.90, Global Rate=309.16
Epoch 10 end 2019-08-27 00:55:29.644307
Metric: CompileTime
TotalSamples: 104
Counter: 11h04m07s264ms89.914us
ValueRate: 619ms329.894us / second
Rate: 0.00400381 / second
Percentiles: 1%=049ms149.898us; 5%=063ms970.208us; 10%=073ms669.112us; 20%=110ms188.871us; 50%=28s580ms414.981us; 80%=06m45s337ms595.036us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 121736
Counter: 02d06h13m15s552ms0.443us
ValueRate: 06s161ms477.190us / second
Rate: 5.02457 / second
Percentiles: 1%=01s071ms791.276us; 5%=01s167ms642.344us; 10%=01s171ms82.760us; 20%=01s176ms170.067us; 50%=01s189ms50.111us; 80%=01s287ms703.122us; 90%=01s290ms583.623us; 95%=01s292ms706.540us; 99%=01s296ms834.053us
Metric: InboundData
TotalSamples: 624
Counter: 1.21KB
ValueRate: 0.05B / second
Rate: 0.0244953 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 499402
Counter: 48.50GB
ValueRate: 507.01KB / second
Rate: 20.7991 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1105946
Counter: 07h06m42s717ms999.891us
ValueRate: 370ms725.626us / second
Rate: 43.417 / second
Percentiles: 1%=444.509us; 5%=490.155us; 10%=535.272us; 20%=606.650us; 50%=843.899us; 80%=003ms113.780us; 90%=012ms556.243us; 95%=026ms844.684us; 99%=054ms64.198us
Metric: TransferFromServerTime
TotalSamples: 624
Counter: 07s721ms940.886us
ValueRate: 263.832us / second
Rate: 0.0244953 / second
Percentiles: 1%=617.094us; 5%=691.819us; 10%=735.771us; 20%=803.201us; 50%=001ms169.451us; 80%=023ms133.105us; 90%=045ms15.400us; 95%=058ms646.581us; 99%=069ms814.666us
Metric: TransferToServerTime
TotalSamples: 499402
Counter: 02d48h12m19s487ms37.752us
ValueRate: 05s010ms482.932us / second
Rate: 20.7989 / second
Percentiles: 1%=001ms69.685us; 5%=001ms169.613us; 10%=001ms255.334us; 20%=001ms385.015us; 50%=002ms170.040us; 80%=856ms271.999us; 90%=970ms800.738us; 95%=01s045ms790.434us; 99%=01s093ms877.683us
Counter: CachedSyncParamMismatch
Value: 51
Counter: CachedSyncTensors
Value: 121632
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 89539463
Counter: CreateXlaTensor
Value: 583934098
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 89532431
Counter: DestroyXlaTensor
Value: 583928090
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 89532431
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 104
Counter: XRTAllocateFromTensor_Empty
Value: 22677
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 624
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 00:55:33.580797, device xla:6, step 0, Compiles=104, _local_scalar_dense=624
validation/ 2019-08-27 00:55:33.583308, device xla:4, step 0, Compiles=104, _local_scalar_dense=624
validation/ 2019-08-27 00:55:33.586565, device xla:8, step 0, Compiles=104, _local_scalar_dense=624
validation/ 2019-08-27 00:55:33.588211, device xla:3, step 0, Compiles=104, _local_scalar_dense=624
validation/ 2019-08-27 00:55:33.724180, device xla:2, step 0, Compiles=104, _local_scalar_dense=624
validation/ 2019-08-27 00:55:33.726546, device xla:1, step 0, Compiles=104, _local_scalar_dense=624
validation/ 2019-08-27 00:55:33.731988, device xla:7, step 0, Compiles=104, _local_scalar_dense=624
validation/ 2019-08-27 00:55:33.735018, device xla:5, step 0, Compiles=104, _local_scalar_dense=624
validation stats on subset "valid" - 2019-08-27 00:55:39.687135
| epoch 010 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 15080
| epoch 010 | valid on 'valid' subset | loss 3.953 | nll_loss 2.141 | ppl 4.41 | num_updates 15080
| epoch 010 | valid on 'valid' subset | loss 4.000 | nll_loss 2.203 | ppl 4.60 | num_updates 15080
| epoch 010 | valid on 'valid' subset | loss 3.969 | nll_loss 2.219 | ppl 4.65 | num_updates 15080
| epoch 010 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 15080
| epoch 010 | valid on 'valid' subset | loss 3.953 | nll_loss 2.125 | ppl 4.36 | num_updates 15080
| epoch 010 | valid on 'valid' subset | loss 3.969 | nll_loss 2.141 | ppl 4.41 | num_updates 15080
| epoch 010 | valid on 'valid' subset | loss 4.031 | nll_loss 2.234 | ppl 4.71 | num_updates 15080
old learning rate: 0.00027144264249352344
new learning rate: 0.00025751310131230236
Metric: CompileTime
TotalSamples: 104
Counter: 11h04m07s264ms89.914us
ValueRate: 619ms329.894us / second
Rate: 0.00400381 / second
Percentiles: 1%=049ms149.898us; 5%=063ms970.208us; 10%=073ms669.112us; 20%=110ms188.871us; 50%=28s580ms414.981us; 80%=06m45s337ms595.036us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 121857
Counter: 02d06h14m01s605ms631.362us
ValueRate: 06s071ms233.090us / second
Rate: 5.40218 / second
Percentiles: 1%=377ms623.610us; 5%=379ms82.273us; 10%=391ms400.562us; 20%=01s170ms323.618us; 50%=01s183ms438.908us; 80%=01s286ms976.007us; 90%=01s289ms983.422us; 95%=01s292ms552.661us; 99%=01s296ms834.053us
Metric: InboundData
TotalSamples: 649
Counter: 1.26KB
ValueRate: 0.05B / second
Rate: 0.0254665 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 499642
Counter: 48.54GB
ValueRate: 956.52KB / second
Rate: 20.9049 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1106558
Counter: 07h06m12s093ms501.389us
ValueRate: 02s650ms560.100us / second
Rate: 52.7982 / second
Percentiles: 1%=427.511us; 5%=480.414us; 10%=515.522us; 20%=567.116us; 50%=749.029us; 80%=001ms334.656us; 90%=022ms26.804us; 95%=375ms633.916us; 99%=388ms410.387us
Metric: TransferFromServerTime
TotalSamples: 649
Counter: 07s825ms636.754us
ValueRate: 267.797us / second
Rate: 0.0254665 / second
Percentiles: 1%=617.094us; 5%=692.894us; 10%=736.811us; 20%=805.991us; 50%=001ms166.721us; 80%=011ms47.402us; 90%=045ms604.272us; 95%=057ms323.173us; 99%=069ms814.666us
Metric: TransferToServerTime
TotalSamples: 499642
Counter: 02d48h13m46s931ms558.213us
ValueRate: 04s284ms749.267us / second
Rate: 20.9071 / second
Percentiles: 1%=001ms88.295us; 5%=001ms190.755us; 10%=001ms295.105us; 20%=001ms427.094us; 50%=002ms45.393us; 80%=235ms373.668us; 90%=949ms95.061us; 95%=998ms614.580us; 99%=01s083ms646.205us
Counter: CachedSyncParamMismatch
Value: 51
Counter: CachedSyncTensors
Value: 121753
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 89541072
Counter: CreateXlaTensor
Value: 584068915
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 89534038
Counter: DestroyXlaTensor
Value: 584062906
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 89534039
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 104
Counter: XRTAllocateFromTensor_Empty
Value: 22677
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 649
Epoch 11 begin 2019-08-27 00:55:39.773011
training torch.Size([256, 64])/ 2019-08-27 00:55:47.934751, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 00:55:47.954262, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:55:48.035425, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:55:48.127558, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 00:55:48.362474, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 00:55:48.587856, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 00:55:48.679917, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:55:48.765203, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:58:40.861419, device xla:4, step 100, Rate=59.28, Global Rate=288.33, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:58:40.866463, device xla:7, step 100, Rate=59.47, Global Rate=288.32, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:58:40.878473, device xla:2, step 100, Rate=59.22, Global Rate=288.30, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:58:40.891513, device xla:6, step 100, Rate=59.43, Global Rate=288.28, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 00:58:40.909356, device xla:1, step 100, Rate=59.20, Global Rate=288.25, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 00:58:40.912198, device xla:5, step 100, Rate=59.35, Global Rate=288.25, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 00:58:40.896845, device xla:8, step 100, Rate=59.49, Global Rate=288.27, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 00:58:40.871073, device xla:3, step 100, Rate=59.25, Global Rate=288.31, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:01:28.341821, device xla:7, step 200, Rate=108.72, Global Rate=296.77, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:01:28.346102, device xla:6, step 200, Rate=108.69, Global Rate=296.76, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:01:28.352725, device xla:8, step 200, Rate=108.74, Global Rate=296.76, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:01:28.362909, device xla:2, step 200, Rate=108.51, Global Rate=296.75, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:01:28.375420, device xla:4, step 200, Rate=108.55, Global Rate=296.74, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:01:28.365231, device xla:1, step 200, Rate=108.51, Global Rate=296.74, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:01:28.381326, device xla:3, step 200, Rate=108.53, Global Rate=296.73, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:01:28.394105, device xla:5, step 200, Rate=108.62, Global Rate=296.72, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:04:15.238591, device xla:7, step 300, Rate=148.33, Global Rate=300.03, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:04:15.249674, device xla:1, step 300, Rate=148.17, Global Rate=300.02, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:04:15.264551, device xla:4, step 300, Rate=148.20, Global Rate=300.01, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:04:15.257178, device xla:3, step 300, Rate=148.19, Global Rate=300.02, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:04:15.243021, device xla:2, step 300, Rate=148.17, Global Rate=300.03, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:04:15.273464, device xla:5, step 300, Rate=148.26, Global Rate=300.01, Compiles=104, _local_scalar_dense=649training torch.Size([1024, 16])/ 2019-08-27 01:04:15.293953, device xla:8, step 300, Rate=148.33, Global Rate=300.00, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:04:15.311333, device xla:6, step 300, Rate=148.29, Global Rate=299.99, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:06:59.916360, device xla:6, step 400, Rate=180.84, Global Rate=302.68, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:06:59.935294, device xla:5, step 400, Rate=180.79, Global Rate=302.67, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:06:59.959272, device xla:1, step 400, Rate=180.70, Global Rate=302.66, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:06:59.944694, device xla:3, step 400, Rate=180.73, Global Rate=302.66, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:06:59.926961, device xla:8, step 400, Rate=180.86, Global Rate=302.67, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:06:59.921661, device xla:4, step 400, Rate=180.75, Global Rate=302.67, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:06:59.975545, device xla:2, step 400, Rate=180.70, Global Rate=302.65, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:06:59.965612, device xla:7, step 400, Rate=180.83, Global Rate=302.66, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:09:46.216288, device xla:4, step 500, Rate=206.18, Global Rate=303.70, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:09:46.236390, device xla:5, step 500, Rate=206.21, Global Rate=303.70, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:09:46.249829, device xla:7, step 500, Rate=206.24, Global Rate=303.69, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:09:46.221515, device xla:2, step 500, Rate=206.15, Global Rate=303.70, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:09:46.228794, device xla:3, step 500, Rate=206.16, Global Rate=303.70, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:09:46.252520, device xla:1, step 500, Rate=206.14, Global Rate=303.69, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:09:46.240587, device xla:6, step 500, Rate=206.24, Global Rate=303.69, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:09:46.262260, device xla:8, step 500, Rate=206.25, Global Rate=303.69, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:12:30.525280, device xla:4, step 600, Rate=227.26, Global Rate=304.99, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:12:30.535830, device xla:2, step 600, Rate=227.24, Global Rate=304.99, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:12:30.530504, device xla:6, step 600, Rate=227.32, Global Rate=304.99, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:12:30.550853, device xla:3, step 600, Rate=227.25, Global Rate=304.98, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:12:30.558621, device xla:8, step 600, Rate=227.33, Global Rate=304.98, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:12:30.576917, device xla:1, step 600, Rate=227.23, Global Rate=304.98, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:12:30.543246, device xla:7, step 600, Rate=227.32, Global Rate=304.99, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:12:30.579561, device xla:5, step 600, Rate=227.28, Global Rate=304.98, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:15:17.387353, device xla:6, step 700, Rate=243.22, Global Rate=305.26, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:15:17.431845, device xla:1, step 700, Rate=243.15, Global Rate=305.24, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:15:17.413845, device xla:5, step 700, Rate=243.20, Global Rate=305.25, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:15:17.438564, device xla:2, step 700, Rate=243.15, Global Rate=305.24, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:15:17.397943, device xla:8, step 700, Rate=243.24, Global Rate=305.25, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:15:17.392587, device xla:4, step 700, Rate=243.18, Global Rate=305.25, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:15:17.406909, device xla:3, step 700, Rate=243.17, Global Rate=305.25, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:15:17.447567, device xla:7, step 700, Rate=243.21, Global Rate=305.24, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:18:04.384492, device xla:6, step 800, Rate=255.90, Global Rate=305.42, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:18:04.389859, device xla:2, step 800, Rate=255.85, Global Rate=305.42, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:18:04.414321, device xla:1, step 800, Rate=255.85, Global Rate=305.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:18:04.416681, device xla:7, step 800, Rate=255.90, Global Rate=305.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:18:04.398171, device xla:5, step 800, Rate=255.88, Global Rate=305.42, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:18:04.434773, device xla:8, step 800, Rate=255.90, Global Rate=305.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:18:04.407684, device xla:4, step 800, Rate=255.85, Global Rate=305.42, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:18:04.421712, device xla:3, step 800, Rate=255.85, Global Rate=305.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:20:51.154613, device xla:6, step 900, Rate=266.12, Global Rate=305.60, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:20:51.159842, device xla:2, step 900, Rate=266.08, Global Rate=305.60, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:20:51.176454, device xla:3, step 900, Rate=266.08, Global Rate=305.59, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:20:51.196198, device xla:7, step 900, Rate=266.12, Global Rate=305.59, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:20:51.190188, device xla:4, step 900, Rate=266.08, Global Rate=305.59, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:20:51.166703, device xla:5, step 900, Rate=266.11, Global Rate=305.59, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:20:51.212002, device xla:1, step 900, Rate=266.07, Global Rate=305.59, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:20:51.200828, device xla:8, step 900, Rate=266.12, Global Rate=305.59, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:23:36.265915, device xla:8, step 1000, Rate=274.93, Global Rate=306.04, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:23:36.283243, device xla:4, step 1000, Rate=274.89, Global Rate=306.04, Compiles=104, _local_scalar_dense=649
training torch.Size([1024, 16])/ 2019-08-27 01:23:36.275198, device xla:1, step 1000, Rate=274.89, Global Rate=306.04, Compiles=104, _local_scalar_dense=649training torch.Size([1024, 16])/ 2019-08-27 01:23:36.303908, device xla:3, step 1000, Rate=274.88, Global Rate=306.03, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:23:36.311905, device xla:6, step 1000, Rate=274.90, Global Rate=306.03, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:23:36.295994, device xla:7, step 1000, Rate=274.92, Global Rate=306.04, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:23:36.289373, device xla:5, step 1000, Rate=274.90, Global Rate=306.04, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:23:36.258325, device xla:2, step 1000, Rate=274.89, Global Rate=306.04, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:26:21.341371, device xla:6, step 1100, Rate=281.97, Global Rate=306.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:26:21.353134, device xla:3, step 1100, Rate=281.95, Global Rate=306.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:26:21.346340, device xla:4, step 1100, Rate=281.95, Global Rate=306.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:26:21.360880, device xla:5, step 1100, Rate=281.95, Global Rate=306.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:26:21.378663, device xla:2, step 1100, Rate=281.93, Global Rate=306.40, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:26:21.370832, device xla:1, step 1100, Rate=281.94, Global Rate=306.41, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:26:21.390260, device xla:7, step 1100, Rate=281.96, Global Rate=306.40, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:26:21.401995, device xla:8, step 1100, Rate=281.96, Global Rate=306.40, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:29:06.407740, device xla:7, step 1200, Rate=287.62, Global Rate=306.72, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:29:06.419030, device xla:1, step 1200, Rate=287.59, Global Rate=306.72, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:29:06.412405, device xla:3, step 1200, Rate=287.60, Global Rate=306.72, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:29:06.435691, device xla:2, step 1200, Rate=287.58, Global Rate=306.72, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:29:06.443457, device xla:8, step 1200, Rate=287.61, Global Rate=306.72, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:29:06.459907, device xla:4, step 1200, Rate=287.58, Global Rate=306.71, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:29:06.469470, device xla:6, step 1200, Rate=287.59, Global Rate=306.71, Compiles=104, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:29:06.426017, device xla:5, step 1200, Rate=287.60, Global Rate=306.72, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:51.988699, device xla:6, step 1300, Rate=291.94, Global Rate=306.91, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:51.999402, device xla:7, step 1300, Rate=291.94, Global Rate=306.91, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:51.993989, device xla:4, step 1300, Rate=291.92, Global Rate=306.91, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:52.012007, device xla:1, step 1300, Rate=291.91, Global Rate=306.91, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:52.004652, device xla:2, step 1300, Rate=291.91, Global Rate=306.91, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:52.020021, device xla:5, step 1300, Rate=291.92, Global Rate=306.91, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:52.038122, device xla:3, step 1300, Rate=291.90, Global Rate=306.90, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:31:52.062775, device xla:8, step 1300, Rate=291.92, Global Rate=306.90, Compiles=104, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:34:36.139738, device xla:7, step 1400, Rate=295.93, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:34:36.156350, device xla:1, step 1400, Rate=295.91, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:34:36.144080, device xla:4, step 1400, Rate=295.92, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:34:36.149280, device xla:3, step 1400, Rate=295.92, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:34:36.164622, device xla:2, step 1400, Rate=295.91, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:34:36.159106, device xla:6, step 1400, Rate=295.92, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:34:36.171034, device xla:5, step 1400, Rate=295.92, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:34:36.183638, device xla:8, step 1400, Rate=295.93, Global Rate=307.26, Compiles=107, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:37:17.219349, device xla:6, step 1500, Rate=300.32, Global Rate=307.95, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:37:17.231488, device xla:1, step 1500, Rate=300.30, Global Rate=307.95, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:37:17.233514, device xla:3, step 1500, Rate=300.30, Global Rate=307.95, Compiles=107, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:37:17.240600, device xla:5, step 1500, Rate=300.31, Global Rate=307.94, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:37:17.250736, device xla:4, step 1500, Rate=300.30, Global Rate=307.94, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:37:17.224692, device xla:2, step 1500, Rate=300.31, Global Rate=307.95, Compiles=107, _local_scalar_dense=649
training torch.Size([256, 64])/ 2019-08-27 01:37:17.258933, device xla:7, step 1500, Rate=300.30, Global Rate=307.94, Compiles=107, _local_scalar_dense=649
training torch.Size([512, 32])/ 2019-08-27 01:37:17.268297, device xla:8, step 1500, Rate=300.31, Global Rate=307.94, Compiles=107, _local_scalar_dense=649
Epoch 11 Training stats:
device xla:1
| epoch 011 | loss 0.361 | nll_loss 0.361 | ppl 1.28 | wps 5495 | ups 0 | wpb 11160.154 | bsz 409.541 | num_updates 16588 | lr 0.000245529 | gnorm 0.105 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 25743
device xla:2
| epoch 011 | loss 0.361 | nll_loss 0.361 | ppl 1.28 | wps 5496 | ups 0 | wpb 11162.312 | bsz 408.152 | num_updates 16588 | lr 0.000245529 | gnorm 0.101 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 26971
device xla:3
| epoch 011 | loss 0.363 | nll_loss 0.363 | ppl 1.29 | wps 5464 | ups 0 | wpb 11096.180 | bsz 410.298 | num_updates 16588 | lr 0.000245529 | gnorm 0.108 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 25306
device xla:4
| epoch 011 | loss 0.363 | nll_loss 0.363 | ppl 1.29 | wps 5488 | ups 0 | wpb 11146.375 | bsz 410.236 | num_updates 16588 | lr 0.000245529 | gnorm 0.107 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 26991
device xla:5
| epoch 011 | loss 0.361 | nll_loss 0.361 | ppl 1.28 | wps 5497 | ups 0 | wpb 11164.942 | bsz 412.381 | num_updates 16588 | lr 0.000245529 | gnorm 0.103 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 26848
device xla:6
| epoch 011 | loss 0.363 | nll_loss 0.363 | ppl 1.29 | wps 5493 | ups 0 | wpb 11155.554 | bsz 409.048 | num_updates 16588 | lr 0.000245529 | gnorm 0.104 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 26928
device xla:7
| epoch 011 | loss 0.361 | nll_loss 0.361 | ppl 1.28 | wps 5497 | ups 0 | wpb 11164.285 | bsz 409.063 | num_updates 16588 | lr 0.000245529 | gnorm 0.104 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 25801
device xla:8
| epoch 011 | loss 0.363 | nll_loss 0.363 | ppl 1.29 | wps 5475 | ups 0 | wpb 11119.740 | bsz 409.788 | num_updates 16588 | lr 0.000245529 | gnorm 0.110 | clip 0.000 | oom 0.000 | wall 33689 | train_wall 25858
Epoch 11 Tracker Rates:
Rate=298.28, Global Rate=307.85
Rate=298.25, Global Rate=307.85
Rate=298.29, Global Rate=307.85
Rate=298.35, Global Rate=307.85
Rate=298.32, Global Rate=307.85
Rate=298.24, Global Rate=307.85
Rate=298.39, Global Rate=307.85
Rate=298.44, Global Rate=307.85
Epoch 11 end 2019-08-27 01:37:31.346811
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 133921
Counter: 02d16h22m42s620ms376.232us
ValueRate: 06s165ms636.505us / second
Rate: 4.99306 / second
Percentiles: 1%=01s166ms783.558us; 5%=01s170ms696.457us; 10%=01s173ms845.334us; 20%=01s176ms228.995us; 50%=01s276ms593.882us; 80%=01s288ms130.174us; 90%=01s291ms343.167us; 95%=01s294ms37.032us; 99%=01s298ms587.246us
Metric: InboundData
TotalSamples: 689
Counter: 1.34KB
ValueRate: 0.05B / second
Rate: 0.0246107 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 549341
Counter: 53.03GB
ValueRate: 502.74KB / second
Rate: 20.5778 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1216857
Counter: 08h15m04s230ms819.382us
ValueRate: 388ms738.518us / second
Rate: 43.1637 / second
Percentiles: 1%=443.496us; 5%=492.056us; 10%=542.535us; 20%=622.002us; 50%=864.020us; 80%=002ms482.633us; 90%=013ms157.643us; 95%=026ms145.340us; 99%=057ms316.362us
Metric: TransferFromServerTime
TotalSamples: 689
Counter: 07s887ms685.739us
ValueRate: 245.988us / second
Rate: 0.0246107 / second
Percentiles: 1%=617.094us; 5%=687.920us; 10%=727.796us; 20%=796.461us; 50%=001ms138.567us; 80%=009ms242.738us; 90%=043ms543.770us; 95%=057ms984.905us; 99%=069ms814.666us
Metric: TransferToServerTime
TotalSamples: 549341
Counter: 02d56h24m47s873ms855.124us
ValueRate: 05s106ms943.758us / second
Rate: 20.5785 / second
Percentiles: 1%=001ms89.204us; 5%=001ms195.302us; 10%=001ms262.125us; 20%=001ms383.229us; 50%=002ms153.171us; 80%=920ms170.154us; 90%=982ms828.587us; 95%=01s057ms560.519us; 99%=01s089ms433.058us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 133814
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 98493395
Counter: CreateXlaTensor
Value: 642340547
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 98486362
Counter: DestroyXlaTensor
Value: 642334538
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 98486362
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 22822
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 689
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 01:37:35.814693, device xla:3, step 0, Compiles=107, _local_scalar_dense=689
validation/ 2019-08-27 01:37:35.816347, device xla:5, step 0, Compiles=107, _local_scalar_dense=689
validation/ 2019-08-27 01:37:35.821353, device xla:8, step 0, Compiles=107, _local_scalar_dense=689
validation/ 2019-08-27 01:37:35.823158, device xla:2, step 0, Compiles=107, _local_scalar_dense=689
validation/ 2019-08-27 01:37:35.960498, device xla:1, step 0, Compiles=107, _local_scalar_dense=689
validation/ 2019-08-27 01:37:35.962677, device xla:6, step 0, Compiles=107, _local_scalar_dense=689
validation/ 2019-08-27 01:37:35.969364, device xla:7, step 0, Compiles=107, _local_scalar_dense=689
validation/ 2019-08-27 01:37:35.989857, device xla:4, step 0, Compiles=107, _local_scalar_dense=689
validation stats on subset "valid" - 2019-08-27 01:37:41.957076
| epoch 011 | valid on 'valid' subset | loss 3.891 | nll_loss 2.109 | ppl 4.32 | num_updates 16588
| epoch 011 | valid on 'valid' subset | loss 3.953 | nll_loss 2.141 | ppl 4.41 | num_updates 16588
| epoch 011 | valid on 'valid' subset | loss 4.000 | nll_loss 2.188 | ppl 4.56 | num_updates 16588
| epoch 011 | valid on 'valid' subset | loss 3.969 | nll_loss 2.203 | ppl 4.60 | num_updates 16588
| epoch 011 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 16588
| epoch 011 | valid on 'valid' subset | loss 3.953 | nll_loss 2.125 | ppl 4.36 | num_updates 16588
| epoch 011 | valid on 'valid' subset | loss 3.953 | nll_loss 2.141 | ppl 4.41 | num_updates 16588
| epoch 011 | valid on 'valid' subset | loss 4.031 | nll_loss 2.234 | ppl 4.71 | num_updates 16588
old learning rate: 0.00025751310131230236
new learning rate: 0.00024552910834189034
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 134042
Counter: 02d16h22m28s570ms736.044us
ValueRate: 06s072ms684.340us / second
Rate: 5.36049 / second
Percentiles: 1%=376ms410.586us; 5%=378ms909.433us; 10%=391ms241.056us; 20%=01s173ms589.964us; 50%=01s187ms557.500us; 80%=01s287ms202.558us; 90%=01s290ms223.104us; 95%=01s294ms905.954us; 99%=01s298ms587.246us
Metric: InboundData
TotalSamples: 714
Counter: 1.38KB
ValueRate: 0.05B / second
Rate: 0.025494 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 549581
Counter: 53.08GB
ValueRate: 933.10KB / second
Rate: 20.393 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1217472
Counter: 08h16m36s074ms769.495us
ValueRate: 02s779ms82.230us / second
Rate: 54.4864 / second
Percentiles: 1%=434.165us; 5%=488.155us; 10%=519.651us; 20%=580.001us; 50%=759.777us; 80%=001ms365.714us; 90%=024ms228.282us; 95%=374ms207.846us; 99%=388ms46.562us
Metric: TransferFromServerTime
TotalSamples: 714
Counter: 07s968ms572.153us
ValueRate: 248.783us / second
Rate: 0.025494 / second
Percentiles: 1%=619.099us; 5%=687.920us; 10%=728.959us; 20%=796.461us; 50%=001ms155.775us; 80%=009ms925.895us; 90%=042ms425.704us; 95%=056ms436.868us; 99%=069ms563.927us
Metric: TransferToServerTime
TotalSamples: 549581
Counter: 02d56h00m15s872ms774.838us
ValueRate: 04s330ms743.405us / second
Rate: 20.3938 / second
Percentiles: 1%=001ms90.555us; 5%=001ms210.670us; 10%=001ms287.400us; 20%=001ms432.010us; 50%=002ms310.147us; 80%=247ms346.458us; 90%=975ms804.329us; 95%=01s020ms542.025us; 99%=01s084ms413.219us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 133935
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 98495004
Counter: CreateXlaTensor
Value: 642475364
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 98487970
Counter: DestroyXlaTensor
Value: 642469356
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 98487972
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 22822
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 714
Epoch 12 begin 2019-08-27 01:37:41.978615
training torch.Size([256, 64])/ 2019-08-27 01:37:50.414115, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:37:50.698903, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 01:37:50.778805, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:37:50.799783, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:37:51.089701, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:37:51.120837, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:37:51.292536, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:37:51.438833, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:40:39.971946, device xla:2, step 100, Rate=60.52, Global Rate=293.62, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:40:39.976528, device xla:8, step 100, Rate=60.76, Global Rate=293.62, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:40:39.981309, device xla:3, step 100, Rate=60.53, Global Rate=293.61, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:40:39.992949, device xla:4, step 100, Rate=60.49, Global Rate=293.59, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:40:40.002127, device xla:7, step 100, Rate=60.70, Global Rate=293.57, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 01:40:40.009783, device xla:1, step 100, Rate=60.38, Global Rate=293.56, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:40:40.020733, device xla:6, step 100, Rate=60.62, Global Rate=293.54, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:40:39.987234, device xla:5, step 100, Rate=60.64, Global Rate=293.60, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:43:25.876974, device xla:8, step 200, Rate=110.33, Global Rate=300.93, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:43:25.881480, device xla:2, step 200, Rate=110.14, Global Rate=300.93, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:43:25.871698, device xla:3, step 200, Rate=110.15, Global Rate=300.93, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:43:25.886285, device xla:6, step 200, Rate=110.23, Global Rate=300.92, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:43:25.897951, device xla:4, step 200, Rate=110.11, Global Rate=300.91, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:43:25.905855, device xla:1, step 200, Rate=110.03, Global Rate=300.90, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:43:25.923873, device xla:7, step 200, Rate=110.27, Global Rate=300.89, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:43:25.891862, device xla:5, step 200, Rate=110.23, Global Rate=300.92, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:46:11.614947, device xla:8, step 300, Rate=150.05, Global Rate=303.55, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:46:11.620792, device xla:3, step 300, Rate=149.90, Global Rate=303.54, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:46:11.632473, device xla:2, step 300, Rate=149.89, Global Rate=303.54, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:46:11.625896, device xla:5, step 300, Rate=149.97, Global Rate=303.54, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 01:46:11.640081, device xla:6, step 300, Rate=149.96, Global Rate=303.53, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:46:11.657768, device xla:4, step 300, Rate=149.87, Global Rate=303.52, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:46:11.670577, device xla:1, step 300, Rate=149.80, Global Rate=303.51, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:46:11.650142, device xla:7, step 300, Rate=150.01, Global Rate=303.53, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:48:56.555504, device xla:2, step 400, Rate=182.00, Global Rate=305.24, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:48:56.560636, device xla:1, step 400, Rate=181.94, Global Rate=305.23, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:48:56.576982, device xla:8, step 400, Rate=182.11, Global Rate=305.23, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:48:56.590385, device xla:3, step 400, Rate=181.99, Global Rate=305.22, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:48:56.569249, device xla:4, step 400, Rate=181.99, Global Rate=305.23, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:48:56.584286, device xla:6, step 400, Rate=182.05, Global Rate=305.22, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:48:56.600940, device xla:7, step 400, Rate=182.08, Global Rate=305.22, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:48:56.550191, device xla:5, step 400, Rate=182.07, Global Rate=305.24, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:51:41.336201, device xla:8, step 500, Rate=207.84, Global Rate=306.32, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:51:41.346635, device xla:2, step 500, Rate=207.74, Global Rate=306.31, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:51:41.356902, device xla:7, step 500, Rate=207.82, Global Rate=306.31, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:51:41.351132, device xla:3, step 500, Rate=207.74, Global Rate=306.31, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 01:51:41.341201, device xla:5, step 500, Rate=207.79, Global Rate=306.31, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:51:41.364486, device xla:1, step 500, Rate=207.69, Global Rate=306.31, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:51:41.375976, device xla:4, step 500, Rate=207.72, Global Rate=306.30, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:51:41.384432, device xla:6, step 500, Rate=207.78, Global Rate=306.30, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:54:25.971436, device xla:8, step 600, Rate=228.47, Global Rate=307.09, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:54:25.990359, device xla:7, step 600, Rate=228.45, Global Rate=307.08, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:54:25.992317, device xla:4, step 600, Rate=228.38, Global Rate=307.08, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:54:25.981206, device xla:1, step 600, Rate=228.35, Global Rate=307.08, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:54:26.015360, device xla:5, step 600, Rate=228.42, Global Rate=307.07, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 01:54:26.000117, device xla:2, step 600, Rate=228.38, Global Rate=307.08, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:54:26.007011, device xla:6, step 600, Rate=228.42, Global Rate=307.07, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:54:25.975850, device xla:3, step 600, Rate=228.40, Global Rate=307.08, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:57:12.907033, device xla:8, step 700, Rate=244.12, Global Rate=307.03, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:57:12.919525, device xla:7, step 700, Rate=244.11, Global Rate=307.03, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:57:12.912000, device xla:4, step 700, Rate=244.05, Global Rate=307.03, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:57:12.927773, device xla:5, step 700, Rate=244.08, Global Rate=307.03, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:57:12.921325, device xla:3, step 700, Rate=244.06, Global Rate=307.03, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:57:12.934233, device xla:1, step 700, Rate=244.02, Global Rate=307.02, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:57:12.943973, device xla:2, step 700, Rate=244.04, Global Rate=307.02, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:57:12.951091, device xla:6, step 700, Rate=244.08, Global Rate=307.02, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:59:57.717238, device xla:8, step 800, Rate=257.43, Global Rate=307.48, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:59:57.721759, device xla:2, step 800, Rate=257.38, Global Rate=307.48, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:59:57.733334, device xla:3, step 800, Rate=257.38, Global Rate=307.48, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:59:57.742039, device xla:6, step 800, Rate=257.40, Global Rate=307.47, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:59:57.747723, device xla:1, step 800, Rate=257.35, Global Rate=307.47, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 01:59:57.757412, device xla:5, step 800, Rate=257.39, Global Rate=307.47, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 01:59:57.726209, device xla:7, step 800, Rate=257.42, Global Rate=307.48, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 01:59:57.767506, device xla:4, step 800, Rate=257.36, Global Rate=307.47, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:02:43.015024, device xla:2, step 900, Rate=267.85, Global Rate=307.73, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:02:43.020237, device xla:6, step 900, Rate=267.88, Global Rate=307.73, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:02:43.009450, device xla:3, step 900, Rate=267.86, Global Rate=307.73, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 02:02:43.039997, device xla:8, step 900, Rate=267.88, Global Rate=307.72, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:02:43.030212, device xla:1, step 900, Rate=267.83, Global Rate=307.73, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:02:43.022588, device xla:4, step 900, Rate=267.85, Global Rate=307.73, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:02:43.004186, device xla:5, step 900, Rate=267.88, Global Rate=307.73, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:02:43.048476, device xla:7, step 900, Rate=267.88, Global Rate=307.72, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:05:29.534244, device xla:2, step 1000, Rate=275.78, Global Rate=307.70, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 02:05:29.528830, device xla:3, step 1000, Rate=275.78, Global Rate=307.71, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:05:29.539433, device xla:7, step 1000, Rate=275.80, Global Rate=307.70, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:05:29.547298, device xla:1, step 1000, Rate=275.76, Global Rate=307.70, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:05:29.565573, device xla:4, step 1000, Rate=275.77, Global Rate=307.70, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:05:29.574840, device xla:8, step 1000, Rate=275.79, Global Rate=307.70, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:05:29.580992, device xla:5, step 1000, Rate=275.78, Global Rate=307.70, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:05:29.559493, device xla:6, step 1000, Rate=275.79, Global Rate=307.70, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:08:14.787623, device xla:8, step 1100, Rate=282.62, Global Rate=307.90, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:08:14.797562, device xla:2, step 1100, Rate=282.58, Global Rate=307.89, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:08:14.792133, device xla:5, step 1100, Rate=282.60, Global Rate=307.90, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:08:14.815471, device xla:1, step 1100, Rate=282.57, Global Rate=307.89, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:08:14.809823, device xla:3, step 1100, Rate=282.58, Global Rate=307.89, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:08:14.802532, device xla:4, step 1100, Rate=282.58, Global Rate=307.89, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:08:14.833111, device xla:6, step 1100, Rate=282.59, Global Rate=307.89, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:08:14.824640, device xla:7, step 1100, Rate=282.60, Global Rate=307.89, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:10:59.490742, device xla:6, step 1200, Rate=288.26, Global Rate=308.14, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:10:59.495887, device xla:8, step 1200, Rate=288.26, Global Rate=308.14, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:10:59.518142, device xla:7, step 1200, Rate=288.25, Global Rate=308.14, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:10:59.520803, device xla:4, step 1200, Rate=288.23, Global Rate=308.14, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:10:59.507745, device xla:1, step 1200, Rate=288.23, Global Rate=308.14, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:10:59.528881, device xla:2, step 1200, Rate=288.23, Global Rate=308.13, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 02:10:59.534199, device xla:3, step 1200, Rate=288.23, Global Rate=308.13, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:10:59.500672, device xla:5, step 1200, Rate=288.25, Global Rate=308.14, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:13:45.737913, device xla:8, step 1300, Rate=292.21, Global Rate=308.13, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:13:45.748166, device xla:2, step 1300, Rate=292.19, Global Rate=308.13, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:13:45.742641, device xla:6, step 1300, Rate=292.20, Global Rate=308.13, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:13:45.753473, device xla:1, step 1300, Rate=292.18, Global Rate=308.13, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:13:45.762552, device xla:3, step 1300, Rate=292.18, Global Rate=308.12, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:13:45.770909, device xla:7, step 1300, Rate=292.20, Global Rate=308.12, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:13:45.774262, device xla:5, step 1300, Rate=292.19, Global Rate=308.12, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:13:45.780349, device xla:4, step 1300, Rate=292.18, Global Rate=308.12, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:16:27.811086, device xla:8, step 1400, Rate=296.95, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:16:27.815258, device xla:3, step 1400, Rate=296.94, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([1024, 16])/ 2019-08-27 02:16:27.820479, device xla:1, step 1400, Rate=296.93, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:16:27.849156, device xla:7, step 1400, Rate=296.94, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:16:27.843147, device xla:6, step 1400, Rate=296.93, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:16:27.829125, device xla:4, step 1400, Rate=296.93, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:16:27.836282, device xla:5, step 1400, Rate=296.94, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:16:27.852438, device xla:2, step 1400, Rate=296.92, Global Rate=308.67, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:19:09.667503, device xla:2, step 1500, Rate=300.82, Global Rate=309.17, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:19:09.671579, device xla:6, step 1500, Rate=300.82, Global Rate=309.17, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:19:09.678400, device xla:7, step 1500, Rate=300.83, Global Rate=309.17, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:19:09.681586, device xla:5, step 1500, Rate=300.82, Global Rate=309.17, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:19:09.689376, device xla:4, step 1500, Rate=300.81, Global Rate=309.17, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:19:09.697158, device xla:3, step 1500, Rate=300.81, Global Rate=309.17, Compiles=107, _local_scalar_dense=714
training torch.Size([512, 32])/ 2019-08-27 02:19:09.707925, device xla:8, step 1500, Rate=300.81, Global Rate=309.17, Compiles=107, _local_scalar_dense=714
training torch.Size([256, 64])/ 2019-08-27 02:19:09.721871, device xla:1, step 1500, Rate=300.79, Global Rate=309.16, Compiles=107, _local_scalar_dense=714
Epoch 12 Training stats:
device xla:1
| epoch 012 | loss 0.332 | nll_loss 0.332 | ppl 1.26 | wps 5577 | ups 0 | wpb 11156.341 | bsz 409.945 | num_updates 18096 | lr 0.000235076 | gnorm 0.097 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 27790
device xla:2
| epoch 012 | loss 0.332 | nll_loss 0.332 | ppl 1.26 | wps 5580 | ups 0 | wpb 11163.185 | bsz 408.021 | num_updates 18096 | lr 0.000235076 | gnorm 0.093 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 29031
device xla:3
| epoch 012 | loss 0.334 | nll_loss 0.334 | ppl 1.26 | wps 5551 | ups 0 | wpb 11105.435 | bsz 410.851 | num_updates 18096 | lr 0.000235076 | gnorm 0.100 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 27362
device xla:4
| epoch 012 | loss 0.334 | nll_loss 0.334 | ppl 1.26 | wps 5568 | ups 0 | wpb 11138.880 | bsz 410.907 | num_updates 18096 | lr 0.000235076 | gnorm 0.099 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 29044
device xla:5
| epoch 012 | loss 0.332 | nll_loss 0.332 | ppl 1.26 | wps 5580 | ups 0 | wpb 11162.443 | bsz 411.501 | num_updates 18096 | lr 0.000235076 | gnorm 0.094 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 28903
device xla:6
| epoch 012 | loss 0.334 | nll_loss 0.334 | ppl 1.26 | wps 5573 | ups 0 | wpb 11148.280 | bsz 409.153 | num_updates 18096 | lr 0.000235076 | gnorm 0.095 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 28978
device xla:7
| epoch 012 | loss 0.332 | nll_loss 0.332 | ppl 1.26 | wps 5584 | ups 0 | wpb 11170.833 | bsz 408.898 | num_updates 18096 | lr 0.000235076 | gnorm 0.095 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 27854
device xla:8
| epoch 012 | loss 0.334 | nll_loss 0.334 | ppl 1.26 | wps 5561 | ups 0 | wpb 11124.070 | bsz 409.195 | num_updates 18096 | lr 0.000235076 | gnorm 0.102 | clip 0.000 | oom 0.000 | wall 36201 | train_wall 27920
Epoch 12 Tracker Rates:
Rate=298.67, Global Rate=309.06
Rate=298.47, Global Rate=309.06
Rate=298.58, Global Rate=309.06
Rate=298.56, Global Rate=309.06
Rate=298.53, Global Rate=309.06
Rate=298.49, Global Rate=309.06
Rate=298.52, Global Rate=309.06
Rate=298.63, Global Rate=309.06
Epoch 12 end 2019-08-27 02:19:23.836375
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 146106
Counter: 02d26h06m15s130ms875.589us
ValueRate: 06s141ms351.263us / second
Rate: 4.98825 / second
Percentiles: 1%=01s166ms172.513us; 5%=01s170ms285.622us; 10%=01s174ms817.262us; 20%=01s178ms517.240us; 50%=01s193ms421.789us; 80%=01s288ms424.346us; 90%=01s292ms689.668us; 95%=01s296ms581.595us; 99%=01s302ms966.311us
Metric: InboundData
TotalSamples: 754
Counter: 1.46KB
ValueRate: 0.05B / second
Rate: 0.0247144 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 599356
Counter: 57.57GB
ValueRate: 499.73KB / second
Rate: 20.4446 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1327274
Counter: 08h00m16s940ms497.501us
ValueRate: 420ms867.812us / second
Rate: 42.8336 / second
Percentiles: 1%=457.094us; 5%=517.618us; 10%=563.351us; 20%=637.316us; 50%=876.008us; 80%=003ms806.819us; 90%=013ms150.550us; 95%=028ms213.318us; 99%=053ms281.375us
Metric: TransferFromServerTime
TotalSamples: 754
Counter: 07s145ms448.942us
ValueRate: 234.212us / second
Rate: 0.0247144 / second
Percentiles: 1%=619.099us; 5%=684.649us; 10%=727.007us; 20%=794.035us; 50%=001ms138.567us; 80%=009ms551.269us; 90%=042ms159.615us; 95%=056ms309.439us; 99%=069ms563.927us
Metric: TransferToServerTime
TotalSamples: 599356
Counter: 02d04h12m41s650ms305.114us
ValueRate: 05s917ms859.776us / second
Rate: 20.4443 / second
Percentiles: 1%=001ms72.229us; 5%=001ms168.377us; 10%=001ms269.223us; 20%=001ms394.019us; 50%=002ms62.531us; 80%=877ms890.103us; 90%=967ms444.358us; 95%=01s038ms77.424us; 99%=01s075ms104.650us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 145999
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 107447403
Counter: CreateXlaTensor
Value: 700746996
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 107440371
Counter: DestroyXlaTensor
Value: 700740988
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 107440371
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 22942
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 754
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 02:19:27.894626, device xla:6, step 0, Compiles=107, _local_scalar_dense=754
validation/ 2019-08-27 02:19:28.057252, device xla:1, step 0, Compiles=107, _local_scalar_dense=754
validation/ 2019-08-27 02:19:28.075649, device xla:7, step 0, Compiles=107, _local_scalar_dense=754
validation/ 2019-08-27 02:19:28.079261, device xla:3, step 0, Compiles=107, _local_scalar_dense=754
validation/ 2019-08-27 02:19:28.082173, device xla:4, step 0, Compiles=107, _local_scalar_dense=754
validation/ 2019-08-27 02:19:28.089026, device xla:5, step 0, Compiles=107, _local_scalar_dense=754
validation/ 2019-08-27 02:19:28.091662, device xla:8, step 0, Compiles=107, _local_scalar_dense=754
validation/ 2019-08-27 02:19:28.093219, device xla:2, step 0, Compiles=107, _local_scalar_dense=754
validation stats on subset "valid" - 2019-08-27 02:19:34.085532
| epoch 012 | valid on 'valid' subset | loss 3.891 | nll_loss 2.109 | ppl 4.32 | num_updates 18096
| epoch 012 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 18096
| epoch 012 | valid on 'valid' subset | loss 3.969 | nll_loss 2.172 | ppl 4.51 | num_updates 18096
| epoch 012 | valid on 'valid' subset | loss 3.969 | nll_loss 2.203 | ppl 4.60 | num_updates 18096
| epoch 012 | valid on 'valid' subset | loss 3.922 | nll_loss 2.094 | ppl 4.27 | num_updates 18096
| epoch 012 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 18096
| epoch 012 | valid on 'valid' subset | loss 3.953 | nll_loss 2.125 | ppl 4.36 | num_updates 18096
| epoch 012 | valid on 'valid' subset | loss 4.000 | nll_loss 2.203 | ppl 4.60 | num_updates 18096
old learning rate: 0.00024552910834189034
new learning rate: 0.00023507622406976865
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 146227
Counter: 02d26h07m01s120ms614.375us
ValueRate: 06s042ms52.710us / second
Rate: 5.34353 / second
Percentiles: 1%=377ms804.299us; 5%=378ms96.581us; 10%=391ms439.602us; 20%=01s173ms199.809us; 50%=01s187ms953.086us; 80%=01s288ms71.045us; 90%=01s291ms345.357us; 95%=01s295ms841.763us; 99%=01s302ms966.311us
Metric: InboundData
TotalSamples: 779
Counter: 1.51KB
ValueRate: 0.05B / second
Rate: 0.0255253 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 599596
Counter: 57.61GB
ValueRate: 932.76KB / second
Rate: 20.3855 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1327899
Counter: 08h01m48s832ms886.163us
ValueRate: 02s740ms169.579us / second
Rate: 51.3799 / second
Percentiles: 1%=448.217us; 5%=487.630us; 10%=519.132us; 20%=572.528us; 50%=777.429us; 80%=001ms329.250us; 90%=024ms222.989us; 95%=375ms59.298us; 99%=388ms175.727us
Metric: TransferFromServerTime
TotalSamples: 779
Counter: 07s184ms665.283us
ValueRate: 235.385us / second
Rate: 0.0255253 / second
Percentiles: 1%=619.099us; 5%=687.920us; 10%=727.239us; 20%=794.035us; 50%=001ms128.191us; 80%=008ms283.771us; 90%=042ms91.806us; 95%=056ms58.859us; 99%=069ms563.927us
Metric: TransferToServerTime
TotalSamples: 599596
Counter: 02d05h12m08s133ms747.692us
ValueRate: 04s195ms417.536us / second
Rate: 20.385 / second
Percentiles: 1%=001ms85.674us; 5%=001ms195.969us; 10%=001ms308.209us; 20%=001ms464.780us; 50%=002ms183.659us; 80%=247ms148.347us; 90%=943ms823.478us; 95%=987ms54.491us; 99%=01s055ms384.565us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 146120
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 107449012
Counter: CreateXlaTensor
Value: 700881813
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 107441979
Counter: DestroyXlaTensor
Value: 700875804
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 107441979
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 22942
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 779
Epoch 13 begin 2019-08-27 02:19:34.105324
training torch.Size([512, 32])/ 2019-08-27 02:19:43.086513, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:19:43.132297, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:19:43.231782, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:19:43.330565, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:19:43.427200, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:19:43.503522, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:19:44.064724, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:19:44.174570, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:22:33.688922, device xla:5, step 100, Rate=60.14, Global Rate=291.03, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:22:33.718801, device xla:1, step 100, Rate=60.01, Global Rate=290.98, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:22:33.710056, device xla:6, step 100, Rate=60.16, Global Rate=291.00, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:22:33.721891, device xla:4, step 100, Rate=60.06, Global Rate=290.98, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:22:33.694074, device xla:3, step 100, Rate=60.11, Global Rate=291.02, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:22:33.733107, device xla:2, step 100, Rate=60.02, Global Rate=290.96, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:22:33.701034, device xla:7, step 100, Rate=60.40, Global Rate=291.01, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:22:33.741875, device xla:8, step 100, Rate=60.35, Global Rate=290.95, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:25:18.074096, device xla:6, step 200, Rate=110.43, Global Rate=300.90, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:25:18.089533, device xla:8, step 200, Rate=110.59, Global Rate=300.89, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:25:18.078653, device xla:3, step 200, Rate=110.38, Global Rate=300.90, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:25:18.112468, device xla:2, step 200, Rate=110.31, Global Rate=300.87, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:25:18.107537, device xla:4, step 200, Rate=110.34, Global Rate=300.87, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:25:18.096882, device xla:5, step 200, Rate=110.40, Global Rate=300.88, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:25:18.067463, device xla:1, step 200, Rate=110.32, Global Rate=300.91, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:25:18.118844, device xla:7, step 200, Rate=110.60, Global Rate=300.86, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:28:02.210918, device xla:8, step 300, Rate=150.86, Global Rate=304.49, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:28:02.229727, device xla:6, step 300, Rate=150.72, Global Rate=304.48, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:28:02.236411, device xla:1, step 300, Rate=150.63, Global Rate=304.48, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:28:02.215512, device xla:5, step 300, Rate=150.71, Global Rate=304.49, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:28:02.250593, device xla:3, step 300, Rate=150.68, Global Rate=304.47, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:28:02.220762, device xla:7, step 300, Rate=150.88, Global Rate=304.49, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:28:02.240677, device xla:4, step 300, Rate=150.66, Global Rate=304.47, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:28:02.265990, device xla:2, step 300, Rate=150.63, Global Rate=304.46, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:30:47.926251, device xla:6, step 400, Rate=182.38, Global Rate=305.60, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:30:47.936346, device xla:7, step 400, Rate=182.50, Global Rate=305.59, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:30:47.952670, device xla:8, step 400, Rate=182.47, Global Rate=305.59, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:30:47.930995, device xla:5, step 400, Rate=182.36, Global Rate=305.60, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:30:47.944903, device xla:3, step 400, Rate=182.34, Global Rate=305.59, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:30:47.964912, device xla:4, step 400, Rate=182.32, Global Rate=305.58, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:30:47.974320, device xla:1, step 400, Rate=182.29, Global Rate=305.58, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:30:47.958178, device xla:2, step 400, Rate=182.31, Global Rate=305.58, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:33:34.463317, device xla:8, step 500, Rate=207.48, Global Rate=305.96, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:33:34.469400, device xla:3, step 500, Rate=207.37, Global Rate=305.96, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:33:34.509129, device xla:7, step 500, Rate=207.47, Global Rate=305.95, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:33:34.476273, device xla:5, step 500, Rate=207.37, Global Rate=305.96, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:33:34.481950, device xla:2, step 500, Rate=207.34, Global Rate=305.96, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:33:34.500274, device xla:1, step 500, Rate=207.32, Global Rate=305.95, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:33:34.512260, device xla:6, step 500, Rate=207.37, Global Rate=305.95, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:33:34.490666, device xla:4, step 500, Rate=207.35, Global Rate=305.95, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:36:18.557255, device xla:3, step 600, Rate=228.30, Global Rate=306.96, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:36:18.551751, device xla:2, step 600, Rate=228.28, Global Rate=306.96, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:36:18.573441, device xla:8, step 600, Rate=228.38, Global Rate=306.95, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:36:18.564633, device xla:4, step 600, Rate=228.29, Global Rate=306.95, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:36:18.586420, device xla:7, step 600, Rate=228.39, Global Rate=306.95, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:36:18.579037, device xla:5, step 600, Rate=228.30, Global Rate=306.95, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:36:18.593206, device xla:6, step 600, Rate=228.31, Global Rate=306.95, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:36:18.602083, device xla:1, step 600, Rate=228.26, Global Rate=306.94, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:39:02.004035, device xla:6, step 700, Rate=245.31, Global Rate=307.84, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:39:02.008233, device xla:5, step 700, Rate=245.30, Global Rate=307.84, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:39:02.013497, device xla:8, step 700, Rate=245.36, Global Rate=307.84, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:39:02.060553, device xla:2, step 700, Rate=245.25, Global Rate=307.83, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:39:02.038249, device xla:3, step 700, Rate=245.28, Global Rate=307.83, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:39:02.022135, device xla:7, step 700, Rate=245.37, Global Rate=307.84, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:39:02.067711, device xla:1, step 700, Rate=245.25, Global Rate=307.82, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:39:02.045684, device xla:4, step 700, Rate=245.27, Global Rate=307.83, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:41:48.054700, device xla:6, step 800, Rate=257.92, Global Rate=307.90, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:41:48.069160, device xla:8, step 800, Rate=257.95, Global Rate=307.90, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:41:48.074350, device xla:5, step 800, Rate=257.90, Global Rate=307.90, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:41:48.083424, device xla:4, step 800, Rate=257.89, Global Rate=307.90, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:41:48.094524, device xla:2, step 800, Rate=257.88, Global Rate=307.89, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:41:48.101152, device xla:1, step 800, Rate=257.87, Global Rate=307.89, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:41:48.060734, device xla:7, step 800, Rate=257.96, Global Rate=307.90, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:41:48.112220, device xla:3, step 800, Rate=257.88, Global Rate=307.89, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:44:31.967385, device xla:8, step 900, Rate=268.84, Global Rate=308.39, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:44:31.961743, device xla:2, step 900, Rate=268.79, Global Rate=308.39, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:44:31.973008, device xla:1, step 900, Rate=268.79, Global Rate=308.39, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:44:31.956333, device xla:5, step 900, Rate=268.80, Global Rate=308.39, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:44:31.975043, device xla:6, step 900, Rate=268.80, Global Rate=308.39, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:44:31.981417, device xla:4, step 900, Rate=268.79, Global Rate=308.39, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:44:31.995764, device xla:7, step 900, Rate=268.84, Global Rate=308.39, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:44:32.008650, device xla:3, step 900, Rate=268.78, Global Rate=308.38, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:47:15.834435, device xla:6, step 1000, Rate=277.53, Global Rate=308.79, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:47:15.844821, device xla:8, step 1000, Rate=277.56, Global Rate=308.79, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:47:15.850728, device xla:1, step 1000, Rate=277.51, Global Rate=308.79, Compiles=107, _local_scalar_dense=779training torch.Size([1024, 16])/ 2019-08-27 02:47:15.829116, device xla:5, step 1000, Rate=277.53, Global Rate=308.79, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:47:15.839293, device xla:2, step 1000, Rate=277.52, Global Rate=308.79, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:47:15.855163, device xla:7, step 1000, Rate=277.56, Global Rate=308.79, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:47:15.866234, device xla:3, step 1000, Rate=277.52, Global Rate=308.79, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:47:15.875226, device xla:4, step 1000, Rate=277.51, Global Rate=308.79, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:50:00.265300, device xla:6, step 1100, Rate=284.30, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:50:00.259991, device xla:2, step 1100, Rate=284.29, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:50:00.295575, device xla:8, step 1100, Rate=284.31, Global Rate=309.02, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:50:00.270419, device xla:7, step 1100, Rate=284.33, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:50:00.286340, device xla:4, step 1100, Rate=284.29, Global Rate=309.02, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:50:00.278866, device xla:1, step 1100, Rate=284.29, Global Rate=309.02, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:50:00.301785, device xla:5, step 1100, Rate=284.28, Global Rate=309.02, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:50:00.308254, device xla:3, step 1100, Rate=284.29, Global Rate=309.02, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:52:45.878373, device xla:6, step 1200, Rate=289.27, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:52:45.882735, device xla:8, step 1200, Rate=289.29, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:52:45.895252, device xla:1, step 1200, Rate=289.26, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:52:45.902504, device xla:7, step 1200, Rate=289.29, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:52:45.921697, device xla:3, step 1200, Rate=289.26, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:52:45.935324, device xla:2, step 1200, Rate=289.24, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:52:45.911718, device xla:5, step 1200, Rate=289.26, Global Rate=309.03, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:52:45.887123, device xla:4, step 1200, Rate=289.27, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:55:31.511222, device xla:6, step 1300, Rate=293.24, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:55:31.536015, device xla:4, step 1300, Rate=293.23, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:55:31.521462, device xla:7, step 1300, Rate=293.26, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([1024, 16])/ 2019-08-27 02:55:31.515839, device xla:2, step 1300, Rate=293.24, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:55:31.539685, device xla:8, step 1300, Rate=293.25, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:55:31.530294, device xla:5, step 1300, Rate=293.24, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:55:31.560916, device xla:3, step 1300, Rate=293.23, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:55:31.545528, device xla:1, step 1300, Rate=293.23, Global Rate=309.04, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:58:14.956792, device xla:6, step 1400, Rate=297.24, Global Rate=309.34, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:58:14.969705, device xla:1, step 1400, Rate=297.24, Global Rate=309.34, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:58:14.986559, device xla:8, step 1400, Rate=297.25, Global Rate=309.34, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:58:14.991676, device xla:5, step 1400, Rate=297.23, Global Rate=309.34, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:58:14.961626, device xla:7, step 1400, Rate=297.26, Global Rate=309.34, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:58:14.997559, device xla:4, step 1400, Rate=297.23, Global Rate=309.33, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 02:58:14.978949, device xla:3, step 1400, Rate=297.24, Global Rate=309.34, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 02:58:15.007926, device xla:2, step 1400, Rate=297.22, Global Rate=309.33, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 03:00:57.824845, device xla:5, step 1500, Rate=300.67, Global Rate=309.67, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 03:00:57.830195, device xla:8, step 1500, Rate=300.68, Global Rate=309.67, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 03:00:57.835086, device xla:6, step 1500, Rate=300.66, Global Rate=309.67, Compiles=107, _local_scalar_dense=779
training torch.Size([512, 32])/ 2019-08-27 03:00:57.848754, device xla:3, step 1500, Rate=300.67, Global Rate=309.67, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 03:00:57.866780, device xla:1, step 1500, Rate=300.65, Global Rate=309.66, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 03:00:57.869968, device xla:2, step 1500, Rate=300.65, Global Rate=309.66, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 03:00:57.856454, device xla:4, step 1500, Rate=300.66, Global Rate=309.67, Compiles=107, _local_scalar_dense=779
training torch.Size([256, 64])/ 2019-08-27 03:00:57.839983, device xla:7, step 1500, Rate=300.68, Global Rate=309.67, Compiles=107, _local_scalar_dense=779
Epoch 13 Training stats:
device xla:1
| epoch 013 | loss 0.307 | nll_loss 0.307 | ppl 1.24 | wps 5649 | ups 1 | wpb 11154.044 | bsz 409.203 | num_updates 19604 | lr 0.000225854 | gnorm 0.089 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 29837
device xla:2
| epoch 013 | loss 0.307 | nll_loss 0.307 | ppl 1.24 | wps 5655 | ups 1 | wpb 11165.958 | bsz 407.649 | num_updates 19604 | lr 0.000225854 | gnorm 0.085 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 31089
device xla:3
| epoch 013 | loss 0.309 | nll_loss 0.309 | ppl 1.24 | wps 5627 | ups 1 | wpb 11111.204 | bsz 411.541 | num_updates 19604 | lr 0.000225854 | gnorm 0.092 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 29408
device xla:4
| epoch 013 | loss 0.309 | nll_loss 0.309 | ppl 1.24 | wps 5640 | ups 1 | wpb 11137.001 | bsz 411.436 | num_updates 19604 | lr 0.000225854 | gnorm 0.091 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 31091
device xla:5
| epoch 013 | loss 0.307 | nll_loss 0.307 | ppl 1.24 | wps 5653 | ups 1 | wpb 11163.198 | bsz 411.358 | num_updates 19604 | lr 0.000225854 | gnorm 0.087 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 30950
device xla:6
| epoch 013 | loss 0.309 | nll_loss 0.309 | ppl 1.24 | wps 5644 | ups 1 | wpb 11145.103 | bsz 409.125 | num_updates 19604 | lr 0.000225854 | gnorm 0.088 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 31039
device xla:7
| epoch 013 | loss 0.307 | nll_loss 0.307 | ppl 1.24 | wps 5657 | ups 1 | wpb 11169.210 | bsz 408.694 | num_updates 19604 | lr 0.000225854 | gnorm 0.088 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 29902
device xla:8
| epoch 013 | loss 0.309 | nll_loss 0.309 | ppl 1.24 | wps 5634 | ups 1 | wpb 11123.791 | bsz 409.464 | num_updates 19604 | lr 0.000225854 | gnorm 0.094 | clip 0.000 | oom 0.000 | wall 38710 | train_wall 29974
Epoch 13 Tracker Rates:
Rate=298.48, Global Rate=309.55
Rate=298.49, Global Rate=309.55
Rate=298.42, Global Rate=309.55
Rate=298.44, Global Rate=309.55
Rate=298.33, Global Rate=309.55
Rate=298.36, Global Rate=309.55
Rate=298.39, Global Rate=309.55
Rate=298.35, Global Rate=309.55
Epoch 13 end 2019-08-27 03:01:12.001283
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 158291
Counter: 03d37h14m24s429ms818.452us
ValueRate: 06s141ms522.640us / second
Rate: 4.96038 / second
Percentiles: 1%=01s165ms518.506us; 5%=01s170ms830.101us; 10%=01s172ms335.114us; 20%=01s177ms52.799us; 50%=01s279ms950.972us; 80%=01s289ms206.532us; 90%=01s293ms49.632us; 95%=01s296ms167.873us; 99%=01s308ms420.321us
Metric: InboundData
TotalSamples: 819
Counter: 1.59KB
ValueRate: 0.05B / second
Rate: 0.0248057 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 649302
Counter: 62.10GB
ValueRate: 495.61KB / second
Rate: 20.3381 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1437118
Counter: 08h10m49s965ms268.615us
ValueRate: 442ms229.600us / second
Rate: 42.9684 / second
Percentiles: 1%=432.312us; 5%=495.501us; 10%=543.914us; 20%=614.914us; 50%=829.055us; 80%=003ms138.787us; 90%=012ms307.431us; 95%=029ms135.728us; 99%=052ms431.387us
Metric: TransferFromServerTime
TotalSamples: 819
Counter: 07s249ms467.662us
ValueRate: 219.570us / second
Rate: 0.0248057 / second
Percentiles: 1%=617.094us; 5%=674.124us; 10%=718.868us; 20%=785.082us; 50%=001ms94.138us; 80%=007ms887.539us; 90%=041ms253.954us; 95%=056ms902.589us; 99%=068ms932.198us
Metric: TransferToServerTime
TotalSamples: 649302
Counter: 02d13h23m08s764ms198.778us
ValueRate: 05s956ms425.997us / second
Rate: 20.338 / second
Percentiles: 1%=001ms63.023us; 5%=001ms172.172us; 10%=001ms252.836us; 20%=001ms384.359us; 50%=002ms52.883us; 80%=889ms498.657us; 90%=976ms341.659us; 95%=01s041ms66.649us; 99%=01s084ms699.106us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 158184
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 116401342
Counter: CreateXlaTensor
Value: 759153445
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 116394309
Counter: DestroyXlaTensor
Value: 759147436
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 116394309
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23077
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 819
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 03:01:16.024964, device xla:1, step 0, Compiles=107, _local_scalar_dense=819
validation/ 2019-08-27 03:01:16.033316, device xla:2, step 0, Compiles=107, _local_scalar_dense=819
validation/ 2019-08-27 03:01:16.185961, device xla:5, step 0, Compiles=107, _local_scalar_dense=819
validation/ 2019-08-27 03:01:16.187842, device xla:3, step 0, Compiles=107, _local_scalar_dense=819
validation/ 2019-08-27 03:01:16.196518, device xla:4, step 0, Compiles=107, _local_scalar_dense=819
validation/ 2019-08-27 03:01:16.206912, device xla:8, step 0, Compiles=107, _local_scalar_dense=819
validation/ 2019-08-27 03:01:16.219989, device xla:7, step 0, Compiles=107, _local_scalar_dense=819
validation/ 2019-08-27 03:01:16.221613, device xla:6, step 0, Compiles=107, _local_scalar_dense=819
validation stats on subset "valid" - 2019-08-27 03:01:22.191531
| epoch 013 | valid on 'valid' subset | loss 3.891 | nll_loss 2.094 | ppl 4.27 | num_updates 19604
| epoch 013 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 19604
| epoch 013 | valid on 'valid' subset | loss 3.953 | nll_loss 2.172 | ppl 4.51 | num_updates 19604
| epoch 013 | valid on 'valid' subset | loss 3.969 | nll_loss 2.188 | ppl 4.56 | num_updates 19604
| epoch 013 | valid on 'valid' subset | loss 3.922 | nll_loss 2.094 | ppl 4.27 | num_updates 19604
| epoch 013 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 19604
| epoch 013 | valid on 'valid' subset | loss 3.953 | nll_loss 2.125 | ppl 4.36 | num_updates 19604
| epoch 013 | valid on 'valid' subset | loss 4.000 | nll_loss 2.188 | ppl 4.56 | num_updates 19604
old learning rate: 0.00023507622406976865
new learning rate: 0.00022585393058257823
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 158412
Counter: 03d37h15m10s408ms273.255us
ValueRate: 06s045ms348.223us / second
Rate: 5.31883 / second
Percentiles: 1%=376ms417.349us; 5%=378ms33.403us; 10%=392ms722.693us; 20%=01s172ms84.133us; 50%=01s189ms342.544us; 80%=01s288ms900.888us; 90%=01s292ms115.960us; 95%=01s295ms308.030us; 99%=01s308ms420.321us
Metric: InboundData
TotalSamples: 844
Counter: 1.64KB
ValueRate: 0.05B / second
Rate: 0.025555 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 649542
Counter: 62.15GB
ValueRate: 942.80KB / second
Rate: 20.605 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1437733
Counter: 08h10m20s106ms984.386us
ValueRate: 02s655ms152.308us / second
Rate: 51.5269 / second
Percentiles: 1%=441.550us; 5%=486.026us; 10%=527.630us; 20%=585.579us; 50%=771.307us; 80%=001ms377.829us; 90%=026ms480.418us; 95%=374ms296.104us; 99%=388ms996.052us
Metric: TransferFromServerTime
TotalSamples: 844
Counter: 07s289ms977.572us
ValueRate: 220.699us / second
Rate: 0.025555 / second
Percentiles: 1%=617.094us; 5%=677.899us; 10%=721.453us; 20%=786.595us; 50%=001ms91.085us; 80%=006ms356.299us; 90%=041ms609.731us; 95%=056ms793.445us; 99%=068ms932.198us
Metric: TransferToServerTime
TotalSamples: 649542
Counter: 02d13h24m34s495ms182.009us
ValueRate: 04s234ms890.384us / second
Rate: 20.6051 / second
Percentiles: 1%=001ms72.825us; 5%=001ms194.345us; 10%=001ms280.365us; 20%=001ms426.373us; 50%=002ms293.243us; 80%=239ms834.432us; 90%=946ms629.561us; 95%=979ms766.263us; 99%=01s058ms993.165us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 158305
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 116402951
Counter: CreateXlaTensor
Value: 759288262
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 116395918
Counter: DestroyXlaTensor
Value: 759282253
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 116395918
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23077
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 844
Epoch 14 begin 2019-08-27 03:01:22.212287
training torch.Size([512, 32])/ 2019-08-27 03:01:30.647314, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:01:30.686629, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:01:30.723008, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:01:30.830616, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:01:31.018672, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:01:31.042535, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:01:31.458226, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:01:31.711668, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:04:22.519337, device xla:8, step 100, Rate=59.95, Global Rate=289.89, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:04:22.523368, device xla:4, step 100, Rate=59.64, Global Rate=289.88, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:04:22.527636, device xla:1, step 100, Rate=59.58, Global Rate=289.87, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:04:22.537694, device xla:5, step 100, Rate=59.71, Global Rate=289.86, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:04:22.532361, device xla:7, step 100, Rate=59.86, Global Rate=289.87, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:04:22.558435, device xla:3, step 100, Rate=59.59, Global Rate=289.82, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:04:22.542699, device xla:6, step 100, Rate=59.70, Global Rate=289.85, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:04:22.573208, device xla:2, step 100, Rate=59.57, Global Rate=289.80, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:07:08.895647, device xla:1, step 200, Rate=109.21, Global Rate=298.55, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:07:08.920485, device xla:8, step 200, Rate=109.50, Global Rate=298.52, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:07:08.925582, device xla:6, step 200, Rate=109.30, Global Rate=298.52, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:07:08.934767, device xla:4, step 200, Rate=109.25, Global Rate=298.51, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:07:08.911869, device xla:7, step 200, Rate=109.43, Global Rate=298.53, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:07:08.905269, device xla:3, step 200, Rate=109.23, Global Rate=298.54, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:07:08.943804, device xla:5, step 200, Rate=109.30, Global Rate=298.50, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:07:08.900114, device xla:2, step 200, Rate=109.22, Global Rate=298.54, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:09:52.894218, device xla:4, step 300, Rate=149.85, Global Rate=302.96, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:09:52.898511, device xla:2, step 300, Rate=149.82, Global Rate=302.96, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:09:52.903542, device xla:3, step 300, Rate=149.83, Global Rate=302.96, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:09:52.916903, device xla:7, step 300, Rate=149.98, Global Rate=302.95, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:09:52.909866, device xla:1, step 300, Rate=149.80, Global Rate=302.95, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:09:52.943634, device xla:8, step 300, Rate=150.03, Global Rate=302.93, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:09:52.948664, device xla:5, step 300, Rate=149.88, Global Rate=302.93, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:09:52.923446, device xla:6, step 300, Rate=149.88, Global Rate=302.94, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:12:37.946081, device xla:4, step 400, Rate=181.92, Global Rate=304.74, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:12:37.950642, device xla:8, step 400, Rate=182.08, Global Rate=304.74, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:12:37.955434, device xla:6, step 400, Rate=181.96, Global Rate=304.74, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:12:37.959146, device xla:1, step 400, Rate=181.88, Global Rate=304.73, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:12:37.974073, device xla:7, step 400, Rate=182.03, Global Rate=304.73, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:12:37.964412, device xla:5, step 400, Rate=181.96, Global Rate=304.73, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:12:37.989257, device xla:3, step 400, Rate=181.89, Global Rate=304.72, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:12:37.980290, device xla:2, step 400, Rate=181.89, Global Rate=304.73, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:15:22.489557, device xla:8, step 500, Rate=207.90, Global Rate=306.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:15:22.493912, device xla:4, step 500, Rate=207.77, Global Rate=306.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:15:22.500084, device xla:5, step 500, Rate=207.80, Global Rate=306.00, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:15:22.508684, device xla:3, step 500, Rate=207.75, Global Rate=306.00, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:15:22.515920, device xla:6, step 500, Rate=207.79, Global Rate=305.99, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:15:22.532936, device xla:2, step 500, Rate=207.74, Global Rate=305.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:15:22.542897, device xla:1, step 500, Rate=207.72, Global Rate=305.98, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:15:22.526899, device xla:7, step 500, Rate=207.85, Global Rate=305.99, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:18:06.820071, device xla:7, step 600, Rate=228.61, Global Rate=306.92, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:18:06.839333, device xla:8, step 600, Rate=228.63, Global Rate=306.91, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:18:06.844439, device xla:4, step 600, Rate=228.52, Global Rate=306.91, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:18:06.825227, device xla:2, step 600, Rate=228.52, Global Rate=306.92, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:18:06.830543, device xla:5, step 600, Rate=228.56, Global Rate=306.91, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:18:06.863672, device xla:6, step 600, Rate=228.54, Global Rate=306.90, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:18:06.849766, device xla:3, step 600, Rate=228.51, Global Rate=306.91, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:18:06.880090, device xla:1, step 600, Rate=228.49, Global Rate=306.90, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:20:50.512003, device xla:4, step 700, Rate=245.38, Global Rate=307.74, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:20:50.516409, device xla:8, step 700, Rate=245.46, Global Rate=307.74, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:20:50.520929, device xla:2, step 700, Rate=245.37, Global Rate=307.74, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:20:50.530504, device xla:1, step 700, Rate=245.37, Global Rate=307.74, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:20:50.522652, device xla:7, step 700, Rate=245.44, Global Rate=307.74, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:20:50.549890, device xla:3, step 700, Rate=245.36, Global Rate=307.73, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:20:50.557699, device xla:5, step 700, Rate=245.39, Global Rate=307.73, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:20:50.532200, device xla:6, step 700, Rate=245.40, Global Rate=307.74, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:23:35.971519, device xla:8, step 800, Rate=258.26, Global Rate=307.95, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:23:35.976262, device xla:4, step 800, Rate=258.19, Global Rate=307.95, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:23:35.981040, device xla:1, step 800, Rate=258.18, Global Rate=307.95, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:23:35.987796, device xla:5, step 800, Rate=258.21, Global Rate=307.95, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:23:35.997499, device xla:7, step 800, Rate=258.23, Global Rate=307.95, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:23:35.991619, device xla:2, step 800, Rate=258.18, Global Rate=307.95, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:23:36.015789, device xla:3, step 800, Rate=258.18, Global Rate=307.94, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:23:36.005974, device xla:6, step 800, Rate=258.20, Global Rate=307.95, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:26:22.036855, device xla:4, step 900, Rate=268.22, Global Rate=307.99, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:26:22.054641, device xla:5, step 900, Rate=268.23, Global Rate=307.99, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:26:22.046781, device xla:3, step 900, Rate=268.22, Global Rate=307.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:26:22.057253, device xla:6, step 900, Rate=268.23, Global Rate=307.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:26:22.075919, device xla:2, step 900, Rate=268.20, Global Rate=307.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:26:22.041500, device xla:7, step 900, Rate=268.26, Global Rate=307.99, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:26:22.066717, device xla:8, step 900, Rate=268.26, Global Rate=307.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:26:22.087223, device xla:1, step 900, Rate=268.19, Global Rate=307.98, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:29:07.835990, device xla:1, step 1000, Rate=276.34, Global Rate=308.07, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:29:07.854623, device xla:8, step 1000, Rate=276.37, Global Rate=308.07, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:29:07.846037, device xla:6, step 1000, Rate=276.35, Global Rate=308.07, Compiles=107, _local_scalar_dense=844training torch.Size([256, 64])/ 2019-08-27 03:29:07.865441, device xla:4, step 1000, Rate=276.33, Global Rate=308.07, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:29:07.874213, device xla:3, step 1000, Rate=276.32, Global Rate=308.07, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:29:07.859436, device xla:2, step 1000, Rate=276.33, Global Rate=308.07, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:29:07.840570, device xla:7, step 1000, Rate=276.37, Global Rate=308.07, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:29:07.882647, device xla:5, step 1000, Rate=276.33, Global Rate=308.07, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:31:52.673457, device xla:8, step 1100, Rate=283.23, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:31:52.677815, device xla:1, step 1100, Rate=283.19, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:31:52.688169, device xla:6, step 1100, Rate=283.20, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:31:52.710017, device xla:4, step 1100, Rate=283.18, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:31:52.682429, device xla:2, step 1100, Rate=283.19, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:31:52.696802, device xla:3, step 1100, Rate=283.19, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:31:52.704089, device xla:7, step 1100, Rate=283.21, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:31:52.721324, device xla:5, step 1100, Rate=283.19, Global Rate=308.30, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:34:37.882029, device xla:1, step 1200, Rate=288.53, Global Rate=308.44, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:34:37.876822, device xla:2, step 1200, Rate=288.54, Global Rate=308.44, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:34:37.895238, device xla:4, step 1200, Rate=288.53, Global Rate=308.43, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:34:37.899663, device xla:7, step 1200, Rate=288.55, Global Rate=308.43, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:34:37.886747, device xla:5, step 1200, Rate=288.55, Global Rate=308.44, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:34:37.905517, device xla:6, step 1200, Rate=288.54, Global Rate=308.43, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:34:37.923400, device xla:3, step 1200, Rate=288.52, Global Rate=308.43, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:34:37.935111, device xla:8, step 1200, Rate=288.54, Global Rate=308.43, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:37:21.875769, device xla:2, step 1300, Rate=293.27, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:37:21.880782, device xla:4, step 1300, Rate=293.27, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:37:21.891483, device xla:1, step 1300, Rate=293.26, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:37:21.906599, device xla:5, step 1300, Rate=293.27, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:37:21.898576, device xla:3, step 1300, Rate=293.27, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:37:21.909208, device xla:7, step 1300, Rate=293.28, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:37:21.915352, device xla:6, step 1300, Rate=293.27, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([1024, 16])/ 2019-08-27 03:37:21.885368, device xla:8, step 1300, Rate=293.29, Global Rate=308.72, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:40:05.668620, device xla:2, step 1400, Rate=297.13, Global Rate=309.00, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:40:05.689683, device xla:1, step 1400, Rate=297.13, Global Rate=308.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:40:05.694598, device xla:8, step 1400, Rate=297.15, Global Rate=308.99, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:40:05.673849, device xla:3, step 1400, Rate=297.14, Global Rate=309.00, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:40:05.699431, device xla:6, step 1400, Rate=297.13, Global Rate=308.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:40:05.721892, device xla:4, step 1400, Rate=297.12, Global Rate=308.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:40:05.712743, device xla:7, step 1400, Rate=297.14, Global Rate=308.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:40:05.680898, device xla:5, step 1400, Rate=297.14, Global Rate=308.99, Compiles=107, _local_scalar_dense=844
training torch.Size([256, 64])/ 2019-08-27 03:42:46.482234, device xla:2, step 1500, Rate=301.38, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:42:46.487318, device xla:1, step 1500, Rate=301.38, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:42:46.491753, device xla:3, step 1500, Rate=301.39, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:42:46.498849, device xla:8, step 1500, Rate=301.40, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:42:46.506389, device xla:7, step 1500, Rate=301.39, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:42:46.526722, device xla:4, step 1500, Rate=301.37, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:42:46.515634, device xla:6, step 1500, Rate=301.38, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
training torch.Size([512, 32])/ 2019-08-27 03:42:46.539189, device xla:5, step 1500, Rate=301.37, Global Rate=309.60, Compiles=107, _local_scalar_dense=844
Epoch 14 Training stats:
device xla:1
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5712 | ups 1 | wpb 11152.554 | bsz 409.695 | num_updates 21112 | lr 0.000217638 | gnorm 0.083 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 31898
device xla:2
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5719 | ups 1 | wpb 11165.131 | bsz 408.118 | num_updates 21112 | lr 0.000217638 | gnorm 0.079 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 33142
device xla:3
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5689 | ups 1 | wpb 11106.747 | bsz 411.186 | num_updates 21112 | lr 0.000217638 | gnorm 0.085 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 31457
device xla:4
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5704 | ups 1 | wpb 11136.958 | bsz 411.744 | num_updates 21112 | lr 0.000217638 | gnorm 0.084 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 33143
device xla:5
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5720 | ups 1 | wpb 11166.537 | bsz 411.295 | num_updates 21112 | lr 0.000217638 | gnorm 0.081 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 33002
device xla:6
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5708 | ups 1 | wpb 11144.819 | bsz 409.379 | num_updates 21112 | lr 0.000217638 | gnorm 0.082 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 33091
device xla:7
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5720 | ups 1 | wpb 11167.996 | bsz 408.021 | num_updates 21112 | lr 0.000217638 | gnorm 0.082 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 31960
device xla:8
| epoch 014 | loss 0.285 | nll_loss 0.285 | ppl 1.22 | wps 5700 | ups 1 | wpb 11128.924 | bsz 409.040 | num_updates 21112 | lr 0.000217638 | gnorm 0.087 | clip 0.000 | oom 0.000 | wall 41218 | train_wall 32034
Epoch 14 Tracker Rates:
Rate=300.82, Global Rate=309.54
Rate=300.80, Global Rate=309.54
Rate=300.84, Global Rate=309.54
Rate=300.99, Global Rate=309.54
Rate=301.04, Global Rate=309.54
Rate=300.95, Global Rate=309.54
Rate=300.91, Global Rate=309.54
Rate=300.88, Global Rate=309.54
Epoch 14 end 2019-08-27 03:43:00.205544
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 170476
Counter: 03d47h23m03s603ms711.152us
ValueRate: 06s157ms144.512us / second
Rate: 5.01253 / second
Percentiles: 1%=01s067ms152.013us; 5%=01s167ms629.665us; 10%=01s171ms919.029us; 20%=01s176ms184.052us; 50%=01s217ms311.130us; 80%=01s287ms257.749us; 90%=01s291ms807.209us; 95%=01s293ms963.734us; 99%=01s306ms217.230us
Metric: InboundData
TotalSamples: 884
Counter: 1.71KB
ValueRate: 0.05B / second
Rate: 0.024884 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 699307
Counter: 66.64GB
ValueRate: 489.47KB / second
Rate: 20.0697 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1546730
Counter: 09h19m04s748ms440.473us
ValueRate: 317ms857.513us / second
Rate: 42.8056 / second
Percentiles: 1%=443.600us; 5%=511.522us; 10%=551.179us; 20%=624.735us; 50%=836.395us; 80%=003ms863.697us; 90%=013ms571.319us; 95%=025ms229.147us; 99%=054ms160.415us
Metric: TransferFromServerTime
TotalSamples: 884
Counter: 07s368ms447.847us
ValueRate: 207.417us / second
Rate: 0.024884 / second
Percentiles: 1%=613.793us; 5%=669.685us; 10%=717.038us; 20%=781.058us; 50%=001ms75.242us; 80%=005ms152.671us; 90%=040ms723.960us; 95%=055ms156.504us; 99%=068ms932.198us
Metric: TransferToServerTime
TotalSamples: 699307
Counter: 02d22h12m13s761ms29.836us
ValueRate: 05s119ms329.760us / second
Rate: 20.5058 / second
Percentiles: 1%=001ms71.343us; 5%=001ms190.593us; 10%=001ms284.344us; 20%=001ms421.333us; 50%=003ms798.210us; 80%=887ms610.105us; 90%=970ms9.079us; 95%=01s038ms757.591us; 99%=01s087ms420.837us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 170369
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 125355340
Counter: CreateXlaTensor
Value: 817559906
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 125345456
Counter: DestroyXlaTensor
Value: 817553897
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 125348307
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23187
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 884
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 03:43:04.174716, device xla:5, step 0, Compiles=107, _local_scalar_dense=884
validation/ 2019-08-27 03:43:04.177196, device xla:2, step 0, Compiles=107, _local_scalar_dense=884
validation/ 2019-08-27 03:43:04.183992, device xla:6, step 0, Compiles=107, _local_scalar_dense=884
validation/ 2019-08-27 03:43:04.186388, device xla:7, step 0, Compiles=107, _local_scalar_dense=884
validation/ 2019-08-27 03:43:04.192910, device xla:4, step 0, Compiles=107, _local_scalar_dense=884
validation/ 2019-08-27 03:43:04.197069, device xla:3, step 0, Compiles=107, _local_scalar_dense=884
validation/ 2019-08-27 03:43:04.199322, device xla:1, step 0, Compiles=107, _local_scalar_dense=884
validation/ 2019-08-27 03:43:04.332437, device xla:8, step 0, Compiles=107, _local_scalar_dense=884
validation stats on subset "valid" - 2019-08-27 03:43:10.238821
| epoch 014 | valid on 'valid' subset | loss 3.891 | nll_loss 2.078 | ppl 4.22 | num_updates 21112
| epoch 014 | valid on 'valid' subset | loss 3.891 | nll_loss 2.078 | ppl 4.22 | num_updates 21112
| epoch 014 | valid on 'valid' subset | loss 3.953 | nll_loss 2.156 | ppl 4.46 | num_updates 21112
| epoch 014 | valid on 'valid' subset | loss 3.953 | nll_loss 2.188 | ppl 4.56 | num_updates 21112
| epoch 014 | valid on 'valid' subset | loss 3.922 | nll_loss 2.094 | ppl 4.27 | num_updates 21112
| epoch 014 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 21112
| epoch 014 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 21112
| epoch 014 | valid on 'valid' subset | loss 3.953 | nll_loss 2.188 | ppl 4.56 | num_updates 21112
old learning rate: 0.00022585393058257823
new learning rate: 0.0002176382932224279
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 170597
Counter: 03d47h24m49s593ms725.789us
ValueRate: 06s083ms850.259us / second
Rate: 5.39241 / second
Percentiles: 1%=376ms486.672us; 5%=379ms750.550us; 10%=391ms971.550us; 20%=01s170ms121.440us; 50%=01s185ms462.689us; 80%=01s287ms2.284us; 90%=01s291ms807.209us; 95%=01s293ms963.734us; 99%=01s306ms217.230us
Metric: InboundData
TotalSamples: 909
Counter: 1.76KB
ValueRate: 0.05B / second
Rate: 0.0255805 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 699547
Counter: 66.69GB
ValueRate: 939.46KB / second
Rate: 20.5319 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1547356
Counter: 09h20m35s963ms945.508us
ValueRate: 02s682ms276.004us / second
Rate: 52.3674 / second
Percentiles: 1%=448.515us; 5%=497.759us; 10%=534.321us; 20%=592.307us; 50%=788.599us; 80%=001ms369.143us; 90%=023ms842.973us; 95%=375ms804.612us; 99%=388ms251.135us
Metric: TransferFromServerTime
TotalSamples: 909
Counter: 07s413ms323.094us
ValueRate: 208.621us / second
Rate: 0.0255805 / second
Percentiles: 1%=617.094us; 5%=673.738us; 10%=718.167us; 20%=782.713us; 50%=001ms84.629us; 80%=005ms703.116us; 90%=039ms682.532us; 95%=055ms908.022us; 99%=068ms576.480us
Metric: TransferToServerTime
TotalSamples: 699547
Counter: 02d22h13m39s741ms129.287us
ValueRate: 04s209ms552.306us / second
Rate: 20.5319 / second
Percentiles: 1%=001ms73.918us; 5%=001ms209.559us; 10%=001ms314.021us; 20%=001ms468.129us; 50%=003ms692.040us; 80%=250ms702.456us; 90%=951ms17.686us; 95%=989ms961.894us; 99%=01s083ms825.037us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 170490
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 125356949
Counter: CreateXlaTensor
Value: 817694723
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 125349916
Counter: DestroyXlaTensor
Value: 817688714
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 125349916
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23187
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 909
Epoch 15 begin 2019-08-27 03:43:10.262573
training torch.Size([256, 64])/ 2019-08-27 03:43:18.063399, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:43:18.100153, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:43:18.141017, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:43:18.247873, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:43:18.286223, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:43:18.377156, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:43:18.648059, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:43:18.905944, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:46:09.353529, device xla:8, step 100, Rate=60.08, Global Rate=291.64, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:46:09.358066, device xla:6, step 100, Rate=59.86, Global Rate=291.63, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:46:09.375850, device xla:5, step 100, Rate=59.84, Global Rate=291.60, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:46:09.378384, device xla:4, step 100, Rate=59.88, Global Rate=291.59, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:46:09.362471, device xla:2, step 100, Rate=59.79, Global Rate=291.62, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:46:09.367868, device xla:7, step 100, Rate=59.98, Global Rate=291.61, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:46:09.384684, device xla:3, step 100, Rate=59.80, Global Rate=291.58, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:46:09.392753, device xla:1, step 100, Rate=59.77, Global Rate=291.57, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:48:53.667428, device xla:8, step 200, Rate=110.38, Global Rate=301.29, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:48:53.671884, device xla:4, step 200, Rate=110.23, Global Rate=301.28, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:48:53.676588, device xla:3, step 200, Rate=110.17, Global Rate=301.28, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:48:53.682893, device xla:5, step 200, Rate=110.19, Global Rate=301.27, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:48:53.698896, device xla:1, step 200, Rate=110.14, Global Rate=301.26, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:48:53.691663, device xla:7, step 200, Rate=110.30, Global Rate=301.27, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:48:53.705072, device xla:6, step 200, Rate=110.19, Global Rate=301.25, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 03:48:53.713907, device xla:2, step 200, Rate=110.14, Global Rate=301.24, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:51:38.535594, device xla:8, step 300, Rate=150.42, Global Rate=304.31, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 03:51:38.549089, device xla:4, step 300, Rate=150.29, Global Rate=304.30, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:51:38.574939, device xla:3, step 300, Rate=150.23, Global Rate=304.29, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:51:38.540338, device xla:5, step 300, Rate=150.27, Global Rate=304.31, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:51:38.553689, device xla:7, step 300, Rate=150.35, Global Rate=304.30, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:51:38.560726, device xla:1, step 300, Rate=150.22, Global Rate=304.30, Compiles=107, _local_scalar_dense=909training torch.Size([256, 64])/ 2019-08-27 03:51:38.576909, device xla:2, step 300, Rate=150.22, Global Rate=304.29, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:51:38.566845, device xla:6, step 300, Rate=150.27, Global Rate=304.29, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:54:26.034371, device xla:8, step 400, Rate=181.47, Global Rate=304.65, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:54:26.044416, device xla:4, step 400, Rate=181.37, Global Rate=304.65, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:54:26.056901, device xla:5, step 400, Rate=181.34, Global Rate=304.64, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:54:26.060152, device xla:3, step 400, Rate=181.33, Global Rate=304.64, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:54:26.049320, device xla:7, step 400, Rate=181.42, Global Rate=304.65, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 03:54:26.065661, device xla:2, step 400, Rate=181.32, Global Rate=304.64, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:54:26.039057, device xla:1, step 400, Rate=181.32, Global Rate=304.65, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:54:26.082345, device xla:6, step 400, Rate=181.34, Global Rate=304.63, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:57:13.068899, device xla:4, step 500, Rate=206.41, Global Rate=305.02, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:57:13.073500, device xla:6, step 500, Rate=206.39, Global Rate=305.02, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:57:13.093314, device xla:8, step 500, Rate=206.47, Global Rate=305.02, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:57:13.078174, device xla:5, step 500, Rate=206.38, Global Rate=305.02, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:57:13.098421, device xla:7, step 500, Rate=206.43, Global Rate=305.01, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 03:57:13.086091, device xla:1, step 500, Rate=206.36, Global Rate=305.02, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:57:13.119301, device xla:3, step 500, Rate=206.36, Global Rate=305.01, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 03:57:13.106605, device xla:2, step 500, Rate=206.36, Global Rate=305.01, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:00:00.110993, device xla:8, step 600, Rate=226.49, Global Rate=305.27, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:00:00.115717, device xla:3, step 600, Rate=226.40, Global Rate=305.27, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:00:00.137643, device xla:6, step 600, Rate=226.41, Global Rate=305.26, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:00:00.121991, device xla:5, step 600, Rate=226.41, Global Rate=305.27, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:00:00.139890, device xla:7, step 600, Rate=226.45, Global Rate=305.26, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:00:00.148958, device xla:4, step 600, Rate=226.41, Global Rate=305.26, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:00:00.130724, device xla:1, step 600, Rate=226.39, Global Rate=305.26, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 04:00:00.157528, device xla:2, step 600, Rate=226.38, Global Rate=305.26, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:02:44.406417, device xla:4, step 700, Rate=243.47, Global Rate=306.16, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:02:44.411193, device xla:3, step 700, Rate=243.45, Global Rate=306.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:02:44.417579, device xla:6, step 700, Rate=243.46, Global Rate=306.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:02:44.419132, device xla:1, step 700, Rate=243.44, Global Rate=306.16, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:02:44.430534, device xla:7, step 700, Rate=243.49, Global Rate=306.16, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:02:44.424768, device xla:2, step 700, Rate=243.44, Global Rate=306.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:02:44.451386, device xla:8, step 700, Rate=243.50, Global Rate=306.15, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:02:44.438526, device xla:5, step 700, Rate=243.45, Global Rate=306.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:05:27.960295, device xla:6, step 800, Rate=257.38, Global Rate=307.01, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:05:27.964664, device xla:3, step 800, Rate=257.37, Global Rate=307.01, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:05:27.970429, device xla:8, step 800, Rate=257.42, Global Rate=307.01, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:05:27.986189, device xla:7, step 800, Rate=257.40, Global Rate=307.00, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:05:27.981255, device xla:4, step 800, Rate=257.38, Global Rate=307.00, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:05:28.012682, device xla:2, step 800, Rate=257.35, Global Rate=307.00, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:05:27.999113, device xla:5, step 800, Rate=257.36, Global Rate=307.00, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:05:27.974934, device xla:1, step 800, Rate=257.36, Global Rate=307.00, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:08:11.421545, device xla:6, step 900, Rate=268.55, Global Rate=307.69, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:08:11.416279, device xla:1, step 900, Rate=268.54, Global Rate=307.69, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:08:11.426404, device xla:2, step 900, Rate=268.54, Global Rate=307.68, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:08:11.445357, device xla:8, step 900, Rate=268.58, Global Rate=307.68, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:08:11.439842, device xla:4, step 900, Rate=268.55, Global Rate=307.68, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:08:11.471623, device xla:3, step 900, Rate=268.52, Global Rate=307.68, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:08:11.450624, device xla:5, step 900, Rate=268.54, Global Rate=307.68, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:08:11.431842, device xla:7, step 900, Rate=268.57, Global Rate=307.68, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:10:54.843209, device xla:8, step 1000, Rate=277.53, Global Rate=308.24, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:10:54.847588, device xla:6, step 1000, Rate=277.50, Global Rate=308.24, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:10:54.851917, device xla:2, step 1000, Rate=277.49, Global Rate=308.24, Compiles=107, _local_scalar_dense=909training torch.Size([256, 64])/ 2019-08-27 04:10:54.858582, device xla:3, step 1000, Rate=277.49, Global Rate=308.24, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:10:54.874514, device xla:4, step 1000, Rate=277.49, Global Rate=308.23, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:10:54.865687, device xla:5, step 1000, Rate=277.49, Global Rate=308.23, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:10:54.883578, device xla:7, step 1000, Rate=277.50, Global Rate=308.23, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:10:54.903604, device xla:1, step 1000, Rate=277.47, Global Rate=308.23, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:13:41.401831, device xla:8, step 1100, Rate=283.50, Global Rate=308.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:13:41.411636, device xla:6, step 1100, Rate=283.48, Global Rate=308.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:13:41.416942, device xla:4, step 1100, Rate=283.48, Global Rate=308.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:13:41.406497, device xla:1, step 1100, Rate=283.47, Global Rate=308.16, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:13:41.421906, device xla:2, step 1100, Rate=283.47, Global Rate=308.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:13:41.437222, device xla:3, step 1100, Rate=283.47, Global Rate=308.16, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 04:13:41.428154, device xla:7, step 1100, Rate=283.49, Global Rate=308.16, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:13:41.446171, device xla:5, step 1100, Rate=283.47, Global Rate=308.15, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:16:26.566722, device xla:3, step 1200, Rate=288.78, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:16:26.572475, device xla:6, step 1200, Rate=288.78, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:16:26.573674, device xla:8, step 1200, Rate=288.80, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:16:26.599465, device xla:4, step 1200, Rate=288.78, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:16:26.604513, device xla:5, step 1200, Rate=288.77, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 04:16:26.583692, device xla:7, step 1200, Rate=288.79, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:16:26.590991, device xla:2, step 1200, Rate=288.77, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:16:26.578337, device xla:1, step 1200, Rate=288.78, Global Rate=308.31, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:19:10.760604, device xla:6, step 1300, Rate=293.39, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:19:10.765255, device xla:4, step 1300, Rate=293.40, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:19:10.775344, device xla:8, step 1300, Rate=293.40, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:19:10.793183, device xla:1, step 1300, Rate=293.38, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:19:10.780152, device xla:5, step 1300, Rate=293.39, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:19:10.801199, device xla:3, step 1300, Rate=293.38, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:19:10.769879, device xla:2, step 1300, Rate=293.39, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:19:10.785899, device xla:7, step 1300, Rate=293.40, Global Rate=308.58, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 04:21:52.785868, device xla:6, step 1400, Rate=297.91, Global Rate=309.10, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:21:52.790656, device xla:8, step 1400, Rate=297.93, Global Rate=309.10, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:21:52.801723, device xla:4, step 1400, Rate=297.91, Global Rate=309.10, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:21:52.806512, device xla:3, step 1400, Rate=297.91, Global Rate=309.10, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:21:52.795084, device xla:7, step 1400, Rate=297.92, Global Rate=309.10, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:21:52.813360, device xla:2, step 1400, Rate=297.90, Global Rate=309.10, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 04:21:52.822128, device xla:5, step 1400, Rate=297.91, Global Rate=309.09, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 04:21:52.835544, device xla:1, step 1400, Rate=297.90, Global Rate=309.09, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:24:34.935784, device xla:4, step 1500, Rate=301.49, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:24:34.940458, device xla:3, step 1500, Rate=301.49, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:24:34.944990, device xla:6, step 1500, Rate=301.48, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:24:34.955175, device xla:7, step 1500, Rate=301.49, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
training torch.Size([1024, 16])/ 2019-08-27 04:24:34.962230, device xla:5, step 1500, Rate=301.48, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
training torch.Size([512, 32])/ 2019-08-27 04:24:34.974728, device xla:2, step 1500, Rate=301.47, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:24:34.980822, device xla:8, step 1500, Rate=301.48, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
training torch.Size([256, 64])/ 2019-08-27 04:24:34.949811, device xla:1, step 1500, Rate=301.48, Global Rate=309.53, Compiles=107, _local_scalar_dense=909
Epoch 15 Training stats:
device xla:1
| epoch 015 | loss 0.266 | nll_loss 0.266 | ppl 1.20 | wps 5769 | ups 1 | wpb 11152.941 | bsz 409.996 | num_updates 22620 | lr 0.000210259 | gnorm 0.077 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 33949
device xla:2
| epoch 015 | loss 0.266 | nll_loss 0.266 | ppl 1.20 | wps 5777 | ups 1 | wpb 11167.953 | bsz 408.683 | num_updates 22620 | lr 0.000210259 | gnorm 0.074 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 35204
device xla:3
| epoch 015 | loss 0.268 | nll_loss 0.268 | ppl 1.20 | wps 5745 | ups 1 | wpb 11106.567 | bsz 410.811 | num_updates 22620 | lr 0.000210259 | gnorm 0.080 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 33524
device xla:4
| epoch 015 | loss 0.268 | nll_loss 0.268 | ppl 1.20 | wps 5762 | ups 1 | wpb 11138.650 | bsz 412.033 | num_updates 22620 | lr 0.000210259 | gnorm 0.079 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 35213
device xla:5
| epoch 015 | loss 0.266 | nll_loss 0.266 | ppl 1.20 | wps 5775 | ups 1 | wpb 11164.517 | bsz 411.139 | num_updates 22620 | lr 0.000210259 | gnorm 0.075 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 35042
device xla:6
| epoch 015 | loss 0.268 | nll_loss 0.268 | ppl 1.20 | wps 5766 | ups 1 | wpb 11146.072 | bsz 409.000 | num_updates 22620 | lr 0.000210259 | gnorm 0.076 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 35149
device xla:7
| epoch 015 | loss 0.266 | nll_loss 0.266 | ppl 1.20 | wps 5775 | ups 1 | wpb 11163.962 | bsz 408.050 | num_updates 22620 | lr 0.000210259 | gnorm 0.076 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 34010
device xla:8
| epoch 015 | loss 0.268 | nll_loss 0.268 | ppl 1.20 | wps 5757 | ups 1 | wpb 11129.214 | bsz 408.774 | num_updates 22620 | lr 0.000210259 | gnorm 0.081 | clip 0.000 | oom 0.000 | wall 43727 | train_wall 34095
Epoch 15 Tracker Rates:
Rate=298.92, Global Rate=309.41
Rate=299.01, Global Rate=309.41
Rate=298.88, Global Rate=309.41
Rate=298.86, Global Rate=309.41
Rate=298.97, Global Rate=309.41
Rate=298.89, Global Rate=309.41
Rate=298.94, Global Rate=309.41
Rate=299.04, Global Rate=309.41
Epoch 15 end 2019-08-27 04:24:49.140024
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 182661
Counter: 03d57h08m49s311ms825.086us
ValueRate: 06s166ms636.309us / second
Rate: 4.9845 / second
Percentiles: 1%=01s164ms311.761us; 5%=01s170ms345.114us; 10%=01s174ms582.033us; 20%=01s178ms683.941us; 50%=01s277ms346.706us; 80%=01s290ms606.534us; 90%=01s294ms687.343us; 95%=01s298ms530.571us; 99%=01s306ms501.363us
Metric: InboundData
TotalSamples: 949
Counter: 1.84KB
ValueRate: 0.05B / second
Rate: 0.0249515 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 749362
Counter: 71.18GB
ValueRate: 497.40KB / second
Rate: 20.3135 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1656460
Counter: 09h04m06s481ms764.484us
ValueRate: 288ms373.538us / second
Rate: 42.3734 / second
Percentiles: 1%=461.566us; 5%=524.472us; 10%=566.762us; 20%=642.778us; 50%=883.531us; 80%=003ms758.136us; 90%=013ms707.685us; 95%=028ms770.323us; 99%=052ms880.201us
Metric: TransferFromServerTime
TotalSamples: 949
Counter: 07s491ms113.870us
ValueRate: 196.959us / second
Rate: 0.0249515 / second
Percentiles: 1%=613.793us; 5%=666.496us; 10%=714.432us; 20%=776.703us; 50%=001ms66.904us; 80%=004ms98.478us; 90%=037ms64.874us; 95%=055ms802.590us; 99%=068ms576.480us
Metric: TransferToServerTime
TotalSamples: 749362
Counter: 03d30h01m06s337ms202.245us
ValueRate: 05s937ms232.121us / second
Rate: 20.3135 / second
Percentiles: 1%=001ms46.950us; 5%=001ms143.978us; 10%=001ms220.700us; 20%=001ms362.562us; 50%=002ms25.216us; 80%=875ms846.404us; 90%=975ms742.941us; 95%=01s058ms961.015us; 99%=01s102ms982.291us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 182554
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 134309388
Counter: CreateXlaTensor
Value: 875966367
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 134299953
Counter: DestroyXlaTensor
Value: 875960358
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 134302355
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23292
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 949
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 04:24:53.194659, device xla:8, step 0, Compiles=107, _local_scalar_dense=949
validation/ 2019-08-27 04:24:53.199218, device xla:3, step 0, Compiles=107, _local_scalar_dense=949
validation/ 2019-08-27 04:24:53.204975, device xla:6, step 0, Compiles=107, _local_scalar_dense=949
validation/ 2019-08-27 04:24:53.209424, device xla:7, step 0, Compiles=107, _local_scalar_dense=949
validation/ 2019-08-27 04:24:53.212047, device xla:1, step 0, Compiles=107, _local_scalar_dense=949
validation/ 2019-08-27 04:24:53.215233, device xla:4, step 0, Compiles=107, _local_scalar_dense=949
validation/ 2019-08-27 04:24:53.360003, device xla:2, step 0, Compiles=107, _local_scalar_dense=949
validation/ 2019-08-27 04:24:53.362031, device xla:5, step 0, Compiles=107, _local_scalar_dense=949
validation stats on subset "valid" - 2019-08-27 04:24:59.326092
| epoch 015 | valid on 'valid' subset | loss 3.875 | nll_loss 2.078 | ppl 4.22 | num_updates 22620
| epoch 015 | valid on 'valid' subset | loss 3.891 | nll_loss 2.078 | ppl 4.22 | num_updates 22620
| epoch 015 | valid on 'valid' subset | loss 3.953 | nll_loss 2.156 | ppl 4.46 | num_updates 22620
| epoch 015 | valid on 'valid' subset | loss 3.953 | nll_loss 2.188 | ppl 4.56 | num_updates 22620
| epoch 015 | valid on 'valid' subset | loss 3.906 | nll_loss 2.078 | ppl 4.22 | num_updates 22620
| epoch 015 | valid on 'valid' subset | loss 3.906 | nll_loss 2.078 | ppl 4.22 | num_updates 22620
| epoch 015 | valid on 'valid' subset | loss 3.922 | nll_loss 2.094 | ppl 4.27 | num_updates 22620
| epoch 015 | valid on 'valid' subset | loss 4.000 | nll_loss 2.172 | ppl 4.51 | num_updates 22620
old learning rate: 0.0002176382932224279
new learning rate: 0.00021025856676559
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 182782
Counter: 03d57h09m35s267ms813.158us
ValueRate: 06s068ms633.403us / second
Rate: 5.33537 / second
Percentiles: 1%=377ms510.671us; 5%=378ms15.559us; 10%=391ms19.396us; 20%=01s173ms175.528us; 50%=01s193ms625.329us; 80%=01s289ms799.205us; 90%=01s293ms108.019us; 95%=01s298ms530.571us; 99%=01s306ms501.363us
Metric: InboundData
TotalSamples: 974
Counter: 1.89KB
ValueRate: 0.05B / second
Rate: 0.0256019 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 749602
Counter: 71.22GB
ValueRate: 932.08KB / second
Rate: 20.3707 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1657106
Counter: 09h05m36s139ms72.752us
ValueRate: 02s590ms282.498us / second
Rate: 51.7688 / second
Percentiles: 1%=435.829us; 5%=484.588us; 10%=521.281us; 20%=577.451us; 50%=775.629us; 80%=001ms373.297us; 90%=022ms928.324us; 95%=375ms548.442us; 99%=388ms520.366us
Metric: TransferFromServerTime
TotalSamples: 974
Counter: 08s646ms475.237us
ValueRate: 200.990us / second
Rate: 0.0256019 / second
Percentiles: 1%=613.793us; 5%=666.496us; 10%=714.775us; 20%=778.685us; 50%=001ms71.778us; 80%=004ms993.163us; 90%=036ms377.027us; 95%=054ms464.644us; 99%=068ms576.480us
Metric: TransferToServerTime
TotalSamples: 749602
Counter: 03d30h02m32s680ms767.912us
ValueRate: 04s215ms965.717us / second
Rate: 20.3715 / second
Percentiles: 1%=001ms54.713us; 5%=001ms183.763us; 10%=001ms285.972us; 20%=001ms445.477us; 50%=002ms141.505us; 80%=244ms609.243us; 90%=960ms552.047us; 95%=01s043ms235.576us; 99%=01s078ms79.937us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 182675
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 134310997
Counter: CreateXlaTensor
Value: 876101184
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 134303963
Counter: DestroyXlaTensor
Value: 876095176
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 134303965
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23292
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 974
Epoch 16 begin 2019-08-27 04:24:59.419441
training torch.Size([256, 64])/ 2019-08-27 04:25:07.409264, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:25:07.425003, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:25:07.530694, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:25:07.562954, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:25:07.958882, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:25:08.052407, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:25:08.156454, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:25:08.579186, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:27:57.674623, device xla:1, step 100, Rate=60.14, Global Rate=292.97, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:27:57.679163, device xla:8, step 100, Rate=60.56, Global Rate=292.97, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:27:57.700530, device xla:6, step 100, Rate=60.36, Global Rate=292.93, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:27:57.669314, device xla:3, step 100, Rate=60.19, Global Rate=292.98, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:27:57.683820, device xla:2, step 100, Rate=60.14, Global Rate=292.96, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:27:57.706348, device xla:5, step 100, Rate=60.32, Global Rate=292.92, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:27:57.713137, device xla:4, step 100, Rate=60.18, Global Rate=292.91, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:27:57.692963, device xla:7, step 100, Rate=60.40, Global Rate=292.95, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:30:41.472314, device xla:5, step 200, Rate=110.79, Global Rate=302.46, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:30:41.477124, device xla:1, step 200, Rate=110.63, Global Rate=302.46, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:30:41.466993, device xla:3, step 200, Rate=110.67, Global Rate=302.47, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:30:41.481903, device xla:2, step 200, Rate=110.63, Global Rate=302.45, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:30:41.497784, device xla:4, step 200, Rate=110.67, Global Rate=302.44, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:30:41.503935, device xla:6, step 200, Rate=110.80, Global Rate=302.43, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:30:41.514130, device xla:8, step 200, Rate=110.95, Global Rate=302.42, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:30:41.490324, device xla:7, step 200, Rate=110.84, Global Rate=302.45, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:33:26.088045, device xla:6, step 300, Rate=150.86, Global Rate=305.26, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:33:26.092529, device xla:3, step 300, Rate=150.73, Global Rate=305.26, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:33:26.097650, device xla:1, step 300, Rate=150.71, Global Rate=305.26, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:33:26.102400, device xla:8, step 300, Rate=150.97, Global Rate=305.26, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:33:26.107049, device xla:4, step 300, Rate=150.74, Global Rate=305.25, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:33:26.124647, device xla:7, step 300, Rate=150.87, Global Rate=305.24, Compiles=107, _local_scalar_dense=974training torch.Size([256, 64])/ 2019-08-27 04:33:26.114017, device xla:5, step 300, Rate=150.83, Global Rate=305.25, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:33:26.138226, device xla:2, step 300, Rate=150.69, Global Rate=305.23, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:36:12.207322, device xla:8, step 400, Rate=182.43, Global Rate=306.00, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:36:12.211529, device xla:6, step 400, Rate=182.33, Global Rate=305.99, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:36:12.215873, device xla:5, step 400, Rate=182.31, Global Rate=305.99, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:36:12.226344, device xla:4, step 400, Rate=182.24, Global Rate=305.99, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:36:12.236245, device xla:2, step 400, Rate=182.21, Global Rate=305.98, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:36:12.240538, device xla:1, step 400, Rate=182.20, Global Rate=305.98, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:36:12.220502, device xla:3, step 400, Rate=182.23, Global Rate=305.99, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:36:12.245987, device xla:7, step 400, Rate=182.34, Global Rate=305.98, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:38:57.916621, device xla:6, step 500, Rate=207.66, Global Rate=306.59, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:38:57.921081, device xla:5, step 500, Rate=207.64, Global Rate=306.58, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:38:57.925947, device xla:8, step 500, Rate=207.73, Global Rate=306.58, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:38:57.931920, device xla:2, step 500, Rate=207.57, Global Rate=306.58, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:38:57.946518, device xla:1, step 500, Rate=207.55, Global Rate=306.58, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:38:57.960986, device xla:3, step 500, Rate=207.56, Global Rate=306.57, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:38:57.940747, device xla:4, step 500, Rate=207.58, Global Rate=306.58, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:38:57.951660, device xla:7, step 500, Rate=207.66, Global Rate=306.57, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:41:42.842926, device xla:6, step 600, Rate=228.22, Global Rate=307.22, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:41:42.853229, device xla:5, step 600, Rate=228.20, Global Rate=307.22, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:41:42.847621, device xla:4, step 600, Rate=228.16, Global Rate=307.22, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:41:42.858244, device xla:2, step 600, Rate=228.14, Global Rate=307.22, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:41:42.861466, device xla:1, step 600, Rate=228.14, Global Rate=307.22, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:41:42.876025, device xla:8, step 600, Rate=228.27, Global Rate=307.21, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:41:42.881230, device xla:3, step 600, Rate=228.14, Global Rate=307.21, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:41:42.866702, device xla:7, step 600, Rate=228.22, Global Rate=307.22, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:44:27.136407, device xla:8, step 700, Rate=244.95, Global Rate=307.85, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:44:27.146050, device xla:1, step 700, Rate=244.84, Global Rate=307.84, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:44:27.140785, device xla:3, step 700, Rate=244.85, Global Rate=307.84, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:44:27.150923, device xla:4, step 700, Rate=244.85, Global Rate=307.84, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:44:27.156383, device xla:5, step 700, Rate=244.89, Global Rate=307.84, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:44:27.161210, device xla:7, step 700, Rate=244.91, Global Rate=307.84, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:44:27.168666, device xla:6, step 700, Rate=244.89, Global Rate=307.84, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:44:27.178233, device xla:2, step 700, Rate=244.83, Global Rate=307.83, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:47:11.117783, device xla:1, step 800, Rate=258.32, Global Rate=308.39, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:47:11.122485, device xla:5, step 800, Rate=258.36, Global Rate=308.39, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:47:11.127163, device xla:8, step 800, Rate=258.40, Global Rate=308.39, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:47:11.131802, device xla:2, step 800, Rate=258.32, Global Rate=308.38, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:47:11.140353, device xla:4, step 800, Rate=258.33, Global Rate=308.38, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:47:11.155342, device xla:6, step 800, Rate=258.35, Global Rate=308.38, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:47:11.146102, device xla:7, step 800, Rate=258.37, Global Rate=308.38, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:47:11.163634, device xla:3, step 800, Rate=258.31, Global Rate=308.38, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:49:54.734529, device xla:8, step 900, Rate=269.31, Global Rate=308.89, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:49:54.739076, device xla:6, step 900, Rate=269.28, Global Rate=308.88, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:49:54.743514, device xla:5, step 900, Rate=269.27, Global Rate=308.88, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:49:54.758138, device xla:1, step 900, Rate=269.23, Global Rate=308.88, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:49:54.749336, device xla:2, step 900, Rate=269.24, Global Rate=308.88, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:49:54.763443, device xla:7, step 900, Rate=269.28, Global Rate=308.88, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:49:54.781336, device xla:4, step 900, Rate=269.24, Global Rate=308.88, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:49:54.770743, device xla:3, step 900, Rate=269.24, Global Rate=308.88, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:52:38.809494, device xla:4, step 1000, Rate=277.82, Global Rate=309.20, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:52:38.827210, device xla:6, step 1000, Rate=277.83, Global Rate=309.20, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:52:38.832037, device xla:1, step 1000, Rate=277.80, Global Rate=309.19, Compiles=107, _local_scalar_dense=974training torch.Size([1024, 16])/ 2019-08-27 04:52:38.820128, device xla:7, step 1000, Rate=277.84, Global Rate=309.20, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:52:38.837257, device xla:8, step 1000, Rate=277.85, Global Rate=309.19, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:52:38.842417, device xla:5, step 1000, Rate=277.82, Global Rate=309.19, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:52:38.814671, device xla:3, step 1000, Rate=277.81, Global Rate=309.20, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:52:38.847869, device xla:2, step 1000, Rate=277.79, Global Rate=309.19, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:55:22.917154, device xla:8, step 1100, Rate=284.69, Global Rate=309.45, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:55:22.921676, device xla:6, step 1100, Rate=284.67, Global Rate=309.45, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:55:22.956551, device xla:2, step 1100, Rate=284.63, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:55:22.927801, device xla:7, step 1100, Rate=284.67, Global Rate=309.45, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:55:22.934739, device xla:4, step 1100, Rate=284.65, Global Rate=309.45, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:55:22.960179, device xla:5, step 1100, Rate=284.65, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:55:22.940967, device xla:1, step 1100, Rate=284.64, Global Rate=309.45, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:55:22.948091, device xla:3, step 1100, Rate=284.64, Global Rate=309.45, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:58:08.415058, device xla:6, step 1200, Rate=289.61, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:58:08.420573, device xla:7, step 1200, Rate=289.61, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:58:08.442442, device xla:3, step 1200, Rate=289.59, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:58:08.427096, device xla:8, step 1200, Rate=289.62, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 04:58:08.451521, device xla:2, step 1200, Rate=289.58, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:58:08.433435, device xla:4, step 1200, Rate=289.59, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 04:58:08.476041, device xla:5, step 1200, Rate=289.59, Global Rate=309.43, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 04:58:08.468434, device xla:1, step 1200, Rate=289.57, Global Rate=309.44, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:00:52.596537, device xla:6, step 1300, Rate=294.06, Global Rate=309.63, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:00:52.602241, device xla:7, step 1300, Rate=294.06, Global Rate=309.63, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:00:52.608724, device xla:2, step 1300, Rate=294.04, Global Rate=309.63, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:00:52.630915, device xla:5, step 1300, Rate=294.05, Global Rate=309.62, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:00:52.623435, device xla:3, step 1300, Rate=294.04, Global Rate=309.62, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:00:52.636151, device xla:1, step 1300, Rate=294.03, Global Rate=309.62, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:00:52.617814, device xla:4, step 1300, Rate=294.04, Global Rate=309.62, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:00:52.645082, device xla:8, step 1300, Rate=294.05, Global Rate=309.62, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:03:35.779917, device xla:5, step 1400, Rate=298.00, Global Rate=309.92, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:03:35.784144, device xla:6, step 1400, Rate=298.00, Global Rate=309.92, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:03:35.788589, device xla:2, step 1400, Rate=297.99, Global Rate=309.92, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 05:03:35.804077, device xla:4, step 1400, Rate=297.98, Global Rate=309.92, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:03:35.809837, device xla:1, step 1400, Rate=297.98, Global Rate=309.91, Compiles=107, _local_scalar_dense=974
training torch.Size([1024, 16])/ 2019-08-27 05:03:35.818810, device xla:3, step 1400, Rate=297.98, Global Rate=309.91, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:03:35.796801, device xla:7, step 1400, Rate=298.00, Global Rate=309.92, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:03:35.829783, device xla:8, step 1400, Rate=297.99, Global Rate=309.91, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:06:15.379804, device xla:5, step 1500, Rate=302.56, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:06:15.398409, device xla:1, step 1500, Rate=302.55, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:06:15.403162, device xla:3, step 1500, Rate=302.55, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:06:15.408796, device xla:6, step 1500, Rate=302.55, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:06:15.384363, device xla:7, step 1500, Rate=302.56, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
training torch.Size([512, 32])/ 2019-08-27 05:06:15.391014, device xla:4, step 1500, Rate=302.55, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:06:15.417089, device xla:8, step 1500, Rate=302.56, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
training torch.Size([256, 64])/ 2019-08-27 05:06:15.425523, device xla:2, step 1500, Rate=302.54, Global Rate=310.62, Compiles=107, _local_scalar_dense=974
Epoch 16 Training stats:
device xla:1
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5821 | ups 1 | wpb 11152.832 | bsz 410.058 | num_updates 24128 | lr 0.000203582 | gnorm 0.072 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 36003
device xla:2
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5826 | ups 1 | wpb 11161.987 | bsz 408.318 | num_updates 24128 | lr 0.000203582 | gnorm 0.069 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 37247
device xla:3
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5798 | ups 1 | wpb 11108.035 | bsz 410.398 | num_updates 24128 | lr 0.000203582 | gnorm 0.075 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 35563
device xla:4
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5813 | ups 1 | wpb 11137.104 | bsz 412.340 | num_updates 24128 | lr 0.000203582 | gnorm 0.074 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 37265
device xla:5
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5827 | ups 1 | wpb 11164.579 | bsz 411.077 | num_updates 24128 | lr 0.000203582 | gnorm 0.071 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 37098
device xla:6
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5818 | ups 1 | wpb 11147.714 | bsz 409.072 | num_updates 24128 | lr 0.000203582 | gnorm 0.071 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 37193
device xla:7
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5827 | ups 1 | wpb 11163.778 | bsz 408.329 | num_updates 24128 | lr 0.000203582 | gnorm 0.071 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 36064
device xla:8
| epoch 016 | loss 0.250 | nll_loss 0.250 | ppl 1.19 | wps 5811 | ups 1 | wpb 11133.824 | bsz 408.902 | num_updates 24128 | lr 0.000203582 | gnorm 0.077 | clip 0.000 | oom 0.000 | wall 46227 | train_wall 36150
Epoch 16 Tracker Rates:
Rate=300.07, Global Rate=310.50
Rate=300.17, Global Rate=310.50
Rate=300.09, Global Rate=310.50
Rate=300.04, Global Rate=310.50
Rate=300.01, Global Rate=310.50
Rate=300.11, Global Rate=310.50
Rate=300.02, Global Rate=310.50
Rate=300.16, Global Rate=310.50
Epoch 16 end 2019-08-27 05:06:29.515157
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 194846
Counter: 03d08h16m07s541ms78.707us
ValueRate: 06s166ms833.733us / second
Rate: 5.02645 / second
Percentiles: 1%=01s077ms543.181us; 5%=01s168ms926.602us; 10%=01s171ms11.084us; 20%=01s176ms962.995us; 50%=01s189ms523.453us; 80%=01s288ms550.647us; 90%=01s291ms45.899us; 95%=01s293ms31.016us; 99%=01s313ms193.004us
Metric: InboundData
TotalSamples: 1014
Counter: 1.97KB
ValueRate: 0.05B / second
Rate: 0.0250159 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 799348
Counter: 75.71GB
ValueRate: 497.39KB / second
Rate: 20.3585 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1766057
Counter: 10h13m17s392ms833.748us
ValueRate: 435ms73.171us / second
Rate: 41.0403 / second
Percentiles: 1%=458.308us; 5%=525.507us; 10%=568.131us; 20%=629.765us; 50%=866.967us; 80%=003ms51.704us; 90%=013ms264.593us; 95%=030ms702.063us; 99%=059ms104.272us
Metric: TransferFromServerTime
TotalSamples: 1014
Counter: 08s719ms421.973us
ValueRate: 190.442us / second
Rate: 0.0250159 / second
Percentiles: 1%=617.094us; 5%=663.319us; 10%=713.392us; 20%=771.478us; 50%=001ms55.229us; 80%=004ms842.922us; 90%=034ms354.776us; 95%=054ms13.868us; 99%=067ms919.867us
Metric: TransferToServerTime
TotalSamples: 799348
Counter: 03d39h14m02s154ms907.054us
ValueRate: 05s138ms159.646us / second
Rate: 20.759 / second
Percentiles: 1%=001ms89.341us; 5%=001ms194.919us; 10%=001ms285.969us; 20%=001ms420.107us; 50%=003ms568.133us; 80%=876ms128.281us; 90%=977ms173.674us; 95%=01s028ms876.320us; 99%=01s099ms92.148us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 194739
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 143263367
Counter: CreateXlaTensor
Value: 934372828
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 143255010
Counter: DestroyXlaTensor
Value: 934366820
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 143256335
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23392
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1014
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 05:06:33.599579, device xla:6, step 0, Compiles=107, _local_scalar_dense=1014
validation/ 2019-08-27 05:06:33.602411, device xla:5, step 0, Compiles=107, _local_scalar_dense=1014
validation/ 2019-08-27 05:06:33.611981, device xla:4, step 0, Compiles=107, _local_scalar_dense=1014
validation/ 2019-08-27 05:06:33.759434, device xla:2, step 0, Compiles=107, _local_scalar_dense=1014
validation/ 2019-08-27 05:06:33.762396, device xla:3, step 0, Compiles=107, _local_scalar_dense=1014
validation/ 2019-08-27 05:06:33.768061, device xla:1, step 0, Compiles=107, _local_scalar_dense=1014
validation/ 2019-08-27 05:06:33.770570, device xla:8, step 0, Compiles=107, _local_scalar_dense=1014
validation/ 2019-08-27 05:06:33.783741, device xla:7, step 0, Compiles=107, _local_scalar_dense=1014
validation stats on subset "valid" - 2019-08-27 05:06:39.768143
| epoch 016 | valid on 'valid' subset | loss 3.875 | nll_loss 2.078 | ppl 4.22 | num_updates 24128
| epoch 016 | valid on 'valid' subset | loss 3.891 | nll_loss 2.062 | ppl 4.18 | num_updates 24128
| epoch 016 | valid on 'valid' subset | loss 3.953 | nll_loss 2.156 | ppl 4.46 | num_updates 24128
| epoch 016 | valid on 'valid' subset | loss 3.953 | nll_loss 2.172 | ppl 4.51 | num_updates 24128
| epoch 016 | valid on 'valid' subset | loss 3.906 | nll_loss 2.078 | ppl 4.22 | num_updates 24128
| epoch 016 | valid on 'valid' subset | loss 3.906 | nll_loss 2.078 | ppl 4.22 | num_updates 24128
| epoch 016 | valid on 'valid' subset | loss 3.922 | nll_loss 2.094 | ppl 4.27 | num_updates 24128
| epoch 016 | valid on 'valid' subset | loss 3.953 | nll_loss 2.156 | ppl 4.46 | num_updates 24128
old learning rate: 0.00021025856676559
new learning rate: 0.00020358198187014258
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 194967
Counter: 03d08h17m53s534ms2.487us
ValueRate: 06s073ms285.294us / second
Rate: 5.40354 / second
Percentiles: 1%=377ms736.716us; 5%=378ms347.275us; 10%=391ms93.292us; 20%=01s170ms177.921us; 50%=01s184ms85.980us; 80%=01s286ms973.751us; 90%=01s291ms780.049us; 95%=01s293ms0.508us; 99%=01s313ms193.004us
Metric: InboundData
TotalSamples: 1039
Counter: 2.01KB
ValueRate: 0.05B / second
Rate: 0.0252563 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 799588
Counter: 75.76GB
ValueRate: 929.54KB / second
Rate: 20.3152 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1766656
Counter: 10h14m49s315ms141.052us
ValueRate: 02s710ms817.997us / second
Rate: 51.6305 / second
Percentiles: 1%=447.102us; 5%=498.538us; 10%=525.607us; 20%=588.113us; 50%=782.882us; 80%=002ms600.590us; 90%=034ms882.561us; 95%=375ms914.429us; 99%=388ms49.539us
Metric: TransferFromServerTime
TotalSamples: 1039
Counter: 08s760ms902.115us
ValueRate: 175.827us / second
Rate: 0.0252563 / second
Percentiles: 1%=617.094us; 5%=666.496us; 10%=713.392us; 20%=771.478us; 50%=001ms50.886us; 80%=004ms558.883us; 90%=031ms160.746us; 95%=046ms706.347us; 99%=062ms362.348us
Metric: TransferToServerTime
TotalSamples: 799588
Counter: 03d39h14m29s968ms287.686us
ValueRate: 04s287ms898.140us / second
Rate: 20.3157 / second
Percentiles: 1%=001ms94.108us; 5%=001ms204.852us; 10%=001ms310.409us; 20%=001ms464.654us; 50%=003ms568.133us; 80%=241ms288.908us; 90%=954ms635.532us; 95%=985ms529.916us; 99%=01s079ms509.902us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 194860
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 143264976
Counter: CreateXlaTensor
Value: 934507645
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 143257943
Counter: DestroyXlaTensor
Value: 934501636
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 143257943
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23392
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1039
Epoch 17 begin 2019-08-27 05:06:39.788107
training torch.Size([256, 64])/ 2019-08-27 05:06:48.047519, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:06:48.065182, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:06:48.083613, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:06:48.165732, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:06:48.228676, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:06:48.506026, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:06:48.679814, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:06:48.799137, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:09:38.361773, device xla:4, step 100, Rate=60.17, Global Rate=292.64, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:09:38.366846, device xla:6, step 100, Rate=60.28, Global Rate=292.63, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:09:38.377974, device xla:1, step 100, Rate=60.12, Global Rate=292.61, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:09:38.371335, device xla:3, step 100, Rate=60.13, Global Rate=292.62, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:09:38.393353, device xla:5, step 100, Rate=60.18, Global Rate=292.59, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:09:38.414002, device xla:2, step 100, Rate=60.11, Global Rate=292.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:09:38.403740, device xla:8, step 100, Rate=60.38, Global Rate=292.57, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:09:38.398326, device xla:7, step 100, Rate=60.34, Global Rate=292.58, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:12:21.999797, device xla:7, step 200, Rate=110.86, Global Rate=302.43, Compiles=107, _local_scalar_dense=1039
training torch.Size([1024, 16])/ 2019-08-27 05:12:22.005008, device xla:2, step 200, Rate=110.68, Global Rate=302.42, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:12:22.017791, device xla:5, step 200, Rate=110.72, Global Rate=302.41, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:12:22.025759, device xla:1, step 200, Rate=110.67, Global Rate=302.40, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:12:22.011242, device xla:3, step 200, Rate=110.68, Global Rate=302.41, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:12:22.030885, device xla:6, step 200, Rate=110.79, Global Rate=302.40, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:12:22.043045, device xla:4, step 200, Rate=110.69, Global Rate=302.39, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:12:22.035815, device xla:8, step 200, Rate=110.88, Global Rate=302.39, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:15:06.229064, device xla:6, step 300, Rate=151.00, Global Rate=305.47, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:15:06.234971, device xla:5, step 300, Rate=150.94, Global Rate=305.47, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:15:06.239395, device xla:1, step 300, Rate=150.89, Global Rate=305.47, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:15:06.255830, device xla:8, step 300, Rate=151.06, Global Rate=305.46, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:15:06.261147, device xla:2, step 300, Rate=150.89, Global Rate=305.45, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:15:06.223625, device xla:7, step 300, Rate=151.04, Global Rate=305.48, Compiles=107, _local_scalar_dense=1039training torch.Size([512, 32])/ 2019-08-27 05:15:06.248073, device xla:3, step 300, Rate=150.90, Global Rate=305.46, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:15:06.268038, device xla:4, step 300, Rate=150.91, Global Rate=305.45, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:17:51.916073, device xla:5, step 400, Rate=182.55, Global Rate=306.35, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:17:51.920629, device xla:2, step 400, Rate=182.53, Global Rate=306.35, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:17:51.934125, device xla:3, step 400, Rate=182.52, Global Rate=306.34, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:17:51.941744, device xla:8, step 400, Rate=182.65, Global Rate=306.34, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:17:51.926299, device xla:1, step 400, Rate=182.52, Global Rate=306.35, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:17:51.946914, device xla:7, step 400, Rate=182.62, Global Rate=306.34, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:17:51.955757, device xla:4, step 400, Rate=182.53, Global Rate=306.33, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:17:51.965615, device xla:6, step 400, Rate=182.58, Global Rate=306.33, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:20:36.009150, device xla:5, step 500, Rate=208.45, Global Rate=307.47, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:20:36.013368, device xla:4, step 500, Rate=208.44, Global Rate=307.47, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:20:36.019581, device xla:2, step 500, Rate=208.42, Global Rate=307.46, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:20:36.046444, device xla:6, step 500, Rate=208.48, Global Rate=307.45, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:20:36.063683, device xla:8, step 500, Rate=208.51, Global Rate=307.45, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:20:36.025738, device xla:3, step 500, Rate=208.42, Global Rate=307.46, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:20:36.037797, device xla:7, step 500, Rate=208.50, Global Rate=307.46, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:20:36.053540, device xla:1, step 500, Rate=208.40, Global Rate=307.45, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:23:22.313458, device xla:2, step 600, Rate=228.31, Global Rate=307.54, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:23:22.317815, device xla:5, step 600, Rate=228.33, Global Rate=307.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:23:22.323357, device xla:6, step 600, Rate=228.36, Global Rate=307.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:23:22.329047, device xla:4, step 600, Rate=228.32, Global Rate=307.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:23:22.336846, device xla:8, step 600, Rate=228.40, Global Rate=307.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:23:22.356335, device xla:1, step 600, Rate=228.30, Global Rate=307.52, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:23:22.368988, device xla:7, step 600, Rate=228.37, Global Rate=307.52, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:23:22.345746, device xla:3, step 600, Rate=228.30, Global Rate=307.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:26:05.615038, device xla:6, step 700, Rate=245.40, Global Rate=308.38, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:26:05.597321, device xla:4, step 700, Rate=245.38, Global Rate=308.38, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:26:05.620278, device xla:8, step 700, Rate=245.43, Global Rate=308.38, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:26:05.625292, device xla:5, step 700, Rate=245.37, Global Rate=308.37, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:26:05.608007, device xla:3, step 700, Rate=245.36, Global Rate=308.38, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:26:05.631852, device xla:2, step 700, Rate=245.35, Global Rate=308.37, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:26:05.602670, device xla:7, step 700, Rate=245.43, Global Rate=308.38, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:26:05.637578, device xla:1, step 700, Rate=245.35, Global Rate=308.37, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:28:50.508337, device xla:8, step 800, Rate=258.45, Global Rate=308.64, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:28:50.514251, device xla:7, step 800, Rate=258.43, Global Rate=308.64, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:28:50.525717, device xla:2, step 800, Rate=258.38, Global Rate=308.64, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:28:50.531903, device xla:6, step 800, Rate=258.41, Global Rate=308.64, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:28:50.544217, device xla:4, step 800, Rate=258.38, Global Rate=308.63, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:28:50.536832, device xla:3, step 800, Rate=258.38, Global Rate=308.64, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:28:50.519622, device xla:5, step 800, Rate=258.39, Global Rate=308.64, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:28:50.550416, device xla:1, step 800, Rate=258.38, Global Rate=308.63, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:31:34.078145, device xla:8, step 900, Rate=269.36, Global Rate=309.12, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:31:34.090619, device xla:1, step 900, Rate=269.31, Global Rate=309.12, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:31:34.073414, device xla:5, step 900, Rate=269.32, Global Rate=309.12, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:31:34.094455, device xla:4, step 900, Rate=269.32, Global Rate=309.12, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:31:34.100339, device xla:6, step 900, Rate=269.33, Global Rate=309.12, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:31:34.083509, device xla:7, step 900, Rate=269.35, Global Rate=309.12, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:31:34.105667, device xla:3, step 900, Rate=269.31, Global Rate=309.12, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:31:34.117307, device xla:2, step 900, Rate=269.30, Global Rate=309.11, Compiles=107, _local_scalar_dense=1039
training torch.Size([1024, 16])/ 2019-08-27 05:34:18.688136, device xla:4, step 1000, Rate=277.67, Global Rate=309.31, Compiles=107, _local_scalar_dense=1039
training torch.Size([1024, 16])/ 2019-08-27 05:34:18.693338, device xla:7, step 1000, Rate=277.69, Global Rate=309.31, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:34:18.712323, device xla:5, step 1000, Rate=277.66, Global Rate=309.31, Compiles=107, _local_scalar_dense=1039training torch.Size([256, 64])/ 2019-08-27 05:34:18.705309, device xla:8, step 1000, Rate=277.69, Global Rate=309.31, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:34:18.698345, device xla:3, step 1000, Rate=277.66, Global Rate=309.31, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:34:18.726710, device xla:2, step 1000, Rate=277.65, Global Rate=309.31, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:34:18.734878, device xla:1, step 1000, Rate=277.65, Global Rate=309.30, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:34:18.719116, device xla:6, step 1000, Rate=277.67, Global Rate=309.31, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:37:03.147233, device xla:1, step 1100, Rate=284.40, Global Rate=309.49, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:37:03.155502, device xla:7, step 1100, Rate=284.41, Global Rate=309.49, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:37:03.160785, device xla:8, step 1100, Rate=284.42, Global Rate=309.49, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:37:03.167390, device xla:5, step 1100, Rate=284.39, Global Rate=309.49, Compiles=107, _local_scalar_dense=1039
training torch.Size([1024, 16])/ 2019-08-27 05:37:03.140983, device xla:3, step 1100, Rate=284.40, Global Rate=309.50, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:37:03.174152, device xla:6, step 1100, Rate=284.40, Global Rate=309.49, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:37:03.181073, device xla:4, step 1100, Rate=284.39, Global Rate=309.49, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:37:03.196100, device xla:2, step 1100, Rate=284.38, Global Rate=309.49, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:39:48.653179, device xla:6, step 1200, Rate=289.40, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:39:48.657908, device xla:5, step 1200, Rate=289.39, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([1024, 16])/ 2019-08-27 05:39:48.663778, device xla:8, step 1200, Rate=289.41, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:39:48.690098, device xla:2, step 1200, Rate=289.38, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:39:48.665489, device xla:3, step 1200, Rate=289.38, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:39:48.680732, device xla:1, step 1200, Rate=289.38, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:39:48.672452, device xla:4, step 1200, Rate=289.38, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:39:48.697677, device xla:7, step 1200, Rate=289.39, Global Rate=309.48, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:42:33.582700, device xla:4, step 1300, Rate=293.60, Global Rate=309.56, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:42:33.587862, device xla:2, step 1300, Rate=293.60, Global Rate=309.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:42:33.592927, device xla:6, step 1300, Rate=293.61, Global Rate=309.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:42:33.597429, device xla:5, step 1300, Rate=293.60, Global Rate=309.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:42:33.602209, device xla:3, step 1300, Rate=293.59, Global Rate=309.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:42:33.611169, device xla:8, step 1300, Rate=293.61, Global Rate=309.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:42:33.616039, device xla:7, step 1300, Rate=293.60, Global Rate=309.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([1024, 16])/ 2019-08-27 05:42:33.629856, device xla:1, step 1300, Rate=293.58, Global Rate=309.55, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:45:14.674852, device xla:4, step 1400, Rate=298.45, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:45:14.688090, device xla:6, step 1400, Rate=298.45, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:45:14.698086, device xla:5, step 1400, Rate=298.44, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([1024, 16])/ 2019-08-27 05:45:14.702977, device xla:8, step 1400, Rate=298.45, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:45:14.713600, device xla:7, step 1400, Rate=298.45, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:45:14.719727, device xla:3, step 1400, Rate=298.43, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:45:14.708680, device xla:2, step 1400, Rate=298.44, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:45:14.680031, device xla:1, step 1400, Rate=298.45, Global Rate=310.13, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:47:56.547648, device xla:5, step 1500, Rate=302.02, Global Rate=310.54, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:47:56.551950, device xla:4, step 1500, Rate=302.02, Global Rate=310.54, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:47:56.556784, device xla:6, step 1500, Rate=302.02, Global Rate=310.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([512, 32])/ 2019-08-27 05:47:56.568420, device xla:8, step 1500, Rate=302.02, Global Rate=310.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:47:56.573343, device xla:2, step 1500, Rate=302.01, Global Rate=310.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:47:56.561467, device xla:3, step 1500, Rate=302.01, Global Rate=310.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:47:56.601016, device xla:7, step 1500, Rate=302.01, Global Rate=310.53, Compiles=107, _local_scalar_dense=1039
training torch.Size([256, 64])/ 2019-08-27 05:47:56.584027, device xla:1, step 1500, Rate=302.01, Global Rate=310.53, Compiles=107, _local_scalar_dense=1039
Epoch 17 Training stats:
device xla:1
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5867 | ups 1 | wpb 11151.356 | bsz 410.133 | num_updates 25636 | lr 0.000197504 | gnorm 0.068 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 38054
device xla:2
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5874 | ups 1 | wpb 11166.004 | bsz 409.015 | num_updates 25636 | lr 0.000197504 | gnorm 0.065 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 39302
device xla:3
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5845 | ups 1 | wpb 11110.366 | bsz 410.932 | num_updates 25636 | lr 0.000197504 | gnorm 0.070 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 37618
device xla:4
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5860 | ups 1 | wpb 11138.347 | bsz 411.921 | num_updates 25636 | lr 0.000197504 | gnorm 0.070 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 39323
device xla:5
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5872 | ups 1 | wpb 11161.360 | bsz 410.962 | num_updates 25636 | lr 0.000197504 | gnorm 0.066 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 39151
device xla:6
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5862 | ups 1 | wpb 11142.782 | bsz 408.715 | num_updates 25636 | lr 0.000197504 | gnorm 0.067 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 39252
device xla:7
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5871 | ups 1 | wpb 11160.306 | bsz 408.116 | num_updates 25636 | lr 0.000197504 | gnorm 0.067 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 38121
device xla:8
| epoch 017 | loss 0.235 | nll_loss 0.235 | ppl 1.18 | wps 5860 | ups 1 | wpb 11139.252 | bsz 408.705 | num_updates 25636 | lr 0.000197504 | gnorm 0.072 | clip 0.000 | oom 0.000 | wall 48728 | train_wall 38200
Epoch 17 Tracker Rates:
Rate=300.32, Global Rate=310.44
Rate=300.28, Global Rate=310.44
Rate=300.24, Global Rate=310.44
Rate=300.20, Global Rate=310.44
Rate=300.18, Global Rate=310.44
Rate=300.22, Global Rate=310.44
Rate=300.40, Global Rate=310.44
Rate=300.27, Global Rate=310.44
Epoch 17 end 2019-08-27 05:48:10.535322
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 207031
Counter: 03d18h01m55s182ms710.413us
ValueRate: 06s162ms941.864us / second
Rate: 4.99621 / second
Percentiles: 1%=01s162ms885.969us; 5%=01s171ms34.907us; 10%=01s174ms311.572us; 20%=01s179ms820.203us; 50%=01s274ms148.212us; 80%=01s288ms253.701us; 90%=01s292ms567.194us; 95%=01s294ms594.681us; 99%=01s300ms611.151us
Metric: InboundData
TotalSamples: 1079
Counter: 2.09KB
ValueRate: 0.05B / second
Rate: 0.0242964 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 849389
Counter: 80.25GB
ValueRate: 499.43KB / second
Rate: 20.4488 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1875298
Counter: 10h22m21s820ms879.304us
ValueRate: 308ms941.061us / second
Rate: 42.4087 / second
Percentiles: 1%=434.354us; 5%=500.027us; 10%=553.382us; 20%=620.203us; 50%=844.596us; 80%=002ms319.097us; 90%=011ms592.292us; 95%=023ms750.890us; 99%=060ms275.804us
Metric: TransferFromServerTime
TotalSamples: 1079
Counter: 08s817ms676.926us
ValueRate: 151.012us / second
Rate: 0.0242964 / second
Percentiles: 1%=603.621us; 5%=658.530us; 10%=701.807us; 20%=758.875us; 50%=001ms36.906us; 80%=003ms311.154us; 90%=027ms607.259us; 95%=043ms876.650us; 99%=062ms362.348us
Metric: TransferToServerTime
TotalSamples: 849389
Counter: 03d47h05m10s312ms821.345us
ValueRate: 05s075ms711.989us / second
Rate: 20.449 / second
Percentiles: 1%=001ms56.497us; 5%=001ms168.079us; 10%=001ms259.915us; 20%=001ms380.564us; 50%=002ms54.875us; 80%=912ms800.376us; 90%=990ms21.900us; 95%=01s062ms625.907us; 99%=01s102ms743.346us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 206924
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 152217401
Counter: CreateXlaTensor
Value: 992779289
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 152206894
Counter: DestroyXlaTensor
Value: 992773280
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 152210368
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23502
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1079
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 05:48:14.391722, device xla:8, step 0, Compiles=107, _local_scalar_dense=1079
validation/ 2019-08-27 05:48:14.401375, device xla:1, step 0, Compiles=107, _local_scalar_dense=1079
validation/ 2019-08-27 05:48:14.404147, device xla:6, step 0, Compiles=107, _local_scalar_dense=1079
validation/ 2019-08-27 05:48:14.406867, device xla:3, step 0, Compiles=107, _local_scalar_dense=1079
validation/ 2019-08-27 05:48:14.412243, device xla:2, step 0, Compiles=107, _local_scalar_dense=1079
validation/ 2019-08-27 05:48:14.417063, device xla:4, step 0, Compiles=107, _local_scalar_dense=1079
validation/ 2019-08-27 05:48:14.418621, device xla:7, step 0, Compiles=107, _local_scalar_dense=1079
validation/ 2019-08-27 05:48:14.420276, device xla:5, step 0, Compiles=107, _local_scalar_dense=1079
validation stats on subset "valid" - 2019-08-27 05:48:20.445164
| epoch 017 | valid on 'valid' subset | loss 3.828 | nll_loss 2.031 | ppl 4.09 | num_updates 25636
| epoch 017 | valid on 'valid' subset | loss 3.875 | nll_loss 2.047 | ppl 4.13 | num_updates 25636
| epoch 017 | valid on 'valid' subset | loss 3.953 | nll_loss 2.141 | ppl 4.41 | num_updates 25636
| epoch 017 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 25636
| epoch 017 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 25636
| epoch 017 | valid on 'valid' subset | loss 3.875 | nll_loss 2.078 | ppl 4.22 | num_updates 25636
| epoch 017 | valid on 'valid' subset | loss 3.906 | nll_loss 2.078 | ppl 4.22 | num_updates 25636
| epoch 017 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 25636
old learning rate: 0.00020358198187014258
new learning rate: 0.00019750353287604176
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 207152
Counter: 03d18h02m41s192ms171.545us
ValueRate: 06s070ms883.697us / second
Rate: 5.36115 / second
Percentiles: 1%=377ms117.268us; 5%=379ms810.588us; 10%=391ms929.327us; 20%=01s174ms97.718us; 50%=01s187ms605.861us; 80%=01s288ms650.135us; 90%=01s291ms246.062us; 95%=01s294ms594.681us; 99%=01s300ms611.151us
Metric: InboundData
TotalSamples: 1104
Counter: 2.14KB
ValueRate: 0.05B / second
Rate: 0.0258294 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 849628
Counter: 80.30GB
ValueRate: 952.06KB / second
Rate: 20.8073 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1875935
Counter: 10h23m50s802ms17.084us
ValueRate: 02s568ms679.920us / second
Rate: 52.7073 / second
Percentiles: 1%=437.675us; 5%=496.906us; 10%=531.703us; 20%=585.417us; 50%=756.962us; 80%=001ms257.391us; 90%=016ms619.206us; 95%=375ms15.472us; 99%=388ms659.278us
Metric: TransferFromServerTime
TotalSamples: 1104
Counter: 08s856ms839.838us
ValueRate: 145.210us / second
Rate: 0.0258294 / second
Percentiles: 1%=603.621us; 5%=658.530us; 10%=701.807us; 20%=756.815us; 50%=001ms27.196us; 80%=003ms157.830us; 90%=010ms920.459us; 95%=042ms121.967us; 99%=060ms996.139us
Metric: TransferToServerTime
TotalSamples: 849628
Counter: 03d47h06m36s056ms584.854us
ValueRate: 04s412ms686.674us / second
Rate: 21.2198 / second
Percentiles: 1%=001ms63.025us; 5%=001ms187.479us; 10%=001ms284.144us; 20%=001ms433.060us; 50%=002ms54.875us; 80%=236ms395.238us; 90%=968ms192.885us; 95%=01s008ms314.578us; 99%=01s078ms106.615us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 207045
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 152219009
Counter: CreateXlaTensor
Value: 992914106
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 152211976
Counter: DestroyXlaTensor
Value: 992908098
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 152211977
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23502
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1104
Epoch 18 begin 2019-08-27 05:48:20.464381
training torch.Size([256, 64])/ 2019-08-27 05:48:28.463906, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:48:28.569608, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([1024, 16])/ 2019-08-27 05:48:28.603925, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([1024, 16])/ 2019-08-27 05:48:28.634197, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:48:28.749335, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([1024, 16])/ 2019-08-27 05:48:28.764261, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:48:29.134826, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:48:29.339548, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:51:15.216349, device xla:1, step 100, Rate=61.46, Global Rate=298.93, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:51:15.221863, device xla:6, step 100, Rate=61.51, Global Rate=298.92, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:51:15.233313, device xla:2, step 100, Rate=61.40, Global Rate=298.90, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:51:15.226701, device xla:7, step 100, Rate=61.65, Global Rate=298.92, Compiles=107, _local_scalar_dense=1104
training torch.Size([1024, 16])/ 2019-08-27 05:51:15.238163, device xla:3, step 100, Rate=61.44, Global Rate=298.89, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:51:15.245081, device xla:4, step 100, Rate=61.51, Global Rate=298.88, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:51:15.258087, device xla:8, step 100, Rate=61.72, Global Rate=298.86, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:51:15.250273, device xla:5, step 100, Rate=61.46, Global Rate=298.87, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:53:59.672359, device xla:6, step 200, Rate=111.48, Global Rate=305.01, Compiles=107, _local_scalar_dense=1104
training torch.Size([1024, 16])/ 2019-08-27 05:53:59.677071, device xla:1, step 200, Rate=111.43, Global Rate=305.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:53:59.690688, device xla:8, step 200, Rate=111.65, Global Rate=304.99, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:53:59.695501, device xla:3, step 200, Rate=111.42, Global Rate=304.98, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:53:59.700632, device xla:2, step 200, Rate=111.38, Global Rate=304.98, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:53:59.682438, device xla:4, step 200, Rate=111.48, Global Rate=305.00, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:53:59.710104, device xla:7, step 200, Rate=111.58, Global Rate=304.97, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:53:59.717858, device xla:5, step 200, Rate=111.43, Global Rate=304.96, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:56:44.834431, device xla:6, step 300, Rate=151.18, Global Rate=306.65, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:56:44.839045, device xla:2, step 300, Rate=151.12, Global Rate=306.65, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:56:44.844772, device xla:1, step 300, Rate=151.14, Global Rate=306.65, Compiles=107, _local_scalar_dense=1104
training torch.Size([1024, 16])/ 2019-08-27 05:56:44.850054, device xla:7, step 300, Rate=151.27, Global Rate=306.64, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:56:44.866542, device xla:5, step 300, Rate=151.15, Global Rate=306.63, Compiles=107, _local_scalar_dense=1104
training torch.Size([1024, 16])/ 2019-08-27 05:56:44.857133, device xla:4, step 300, Rate=151.18, Global Rate=306.64, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:56:44.872198, device xla:3, step 300, Rate=151.13, Global Rate=306.63, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:56:44.879662, device xla:8, step 300, Rate=151.31, Global Rate=306.62, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:59:27.770305, device xla:8, step 400, Rate=183.91, Global Rate=308.51, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:59:27.774938, device xla:1, step 400, Rate=183.76, Global Rate=308.51, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:59:27.780060, device xla:3, step 400, Rate=183.76, Global Rate=308.51, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:59:27.789959, device xla:2, step 400, Rate=183.73, Global Rate=308.50, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 05:59:27.796140, device xla:5, step 400, Rate=183.77, Global Rate=308.50, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:59:27.802836, device xla:4, step 400, Rate=183.79, Global Rate=308.50, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:59:27.782572, device xla:7, step 400, Rate=183.86, Global Rate=308.51, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 05:59:27.817783, device xla:6, step 400, Rate=183.77, Global Rate=308.49, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:02:10.816149, device xla:8, step 500, Rate=209.93, Global Rate=309.60, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:02:10.822188, device xla:1, step 500, Rate=209.81, Global Rate=309.60, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:02:10.834516, device xla:2, step 500, Rate=209.79, Global Rate=309.59, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:02:10.839356, device xla:3, step 500, Rate=209.81, Global Rate=309.59, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:02:10.844370, device xla:6, step 500, Rate=209.83, Global Rate=309.59, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:02:10.827348, device xla:7, step 500, Rate=209.90, Global Rate=309.60, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:02:10.851758, device xla:4, step 500, Rate=209.83, Global Rate=309.59, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:02:10.861723, device xla:5, step 500, Rate=209.81, Global Rate=309.58, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:04:54.798478, device xla:5, step 600, Rate=230.31, Global Rate=310.03, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:04:54.803523, device xla:3, step 600, Rate=230.30, Global Rate=310.03, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:04:54.808029, device xla:8, step 600, Rate=230.39, Global Rate=310.03, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:04:54.827876, device xla:4, step 600, Rate=230.31, Global Rate=310.03, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:04:54.832954, device xla:6, step 600, Rate=230.31, Global Rate=310.02, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:04:54.819358, device xla:1, step 600, Rate=230.29, Global Rate=310.03, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:04:54.812642, device xla:2, step 600, Rate=230.28, Global Rate=310.03, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:04:54.841209, device xla:7, step 600, Rate=230.35, Global Rate=310.02, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:07:40.125368, device xla:6, step 700, Rate=246.20, Global Rate=309.99, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:07:40.129977, device xla:3, step 700, Rate=246.18, Global Rate=309.98, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:07:40.134613, device xla:2, step 700, Rate=246.16, Global Rate=309.98, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:07:40.147284, device xla:5, step 700, Rate=246.18, Global Rate=309.98, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:07:40.156413, device xla:4, step 700, Rate=246.19, Global Rate=309.98, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:07:40.166018, device xla:8, step 700, Rate=246.24, Global Rate=309.97, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:07:40.139697, device xla:7, step 700, Rate=246.23, Global Rate=309.98, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:07:40.178779, device xla:1, step 700, Rate=246.16, Global Rate=309.97, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:10:27.101193, device xla:2, step 800, Rate=258.26, Global Rate=309.56, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:10:27.112101, device xla:6, step 800, Rate=258.28, Global Rate=309.56, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:10:27.116645, device xla:3, step 800, Rate=258.26, Global Rate=309.56, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:10:27.105726, device xla:7, step 800, Rate=258.31, Global Rate=309.56, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:10:27.121711, device xla:8, step 800, Rate=258.32, Global Rate=309.56, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:10:27.140402, device xla:1, step 800, Rate=258.26, Global Rate=309.55, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:10:27.131581, device xla:5, step 800, Rate=258.27, Global Rate=309.55, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:10:27.146604, device xla:4, step 800, Rate=258.27, Global Rate=309.55, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:13:11.410409, device xla:3, step 900, Rate=268.94, Global Rate=309.79, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:13:11.414828, device xla:5, step 900, Rate=268.94, Global Rate=309.79, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:13:11.419924, device xla:8, step 900, Rate=268.98, Global Rate=309.79, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:13:11.426277, device xla:4, step 900, Rate=268.95, Global Rate=309.78, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:13:11.429424, device xla:2, step 900, Rate=268.92, Global Rate=309.78, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:13:11.453784, device xla:6, step 900, Rate=268.93, Global Rate=309.78, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:13:11.444377, device xla:7, step 900, Rate=268.96, Global Rate=309.78, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:13:11.434682, device xla:1, step 900, Rate=268.93, Global Rate=309.78, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:15:55.458907, device xla:1, step 1000, Rate=277.58, Global Rate=310.02, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:15:55.464248, device xla:5, step 1000, Rate=277.58, Global Rate=310.02, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:15:55.484808, device xla:3, step 1000, Rate=277.56, Global Rate=310.01, Compiles=107, _local_scalar_dense=1104training torch.Size([256, 64])/ 2019-08-27 06:15:55.491490, device xla:6, step 1000, Rate=277.57, Global Rate=310.01, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:15:55.499112, device xla:8, step 1000, Rate=277.60, Global Rate=310.01, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:15:55.477465, device xla:7, step 1000, Rate=277.60, Global Rate=310.01, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:15:55.469319, device xla:4, step 1000, Rate=277.58, Global Rate=310.02, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:15:55.506601, device xla:2, step 1000, Rate=277.55, Global Rate=310.01, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:18:39.777796, device xla:3, step 1100, Rate=284.38, Global Rate=310.16, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:18:39.789714, device xla:6, step 1100, Rate=284.38, Global Rate=310.16, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:18:39.782634, device xla:7, step 1100, Rate=284.40, Global Rate=310.16, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:18:39.791030, device xla:5, step 1100, Rate=284.38, Global Rate=310.16, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:18:39.802135, device xla:1, step 1100, Rate=284.37, Global Rate=310.16, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:18:39.808128, device xla:8, step 1100, Rate=284.40, Global Rate=310.15, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:18:39.830478, device xla:2, step 1100, Rate=284.35, Global Rate=310.15, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:18:39.818553, device xla:4, step 1100, Rate=284.37, Global Rate=310.15, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:21:24.519222, device xla:2, step 1200, Rate=289.66, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:21:24.523655, device xla:3, step 1200, Rate=289.66, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:21:24.544003, device xla:8, step 1200, Rate=289.68, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:21:24.536827, device xla:7, step 1200, Rate=289.67, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:21:24.528922, device xla:4, step 1200, Rate=289.67, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:21:24.550389, device xla:1, step 1200, Rate=289.65, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:21:24.556739, device xla:6, step 1200, Rate=289.65, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:21:24.564032, device xla:5, step 1200, Rate=289.65, Global Rate=310.21, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:24:08.193562, device xla:3, step 1300, Rate=294.29, Global Rate=310.41, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:24:08.204827, device xla:8, step 1300, Rate=294.31, Global Rate=310.41, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:24:08.217757, device xla:2, step 1300, Rate=294.28, Global Rate=310.41, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:24:08.224658, device xla:6, step 1300, Rate=294.29, Global Rate=310.41, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:24:08.209437, device xla:4, step 1300, Rate=294.30, Global Rate=310.41, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:24:08.199679, device xla:1, step 1300, Rate=294.29, Global Rate=310.41, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:24:08.237877, device xla:7, step 1300, Rate=294.29, Global Rate=310.40, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:24:08.229789, device xla:5, step 1300, Rate=294.28, Global Rate=310.41, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:26:51.521417, device xla:3, step 1400, Rate=298.13, Global Rate=310.63, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:26:51.531047, device xla:8, step 1400, Rate=298.15, Global Rate=310.63, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:26:51.525854, device xla:5, step 1400, Rate=298.14, Global Rate=310.63, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:26:51.556453, device xla:6, step 1400, Rate=298.13, Global Rate=310.62, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:26:51.535823, device xla:2, step 1400, Rate=298.13, Global Rate=310.63, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:26:51.541358, device xla:1, step 1400, Rate=298.13, Global Rate=310.63, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:26:51.559233, device xla:7, step 1400, Rate=298.13, Global Rate=310.62, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:26:51.547005, device xla:4, step 1400, Rate=298.13, Global Rate=310.62, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:29:32.343376, device xla:6, step 1500, Rate=302.19, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:29:32.347754, device xla:1, step 1500, Rate=302.18, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:29:32.354301, device xla:5, step 1500, Rate=302.18, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:29:32.359605, device xla:2, step 1500, Rate=302.17, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:29:32.375495, device xla:3, step 1500, Rate=302.16, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:29:32.365469, device xla:4, step 1500, Rate=302.18, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
training torch.Size([512, 32])/ 2019-08-27 06:29:32.380454, device xla:7, step 1500, Rate=302.18, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
training torch.Size([256, 64])/ 2019-08-27 06:29:32.388104, device xla:8, step 1500, Rate=302.18, Global Rate=311.13, Compiles=107, _local_scalar_dense=1104
Epoch 18 Training stats:
device xla:1
| epoch 018 | loss 0.223 | nll_loss 0.223 | ppl 1.17 | wps 5909 | ups 1 | wpb 11150.301 | bsz 410.417 | num_updates 27144 | lr 0.000191939 | gnorm 0.064 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 40103
device xla:2
| epoch 018 | loss 0.221 | nll_loss 0.221 | ppl 1.17 | wps 5920 | ups 1 | wpb 11172.605 | bsz 408.936 | num_updates 27144 | lr 0.000191939 | gnorm 0.062 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 41357
device xla:3
| epoch 018 | loss 0.223 | nll_loss 0.223 | ppl 1.17 | wps 5888 | ups 1 | wpb 11110.644 | bsz 411.058 | num_updates 27144 | lr 0.000191939 | gnorm 0.066 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 39670
device xla:4
| epoch 018 | loss 0.223 | nll_loss 0.223 | ppl 1.17 | wps 5898 | ups 1 | wpb 11129.531 | bsz 411.548 | num_updates 27144 | lr 0.000191939 | gnorm 0.066 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 41374
device xla:5
| epoch 018 | loss 0.221 | nll_loss 0.221 | ppl 1.17 | wps 5918 | ups 1 | wpb 11168.314 | bsz 411.049 | num_updates 27144 | lr 0.000191939 | gnorm 0.063 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 41209
device xla:6
| epoch 018 | loss 0.223 | nll_loss 0.223 | ppl 1.17 | wps 5905 | ups 1 | wpb 11144.138 | bsz 408.549 | num_updates 27144 | lr 0.000191939 | gnorm 0.063 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 41307
device xla:7
| epoch 018 | loss 0.223 | nll_loss 0.223 | ppl 1.17 | wps 5915 | ups 1 | wpb 11162.430 | bsz 408.427 | num_updates 27144 | lr 0.000191939 | gnorm 0.063 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 40169
device xla:8
| epoch 018 | loss 0.223 | nll_loss 0.223 | ppl 1.17 | wps 5899 | ups 1 | wpb 11131.794 | bsz 408.521 | num_updates 27144 | lr 0.000191939 | gnorm 0.068 | clip 0.000 | oom 0.000 | wall 51225 | train_wall 40254
Epoch 18 Tracker Rates:
Rate=297.82, Global Rate=310.95
Rate=297.86, Global Rate=310.95
Rate=297.92, Global Rate=310.95
Rate=297.89, Global Rate=310.95
Rate=297.85, Global Rate=310.95
Rate=297.81, Global Rate=310.95
Rate=297.95, Global Rate=310.95
Rate=297.98, Global Rate=310.95
Epoch 18 end 2019-08-27 06:29:46.955531
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 219216
Counter: 03d28h10m30s202ms674.172us
ValueRate: 06s180ms875.470us / second
Rate: 4.99961 / second
Percentiles: 1%=01s164ms768.638us; 5%=01s170ms340.242us; 10%=01s174ms329.165us; 20%=01s178ms641.745us; 50%=01s277ms443.605us; 80%=01s289ms182.826us; 90%=01s293ms251.693us; 95%=01s296ms226.716us; 99%=01s300ms450.307us
Metric: InboundData
TotalSamples: 1144
Counter: 2.22KB
ValueRate: 0.05B / second
Rate: 0.0243112 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 899567
Counter: 84.79GB
ValueRate: 508.54KB / second
Rate: 20.8316 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1984664
Counter: 10h07m18s476ms520.936us
ValueRate: 342ms375.548us / second
Rate: 44.9963 / second
Percentiles: 1%=436.233us; 5%=482.563us; 10%=513.407us; 20%=583.063us; 50%=782.435us; 80%=002ms251.901us; 90%=008ms899.202us; 95%=027ms994.816us; 99%=057ms668.825us
Metric: TransferFromServerTime
TotalSamples: 1144
Counter: 08s961ms977.773us
ValueRate: 125.411us / second
Rate: 0.0243112 / second
Percentiles: 1%=603.621us; 5%=657.665us; 10%=700.844us; 20%=756.633us; 50%=001ms10.043us; 80%=003ms969.819us; 90%=009ms102.990us; 95%=041ms111.152us; 99%=058ms699.169us
Metric: TransferToServerTime
TotalSamples: 899567
Counter: 03d56h21m33s163ms343.225us
ValueRate: 05s089ms290.328us / second
Rate: 20.833 / second
Percentiles: 1%=001ms71.220us; 5%=001ms190.350us; 10%=001ms273.456us; 20%=001ms385.438us; 50%=002ms975.123us; 80%=918ms327.308us; 90%=01s008ms981.652us; 95%=01s067ms593.517us; 99%=01s101ms82.854us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 219109
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 161171572
Counter: CreateXlaTensor
Value: 1051185750
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 161164540
Counter: DestroyXlaTensor
Value: 1051179742
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 161164540
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23587
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1144
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 06:29:50.732895, device xla:3, step 0, Compiles=107, _local_scalar_dense=1144
validation/ 2019-08-27 06:29:50.738488, device xla:5, step 0, Compiles=107, _local_scalar_dense=1144
validation/ 2019-08-27 06:29:50.740904, device xla:4, step 0, Compiles=107, _local_scalar_dense=1144
validation/ 2019-08-27 06:29:50.744104, device xla:7, step 0, Compiles=107, _local_scalar_dense=1144
validation/ 2019-08-27 06:29:50.745784, device xla:8, step 0, Compiles=107, _local_scalar_dense=1144
validation/ 2019-08-27 06:29:50.879263, device xla:6, step 0, Compiles=107, _local_scalar_dense=1144
validation/ 2019-08-27 06:29:50.883507, device xla:2, step 0, Compiles=107, _local_scalar_dense=1144
validation/ 2019-08-27 06:29:50.892338, device xla:1, step 0, Compiles=107, _local_scalar_dense=1144
validation stats on subset "valid" - 2019-08-27 06:29:56.880776
| epoch 018 | valid on 'valid' subset | loss 3.828 | nll_loss 2.031 | ppl 4.09 | num_updates 27144
| epoch 018 | valid on 'valid' subset | loss 3.875 | nll_loss 2.047 | ppl 4.13 | num_updates 27144
| epoch 018 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 27144
| epoch 018 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 27144
| epoch 018 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 27144
| epoch 018 | valid on 'valid' subset | loss 3.875 | nll_loss 2.078 | ppl 4.22 | num_updates 27144
| epoch 018 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 27144
| epoch 018 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 27144
old learning rate: 0.00019750353287604176
new learning rate: 0.0001919389332103661
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 219337
Counter: 03d28h10m16s371ms792.658us
ValueRate: 06s095ms688.139us / second
Rate: 5.36583 / second
Percentiles: 1%=377ms96.068us; 5%=379ms316.516us; 10%=392ms585.142us; 20%=01s174ms543.231us; 50%=01s191ms952.658us; 80%=01s288ms388.562us; 90%=01s293ms523.384us; 95%=01s296ms107.135us; 99%=01s300ms450.307us
Metric: InboundData
TotalSamples: 1169
Counter: 2.27KB
ValueRate: 0.05B / second
Rate: 0.0268232 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 899807
Counter: 84.83GB
ValueRate: 962.51KB / second
Rate: 21.0358 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 1985293
Counter: 10h08m49s586ms19.201us
ValueRate: 02s733ms952.274us / second
Rate: 55.8604 / second
Percentiles: 1%=437.151us; 5%=487.251us; 10%=511.114us; 20%=575.910us; 50%=758.539us; 80%=001ms339.119us; 90%=024ms514.695us; 95%=375ms272.099us; 99%=389ms537.429us
Metric: TransferFromServerTime
TotalSamples: 1169
Counter: 08s044ms665.127us
ValueRate: 125.995us / second
Rate: 0.0268232 / second
Percentiles: 1%=603.621us; 5%=657.759us; 10%=700.882us; 20%=758.054us; 50%=001ms13.244us; 80%=003ms811.717us; 90%=009ms562.985us; 95%=038ms992.273us; 99%=057ms984.905us
Metric: TransferToServerTime
TotalSamples: 899807
Counter: 03d56h21m00s444ms433.160us
ValueRate: 04s310ms588.480us / second
Rate: 21.0364 / second
Percentiles: 1%=001ms89.668us; 5%=001ms218.854us; 10%=001ms300.509us; 20%=001ms436.575us; 50%=002ms117.090us; 80%=246ms351.187us; 90%=968ms733.631us; 95%=01s011ms744.636us; 99%=01s084ms889.883us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 219230
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 161173181
Counter: CreateXlaTensor
Value: 1051320567
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 161166147
Counter: DestroyXlaTensor
Value: 1051314558
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 161166148
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23587
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1169
Epoch 19 begin 2019-08-27 06:29:56.904366
training torch.Size([1024, 16])/ 2019-08-27 06:30:04.752845, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:30:04.783617, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:30:04.870039, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:30:04.914825, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:30:05.273866, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:30:05.311787, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:30:05.473812, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:30:05.514034, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:32:52.317802, device xla:8, step 100, Rate=61.39, Global Rate=297.73, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:32:52.323582, device xla:1, step 100, Rate=61.11, Global Rate=297.72, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:32:52.329734, device xla:7, step 100, Rate=61.37, Global Rate=297.71, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:32:52.345475, device xla:4, step 100, Rate=61.16, Global Rate=297.68, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:32:52.352833, device xla:5, step 100, Rate=61.29, Global Rate=297.67, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:32:52.337066, device xla:6, step 100, Rate=61.31, Global Rate=297.70, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:32:52.373054, device xla:3, step 100, Rate=61.13, Global Rate=297.63, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:32:52.365391, device xla:2, step 100, Rate=61.10, Global Rate=297.64, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:35:36.107023, device xla:5, step 200, Rate=111.56, Global Rate=304.98, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:35:36.101670, device xla:7, step 200, Rate=111.62, Global Rate=304.99, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:35:36.111852, device xla:1, step 200, Rate=111.41, Global Rate=304.98, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:35:36.116948, device xla:2, step 200, Rate=111.42, Global Rate=304.97, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:35:36.119153, device xla:4, step 200, Rate=111.45, Global Rate=304.97, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:35:36.123946, device xla:8, step 200, Rate=111.62, Global Rate=304.97, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:35:36.133516, device xla:6, step 200, Rate=111.56, Global Rate=304.96, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:35:36.146580, device xla:3, step 200, Rate=111.43, Global Rate=304.94, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:38:18.922173, device xla:1, step 300, Rate=152.02, Global Rate=308.08, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:38:18.926740, device xla:4, step 300, Rate=152.06, Global Rate=308.08, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:38:18.937410, device xla:2, step 300, Rate=152.03, Global Rate=308.07, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:38:18.939098, device xla:5, step 300, Rate=152.14, Global Rate=308.07, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:38:18.931531, device xla:6, step 300, Rate=152.15, Global Rate=308.07, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:38:18.962680, device xla:8, step 300, Rate=152.18, Global Rate=308.06, Compiles=107, _local_scalar_dense=1169training torch.Size([1024, 16])/ 2019-08-27 06:38:18.956857, device xla:7, step 300, Rate=152.18, Global Rate=308.06, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:38:18.946968, device xla:3, step 300, Rate=152.04, Global Rate=308.06, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:41:03.052922, device xla:8, step 400, Rate=184.15, Global Rate=309.04, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:41:03.057339, device xla:2, step 400, Rate=184.01, Global Rate=309.03, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:41:03.073909, device xla:4, step 400, Rate=184.03, Global Rate=309.03, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:41:03.078866, device xla:5, step 400, Rate=184.10, Global Rate=309.03, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:41:03.047598, device xla:6, step 400, Rate=184.12, Global Rate=309.04, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:41:03.064610, device xla:3, step 400, Rate=184.03, Global Rate=309.03, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:41:03.090438, device xla:1, step 400, Rate=183.99, Global Rate=309.02, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:41:03.097718, device xla:7, step 400, Rate=184.13, Global Rate=309.02, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:43:47.753085, device xla:8, step 500, Rate=209.50, Global Rate=309.40, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:43:47.757386, device xla:5, step 500, Rate=209.46, Global Rate=309.40, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:43:47.761833, device xla:1, step 500, Rate=209.38, Global Rate=309.40, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:43:47.767761, device xla:3, step 500, Rate=209.40, Global Rate=309.40, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:43:47.776277, device xla:4, step 500, Rate=209.40, Global Rate=309.39, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:43:47.782746, device xla:2, step 500, Rate=209.37, Global Rate=309.39, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:43:47.790218, device xla:7, step 500, Rate=209.48, Global Rate=309.39, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:43:47.802173, device xla:6, step 500, Rate=209.45, Global Rate=309.38, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:46:31.257004, device xla:1, step 600, Rate=230.13, Global Rate=310.02, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:46:31.266515, device xla:8, step 600, Rate=230.22, Global Rate=310.02, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:46:31.272313, device xla:5, step 600, Rate=230.19, Global Rate=310.01, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:46:31.261394, device xla:7, step 600, Rate=230.22, Global Rate=310.02, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:46:31.274839, device xla:2, step 600, Rate=230.13, Global Rate=310.01, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:46:31.277651, device xla:4, step 600, Rate=230.15, Global Rate=310.01, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:46:31.285943, device xla:6, step 600, Rate=230.19, Global Rate=310.01, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:46:31.292684, device xla:3, step 600, Rate=230.14, Global Rate=310.01, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:49:15.106920, device xla:1, step 700, Rate=246.60, Global Rate=310.37, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:49:15.116739, device xla:4, step 700, Rate=246.62, Global Rate=310.37, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:49:15.122868, device xla:5, step 700, Rate=246.65, Global Rate=310.36, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:49:15.127910, device xla:8, step 700, Rate=246.67, Global Rate=310.36, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:49:15.134205, device xla:3, step 700, Rate=246.61, Global Rate=310.36, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:49:15.146142, device xla:2, step 700, Rate=246.59, Global Rate=310.36, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:49:15.138875, device xla:7, step 700, Rate=246.66, Global Rate=310.36, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:49:15.111518, device xla:6, step 700, Rate=246.66, Global Rate=310.37, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:51:58.066978, device xla:8, step 800, Rate=260.18, Global Rate=310.84, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:51:58.086554, device xla:4, step 800, Rate=260.13, Global Rate=310.84, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:51:58.078185, device xla:3, step 800, Rate=260.13, Global Rate=310.84, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:51:58.093258, device xla:1, step 800, Rate=260.11, Global Rate=310.83, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:51:58.098673, device xla:2, step 800, Rate=260.12, Global Rate=310.83, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:51:58.107716, device xla:5, step 800, Rate=260.15, Global Rate=310.83, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:51:58.101358, device xla:7, step 800, Rate=260.17, Global Rate=310.83, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:51:58.072902, device xla:6, step 800, Rate=260.16, Global Rate=310.84, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:54:45.336943, device xla:8, step 900, Rate=269.36, Global Rate=310.31, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:54:45.349343, device xla:4, step 900, Rate=269.32, Global Rate=310.30, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:54:45.355588, device xla:5, step 900, Rate=269.34, Global Rate=310.30, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:54:45.342711, device xla:2, step 900, Rate=269.32, Global Rate=310.30, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:54:45.366866, device xla:6, step 900, Rate=269.34, Global Rate=310.30, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:54:45.360821, device xla:7, step 900, Rate=269.36, Global Rate=310.30, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:54:45.374524, device xla:1, step 900, Rate=269.30, Global Rate=310.30, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:54:45.379768, device xla:3, step 900, Rate=269.31, Global Rate=310.30, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:57:27.686512, device xla:5, step 1000, Rate=278.56, Global Rate=310.80, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:57:27.691087, device xla:4, step 1000, Rate=278.54, Global Rate=310.80, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:57:27.695454, device xla:8, step 1000, Rate=278.56, Global Rate=310.80, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:57:27.681405, device xla:6, step 1000, Rate=278.56, Global Rate=310.81, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:57:27.721206, device xla:2, step 1000, Rate=278.52, Global Rate=310.80, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 06:57:27.700406, device xla:3, step 1000, Rate=278.53, Global Rate=310.80, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 06:57:27.713146, device xla:7, step 1000, Rate=278.56, Global Rate=310.80, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 06:57:27.725722, device xla:1, step 1000, Rate=278.51, Global Rate=310.80, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 07:00:11.623667, device xla:6, step 1100, Rate=285.31, Global Rate=310.94, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:00:11.629056, device xla:5, step 1100, Rate=285.31, Global Rate=310.94, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:00:11.633826, device xla:1, step 1100, Rate=285.29, Global Rate=310.94, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:00:11.647351, device xla:2, step 1100, Rate=285.28, Global Rate=310.94, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:00:11.658453, device xla:8, step 1100, Rate=285.30, Global Rate=310.94, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:00:11.638430, device xla:3, step 1100, Rate=285.29, Global Rate=310.94, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:00:11.668835, device xla:7, step 1100, Rate=285.30, Global Rate=310.93, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:00:11.675322, device xla:4, step 1100, Rate=285.27, Global Rate=310.93, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:02:54.407914, device xla:1, step 1200, Rate=291.14, Global Rate=311.24, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:02:54.412382, device xla:2, step 1200, Rate=291.14, Global Rate=311.24, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:02:54.428709, device xla:4, step 1200, Rate=291.14, Global Rate=311.23, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:02:54.429904, device xla:5, step 1200, Rate=291.14, Global Rate=311.23, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:02:54.419419, device xla:3, step 1200, Rate=291.14, Global Rate=311.23, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:02:54.441528, device xla:6, step 1200, Rate=291.14, Global Rate=311.23, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:02:54.448967, device xla:8, step 1200, Rate=291.14, Global Rate=311.23, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:02:54.435142, device xla:7, step 1200, Rate=291.15, Global Rate=311.23, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 07:05:38.596133, device xla:7, step 1300, Rate=295.30, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:05:38.601404, device xla:6, step 1300, Rate=295.29, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:05:38.606497, device xla:4, step 1300, Rate=295.28, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:05:38.611725, device xla:1, step 1300, Rate=295.27, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:05:38.617086, device xla:3, step 1300, Rate=295.27, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:05:38.627508, device xla:8, step 1300, Rate=295.29, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:05:38.632366, device xla:2, step 1300, Rate=295.27, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:05:38.644395, device xla:5, step 1300, Rate=295.27, Global Rate=311.28, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:08:20.877273, device xla:1, step 1400, Rate=299.32, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:08:20.896141, device xla:8, step 1400, Rate=299.33, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:08:20.882368, device xla:6, step 1400, Rate=299.33, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:08:20.908621, device xla:2, step 1400, Rate=299.32, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:08:20.902940, device xla:7, step 1400, Rate=299.33, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:08:20.887558, device xla:3, step 1400, Rate=299.32, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:08:20.912113, device xla:4, step 1400, Rate=299.32, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:08:20.917048, device xla:5, step 1400, Rate=299.32, Global Rate=311.58, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:10:59.379078, device xla:4, step 1500, Rate=304.07, Global Rate=312.32, Compiles=107, _local_scalar_dense=1169
training torch.Size([1024, 16])/ 2019-08-27 07:10:59.384849, device xla:1, step 1500, Rate=304.06, Global Rate=312.32, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:10:59.389413, device xla:8, step 1500, Rate=304.08, Global Rate=312.32, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:10:59.395037, device xla:6, step 1500, Rate=304.07, Global Rate=312.32, Compiles=107, _local_scalar_dense=1169
training torch.Size([512, 32])/ 2019-08-27 07:10:59.408244, device xla:5, step 1500, Rate=304.07, Global Rate=312.31, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:10:59.400590, device xla:2, step 1500, Rate=304.06, Global Rate=312.32, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:10:59.415874, device xla:3, step 1500, Rate=304.05, Global Rate=312.31, Compiles=107, _local_scalar_dense=1169
training torch.Size([256, 64])/ 2019-08-27 07:10:59.426185, device xla:7, step 1500, Rate=304.06, Global Rate=312.31, Compiles=107, _local_scalar_dense=1169
Epoch 19 Training stats:
device xla:1
| epoch 019 | loss 0.211 | nll_loss 0.211 | ppl 1.16 | wps 5947 | ups 1 | wpb 11148.162 | bsz 410.554 | num_updates 28652 | lr 0.00018682 | gnorm 0.061 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 42152
device xla:2
| epoch 019 | loss 0.209 | nll_loss 0.209 | ppl 1.16 | wps 5959 | ups 1 | wpb 11170.204 | bsz 408.964 | num_updates 28652 | lr 0.00018682 | gnorm 0.058 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 43400
device xla:3
| epoch 019 | loss 0.211 | nll_loss 0.211 | ppl 1.16 | wps 5925 | ups 1 | wpb 11107.807 | bsz 410.760 | num_updates 28652 | lr 0.00018682 | gnorm 0.063 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 41709
device xla:4
| epoch 019 | loss 0.211 | nll_loss 0.211 | ppl 1.16 | wps 5934 | ups 1 | wpb 11123.943 | bsz 411.617 | num_updates 28652 | lr 0.00018682 | gnorm 0.062 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 43419
device xla:5
| epoch 019 | loss 0.209 | nll_loss 0.209 | ppl 1.16 | wps 5960 | ups 1 | wpb 11173.416 | bsz 410.760 | num_updates 28652 | lr 0.00018682 | gnorm 0.059 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 43262
device xla:6
| epoch 019 | loss 0.211 | nll_loss 0.211 | ppl 1.16 | wps 5947 | ups 1 | wpb 11147.790 | bsz 408.964 | num_updates 28652 | lr 0.00018682 | gnorm 0.060 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 43366
device xla:7
| epoch 019 | loss 0.209 | nll_loss 0.209 | ppl 1.16 | wps 5955 | ups 1 | wpb 11163.635 | bsz 408.312 | num_updates 28652 | lr 0.00018682 | gnorm 0.060 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 42222
device xla:8
| epoch 019 | loss 0.211 | nll_loss 0.211 | ppl 1.16 | wps 5940 | ups 1 | wpb 11134.779 | bsz 408.580 | num_updates 28652 | lr 0.00018682 | gnorm 0.064 | clip 0.000 | oom 0.000 | wall 53711 | train_wall 42306
Epoch 19 Tracker Rates:
Rate=301.72, Global Rate=312.20
Rate=301.79, Global Rate=312.20
Rate=301.84, Global Rate=312.20
Rate=301.70, Global Rate=312.20
Rate=301.82, Global Rate=312.20
Rate=301.77, Global Rate=312.20
Rate=301.89, Global Rate=312.20
Rate=301.75, Global Rate=312.20
Epoch 19 end 2019-08-27 07:11:13.395111
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 231401
Counter: 04d39h18m20s022ms40.463us
ValueRate: 06s232ms959.424us / second
Rate: 5.07279 / second
Percentiles: 1%=01s074ms500.839us; 5%=01s170ms488.222us; 10%=01s175ms640.964us; 20%=01s177ms486.244us; 50%=01s191ms8.093us; 80%=01s288ms838.977us; 90%=01s292ms442.072us; 95%=01s295ms80.845us; 99%=01s299ms379.010us
Metric: InboundData
TotalSamples: 1209
Counter: 2.34KB
ValueRate: 0.05B / second
Rate: 0.0251956 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 949547
Counter: 89.32GB
ValueRate: 499.73KB / second
Rate: 20.5135 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2093191
Counter: 11h16m27s981ms959.336us
ValueRate: 301ms256.675us / second
Rate: 43.2422 / second
Percentiles: 1%=436.320us; 5%=503.941us; 10%=542.882us; 20%=619.366us; 50%=853.910us; 80%=003ms455.789us; 90%=013ms942.730us; 95%=030ms765.769us; 99%=057ms217.231us
Metric: TransferFromServerTime
TotalSamples: 1209
Counter: 08s126ms549.539us
ValueRate: 107.247us / second
Rate: 0.0251956 / second
Percentiles: 1%=594.259us; 5%=653.945us; 10%=699.593us; 20%=753.414us; 50%=995.083us; 80%=003ms767.487us; 90%=008ms73.315us; 95%=032ms34.938us; 99%=054ms464.644us
Metric: TransferToServerTime
TotalSamples: 949547
Counter: 03d04h11m24s556ms681.243us
ValueRate: 05s131ms123.506us / second
Rate: 20.5142 / second
Percentiles: 1%=001ms57.973us; 5%=001ms150.925us; 10%=001ms240.028us; 20%=001ms358.617us; 50%=002ms489.675us; 80%=915ms744.219us; 90%=01s012ms4.952us; 95%=01s077ms144.593us; 99%=01s102ms993.580us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 231294
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 170125545
Counter: CreateXlaTensor
Value: 1109592211
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 170118512
Counter: DestroyXlaTensor
Value: 1109586202
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 170118512
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23652
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1209
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 07:11:17.228761, device xla:8, step 0, Compiles=107, _local_scalar_dense=1209
validation/ 2019-08-27 07:11:17.243186, device xla:5, step 0, Compiles=107, _local_scalar_dense=1209
validation/ 2019-08-27 07:11:17.246257, device xla:7, step 0, Compiles=107, _local_scalar_dense=1209
validation/ 2019-08-27 07:11:17.249287, device xla:1, step 0, Compiles=107, _local_scalar_dense=1209
validation/ 2019-08-27 07:11:17.251857, device xla:4, step 0, Compiles=107, _local_scalar_dense=1209
validation/ 2019-08-27 07:11:17.253891, device xla:2, step 0, Compiles=107, _local_scalar_dense=1209
validation/ 2019-08-27 07:11:17.378745, device xla:6, step 0, Compiles=107, _local_scalar_dense=1209
validation/ 2019-08-27 07:11:17.380961, device xla:3, step 0, Compiles=107, _local_scalar_dense=1209
validation stats on subset "valid" - 2019-08-27 07:11:23.323205
| epoch 019 | valid on 'valid' subset | loss 3.844 | nll_loss 2.031 | ppl 4.09 | num_updates 28652
| epoch 019 | valid on 'valid' subset | loss 3.875 | nll_loss 2.047 | ppl 4.13 | num_updates 28652
| epoch 019 | valid on 'valid' subset | loss 3.953 | nll_loss 2.141 | ppl 4.41 | num_updates 28652
| epoch 019 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 28652
| epoch 019 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 28652
| epoch 019 | valid on 'valid' subset | loss 3.875 | nll_loss 2.078 | ppl 4.22 | num_updates 28652
| epoch 019 | valid on 'valid' subset | loss 3.922 | nll_loss 2.078 | ppl 4.22 | num_updates 28652
| epoch 019 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 28652
old learning rate: 0.0001919389332103661
new learning rate: 0.00018681963909424865
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 231522
Counter: 04d39h19m06s280ms599.622us
ValueRate: 06s147ms127.144us / second
Rate: 5.4592 / second
Percentiles: 1%=377ms53.354us; 5%=378ms441.677us; 10%=392ms610.716us; 20%=01s174ms57.736us; 50%=01s184ms562.876us; 80%=01s287ms206.162us; 90%=01s292ms999.281us; 95%=01s295ms80.845us; 99%=01s299ms379.010us
Metric: InboundData
TotalSamples: 1234
Counter: 2.39KB
ValueRate: 0.05B / second
Rate: 0.0268378 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 949787
Counter: 89.37GB
ValueRate: 945.86KB / second
Rate: 20.6719 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2093808
Counter: 11h17m58s547ms76.560us
ValueRate: 02s734ms978.733us / second
Rate: 53.1633 / second
Percentiles: 1%=441.371us; 5%=507.963us; 10%=552.513us; 20%=624.428us; 50%=801.185us; 80%=001ms336.677us; 90%=022ms184.825us; 95%=375ms717.911us; 99%=389ms608.347us
Metric: TransferFromServerTime
TotalSamples: 1234
Counter: 08s167ms822.582us
ValueRate: 101.515us / second
Rate: 0.0268378 / second
Percentiles: 1%=594.259us; 5%=653.945us; 10%=699.593us; 20%=753.414us; 50%=991.612us; 80%=003ms610.584us; 90%=007ms602.884us; 95%=027ms146.779us; 99%=050ms817.210us
Metric: TransferToServerTime
TotalSamples: 949787
Counter: 03d04h12m51s145ms805.676us
ValueRate: 04s388ms508.270us / second
Rate: 20.6727 / second
Percentiles: 1%=001ms59.886us; 5%=001ms173.959us; 10%=001ms274.979us; 20%=001ms430.205us; 50%=002ms340.108us; 80%=252ms808.837us; 90%=967ms114.071us; 95%=01s046ms479.107us; 99%=01s093ms467.253us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 231415
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 170127154
Counter: CreateXlaTensor
Value: 1109727028
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 170120120
Counter: DestroyXlaTensor
Value: 1109721019
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 170120121
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23652
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1234
Epoch 20 begin 2019-08-27 07:11:23.346369
training torch.Size([256, 64])/ 2019-08-27 07:11:31.504387, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:11:31.522866, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:11:31.544144, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:11:31.601350, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:11:31.640388, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:11:31.669644, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:11:31.809085, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:11:32.384225, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:14:20.504446, device xla:3, step 100, Rate=60.60, Global Rate=294.82, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:14:20.532101, device xla:2, step 100, Rate=60.58, Global Rate=294.77, Compiles=107, _local_scalar_dense=1234
training torch.Size([1024, 16])/ 2019-08-27 07:14:20.524605, device xla:5, step 100, Rate=60.62, Global Rate=294.79, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:14:20.534805, device xla:1, step 100, Rate=60.63, Global Rate=294.77, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:14:20.543328, device xla:8, step 100, Rate=60.89, Global Rate=294.76, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:14:20.509093, device xla:7, step 100, Rate=60.70, Global Rate=294.81, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:14:20.554110, device xla:4, step 100, Rate=60.59, Global Rate=294.74, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:14:20.517253, device xla:6, step 100, Rate=60.65, Global Rate=294.80, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:17:06.719584, device xla:8, step 200, Rate=110.34, Global Rate=301.28, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:17:06.729738, device xla:5, step 200, Rate=110.11, Global Rate=301.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:17:06.756583, device xla:3, step 200, Rate=110.07, Global Rate=301.25, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:17:06.724310, device xla:2, step 200, Rate=110.08, Global Rate=301.28, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:17:06.762180, device xla:6, step 200, Rate=110.11, Global Rate=301.25, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:17:06.738185, device xla:7, step 200, Rate=110.16, Global Rate=301.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:17:06.750787, device xla:4, step 200, Rate=110.08, Global Rate=301.25, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:17:06.732319, device xla:1, step 200, Rate=110.12, Global Rate=301.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:19:51.100644, device xla:2, step 300, Rate=150.36, Global Rate=304.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:19:51.105594, device xla:4, step 300, Rate=150.37, Global Rate=304.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:19:51.110668, device xla:8, step 300, Rate=150.56, Global Rate=304.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:19:51.124937, device xla:6, step 300, Rate=150.39, Global Rate=304.59, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:19:51.115505, device xla:5, step 300, Rate=150.38, Global Rate=304.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:19:51.128013, device xla:1, step 300, Rate=150.38, Global Rate=304.59, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:19:51.154727, device xla:3, step 300, Rate=150.35, Global Rate=304.57, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:19:51.143587, device xla:7, step 300, Rate=150.41, Global Rate=304.58, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:22:37.235578, device xla:4, step 400, Rate=181.94, Global Rate=305.49, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:22:37.242091, device xla:6, step 400, Rate=181.96, Global Rate=305.49, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:22:37.229375, device xla:1, step 400, Rate=181.96, Global Rate=305.49, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:22:37.273628, device xla:5, step 400, Rate=181.93, Global Rate=305.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([1024, 16])/ 2019-08-27 07:22:37.252697, device xla:3, step 400, Rate=181.93, Global Rate=305.48, Compiles=107, _local_scalar_dense=1234
training torch.Size([1024, 16])/ 2019-08-27 07:22:37.259372, device xla:8, step 400, Rate=182.08, Global Rate=305.48, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:22:37.282187, device xla:7, step 400, Rate=181.97, Global Rate=305.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:22:37.264915, device xla:2, step 400, Rate=181.91, Global Rate=305.48, Compiles=107, _local_scalar_dense=1234
training torch.Size([1024, 16])/ 2019-08-27 07:25:22.806147, device xla:3, step 500, Rate=207.39, Global Rate=306.23, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:25:22.821228, device xla:4, step 500, Rate=207.39, Global Rate=306.23, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:25:22.843318, device xla:6, step 500, Rate=207.40, Global Rate=306.22, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:25:22.827418, device xla:5, step 500, Rate=207.40, Global Rate=306.22, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:25:22.811306, device xla:7, step 500, Rate=207.44, Global Rate=306.23, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:25:22.800441, device xla:2, step 500, Rate=207.39, Global Rate=306.23, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:25:22.836294, device xla:8, step 500, Rate=207.51, Global Rate=306.22, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:25:22.846863, device xla:1, step 500, Rate=207.39, Global Rate=306.22, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:28:08.859828, device xla:8, step 600, Rate=227.68, Global Rate=306.58, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:28:08.871432, device xla:4, step 600, Rate=227.58, Global Rate=306.58, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:28:08.893334, device xla:6, step 600, Rate=227.59, Global Rate=306.57, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:28:08.864402, device xla:5, step 600, Rate=227.59, Global Rate=306.58, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:28:08.878726, device xla:1, step 600, Rate=227.59, Global Rate=306.57, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:28:08.900573, device xla:7, step 600, Rate=227.60, Global Rate=306.57, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:28:08.887195, device xla:2, step 600, Rate=227.57, Global Rate=306.57, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:28:08.912811, device xla:3, step 600, Rate=227.56, Global Rate=306.56, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:30:52.864106, device xla:3, step 700, Rate=244.51, Global Rate=307.37, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:30:52.880137, device xla:6, step 700, Rate=244.51, Global Rate=307.36, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:30:52.894958, device xla:5, step 700, Rate=244.50, Global Rate=307.36, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:30:52.874525, device xla:4, step 700, Rate=244.50, Global Rate=307.37, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:30:52.882617, device xla:8, step 700, Rate=244.58, Global Rate=307.36, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:30:52.868940, device xla:1, step 700, Rate=244.51, Global Rate=307.37, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:30:52.887713, device xla:2, step 700, Rate=244.49, Global Rate=307.36, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:30:52.902586, device xla:7, step 700, Rate=244.52, Global Rate=307.36, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:33:37.663828, device xla:2, step 800, Rate=257.74, Global Rate=307.78, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:33:37.669264, device xla:7, step 800, Rate=257.76, Global Rate=307.78, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:33:37.658421, device xla:1, step 800, Rate=257.75, Global Rate=307.78, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:33:37.679732, device xla:3, step 800, Rate=257.74, Global Rate=307.78, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:33:37.694138, device xla:8, step 800, Rate=257.79, Global Rate=307.77, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:33:37.700915, device xla:4, step 800, Rate=257.73, Global Rate=307.77, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:33:37.707201, device xla:6, step 800, Rate=257.74, Global Rate=307.77, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:33:37.685883, device xla:5, step 800, Rate=257.74, Global Rate=307.77, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:36:22.547961, device xla:1, step 900, Rate=268.30, Global Rate=308.08, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:36:22.553598, device xla:4, step 900, Rate=268.30, Global Rate=308.08, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:36:22.567280, device xla:2, step 900, Rate=268.29, Global Rate=308.08, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:36:22.573241, device xla:3, step 900, Rate=268.29, Global Rate=308.08, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:36:22.585063, device xla:8, step 900, Rate=268.34, Global Rate=308.07, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:36:22.579664, device xla:7, step 900, Rate=268.31, Global Rate=308.07, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:36:22.588794, device xla:5, step 900, Rate=268.29, Global Rate=308.07, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:36:22.559374, device xla:6, step 900, Rate=268.31, Global Rate=308.08, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:39:07.702910, device xla:3, step 1000, Rate=276.64, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:39:07.707457, device xla:8, step 1000, Rate=276.68, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:39:07.718352, device xla:6, step 1000, Rate=276.65, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:39:07.720257, device xla:2, step 1000, Rate=276.63, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:39:07.712388, device xla:4, step 1000, Rate=276.64, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:39:07.725925, device xla:5, step 1000, Rate=276.64, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:39:07.743938, device xla:7, step 1000, Rate=276.64, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:39:07.733716, device xla:1, step 1000, Rate=276.63, Global Rate=308.27, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:41:52.598083, device xla:2, step 1100, Rate=283.41, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:41:52.603297, device xla:3, step 1100, Rate=283.41, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:41:52.615512, device xla:6, step 1100, Rate=283.42, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:41:52.632327, device xla:1, step 1100, Rate=283.41, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:41:52.617555, device xla:7, step 1100, Rate=283.42, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:41:52.638623, device xla:4, step 1100, Rate=283.40, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:41:52.651006, device xla:8, step 1100, Rate=283.43, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:41:52.607973, device xla:5, step 1100, Rate=283.42, Global Rate=308.47, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:44:36.894639, device xla:4, step 1200, Rate=289.06, Global Rate=308.74, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:44:36.907994, device xla:8, step 1200, Rate=289.08, Global Rate=308.73, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:44:36.930598, device xla:6, step 1200, Rate=289.05, Global Rate=308.73, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:44:36.899670, device xla:7, step 1200, Rate=289.07, Global Rate=308.73, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:44:36.932994, device xla:1, step 1200, Rate=289.05, Global Rate=308.73, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:44:36.924211, device xla:3, step 1200, Rate=289.05, Global Rate=308.73, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:44:36.914516, device xla:2, step 1200, Rate=289.05, Global Rate=308.73, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:44:36.937919, device xla:5, step 1200, Rate=289.05, Global Rate=308.73, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:47:21.484159, device xla:3, step 1300, Rate=293.46, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:47:21.494186, device xla:7, step 1300, Rate=293.47, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:47:21.504243, device xla:5, step 1300, Rate=293.46, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:47:21.512391, device xla:6, step 1300, Rate=293.46, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:47:21.488780, device xla:4, step 1300, Rate=293.46, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:47:21.506270, device xla:2, step 1300, Rate=293.45, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:47:21.517331, device xla:8, step 1300, Rate=293.48, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([1024, 16])/ 2019-08-27 07:47:21.523914, device xla:1, step 1300, Rate=293.45, Global Rate=308.91, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:50:04.310343, device xla:3, step 1400, Rate=297.66, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:50:04.324228, device xla:5, step 1400, Rate=297.66, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:50:04.338911, device xla:2, step 1400, Rate=297.65, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:50:04.344986, device xla:8, step 1400, Rate=297.67, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:50:04.332219, device xla:1, step 1400, Rate=297.66, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:50:04.352364, device xla:6, step 1400, Rate=297.65, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:50:04.314833, device xla:7, step 1400, Rate=297.67, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:50:04.326123, device xla:4, step 1400, Rate=297.66, Global Rate=309.30, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:52:47.478933, device xla:3, step 1500, Rate=300.89, Global Rate=309.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:52:47.492675, device xla:8, step 1500, Rate=300.90, Global Rate=309.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:52:47.498532, device xla:6, step 1500, Rate=300.89, Global Rate=309.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([1024, 16])/ 2019-08-27 07:52:47.473525, device xla:4, step 1500, Rate=300.89, Global Rate=309.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([256, 64])/ 2019-08-27 07:52:47.468189, device xla:1, step 1500, Rate=300.90, Global Rate=309.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:52:47.484087, device xla:7, step 1500, Rate=300.89, Global Rate=309.60, Compiles=107, _local_scalar_dense=1234
training torch.Size([512, 32])/ 2019-08-27 07:52:47.511212, device xla:5, step 1500, Rate=300.88, Global Rate=309.59, Compiles=107, _local_scalar_dense=1234
training torch.Size([1024, 16])/ 2019-08-27 07:52:47.500738, device xla:2, step 1500, Rate=300.88, Global Rate=309.59, Compiles=107, _local_scalar_dense=1234
Epoch 20 Training stats:
device xla:1
| epoch 020 | loss 0.200 | nll_loss 0.200 | ppl 1.15 | wps 5980 | ups 1 | wpb 11146.971 | bsz 410.822 | num_updates 30160 | lr 0.000182089 | gnorm 0.058 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 44201
device xla:2
| epoch 020 | loss 0.199 | nll_loss 0.199 | ppl 1.15 | wps 5990 | ups 1 | wpb 11166.434 | bsz 409.176 | num_updates 30160 | lr 0.000182089 | gnorm 0.055 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 45467
device xla:3
| epoch 020 | loss 0.200 | nll_loss 0.200 | ppl 1.15 | wps 5957 | ups 1 | wpb 11104.990 | bsz 410.916 | num_updates 30160 | lr 0.000182089 | gnorm 0.060 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 43764
device xla:4
| epoch 020 | loss 0.200 | nll_loss 0.200 | ppl 1.15 | wps 5968 | ups 1 | wpb 11123.705 | bsz 411.034 | num_updates 30160 | lr 0.000182089 | gnorm 0.059 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 45476
device xla:5
| epoch 020 | loss 0.199 | nll_loss 0.199 | ppl 1.15 | wps 5995 | ups 1 | wpb 11175.191 | bsz 410.848 | num_updates 30160 | lr 0.000182089 | gnorm 0.056 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 45309
device xla:6
| epoch 020 | loss 0.200 | nll_loss 0.200 | ppl 1.15 | wps 5981 | ups 1 | wpb 11148.722 | bsz 408.921 | num_updates 30160 | lr 0.000182089 | gnorm 0.057 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 45412
device xla:7
| epoch 020 | loss 0.199 | nll_loss 0.199 | ppl 1.15 | wps 5992 | ups 1 | wpb 11169.026 | bsz 408.598 | num_updates 30160 | lr 0.000182089 | gnorm 0.057 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 44271
device xla:8
| epoch 020 | loss 0.200 | nll_loss 0.200 | ppl 1.15 | wps 5973 | ups 1 | wpb 11134.745 | bsz 408.199 | num_updates 30160 | lr 0.000182089 | gnorm 0.061 | clip 0.000 | oom 0.000 | wall 56219 | train_wall 44361
Epoch 20 Tracker Rates:
Rate=297.78, Global Rate=309.46
Rate=297.89, Global Rate=309.46
Rate=297.81, Global Rate=309.46
Rate=297.79, Global Rate=309.46
Rate=297.93, Global Rate=309.46
Rate=297.89, Global Rate=309.46
Rate=297.83, Global Rate=309.46
Rate=297.88, Global Rate=309.46
Epoch 20 end 2019-08-27 07:53:01.825219
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 243586
Counter: 04d49h03m43s588ms973.440us
ValueRate: 06s106ms934.063us / second
Rate: 4.95619 / second
Percentiles: 1%=01s066ms283.539us; 5%=01s168ms909.536us; 10%=01s171ms460.999us; 20%=01s175ms432.729us; 50%=01s275ms385.466us; 80%=01s288ms900.207us; 90%=01s291ms962.338us; 95%=01s293ms361.390us; 99%=01s301ms333.110us
Metric: InboundData
TotalSamples: 1274
Counter: 2.47KB
ValueRate: 0.05B / second
Rate: 0.025195 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 999522
Counter: 93.86GB
ValueRate: 489.29KB / second
Rate: 20.0464 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2203056
Counter: 11h02m58s200ms677.297us
ValueRate: 408ms60.401us / second
Rate: 44.7626 / second
Percentiles: 1%=469.656us; 5%=547.287us; 10%=590.684us; 20%=661.477us; 50%=888.740us; 80%=003ms663.127us; 90%=010ms441.544us; 95%=028ms873.324us; 99%=062ms581.526us
Metric: TransferFromServerTime
TotalSamples: 1274
Counter: 08s236ms75.440us
ValueRate: 83.585us / second
Rate: 0.025195 / second
Percentiles: 1%=594.259us; 5%=652.040us; 10%=697.034us; 20%=751.790us; 50%=988.463us; 80%=003ms526.533us; 90%=006ms128.602us; 95%=010ms920.459us; 99%=047ms745.032us
Metric: TransferToServerTime
TotalSamples: 999522
Counter: 03d13h23m17s288ms334.609us
ValueRate: 05s982ms538.601us / second
Rate: 20.4565 / second
Percentiles: 1%=001ms77.640us; 5%=001ms195.104us; 10%=001ms295.077us; 20%=001ms442.095us; 50%=002ms186.611us; 80%=897ms689.451us; 90%=01s008ms97.193us; 95%=01s053ms271.800us; 99%=01s094ms96.749us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 243479
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 179079513
Counter: CreateXlaTensor
Value: 1167998672
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 179072480
Counter: DestroyXlaTensor
Value: 1167992663
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 179072480
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23712
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1274
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 07:53:05.845088, device xla:1, step 0, Compiles=107, _local_scalar_dense=1274
validation/ 2019-08-27 07:53:05.854512, device xla:2, step 0, Compiles=107, _local_scalar_dense=1274
validation/ 2019-08-27 07:53:05.863543, device xla:5, step 0, Compiles=107, _local_scalar_dense=1274
validation/ 2019-08-27 07:53:05.866281, device xla:6, step 0, Compiles=107, _local_scalar_dense=1274
validation/ 2019-08-27 07:53:05.869404, device xla:3, step 0, Compiles=107, _local_scalar_dense=1274
validation/ 2019-08-27 07:53:05.871280, device xla:4, step 0, Compiles=107, _local_scalar_dense=1274
validation/ 2019-08-27 07:53:05.999428, device xla:8, step 0, Compiles=107, _local_scalar_dense=1274
validation/ 2019-08-27 07:53:06.001075, device xla:7, step 0, Compiles=107, _local_scalar_dense=1274
validation stats on subset "valid" - 2019-08-27 07:53:11.920577
| epoch 020 | valid on 'valid' subset | loss 3.844 | nll_loss 2.031 | ppl 4.09 | num_updates 30160
| epoch 020 | valid on 'valid' subset | loss 3.875 | nll_loss 2.047 | ppl 4.13 | num_updates 30160
| epoch 020 | valid on 'valid' subset | loss 3.953 | nll_loss 2.141 | ppl 4.41 | num_updates 30160
| epoch 020 | valid on 'valid' subset | loss 3.922 | nll_loss 2.156 | ppl 4.46 | num_updates 30160
| epoch 020 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 30160
| epoch 020 | valid on 'valid' subset | loss 3.906 | nll_loss 2.078 | ppl 4.22 | num_updates 30160
| epoch 020 | valid on 'valid' subset | loss 3.922 | nll_loss 2.078 | ppl 4.22 | num_updates 30160
| epoch 020 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 30160
old learning rate: 0.00018681963909424865
new learning rate: 0.00018208926018230742
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 243707
Counter: 04d49h03m29s605ms802.897us
ValueRate: 06s014ms791.117us / second
Rate: 5.31183 / second
Percentiles: 1%=377ms811.196us; 5%=379ms879.957us; 10%=392ms500.488us; 20%=01s170ms405.645us; 50%=01s190ms965.080us; 80%=01s287ms90.148us; 90%=01s290ms83.635us; 95%=01s292ms236.338us; 99%=01s297ms185.559us
Metric: InboundData
TotalSamples: 1299
Counter: 2.52KB
ValueRate: 0.05B / second
Rate: 0.0268239 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 999762
Counter: 93.91GB
ValueRate: 925.52KB / second
Rate: 20.2272 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2203686
Counter: 11h02m29s382ms460.099us
ValueRate: 02s691ms963.107us / second
Rate: 52.4266 / second
Percentiles: 1%=442.176us; 5%=490.150us; 10%=518.104us; 20%=584.438us; 50%=777.224us; 80%=001ms336.523us; 90%=028ms143.110us; 95%=376ms508.720us; 99%=388ms434.141us
Metric: TransferFromServerTime
TotalSamples: 1299
Counter: 08s275ms808.320us
ValueRate: 81.448us / second
Rate: 0.0268239 / second
Percentiles: 1%=593.996us; 5%=651.566us; 10%=694.895us; 20%=751.722us; 50%=985.738us; 80%=002ms494.642us; 90%=005ms703.116us; 95%=009ms374.125us; 99%=043ms876.650us
Metric: TransferToServerTime
TotalSamples: 999762
Counter: 03d13h24m43s417ms733.075us
ValueRate: 04s177ms122.732us / second
Rate: 20.2273 / second
Percentiles: 1%=001ms87.322us; 5%=001ms256.807us; 10%=001ms359.619us; 20%=001ms489.172us; 50%=002ms166.552us; 80%=247ms701.749us; 90%=957ms752.890us; 95%=01s029ms663.104us; 99%=01s083ms368.187us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 243600
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 179081122
Counter: CreateXlaTensor
Value: 1168133489
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 179074088
Counter: DestroyXlaTensor
Value: 1168127480
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 179074089
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23712
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1299
Epoch 21 begin 2019-08-27 07:53:11.939430
training torch.Size([1024, 16])/ 2019-08-27 07:53:20.530377, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 07:53:20.608689, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 07:53:20.670089, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:53:20.733397, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 07:53:20.758096, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:53:20.979655, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:53:21.325578, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:53:21.433273, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:56:12.174880, device xla:5, step 100, Rate=59.73, Global Rate=289.88, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:56:12.179569, device xla:8, step 100, Rate=59.97, Global Rate=289.88, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:56:12.201651, device xla:1, step 100, Rate=59.65, Global Rate=289.84, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 07:56:12.186126, device xla:7, step 100, Rate=59.93, Global Rate=289.87, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:56:12.193005, device xla:6, step 100, Rate=59.81, Global Rate=289.85, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 07:56:12.205891, device xla:4, step 100, Rate=59.70, Global Rate=289.83, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:56:12.236286, device xla:3, step 100, Rate=59.66, Global Rate=289.78, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 07:56:12.221479, device xla:2, step 100, Rate=59.72, Global Rate=289.81, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 07:58:57.233060, device xla:7, step 200, Rate=109.99, Global Rate=299.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:58:57.237959, device xla:6, step 200, Rate=109.89, Global Rate=299.69, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:58:57.252891, device xla:8, step 200, Rate=110.01, Global Rate=299.68, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:58:57.244586, device xla:4, step 200, Rate=109.80, Global Rate=299.68, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:58:57.266187, device xla:2, step 200, Rate=109.82, Global Rate=299.66, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:58:57.272655, device xla:3, step 200, Rate=109.78, Global Rate=299.66, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 07:58:57.285232, device xla:5, step 200, Rate=109.80, Global Rate=299.65, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 07:58:57.258500, device xla:1, step 200, Rate=109.76, Global Rate=299.67, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:01:42.412489, device xla:7, step 300, Rate=149.98, Global Rate=303.04, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:01:42.417369, device xla:8, step 300, Rate=150.01, Global Rate=303.04, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:01:42.424140, device xla:1, step 300, Rate=149.81, Global Rate=303.03, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:01:42.440768, device xla:6, step 300, Rate=149.90, Global Rate=303.03, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:01:42.449819, device xla:3, step 300, Rate=149.82, Global Rate=303.02, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:01:42.428206, device xla:4, step 300, Rate=149.83, Global Rate=303.03, Compiles=107, _local_scalar_dense=1299training torch.Size([256, 64])/ 2019-08-27 08:01:42.453400, device xla:5, step 300, Rate=149.84, Global Rate=303.02, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:01:42.463319, device xla:2, step 300, Rate=149.84, Global Rate=303.01, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:04:28.409759, device xla:3, step 400, Rate=181.55, Global Rate=304.37, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:04:28.420672, device xla:1, step 400, Rate=181.53, Global Rate=304.37, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:04:28.412360, device xla:5, step 400, Rate=181.57, Global Rate=304.37, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:04:28.423939, device xla:8, step 400, Rate=181.69, Global Rate=304.37, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:04:28.442580, device xla:2, step 400, Rate=181.57, Global Rate=304.36, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:04:28.402162, device xla:6, step 400, Rate=181.62, Global Rate=304.38, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:04:28.396766, device xla:7, step 400, Rate=181.68, Global Rate=304.38, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:04:28.431175, device xla:4, step 400, Rate=181.55, Global Rate=304.36, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:07:15.777136, device xla:3, step 500, Rate=206.43, Global Rate=304.68, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:07:15.780577, device xla:4, step 500, Rate=206.43, Global Rate=304.68, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:07:15.771279, device xla:7, step 500, Rate=206.52, Global Rate=304.68, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:07:15.765807, device xla:8, step 500, Rate=206.54, Global Rate=304.68, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:07:15.792288, device xla:2, step 500, Rate=206.44, Global Rate=304.67, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:07:15.799762, device xla:5, step 500, Rate=206.43, Global Rate=304.67, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:07:15.825756, device xla:6, step 500, Rate=206.46, Global Rate=304.66, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:07:15.807848, device xla:1, step 500, Rate=206.40, Global Rate=304.67, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:10:04.760215, device xla:7, step 600, Rate=225.81, Global Rate=304.40, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:10:04.768426, device xla:1, step 600, Rate=225.73, Global Rate=304.39, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:10:04.741036, device xla:6, step 600, Rate=225.79, Global Rate=304.40, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:10:04.747632, device xla:8, step 600, Rate=225.83, Global Rate=304.40, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:10:04.753326, device xla:5, step 600, Rate=225.76, Global Rate=304.40, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:10:04.793548, device xla:3, step 600, Rate=225.73, Global Rate=304.39, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:10:04.772341, device xla:2, step 600, Rate=225.75, Global Rate=304.39, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:10:04.779349, device xla:4, step 600, Rate=225.74, Global Rate=304.39, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:12:51.529994, device xla:1, step 700, Rate=241.99, Global Rate=304.77, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:12:51.517452, device xla:8, step 700, Rate=242.07, Global Rate=304.77, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:12:51.555688, device xla:3, step 700, Rate=241.99, Global Rate=304.76, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:12:51.533290, device xla:2, step 700, Rate=242.01, Global Rate=304.77, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:12:51.540275, device xla:7, step 700, Rate=242.05, Global Rate=304.77, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:12:51.522852, device xla:6, step 700, Rate=242.03, Global Rate=304.77, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:12:51.564028, device xla:4, step 700, Rate=241.99, Global Rate=304.76, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:12:51.548742, device xla:5, step 700, Rate=242.00, Global Rate=304.76, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:15:41.037345, device xla:3, step 800, Rate=254.01, Global Rate=304.43, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:15:41.040508, device xla:1, step 800, Rate=254.00, Global Rate=304.42, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:15:41.029135, device xla:5, step 800, Rate=254.02, Global Rate=304.43, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:15:41.023274, device xla:8, step 800, Rate=254.07, Global Rate=304.43, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:15:41.017822, device xla:7, step 800, Rate=254.06, Global Rate=304.43, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:15:41.056204, device xla:6, step 800, Rate=254.02, Global Rate=304.42, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:15:41.043951, device xla:4, step 800, Rate=254.01, Global Rate=304.42, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:15:41.064535, device xla:2, step 800, Rate=254.01, Global Rate=304.42, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:18:28.543788, device xla:1, step 900, Rate=264.33, Global Rate=304.56, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:18:28.535944, device xla:5, step 900, Rate=264.35, Global Rate=304.56, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:18:28.547570, device xla:4, step 900, Rate=264.34, Global Rate=304.56, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:18:28.560114, device xla:8, step 900, Rate=264.37, Global Rate=304.56, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:18:28.530443, device xla:7, step 900, Rate=264.38, Global Rate=304.56, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:18:28.570621, device xla:3, step 900, Rate=264.33, Global Rate=304.56, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:18:28.594002, device xla:2, step 900, Rate=264.33, Global Rate=304.55, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:18:28.586736, device xla:6, step 900, Rate=264.34, Global Rate=304.55, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:21:14.462568, device xla:3, step 1000, Rate=273.19, Global Rate=304.96, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:21:14.454514, device xla:6, step 1000, Rate=273.21, Global Rate=304.96, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:21:14.448803, device xla:8, step 1000, Rate=273.23, Global Rate=304.96, Compiles=107, _local_scalar_dense=1299training torch.Size([256, 64])/ 2019-08-27 08:21:14.465503, device xla:1, step 1000, Rate=273.18, Global Rate=304.96, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:21:14.474923, device xla:4, step 1000, Rate=273.19, Global Rate=304.96, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:21:14.487181, device xla:2, step 1000, Rate=273.19, Global Rate=304.96, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:21:14.443252, device xla:7, step 1000, Rate=273.22, Global Rate=304.96, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:21:14.496468, device xla:5, step 1000, Rate=273.18, Global Rate=304.95, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:24:01.600331, device xla:3, step 1100, Rate=279.82, Global Rate=305.08, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:24:01.582158, device xla:1, step 1100, Rate=279.82, Global Rate=305.09, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:24:01.576404, device xla:7, step 1100, Rate=279.85, Global Rate=305.09, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:24:01.591972, device xla:5, step 1100, Rate=279.82, Global Rate=305.09, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:24:01.604138, device xla:6, step 1100, Rate=279.83, Global Rate=305.08, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:24:01.611418, device xla:2, step 1100, Rate=279.82, Global Rate=305.08, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:24:01.618706, device xla:4, step 1100, Rate=279.81, Global Rate=305.08, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:24:01.635613, device xla:8, step 1100, Rate=279.83, Global Rate=305.08, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:26:56.380281, device xla:1, step 1200, Rate=282.44, Global Rate=304.03, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:26:56.350225, device xla:6, step 1200, Rate=282.46, Global Rate=304.04, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:26:56.344971, device xla:8, step 1200, Rate=282.48, Global Rate=304.04, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:26:56.395688, device xla:5, step 1200, Rate=282.44, Global Rate=304.03, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:26:56.356748, device xla:3, step 1200, Rate=282.45, Global Rate=304.04, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:26:56.384351, device xla:4, step 1200, Rate=282.44, Global Rate=304.03, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:26:56.366192, device xla:2, step 1200, Rate=282.46, Global Rate=304.04, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:26:56.373125, device xla:7, step 1200, Rate=282.46, Global Rate=304.03, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:29:43.219658, device xla:6, step 1300, Rate=287.34, Global Rate=304.25, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:29:43.226266, device xla:3, step 1300, Rate=287.33, Global Rate=304.25, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:29:43.267943, device xla:1, step 1300, Rate=287.31, Global Rate=304.24, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:29:43.235610, device xla:2, step 1300, Rate=287.33, Global Rate=304.25, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:29:43.242438, device xla:7, step 1300, Rate=287.33, Global Rate=304.25, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:29:43.214374, device xla:8, step 1300, Rate=287.35, Global Rate=304.25, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:29:43.255266, device xla:4, step 1300, Rate=287.32, Global Rate=304.25, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:29:43.248485, device xla:5, step 1300, Rate=287.32, Global Rate=304.25, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:32:28.022161, device xla:6, step 1400, Rate=292.00, Global Rate=304.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:32:28.035139, device xla:7, step 1400, Rate=292.00, Global Rate=304.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:32:28.054418, device xla:3, step 1400, Rate=291.99, Global Rate=304.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:32:28.063906, device xla:4, step 1400, Rate=291.99, Global Rate=304.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:32:28.046928, device xla:2, step 1400, Rate=292.00, Global Rate=304.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:32:28.028548, device xla:5, step 1400, Rate=292.00, Global Rate=304.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:32:28.041109, device xla:8, step 1400, Rate=292.00, Global Rate=304.70, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:32:28.077566, device xla:1, step 1400, Rate=291.98, Global Rate=304.69, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:35:11.893302, device xla:8, step 1500, Rate=296.10, Global Rate=305.21, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:35:11.880433, device xla:7, step 1500, Rate=296.10, Global Rate=305.21, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:35:11.886095, device xla:2, step 1500, Rate=296.10, Global Rate=305.21, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:35:11.912145, device xla:3, step 1500, Rate=296.08, Global Rate=305.20, Compiles=107, _local_scalar_dense=1299
training torch.Size([512, 32])/ 2019-08-27 08:35:11.915171, device xla:1, step 1500, Rate=296.08, Global Rate=305.20, Compiles=107, _local_scalar_dense=1299
training torch.Size([1024, 16])/ 2019-08-27 08:35:11.897879, device xla:4, step 1500, Rate=296.09, Global Rate=305.20, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:35:11.926679, device xla:6, step 1500, Rate=296.08, Global Rate=305.20, Compiles=107, _local_scalar_dense=1299
training torch.Size([256, 64])/ 2019-08-27 08:35:11.935710, device xla:5, step 1500, Rate=296.08, Global Rate=305.20, Compiles=107, _local_scalar_dense=1299
Epoch 21 Training stats:
device xla:1
| epoch 021 | loss 0.190 | nll_loss 0.190 | ppl 1.14 | wps 6004 | ups 1 | wpb 11141.662 | bsz 411.178 | num_updates 31668 | lr 0.000177701 | gnorm 0.055 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 46259
device xla:2
| epoch 021 | loss 0.189 | nll_loss 0.189 | ppl 1.14 | wps 6016 | ups 1 | wpb 11163.184 | bsz 408.931 | num_updates 31668 | lr 0.000177701 | gnorm 0.053 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 47528
device xla:3
| epoch 021 | loss 0.190 | nll_loss 0.190 | ppl 1.14 | wps 5987 | ups 1 | wpb 11109.295 | bsz 411.024 | num_updates 31668 | lr 0.000177701 | gnorm 0.057 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 45823
device xla:4
| epoch 021 | loss 0.190 | nll_loss 0.190 | ppl 1.14 | wps 5995 | ups 1 | wpb 11125.336 | bsz 411.178 | num_updates 31668 | lr 0.000177701 | gnorm 0.056 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 47535
device xla:5
| epoch 021 | loss 0.189 | nll_loss 0.189 | ppl 1.14 | wps 6023 | ups 1 | wpb 11176.191 | bsz 410.661 | num_updates 31668 | lr 0.000177701 | gnorm 0.054 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 47380
device xla:6
| epoch 021 | loss 0.190 | nll_loss 0.190 | ppl 1.14 | wps 6009 | ups 1 | wpb 11150.119 | bsz 409.165 | num_updates 31668 | lr 0.000177701 | gnorm 0.054 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 47482
device xla:7
| epoch 021 | loss 0.189 | nll_loss 0.189 | ppl 1.14 | wps 6019 | ups 1 | wpb 11169.813 | bsz 408.163 | num_updates 31668 | lr 0.000177701 | gnorm 0.054 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 46348
device xla:8
| epoch 021 | loss 0.190 | nll_loss 0.190 | ppl 1.14 | wps 6000 | ups 1 | wpb 11134.188 | bsz 408.219 | num_updates 31668 | lr 0.000177701 | gnorm 0.058 | clip 0.000 | oom 0.000 | wall 58764 | train_wall 46436
Epoch 21 Tracker Rates:
Rate=293.69, Global Rate=305.08
Rate=293.58, Global Rate=305.08
Rate=293.67, Global Rate=305.08
Rate=293.63, Global Rate=305.08
Rate=293.76, Global Rate=305.08
Rate=293.73, Global Rate=305.08
Rate=293.57, Global Rate=305.08
Rate=293.61, Global Rate=305.08
Epoch 21 end 2019-08-27 08:35:26.332443
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 255771
Counter: 04d59h11m26s587ms42.785us
ValueRate: 06s075ms950.553us / second
Rate: 4.92078 / second
Percentiles: 1%=01s166ms752.546us; 5%=01s173ms108.047us; 10%=01s176ms845.439us; 20%=01s180ms528.605us; 50%=01s275ms826.431us; 80%=01s291ms900.975us; 90%=01s294ms897.445us; 95%=01s295ms111.122us; 99%=01s300ms833.570us
Metric: InboundData
TotalSamples: 1339
Counter: 2.60KB
ValueRate: 0.05B / second
Rate: 0.0251602 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1049537
Counter: 98.40GB
ValueRate: 490.00KB / second
Rate: 20.0722 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2314253
Counter: 12h12m01s376ms679.064us
ValueRate: 390ms796.455us / second
Rate: 41.7636 / second
Percentiles: 1%=455.426us; 5%=509.839us; 10%=558.485us; 20%=631.236us; 50%=859.080us; 80%=003ms945.374us; 90%=013ms525.170us; 95%=028ms383.182us; 99%=061ms959.240us
Metric: TransferFromServerTime
TotalSamples: 1339
Counter: 08s341ms16.702us
ValueRate: 74.803us / second
Rate: 0.0251602 / second
Percentiles: 1%=593.996us; 5%=650.867us; 10%=694.439us; 20%=748.224us; 50%=984.423us; 80%=002ms490.312us; 90%=005ms748.393us; 95%=009ms374.125us; 99%=043ms876.650us
Metric: TransferToServerTime
TotalSamples: 1049537
Counter: 03d21h08m59s096ms476.447us
ValueRate: 05s923ms839.786us / second
Rate: 20.0724 / second
Percentiles: 1%=001ms58.912us; 5%=001ms188.151us; 10%=001ms276.363us; 20%=001ms403.390us; 50%=002ms471.711us; 80%=905ms940.228us; 90%=990ms516.343us; 95%=01s058ms899.523us; 99%=01s071ms770.674us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 255664
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 188033521
Counter: CreateXlaTensor
Value: 1226405133
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 188025052
Counter: DestroyXlaTensor
Value: 1226399124
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 188026488
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23787
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1339
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 08:35:30.781337, device xla:5, step 0, Compiles=107, _local_scalar_dense=1339
validation/ 2019-08-27 08:35:30.788810, device xla:3, step 0, Compiles=107, _local_scalar_dense=1339
validation/ 2019-08-27 08:35:30.795736, device xla:1, step 0, Compiles=107, _local_scalar_dense=1339
validation/ 2019-08-27 08:35:30.799335, device xla:7, step 0, Compiles=107, _local_scalar_dense=1339
validation/ 2019-08-27 08:35:30.801441, device xla:4, step 0, Compiles=107, _local_scalar_dense=1339
validation/ 2019-08-27 08:35:30.941608, device xla:8, step 0, Compiles=107, _local_scalar_dense=1339
validation/ 2019-08-27 08:35:30.949588, device xla:6, step 0, Compiles=107, _local_scalar_dense=1339
validation/ 2019-08-27 08:35:30.955365, device xla:2, step 0, Compiles=107, _local_scalar_dense=1339
validation stats on subset "valid" - 2019-08-27 08:35:36.892444
| epoch 021 | valid on 'valid' subset | loss 3.828 | nll_loss 2.031 | ppl 4.09 | num_updates 31668
| epoch 021 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 31668
| epoch 021 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 31668
| epoch 021 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 31668
| epoch 021 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 31668
| epoch 021 | valid on 'valid' subset | loss 3.875 | nll_loss 2.078 | ppl 4.22 | num_updates 31668
| epoch 021 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 31668
| epoch 021 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 31668
old learning rate: 0.00018208926018230742
new learning rate: 0.00017770092229505826
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 255892
Counter: 04d60h12m12s579ms882.974us
ValueRate: 06s970ms970.202us / second
Rate: 5.26963 / second
Percentiles: 1%=377ms872.046us; 5%=378ms274.944us; 10%=392ms570.952us; 20%=01s175ms192.000us; 50%=01s187ms692.384us; 80%=01s290ms96.717us; 90%=01s293ms478.351us; 95%=01s295ms983.038us; 99%=01s299ms246.863us
Metric: InboundData
TotalSamples: 1364
Counter: 2.64KB
ValueRate: 0.05B / second
Rate: 0.0267839 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1049777
Counter: 98.44GB
ValueRate: 921.30KB / second
Rate: 20.1352 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2314837
Counter: 12h13m34s050ms981.608us
ValueRate: 02s598ms140.419us / second
Rate: 47.4606 / second
Percentiles: 1%=461.776us; 5%=516.467us; 10%=552.660us; 20%=617.426us; 50%=820.524us; 80%=002ms538.167us; 90%=028ms78.272us; 95%=375ms757.240us; 99%=388ms423.220us
Metric: TransferFromServerTime
TotalSamples: 1364
Counter: 08s384ms791.118us
ValueRate: 75.679us / second
Rate: 0.0267839 / second
Percentiles: 1%=593.996us; 5%=651.349us; 10%=694.895us; 20%=750.146us; 50%=984.609us; 80%=002ms489.320us; 90%=004ms98.478us; 95%=009ms972.049us; 99%=042ms495.394us
Metric: TransferToServerTime
TotalSamples: 1049777
Counter: 03d21h08m25s251ms665.205us
ValueRate: 04s142ms623.498us / second
Rate: 20.1349 / second
Percentiles: 1%=001ms95.502us; 5%=001ms231.289us; 10%=001ms323.184us; 20%=001ms483.193us; 50%=002ms460.328us; 80%=234ms25.234us; 90%=949ms168.234us; 95%=01s013ms724.880us; 99%=01s068ms753.425us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 255785
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 188035130
Counter: CreateXlaTensor
Value: 1226539950
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 188028096
Counter: DestroyXlaTensor
Value: 1226533941
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 188028097
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23787
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1364
Epoch 22 begin 2019-08-27 08:35:36.916448
training torch.Size([256, 64])/ 2019-08-27 08:35:46.093029, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:35:46.141071, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:35:46.178983, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:35:46.245533, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:35:46.583340, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 08:35:46.651947, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:35:46.684568, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:35:46.811586, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:38:40.319150, device xla:4, step 100, Rate=58.83, Global Rate=285.63, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:38:40.301567, device xla:6, step 100, Rate=58.97, Global Rate=285.65, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:38:40.307261, device xla:7, step 100, Rate=58.98, Global Rate=285.65, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:38:40.323065, device xla:2, step 100, Rate=58.79, Global Rate=285.62, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:38:40.340046, device xla:1, step 100, Rate=58.77, Global Rate=285.59, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:38:40.332634, device xla:5, step 100, Rate=58.94, Global Rate=285.61, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:38:40.296279, device xla:8, step 100, Rate=59.03, Global Rate=285.66, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:38:40.357746, device xla:3, step 100, Rate=58.79, Global Rate=285.56, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:41:28.439542, device xla:2, step 200, Rate=107.94, Global Rate=294.78, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:41:28.433100, device xla:5, step 200, Rate=108.06, Global Rate=294.79, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:41:28.467493, device xla:7, step 200, Rate=108.08, Global Rate=294.76, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:41:28.448787, device xla:1, step 200, Rate=107.93, Global Rate=294.77, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:41:28.483396, device xla:4, step 200, Rate=107.95, Global Rate=294.74, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:41:28.492962, device xla:8, step 200, Rate=108.10, Global Rate=294.74, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:41:28.460238, device xla:3, step 200, Rate=107.95, Global Rate=294.76, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:41:28.501116, device xla:6, step 200, Rate=108.06, Global Rate=294.73, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 08:44:16.344488, device xla:5, step 300, Rate=147.44, Global Rate=298.09, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:44:16.351532, device xla:4, step 300, Rate=147.36, Global Rate=298.09, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:44:16.372156, device xla:7, step 300, Rate=147.45, Global Rate=298.07, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:44:16.391067, device xla:3, step 300, Rate=147.34, Global Rate=298.06, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:44:16.383955, device xla:6, step 300, Rate=147.44, Global Rate=298.07, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:44:16.361944, device xla:2, step 300, Rate=147.33, Global Rate=298.08, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:44:16.399348, device xla:1, step 300, Rate=147.31, Global Rate=298.06, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:44:16.339014, device xla:8, step 300, Rate=147.49, Global Rate=298.09, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 08:47:03.427476, device xla:4, step 400, Rate=179.18, Global Rate=300.13, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:47:03.402795, device xla:6, step 400, Rate=179.26, Global Rate=300.14, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:47:03.414119, device xla:7, step 400, Rate=179.26, Global Rate=300.14, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:47:03.430520, device xla:3, step 400, Rate=179.17, Global Rate=300.13, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:47:03.449573, device xla:5, step 400, Rate=179.23, Global Rate=300.12, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 08:47:03.408355, device xla:8, step 400, Rate=179.28, Global Rate=300.14, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:47:03.437539, device xla:1, step 400, Rate=179.15, Global Rate=300.13, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:47:03.459741, device xla:2, step 400, Rate=179.15, Global Rate=300.12, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:49:49.096609, device xla:6, step 500, Rate=205.21, Global Rate=301.88, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:49:49.091349, device xla:5, step 500, Rate=205.20, Global Rate=301.88, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:49:49.107396, device xla:7, step 500, Rate=205.21, Global Rate=301.87, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 08:49:49.121770, device xla:8, step 500, Rate=205.22, Global Rate=301.87, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:49:49.111280, device xla:1, step 500, Rate=205.13, Global Rate=301.87, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:49:49.134694, device xla:2, step 500, Rate=205.13, Global Rate=301.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:49:49.127085, device xla:4, step 500, Rate=205.14, Global Rate=301.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:49:49.101432, device xla:3, step 500, Rate=205.15, Global Rate=301.87, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:52:35.872410, device xla:6, step 600, Rate=225.57, Global Rate=302.72, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:52:35.915422, device xla:7, step 600, Rate=225.56, Global Rate=302.70, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:52:35.883972, device xla:8, step 600, Rate=225.58, Global Rate=302.71, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:52:35.889774, device xla:1, step 600, Rate=225.50, Global Rate=302.71, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:52:35.920611, device xla:4, step 600, Rate=225.51, Global Rate=302.70, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:52:35.878015, device xla:5, step 600, Rate=225.56, Global Rate=302.72, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:52:35.931845, device xla:3, step 600, Rate=225.50, Global Rate=302.70, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 08:52:35.904309, device xla:2, step 600, Rate=225.50, Global Rate=302.71, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:55:21.305383, device xla:2, step 700, Rate=242.31, Global Rate=303.67, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:55:21.322120, device xla:7, step 700, Rate=242.35, Global Rate=303.66, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:55:21.333982, device xla:8, step 700, Rate=242.36, Global Rate=303.66, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:55:21.290951, device xla:6, step 700, Rate=242.36, Global Rate=303.67, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:55:21.314501, device xla:3, step 700, Rate=242.31, Global Rate=303.66, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:55:21.340889, device xla:1, step 700, Rate=242.29, Global Rate=303.66, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:55:21.296324, device xla:4, step 700, Rate=242.33, Global Rate=303.67, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:55:21.358573, device xla:5, step 700, Rate=242.33, Global Rate=303.65, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:58:10.139841, device xla:6, step 800, Rate=254.53, Global Rate=303.62, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:58:10.163248, device xla:4, step 800, Rate=254.50, Global Rate=303.61, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:58:10.153339, device xla:3, step 800, Rate=254.50, Global Rate=303.61, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:58:10.166660, device xla:1, step 800, Rate=254.49, Global Rate=303.61, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:58:10.145166, device xla:2, step 800, Rate=254.50, Global Rate=303.61, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:58:10.189915, device xla:7, step 800, Rate=254.52, Global Rate=303.60, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 08:58:10.184408, device xla:8, step 800, Rate=254.53, Global Rate=303.61, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 08:58:10.203050, device xla:5, step 800, Rate=254.51, Global Rate=303.60, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 09:00:57.525535, device xla:5, step 900, Rate=264.81, Global Rate=303.87, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:00:57.567000, device xla:8, step 900, Rate=264.80, Global Rate=303.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:00:57.531334, device xla:4, step 900, Rate=264.78, Global Rate=303.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:00:57.574341, device xla:1, step 900, Rate=264.76, Global Rate=303.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:00:57.590177, device xla:6, step 900, Rate=264.78, Global Rate=303.85, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 09:00:57.539652, device xla:7, step 900, Rate=264.81, Global Rate=303.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:00:57.550718, device xla:3, step 900, Rate=264.77, Global Rate=303.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:00:57.557302, device xla:2, step 900, Rate=264.77, Global Rate=303.86, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:03:45.845147, device xla:8, step 1000, Rate=272.69, Global Rate=303.90, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:03:45.855569, device xla:2, step 1000, Rate=272.66, Global Rate=303.90, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:03:45.879960, device xla:7, step 1000, Rate=272.67, Global Rate=303.89, Compiles=107, _local_scalar_dense=1364training torch.Size([512, 32])/ 2019-08-27 09:03:45.891313, device xla:4, step 1000, Rate=272.65, Global Rate=303.89, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:03:45.863639, device xla:1, step 1000, Rate=272.66, Global Rate=303.89, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:03:45.906103, device xla:5, step 1000, Rate=272.66, Global Rate=303.89, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:03:45.850277, device xla:6, step 1000, Rate=272.68, Global Rate=303.90, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:03:45.917947, device xla:3, step 1000, Rate=272.64, Global Rate=303.88, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:06:34.144339, device xla:6, step 1100, Rate=278.99, Global Rate=303.93, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:06:34.156652, device xla:5, step 1100, Rate=278.99, Global Rate=303.92, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:06:34.149912, device xla:3, step 1100, Rate=278.98, Global Rate=303.93, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:06:34.165330, device xla:7, step 1100, Rate=278.99, Global Rate=303.92, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:06:34.178792, device xla:1, step 1100, Rate=278.96, Global Rate=303.92, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:06:34.185349, device xla:4, step 1100, Rate=278.96, Global Rate=303.92, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:06:34.194533, device xla:8, step 1100, Rate=278.98, Global Rate=303.92, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:06:34.200194, device xla:2, step 1100, Rate=278.95, Global Rate=303.92, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:09:21.853874, device xla:6, step 1200, Rate=284.25, Global Rate=304.04, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:09:21.870719, device xla:5, step 1200, Rate=284.25, Global Rate=304.04, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:09:21.865208, device xla:8, step 1200, Rate=284.26, Global Rate=304.04, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:09:21.895532, device xla:4, step 1200, Rate=284.23, Global Rate=304.03, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:09:21.859039, device xla:3, step 1200, Rate=284.24, Global Rate=304.04, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 09:09:21.877410, device xla:2, step 1200, Rate=284.23, Global Rate=304.04, Compiles=107, _local_scalar_dense=1364
training torch.Size([1024, 16])/ 2019-08-27 09:09:21.886626, device xla:7, step 1200, Rate=284.24, Global Rate=304.03, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:09:21.914479, device xla:1, step 1200, Rate=284.22, Global Rate=304.03, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:12:07.793356, device xla:1, step 1300, Rate=289.11, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:12:07.804675, device xla:7, step 1300, Rate=289.11, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:12:07.782737, device xla:6, step 1300, Rate=289.11, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:12:07.788036, device xla:8, step 1300, Rate=289.12, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:12:07.826715, device xla:2, step 1300, Rate=289.09, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:12:07.819992, device xla:3, step 1300, Rate=289.09, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:12:07.810605, device xla:4, step 1300, Rate=289.10, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:12:07.837230, device xla:5, step 1300, Rate=289.10, Global Rate=304.38, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:14:50.424688, device xla:6, step 1400, Rate=294.25, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:14:50.430017, device xla:8, step 1400, Rate=294.26, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:14:50.435306, device xla:4, step 1400, Rate=294.25, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:14:50.454915, device xla:1, step 1400, Rate=294.24, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:14:50.465877, device xla:5, step 1400, Rate=294.24, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:14:50.481600, device xla:3, step 1400, Rate=294.23, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:14:50.443173, device xla:7, step 1400, Rate=294.25, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:14:50.472611, device xla:2, step 1400, Rate=294.23, Global Rate=305.10, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:17:34.243833, device xla:6, step 1500, Rate=297.91, Global Rate=305.59, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:17:34.249352, device xla:5, step 1500, Rate=297.92, Global Rate=305.59, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:17:34.262355, device xla:8, step 1500, Rate=297.91, Global Rate=305.59, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:17:34.254594, device xla:3, step 1500, Rate=297.91, Global Rate=305.59, Compiles=107, _local_scalar_dense=1364
training torch.Size([512, 32])/ 2019-08-27 09:17:34.269303, device xla:7, step 1500, Rate=297.91, Global Rate=305.59, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:17:34.279626, device xla:2, step 1500, Rate=297.90, Global Rate=305.58, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:17:34.292321, device xla:1, step 1500, Rate=297.89, Global Rate=305.58, Compiles=107, _local_scalar_dense=1364
training torch.Size([256, 64])/ 2019-08-27 09:17:34.308255, device xla:4, step 1500, Rate=297.89, Global Rate=305.58, Compiles=107, _local_scalar_dense=1364
Epoch 22 Training stats:
device xla:1
| epoch 022 | loss 0.182 | nll_loss 0.182 | ppl 1.13 | wps 6028 | ups 1 | wpb 11139.688 | bsz 411.092 | num_updates 33176 | lr 0.000173615 | gnorm 0.052 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 48318
device xla:2
| epoch 022 | loss 0.181 | nll_loss 0.181 | ppl 1.13 | wps 6041 | ups 1 | wpb 11162.835 | bsz 409.071 | num_updates 33176 | lr 0.000173615 | gnorm 0.050 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 49593
device xla:3
| epoch 022 | loss 0.182 | nll_loss 0.182 | ppl 1.13 | wps 6011 | ups 1 | wpb 11108.630 | bsz 411.247 | num_updates 33176 | lr 0.000173615 | gnorm 0.054 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 47882
device xla:4
| epoch 022 | loss 0.182 | nll_loss 0.182 | ppl 1.13 | wps 6021 | ups 1 | wpb 11125.569 | bsz 411.069 | num_updates 33176 | lr 0.000173615 | gnorm 0.054 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 49599
device xla:5
| epoch 022 | loss 0.181 | nll_loss 0.181 | ppl 1.13 | wps 6048 | ups 1 | wpb 11175.961 | bsz 410.506 | num_updates 33176 | lr 0.000173615 | gnorm 0.051 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 49447
device xla:6
| epoch 022 | loss 0.182 | nll_loss 0.182 | ppl 1.13 | wps 6036 | ups 1 | wpb 11154.002 | bsz 409.210 | num_updates 33176 | lr 0.000173615 | gnorm 0.052 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 49557
device xla:7
| epoch 022 | loss 0.181 | nll_loss 0.181 | ppl 1.13 | wps 6047 | ups 1 | wpb 11173.759 | bsz 408.245 | num_updates 33176 | lr 0.000173615 | gnorm 0.052 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 48405
device xla:8
| epoch 022 | loss 0.182 | nll_loss 0.182 | ppl 1.13 | wps 6023 | ups 1 | wpb 11129.338 | bsz 408.083 | num_updates 33176 | lr 0.000173615 | gnorm 0.055 | clip 0.000 | oom 0.000 | wall 61306 | train_wall 48503
Epoch 22 Tracker Rates:
Rate=295.80, Global Rate=305.48
Rate=295.76, Global Rate=305.48
Rate=295.66, Global Rate=305.48
Rate=295.86, Global Rate=305.48
Rate=295.65, Global Rate=305.48
Rate=295.62, Global Rate=305.48
Rate=295.72, Global Rate=305.48
Rate=295.69, Global Rate=305.48
Epoch 22 end 2019-08-27 09:17:48.542353
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 267956
Counter: 04d10h20m01s067ms994.479us
ValueRate: 06s063ms246.550us / second
Rate: 4.9501 / second
Percentiles: 1%=01s081ms784.175us; 5%=01s168ms471.460us; 10%=01s172ms708.781us; 20%=01s176ms80.760us; 50%=01s187ms764.076us; 80%=01s288ms227.367us; 90%=01s292ms0.189us; 95%=01s295ms904.073us; 99%=01s298ms765.756us
Metric: InboundData
TotalSamples: 1404
Counter: 2.72KB
ValueRate: 0.05B / second
Rate: 0.0251269 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1099530
Counter: 102.93GB
ValueRate: 506.31KB / second
Rate: 20.7769 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2424136
Counter: 12h22m45s004ms1.296us
ValueRate: 400ms497.693us / second
Rate: 44.484 / second
Percentiles: 1%=463.463us; 5%=516.454us; 10%=548.424us; 20%=612.456us; 50%=834.948us; 80%=002ms273.985us; 90%=011ms158.150us; 95%=028ms472.003us; 99%=055ms246.117us
Metric: TransferFromServerTime
TotalSamples: 1404
Counter: 08s469ms746.605us
ValueRate: 66.763us / second
Rate: 0.0251269 / second
Percentiles: 1%=593.996us; 5%=651.349us; 10%=694.895us; 20%=748.068us; 50%=980.102us; 80%=002ms444.341us; 90%=004ms70.538us; 95%=009ms854.104us; 99%=042ms402.917us
Metric: TransferToServerTime
TotalSamples: 1099530
Counter: 03d30h17m05s336ms858.007us
ValueRate: 05s870ms217.667us / second
Rate: 20.7798 / second
Percentiles: 1%=001ms60.078us; 5%=001ms164.076us; 10%=001ms251.525us; 20%=001ms394.238us; 50%=002ms345.779us; 80%=895ms671.673us; 90%=958ms30.409us; 95%=01s033ms984.048us; 99%=01s058ms9.421us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 267849
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 196987507
Counter: CreateXlaTensor
Value: 1284811594
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 196978457
Counter: DestroyXlaTensor
Value: 1284805585
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 196980474
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23842
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1404
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 09:17:53.647404, device xla:1, step 0, Compiles=107, _local_scalar_dense=1404
validation/ 2019-08-27 09:17:53.653002, device xla:8, step 0, Compiles=107, _local_scalar_dense=1404
validation/ 2019-08-27 09:17:53.662945, device xla:3, step 0, Compiles=107, _local_scalar_dense=1404
validation/ 2019-08-27 09:17:53.669992, device xla:6, step 0, Compiles=107, _local_scalar_dense=1404
validation/ 2019-08-27 09:17:53.673212, device xla:5, step 0, Compiles=107, _local_scalar_dense=1404
validation/ 2019-08-27 09:17:53.676461, device xla:7, step 0, Compiles=107, _local_scalar_dense=1404
validation/ 2019-08-27 09:17:53.679008, device xla:4, step 0, Compiles=107, _local_scalar_dense=1404
validation/ 2019-08-27 09:17:53.805290, device xla:2, step 0, Compiles=107, _local_scalar_dense=1404
validation stats on subset "valid" - 2019-08-27 09:17:59.714122
| epoch 022 | valid on 'valid' subset | loss 3.844 | nll_loss 2.031 | ppl 4.09 | num_updates 33176
| epoch 022 | valid on 'valid' subset | loss 3.875 | nll_loss 2.047 | ppl 4.13 | num_updates 33176
| epoch 022 | valid on 'valid' subset | loss 3.953 | nll_loss 2.109 | ppl 4.32 | num_updates 33176
| epoch 022 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 33176
| epoch 022 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 33176
| epoch 022 | valid on 'valid' subset | loss 3.875 | nll_loss 2.078 | ppl 4.22 | num_updates 33176
| epoch 022 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 33176
| epoch 022 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 33176
old learning rate: 0.00017770092229505826
new learning rate: 0.00017361529748723717
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 268077
Counter: 04d10h21m47s053ms534.203us
ValueRate: 06s938ms617.882us / second
Rate: 5.26442 / second
Percentiles: 1%=377ms769.243us; 5%=379ms569.355us; 10%=391ms56.123us; 20%=01s172ms664.724us; 50%=01s184ms706.995us; 80%=01s288ms2.316us; 90%=01s292ms675.744us; 95%=01s295ms763.129us; 99%=01s298ms765.756us
Metric: InboundData
TotalSamples: 1429
Counter: 2.77KB
ValueRate: 0.05B / second
Rate: 0.0268639 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1099770
Counter: 102.98GB
ValueRate: 929.53KB / second
Rate: 20.3148 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2424735
Counter: 12h22m16s151ms841.137us
ValueRate: 02s575ms559.634us / second
Rate: 48.7159 / second
Percentiles: 1%=451.964us; 5%=516.397us; 10%=551.094us; 20%=607.864us; 50%=822.141us; 80%=002ms515.962us; 90%=028ms12.655us; 95%=375ms612.531us; 99%=388ms925.352us
Metric: TransferFromServerTime
TotalSamples: 1429
Counter: 09s513ms652.912us
ValueRate: 69.806us / second
Rate: 0.0268639 / second
Percentiles: 1%=593.996us; 5%=651.349us; 10%=695.440us; 20%=748.809us; 50%=981.829us; 80%=002ms436.714us; 90%=004ms974.844us; 95%=009ms664.022us; 99%=042ms402.917us
Metric: TransferToServerTime
TotalSamples: 1099770
Counter: 03d30h18m31s899ms917.725us
ValueRate: 04s042ms125.956us / second
Rate: 20.3147 / second
Percentiles: 1%=001ms79.764us; 5%=001ms204.192us; 10%=001ms304.169us; 20%=001ms465.981us; 50%=002ms253.207us; 80%=226ms299.446us; 90%=950ms829.362us; 95%=995ms586.636us; 99%=01s056ms174.852us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 267970
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 196989116
Counter: CreateXlaTensor
Value: 1284946411
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 196982083
Counter: DestroyXlaTensor
Value: 1284940402
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 196982083
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23842
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1429
Epoch 23 begin 2019-08-27 09:17:59.737567
training torch.Size([512, 32])/ 2019-08-27 09:18:09.479995, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:18:09.500551, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:18:09.522005, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:18:09.582529, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:18:10.047853, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:18:10.075139, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:18:10.150390, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:18:10.208367, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:21:01.729132, device xla:6, step 100, Rate=59.65, Global Rate=288.27, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:21:01.736698, device xla:7, step 100, Rate=59.68, Global Rate=288.26, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:21:01.717477, device xla:3, step 100, Rate=59.47, Global Rate=288.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:21:01.722907, device xla:1, step 100, Rate=59.45, Global Rate=288.28, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:21:01.748377, device xla:4, step 100, Rate=59.48, Global Rate=288.24, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:21:01.756718, device xla:5, step 100, Rate=59.64, Global Rate=288.23, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:21:01.766449, device xla:8, step 100, Rate=59.69, Global Rate=288.21, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:21:01.775740, device xla:2, step 100, Rate=59.44, Global Rate=288.20, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:23:46.758191, device xla:6, step 200, Rate=109.77, Global Rate=298.86, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:23:46.775973, device xla:4, step 200, Rate=109.63, Global Rate=298.84, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:23:46.764169, device xla:5, step 200, Rate=109.77, Global Rate=298.85, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:23:46.784996, device xla:2, step 200, Rate=109.61, Global Rate=298.83, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:23:46.793724, device xla:7, step 200, Rate=109.78, Global Rate=298.83, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:23:46.769672, device xla:8, step 200, Rate=109.81, Global Rate=298.85, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:23:46.817182, device xla:1, step 200, Rate=109.59, Global Rate=298.80, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:23:46.809162, device xla:3, step 200, Rate=109.60, Global Rate=298.81, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:26:33.300420, device xla:4, step 300, Rate=149.20, Global Rate=301.66, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:26:33.292324, device xla:5, step 300, Rate=149.30, Global Rate=301.67, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:26:33.275076, device xla:6, step 300, Rate=149.31, Global Rate=301.68, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:26:33.315743, device xla:1, step 300, Rate=149.17, Global Rate=301.65, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:26:33.322565, device xla:7, step 300, Rate=149.32, Global Rate=301.65, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:26:33.307154, device xla:2, step 300, Rate=149.18, Global Rate=301.66, Compiles=107, _local_scalar_dense=1429training torch.Size([256, 64])/ 2019-08-27 09:26:33.281240, device xla:3, step 300, Rate=149.19, Global Rate=301.67, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:26:33.286582, device xla:8, step 300, Rate=149.34, Global Rate=301.67, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:29:21.136072, device xla:3, step 400, Rate=180.36, Global Rate=302.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:29:21.140338, device xla:5, step 400, Rate=180.45, Global Rate=302.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:29:21.146250, device xla:1, step 400, Rate=180.35, Global Rate=302.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:29:21.173878, device xla:8, step 400, Rate=180.47, Global Rate=302.49, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:29:21.151638, device xla:2, step 400, Rate=180.35, Global Rate=302.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:29:21.160977, device xla:7, step 400, Rate=180.46, Global Rate=302.49, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:29:21.184208, device xla:6, step 400, Rate=180.44, Global Rate=302.48, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:29:21.193446, device xla:4, step 400, Rate=180.35, Global Rate=302.48, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:32:04.396613, device xla:6, step 500, Rate=207.09, Global Rate=304.66, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:32:04.407183, device xla:8, step 500, Rate=207.11, Global Rate=304.66, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:32:04.411983, device xla:3, step 500, Rate=207.00, Global Rate=304.66, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:32:04.418400, device xla:5, step 500, Rate=207.08, Global Rate=304.65, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:32:04.401860, device xla:4, step 500, Rate=207.02, Global Rate=304.66, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:32:04.436688, device xla:2, step 500, Rate=207.00, Global Rate=304.65, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:32:04.424168, device xla:7, step 500, Rate=207.09, Global Rate=304.65, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:32:04.448934, device xla:1, step 500, Rate=206.99, Global Rate=304.64, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:34:49.469143, device xla:8, step 600, Rate=227.72, Global Rate=305.57, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:34:49.475082, device xla:4, step 600, Rate=227.65, Global Rate=305.56, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:34:49.492284, device xla:5, step 600, Rate=227.69, Global Rate=305.56, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:34:49.496876, device xla:7, step 600, Rate=227.71, Global Rate=305.56, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:34:49.486202, device xla:1, step 600, Rate=227.64, Global Rate=305.56, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:34:49.510842, device xla:3, step 600, Rate=227.63, Global Rate=305.55, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:34:49.480372, device xla:6, step 600, Rate=227.70, Global Rate=305.56, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:34:49.517691, device xla:2, step 600, Rate=227.63, Global Rate=305.55, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:37:33.850020, device xla:8, step 700, Rate=244.47, Global Rate=306.40, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:37:33.855951, device xla:3, step 700, Rate=244.41, Global Rate=306.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:37:33.872965, device xla:7, step 700, Rate=244.46, Global Rate=306.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:37:33.861049, device xla:6, step 700, Rate=244.46, Global Rate=306.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:37:33.890871, device xla:2, step 700, Rate=244.40, Global Rate=306.38, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:37:33.884968, device xla:4, step 700, Rate=244.40, Global Rate=306.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:37:33.866547, device xla:5, step 700, Rate=244.45, Global Rate=306.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:37:33.898727, device xla:1, step 700, Rate=244.39, Global Rate=306.38, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:40:17.044891, device xla:8, step 800, Rate=258.33, Global Rate=307.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:40:17.054899, device xla:5, step 800, Rate=258.31, Global Rate=307.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:40:17.056630, device xla:3, step 800, Rate=258.27, Global Rate=307.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:40:17.063162, device xla:1, step 800, Rate=258.27, Global Rate=307.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:40:17.049423, device xla:4, step 800, Rate=258.28, Global Rate=307.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:40:17.066963, device xla:6, step 800, Rate=258.31, Global Rate=307.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:40:17.086033, device xla:7, step 800, Rate=258.31, Global Rate=307.28, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:40:17.074435, device xla:2, step 800, Rate=258.27, Global Rate=307.29, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:43:01.408332, device xla:3, step 900, Rate=268.92, Global Rate=307.76, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:43:01.412783, device xla:6, step 900, Rate=268.95, Global Rate=307.76, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:43:01.425452, device xla:8, step 900, Rate=268.95, Global Rate=307.75, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:43:01.432257, device xla:5, step 900, Rate=268.94, Global Rate=307.75, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:43:01.420186, device xla:4, step 900, Rate=268.92, Global Rate=307.75, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:43:01.444091, device xla:7, step 900, Rate=268.95, Global Rate=307.75, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:43:01.437918, device xla:1, step 900, Rate=268.91, Global Rate=307.75, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:43:01.446662, device xla:2, step 900, Rate=268.91, Global Rate=307.75, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:45:46.333961, device xla:5, step 1000, Rate=277.25, Global Rate=308.02, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:45:46.338104, device xla:4, step 1000, Rate=277.23, Global Rate=308.02, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:45:46.343062, device xla:3, step 1000, Rate=277.22, Global Rate=308.02, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:45:46.347640, device xla:7, step 1000, Rate=277.26, Global Rate=308.02, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:45:46.358236, device xla:2, step 1000, Rate=277.22, Global Rate=308.02, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:45:46.372746, device xla:1, step 1000, Rate=277.22, Global Rate=308.02, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:45:46.366328, device xla:8, step 1000, Rate=277.25, Global Rate=308.02, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:45:46.380518, device xla:6, step 1000, Rate=277.24, Global Rate=308.01, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:48:31.054369, device xla:3, step 1100, Rate=283.95, Global Rate=308.28, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:48:31.059153, device xla:5, step 1100, Rate=283.97, Global Rate=308.28, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:48:31.063837, device xla:1, step 1100, Rate=283.95, Global Rate=308.27, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:48:31.075030, device xla:6, step 1100, Rate=283.96, Global Rate=308.27, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:48:31.080563, device xla:7, step 1100, Rate=283.97, Global Rate=308.27, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:48:31.069577, device xla:4, step 1100, Rate=283.95, Global Rate=308.27, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:48:31.085762, device xla:8, step 1100, Rate=283.96, Global Rate=308.27, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:48:31.100468, device xla:2, step 1100, Rate=283.94, Global Rate=308.27, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:51:16.414448, device xla:3, step 1200, Rate=289.08, Global Rate=308.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:51:16.425825, device xla:5, step 1200, Rate=289.10, Global Rate=308.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:51:16.419158, device xla:2, step 1200, Rate=289.09, Global Rate=308.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:51:16.440316, device xla:8, step 1200, Rate=289.10, Global Rate=308.38, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:51:16.431908, device xla:7, step 1200, Rate=289.10, Global Rate=308.39, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:51:16.452721, device xla:1, step 1200, Rate=289.07, Global Rate=308.38, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:51:16.446958, device xla:4, step 1200, Rate=289.08, Global Rate=308.38, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:51:16.460642, device xla:6, step 1200, Rate=289.09, Global Rate=308.38, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:54:01.647743, device xla:4, step 1300, Rate=293.25, Global Rate=308.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:54:01.652879, device xla:6, step 1300, Rate=293.26, Global Rate=308.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:54:01.674764, device xla:7, step 1300, Rate=293.25, Global Rate=308.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:54:01.679650, device xla:8, step 1300, Rate=293.25, Global Rate=308.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:54:01.660676, device xla:2, step 1300, Rate=293.24, Global Rate=308.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:54:01.667853, device xla:3, step 1300, Rate=293.23, Global Rate=308.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:54:01.686479, device xla:1, step 1300, Rate=293.23, Global Rate=308.50, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:54:01.707218, device xla:5, step 1300, Rate=293.23, Global Rate=308.49, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:56:45.290008, device xla:1, step 1400, Rate=297.18, Global Rate=308.81, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:56:45.295222, device xla:6, step 1400, Rate=297.18, Global Rate=308.81, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:56:45.300412, device xla:3, step 1400, Rate=297.17, Global Rate=308.81, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:56:45.310693, device xla:4, step 1400, Rate=297.16, Global Rate=308.81, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:56:45.305170, device xla:8, step 1400, Rate=297.18, Global Rate=308.81, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:56:45.316506, device xla:7, step 1400, Rate=297.18, Global Rate=308.81, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:56:45.325248, device xla:2, step 1400, Rate=297.16, Global Rate=308.80, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:56:45.337164, device xla:5, step 1400, Rate=297.17, Global Rate=308.80, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:59:26.654436, device xla:3, step 1500, Rate=301.20, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:59:26.665470, device xla:5, step 1500, Rate=301.21, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:59:26.658983, device xla:2, step 1500, Rate=301.20, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:59:26.676703, device xla:4, step 1500, Rate=301.19, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:59:26.699399, device xla:8, step 1500, Rate=301.19, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
training torch.Size([1024, 16])/ 2019-08-27 09:59:26.682546, device xla:6, step 1500, Rate=301.20, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
training torch.Size([512, 32])/ 2019-08-27 09:59:26.688366, device xla:7, step 1500, Rate=301.20, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
training torch.Size([256, 64])/ 2019-08-27 09:59:26.670181, device xla:1, step 1500, Rate=301.19, Global Rate=309.36, Compiles=107, _local_scalar_dense=1429
Epoch 23 Training stats:
device xla:1
| epoch 023 | loss 0.174 | nll_loss 0.174 | ppl 1.13 | wps 6055 | ups 1 | wpb 11141.472 | bsz 411.088 | num_updates 34684 | lr 0.000169799 | gnorm 0.051 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 50369
device xla:2
| epoch 023 | loss 0.173 | nll_loss 0.173 | ppl 1.13 | wps 6065 | ups 1 | wpb 11159.079 | bsz 409.058 | num_updates 34684 | lr 0.000169799 | gnorm 0.048 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 51649
device xla:3
| epoch 023 | loss 0.174 | nll_loss 0.174 | ppl 1.13 | wps 6039 | ups 1 | wpb 11111.423 | bsz 411.236 | num_updates 34684 | lr 0.000169799 | gnorm 0.052 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 49934
device xla:4
| epoch 023 | loss 0.174 | nll_loss 0.174 | ppl 1.13 | wps 6047 | ups 1 | wpb 11126.343 | bsz 411.523 | num_updates 34684 | lr 0.000169799 | gnorm 0.052 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 51656
device xla:5
| epoch 023 | loss 0.173 | nll_loss 0.173 | ppl 1.13 | wps 6074 | ups 1 | wpb 11175.407 | bsz 410.431 | num_updates 34684 | lr 0.000169799 | gnorm 0.049 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 51506
device xla:6
| epoch 023 | loss 0.174 | nll_loss 0.174 | ppl 1.13 | wps 6061 | ups 1 | wpb 11151.749 | bsz 408.756 | num_updates 34684 | lr 0.000169799 | gnorm 0.050 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 51611
device xla:7
| epoch 023 | loss 0.173 | nll_loss 0.173 | ppl 1.13 | wps 6073 | ups 1 | wpb 11174.583 | bsz 408.209 | num_updates 34684 | lr 0.000169799 | gnorm 0.050 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 50446
device xla:8
| epoch 023 | loss 0.174 | nll_loss 0.174 | ppl 1.13 | wps 6049 | ups 1 | wpb 11129.781 | bsz 408.217 | num_updates 34684 | lr 0.000169799 | gnorm 0.053 | clip 0.000 | oom 0.000 | wall 63818 | train_wall 50565
Epoch 23 Tracker Rates:
Rate=299.04, Global Rate=309.25
Rate=299.00, Global Rate=309.25
Rate=298.98, Global Rate=309.25
Rate=299.07, Global Rate=309.25
Rate=299.03, Global Rate=309.25
Rate=299.09, Global Rate=309.25
Rate=299.12, Global Rate=309.25
Rate=299.16, Global Rate=309.25
Epoch 23 end 2019-08-27 09:59:40.773111
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 280141
Counter: 04d20h04m30s790ms378.352us
ValueRate: 06s113ms760.853us / second
Rate: 4.99223 / second
Percentiles: 1%=01s163ms630.681us; 5%=01s170ms571.607us; 10%=01s171ms465.048us; 20%=01s175ms292.748us; 50%=01s184ms291.507us; 80%=01s287ms699.552us; 90%=01s291ms953.105us; 95%=01s295ms536.444us; 99%=01s299ms275.187us
Metric: InboundData
TotalSamples: 1469
Counter: 2.85KB
ValueRate: 0.05B / second
Rate: 0.0252161 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1149608
Counter: 107.47GB
ValueRate: 525.38KB / second
Rate: 21.5217 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2533302
Counter: 12h07m12s877ms614.148us
ValueRate: 290ms449.758us / second
Rate: 43.5051 / second
Percentiles: 1%=443.666us; 5%=490.847us; 10%=533.596us; 20%=606.261us; 50%=845.922us; 80%=003ms890.584us; 90%=010ms984.956us; 95%=027ms479.331us; 99%=054ms285.573us
Metric: TransferFromServerTime
TotalSamples: 1469
Counter: 09s582ms319.863us
ValueRate: 61.785us / second
Rate: 0.0252161 / second
Percentiles: 1%=593.996us; 5%=650.867us; 10%=695.440us; 20%=748.224us; 50%=980.102us; 80%=002ms423.778us; 90%=004ms990.614us; 95%=008ms336.370us; 99%=041ms422.202us
Metric: TransferToServerTime
TotalSamples: 1149608
Counter: 04d38h06m37s148ms25.683us
ValueRate: 05s232ms454.822us / second
Rate: 21.5218 / second
Percentiles: 1%=001ms28.281us; 5%=001ms158.756us; 10%=001ms241.405us; 20%=001ms360.291us; 50%=002ms947.535us; 80%=913ms249.842us; 90%=991ms598.713us; 95%=01s071ms308.189us; 99%=01s101ms130.224us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 280034
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 205941578
Counter: CreateXlaTensor
Value: 1343218043
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 205933404
Counter: DestroyXlaTensor
Value: 1343212034
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 205934545
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23902
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1469
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 09:59:44.455318, device xla:4, step 0, Compiles=107, _local_scalar_dense=1469
validation/ 2019-08-27 09:59:44.462236, device xla:5, step 0, Compiles=107, _local_scalar_dense=1469
validation/ 2019-08-27 09:59:44.468007, device xla:7, step 0, Compiles=107, _local_scalar_dense=1469
validation/ 2019-08-27 09:59:44.470899, device xla:3, step 0, Compiles=107, _local_scalar_dense=1469
validation/ 2019-08-27 09:59:44.474241, device xla:1, step 0, Compiles=107, _local_scalar_dense=1469
validation/ 2019-08-27 09:59:44.477099, device xla:6, step 0, Compiles=107, _local_scalar_dense=1469
validation/ 2019-08-27 09:59:44.479194, device xla:2, step 0, Compiles=107, _local_scalar_dense=1469
validation/ 2019-08-27 09:59:44.607493, device xla:8, step 0, Compiles=107, _local_scalar_dense=1469
validation stats on subset "valid" - 2019-08-27 09:59:50.501333
| epoch 023 | valid on 'valid' subset | loss 3.828 | nll_loss 2.016 | ppl 4.04 | num_updates 34684
| epoch 023 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 34684
| epoch 023 | valid on 'valid' subset | loss 3.953 | nll_loss 2.109 | ppl 4.32 | num_updates 34684
| epoch 023 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 34684
| epoch 023 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 34684
| epoch 023 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 34684
| epoch 023 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 34684
| epoch 023 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 34684
old learning rate: 0.00017361529748723717
new learning rate: 0.0001697991106489232
Metric: CompileTime
TotalSamples: 107
Counter: 11h04m07s450ms218.662us
ValueRate: 484ms798.318us / second
Rate: 0.00321781 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=070ms443.655us; 20%=088ms568.741us; 50%=28s578ms625.257us; 80%=06m45s333ms304.374us; 90%=07m43s003ms845.134us; 95%=07m09s614ms115.550us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 280262
Counter: 04d20h05m16s764ms355.258us
ValueRate: 06s051ms155.855us / second
Rate: 5.3847 / second
Percentiles: 1%=377ms650.219us; 5%=378ms424.294us; 10%=391ms952.728us; 20%=01s171ms19.130us; 50%=01s181ms690.363us; 80%=01s286ms886.665us; 90%=01s290ms467.571us; 95%=01s294ms277.381us; 99%=01s299ms275.187us
Metric: InboundData
TotalSamples: 1494
Counter: 2.90KB
ValueRate: 0.05B / second
Rate: 0.0271655 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1149848
Counter: 107.52GB
ValueRate: 994.79KB / second
Rate: 21.7412 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 32
Counter: 03h06m15s658ms777.851us
ValueRate: 741ms136.583us / second
Rate: 0.00505178 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=041ms409.780us; 20%=063ms708.988us; 50%=06s914ms613.102us; 80%=04m19s280ms708.158us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2533895
Counter: 12h08m41s136ms635.554us
ValueRate: 02s619ms654.761us / second
Rate: 53.2496 / second
Percentiles: 1%=441.914us; 5%=482.080us; 10%=516.561us; 20%=564.533us; 50%=750.592us; 80%=001ms383.488us; 90%=024ms723.468us; 95%=374ms330.404us; 99%=387ms325.597us
Metric: TransferFromServerTime
TotalSamples: 1494
Counter: 09s621ms932.343us
ValueRate: 63.613us / second
Rate: 0.0271655 / second
Percentiles: 1%=593.996us; 5%=651.349us; 10%=697.034us; 20%=746.845us; 50%=976.758us; 80%=002ms378.831us; 90%=004ms851.498us; 95%=008ms168.610us; 99%=040ms120.057us
Metric: TransferToServerTime
TotalSamples: 1149848
Counter: 04d38h06m05s255ms875.901us
ValueRate: 05s504ms629.033us / second
Rate: 21.7411 / second
Percentiles: 1%=001ms48.768us; 5%=001ms177.728us; 10%=001ms281.589us; 20%=001ms408.675us; 50%=002ms948.187us; 80%=254ms973.210us; 90%=985ms240.988us; 95%=01s019ms86.862us; 99%=01s100ms154.632us
Counter: CachedSyncParamMismatch
Value: 54
Counter: CachedSyncTensors
Value: 280155
Counter: CreateCompileHandles
Value: 56
Counter: CreateDataHandles
Value: 205943187
Counter: CreateXlaTensor
Value: 1343352860
Counter: DestroyCompileHandles
Value: 33
Counter: DestroyDataHandles
Value: 205936153
Counter: DestroyXlaTensor
Value: 1343346851
Counter: ReleaseCompileHandles
Value: 33
Counter: ReleaseDataHandles
Value: 205936154
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 107
Counter: XRTAllocateFromTensor_Empty
Value: 23902
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1494
Epoch 24 begin 2019-08-27 09:59:50.520967
training torch.Size([256, 64])/ 2019-08-27 10:00:01.216011, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:00:01.238945, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:00:01.417153, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:00:01.561385, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:00:01.722561, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:00:01.894227, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:00:02.053962, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:00:02.208621, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:02:50.889070, device xla:2, step 100, Rate=60.36, Global Rate=292.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:02:50.893722, device xla:4, step 100, Rate=60.47, Global Rate=291.99, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:02:50.896957, device xla:8, step 100, Rate=60.70, Global Rate=291.99, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:02:50.901589, device xla:7, step 100, Rate=60.65, Global Rate=291.98, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:02:50.910647, device xla:5, step 100, Rate=60.52, Global Rate=291.97, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:02:50.883895, device xla:6, step 100, Rate=60.60, Global Rate=292.01, Compiles=107, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:02:50.878549, device xla:3, step 100, Rate=60.43, Global Rate=292.02, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:02:50.926186, device xla:1, step 100, Rate=60.34, Global Rate=291.94, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:05:32.271937, device xla:6, step 200, Rate=111.93, Global Rate=304.11, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:05:32.277891, device xla:5, step 200, Rate=111.88, Global Rate=304.10, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:05:32.282396, device xla:7, step 200, Rate=111.97, Global Rate=304.10, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:05:32.292099, device xla:8, step 200, Rate=112.01, Global Rate=304.09, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:05:32.286344, device xla:3, step 200, Rate=111.78, Global Rate=304.09, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:05:32.297421, device xla:4, step 200, Rate=111.82, Global Rate=304.08, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:05:32.304782, device xla:1, step 200, Rate=111.72, Global Rate=304.08, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:05:32.320667, device xla:2, step 200, Rate=111.72, Global Rate=304.06, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:08:14.688696, device xla:2, step 300, Rate=152.44, Global Rate=307.73, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:08:14.679657, device xla:7, step 300, Rate=152.63, Global Rate=307.73, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:08:14.693816, device xla:3, step 300, Rate=152.48, Global Rate=307.73, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:08:14.712912, device xla:8, step 300, Rate=152.65, Global Rate=307.71, Compiles=107, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:08:14.668767, device xla:4, step 300, Rate=152.52, Global Rate=307.74, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:08:14.699609, device xla:1, step 300, Rate=152.44, Global Rate=307.72, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:08:14.674141, device xla:6, step 300, Rate=152.59, Global Rate=307.74, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:08:14.707422, device xla:5, step 300, Rate=152.54, Global Rate=307.72, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:10:58.507438, device xla:5, step 400, Rate=184.55, Global Rate=308.92, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:10:58.501904, device xla:6, step 400, Rate=184.58, Global Rate=308.92, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:10:58.512607, device xla:1, step 400, Rate=184.46, Global Rate=308.92, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:10:58.531566, device xla:2, step 400, Rate=184.45, Global Rate=308.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:10:58.496574, device xla:3, step 400, Rate=184.50, Global Rate=308.92, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:10:58.521822, device xla:7, step 400, Rate=184.60, Global Rate=308.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:10:58.491299, device xla:4, step 400, Rate=184.52, Global Rate=308.93, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:10:58.543817, device xla:8, step 400, Rate=184.63, Global Rate=308.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:13:41.328625, device xla:5, step 500, Rate=210.53, Global Rate=310.01, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:13:41.340448, device xla:8, step 500, Rate=210.60, Global Rate=310.01, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:13:41.334323, device xla:1, step 500, Rate=210.46, Global Rate=310.01, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:13:41.345258, device xla:2, step 500, Rate=210.46, Global Rate=310.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:13:41.351756, device xla:6, step 500, Rate=210.54, Global Rate=310.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:13:41.363901, device xla:4, step 500, Rate=210.49, Global Rate=310.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:13:41.357673, device xla:3, step 500, Rate=210.47, Global Rate=310.00, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:13:41.369337, device xla:7, step 500, Rate=210.56, Global Rate=309.99, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:16:25.750543, device xla:4, step 600, Rate=230.68, Global Rate=310.24, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:16:25.755764, device xla:5, step 600, Rate=230.70, Global Rate=310.24, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:16:25.761534, device xla:2, step 600, Rate=230.65, Global Rate=310.24, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:16:25.766139, device xla:3, step 600, Rate=230.66, Global Rate=310.23, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:16:25.781709, device xla:8, step 600, Rate=230.75, Global Rate=310.23, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:16:25.772727, device xla:1, step 600, Rate=230.64, Global Rate=310.23, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:16:25.792571, device xla:7, step 600, Rate=230.73, Global Rate=310.23, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:16:25.786894, device xla:6, step 600, Rate=230.71, Global Rate=310.23, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:19:08.288429, device xla:8, step 700, Rate=247.62, Global Rate=310.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:19:08.292231, device xla:4, step 700, Rate=247.55, Global Rate=310.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:19:08.298497, device xla:5, step 700, Rate=247.56, Global Rate=310.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:19:08.303932, device xla:1, step 700, Rate=247.51, Global Rate=310.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:19:08.314439, device xla:6, step 700, Rate=247.57, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:19:08.326771, device xla:2, step 700, Rate=247.51, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:19:08.321173, device xla:3, step 700, Rate=247.52, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:19:08.333780, device xla:7, step 700, Rate=247.58, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:21:52.978689, device xla:2, step 800, Rate=260.20, Global Rate=310.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:21:52.983052, device xla:6, step 800, Rate=260.24, Global Rate=310.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:21:52.988171, device xla:8, step 800, Rate=260.27, Global Rate=310.91, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:21:52.998998, device xla:4, step 800, Rate=260.21, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:21:53.004298, device xla:5, step 800, Rate=260.22, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:21:52.993772, device xla:3, step 800, Rate=260.20, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:21:53.005692, device xla:7, step 800, Rate=260.25, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:21:53.019928, device xla:1, step 800, Rate=260.18, Global Rate=310.90, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:24:36.559607, device xla:5, step 900, Rate=270.78, Global Rate=311.14, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:24:36.554366, device xla:4, step 900, Rate=270.78, Global Rate=311.14, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:24:36.572662, device xla:8, step 900, Rate=270.81, Global Rate=311.14, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:24:36.577702, device xla:6, step 900, Rate=270.79, Global Rate=311.13, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:24:36.547811, device xla:1, step 900, Rate=270.76, Global Rate=311.14, Compiles=107, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:24:36.581279, device xla:2, step 900, Rate=270.75, Global Rate=311.13, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:24:36.564358, device xla:7, step 900, Rate=270.81, Global Rate=311.14, Compiles=107, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:24:36.592787, device xla:3, step 900, Rate=270.75, Global Rate=311.13, Compiles=107, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:34:25.707355, device xla:8, step 1000, Rate=234.03, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:34:25.711461, device xla:6, step 1000, Rate=234.01, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:34:25.722587, device xla:4, step 1000, Rate=234.00, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:34:25.731528, device xla:7, step 1000, Rate=234.03, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:34:25.716265, device xla:1, step 1000, Rate=233.99, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:34:25.747197, device xla:2, step 1000, Rate=233.98, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:34:25.758170, device xla:5, step 1000, Rate=234.01, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:34:25.769735, device xla:3, step 1000, Rate=233.98, Global Rate=247.32, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:37:09.930950, device xla:8, step 1100, Rate=249.58, Global Rate=252.06, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:37:09.935771, device xla:3, step 1100, Rate=249.56, Global Rate=252.06, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:37:09.949461, device xla:1, step 1100, Rate=249.54, Global Rate=252.06, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:37:09.957968, device xla:7, step 1100, Rate=249.57, Global Rate=252.06, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:37:09.942335, device xla:5, step 1100, Rate=249.57, Global Rate=252.06, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:37:09.966866, device xla:2, step 1100, Rate=249.54, Global Rate=252.06, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:37:09.974404, device xla:6, step 1100, Rate=249.55, Global Rate=252.06, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:37:09.981994, device xla:4, step 1100, Rate=249.54, Global Rate=252.05, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:39:53.482722, device xla:8, step 1200, Rate=262.27, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:39:53.487063, device xla:6, step 1200, Rate=262.26, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:39:53.491740, device xla:4, step 1200, Rate=262.26, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:39:53.502354, device xla:7, step 1200, Rate=262.27, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:39:53.519488, device xla:1, step 1200, Rate=262.24, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:39:53.496656, device xla:3, step 1200, Rate=262.26, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:39:53.511684, device xla:2, step 1200, Rate=262.24, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:39:53.533053, device xla:5, step 1200, Rate=262.26, Global Rate=256.22, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:42:39.195871, device xla:8, step 1300, Rate=271.61, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:42:39.200150, device xla:4, step 1300, Rate=271.60, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:42:39.219060, device xla:6, step 1300, Rate=271.60, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:42:39.205218, device xla:1, step 1300, Rate=271.59, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:42:39.224652, device xla:3, step 1300, Rate=271.59, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:42:39.242271, device xla:5, step 1300, Rate=271.60, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:42:39.231816, device xla:7, step 1300, Rate=271.61, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:42:39.212420, device xla:2, step 1300, Rate=271.59, Global Rate=259.63, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:45:20.898380, device xla:6, step 1400, Rate=280.61, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:45:20.924814, device xla:7, step 1400, Rate=280.61, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:45:20.908723, device xla:4, step 1400, Rate=280.61, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:45:20.914203, device xla:5, step 1400, Rate=280.62, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:45:20.949112, device xla:8, step 1400, Rate=280.60, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:45:20.903458, device xla:3, step 1400, Rate=280.61, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:45:20.933294, device xla:1, step 1400, Rate=280.59, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:45:20.940841, device xla:2, step 1400, Rate=280.59, Global Rate=263.01, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:48:01.684051, device xla:3, step 1500, Rate=288.18, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:48:01.677171, device xla:1, step 1500, Rate=288.18, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:48:01.687862, device xla:7, step 1500, Rate=288.19, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
training torch.Size([1024, 16])/ 2019-08-27 10:48:01.717919, device xla:8, step 1500, Rate=288.17, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:48:01.710126, device xla:5, step 1500, Rate=288.18, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:48:01.696260, device xla:6, step 1500, Rate=288.17, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
training torch.Size([256, 64])/ 2019-08-27 10:48:01.702019, device xla:2, step 1500, Rate=288.17, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
training torch.Size([512, 32])/ 2019-08-27 10:48:01.734270, device xla:4, step 1500, Rate=288.16, Global Rate=266.10, Compiles=122, _local_scalar_dense=1494
Epoch 24 Training stats:
device xla:1
| epoch 024 | loss 0.167 | nll_loss 0.167 | ppl 1.12 | wps 6043 | ups 1 | wpb 11142.639 | bsz 411.056 | num_updates 36192 | lr 0.000166224 | gnorm 0.048 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 52707
device xla:2
| epoch 024 | loss 0.166 | nll_loss 0.166 | ppl 1.12 | wps 6053 | ups 1 | wpb 11160.260 | bsz 409.118 | num_updates 36192 | lr 0.000166224 | gnorm 0.046 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 53719
device xla:3
| epoch 024 | loss 0.167 | nll_loss 0.167 | ppl 1.12 | wps 6027 | ups 1 | wpb 11113.507 | bsz 411.494 | num_updates 36192 | lr 0.000166224 | gnorm 0.050 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 52273
device xla:4
| epoch 024 | loss 0.167 | nll_loss 0.167 | ppl 1.12 | wps 6033 | ups 1 | wpb 11124.422 | bsz 411.176 | num_updates 36192 | lr 0.000166224 | gnorm 0.049 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 53721
device xla:5
| epoch 024 | loss 0.166 | nll_loss 0.166 | ppl 1.12 | wps 6056 | ups 1 | wpb 11166.764 | bsz 409.874 | num_updates 36192 | lr 0.000166224 | gnorm 0.047 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 53840
device xla:6
| epoch 024 | loss 0.167 | nll_loss 0.167 | ppl 1.12 | wps 6049 | ups 1 | wpb 11154.348 | bsz 408.934 | num_updates 36192 | lr 0.000166224 | gnorm 0.048 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 53951
device xla:7
| epoch 024 | loss 0.166 | nll_loss 0.166 | ppl 1.12 | wps 6063 | ups 1 | wpb 11180.285 | bsz 408.240 | num_updates 36192 | lr 0.000166224 | gnorm 0.048 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 52783
device xla:8
| epoch 024 | loss 0.167 | nll_loss 0.167 | ppl 1.12 | wps 6035 | ups 1 | wpb 11127.606 | bsz 408.630 | num_updates 36192 | lr 0.000166224 | gnorm 0.051 | clip 0.000 | oom 0.000 | wall 66733 | train_wall 52904
Epoch 24 Tracker Rates:
Rate=288.39, Global Rate=266.21
Rate=288.49, Global Rate=266.21
Rate=288.42, Global Rate=266.21
Rate=288.61, Global Rate=266.21
Rate=288.53, Global Rate=266.21
Rate=288.47, Global Rate=266.21
Rate=288.45, Global Rate=266.21
Rate=288.55, Global Rate=266.21
Epoch 24 end 2019-08-27 10:48:15.837675
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 292326
Counter: 05d32h19m43s935ms330.055us
ValueRate: 06s192ms463.741us / second
Rate: 5.00816 / second
Percentiles: 1%=01s165ms105.408us; 5%=01s171ms917.496us; 10%=01s174ms329.239us; 20%=01s178ms962.167us; 50%=01s277ms573.484us; 80%=01s289ms893.983us; 90%=01s292ms291.102us; 95%=01s296ms198.273us; 99%=01s301ms673.524us
Metric: InboundData
TotalSamples: 1534
Counter: 2.97KB
ValueRate: 0.05B / second
Rate: 0.025228 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1199550
Counter: 112.01GB
ValueRate: 501.17KB / second
Rate: 20.5395 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2641163
Counter: 13h17m36s603ms645.099us
ValueRate: 333ms265.497us / second
Rate: 44.9829 / second
Percentiles: 1%=443.153us; 5%=484.901us; 10%=527.920us; 20%=594.577us; 50%=805.165us; 80%=002ms972.208us; 90%=010ms38.931us; 95%=027ms494.395us; 99%=060ms703.860us
Metric: TransferFromServerTime
TotalSamples: 1534
Counter: 09s696ms143.767us
ValueRate: 57.685us / second
Rate: 0.025228 / second
Percentiles: 1%=590.641us; 5%=650.443us; 10%=691.536us; 20%=743.361us; 50%=958.564us; 80%=002ms394.516us; 90%=004ms974.844us; 95%=008ms283.771us; 99%=031ms895.985us
Metric: TransferToServerTime
TotalSamples: 1199550
Counter: 04d48h02m08s733ms575.514us
ValueRate: 05s110ms835.939us / second
Rate: 20.9565 / second
Percentiles: 1%=001ms59.067us; 5%=001ms167.759us; 10%=001ms254.292us; 20%=001ms365.907us; 50%=002ms68.832us; 80%=908ms737.234us; 90%=01s024ms375.215us; 95%=01s076ms880.692us; 99%=01s091ms725.907us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 292204
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 214895513
Counter: CreateXlaTensor
Value: 1401624504
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 214888480
Counter: DestroyXlaTensor
Value: 1401618495
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 214888480
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 23957
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1534
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 10:48:19.626779, device xla:8, step 0, Compiles=122, _local_scalar_dense=1534
validation/ 2019-08-27 10:48:19.639066, device xla:2, step 0, Compiles=122, _local_scalar_dense=1534
validation/ 2019-08-27 10:48:19.643549, device xla:1, step 0, Compiles=122, _local_scalar_dense=1534
validation/ 2019-08-27 10:48:19.648372, device xla:6, step 0, Compiles=122, _local_scalar_dense=1534
validation/ 2019-08-27 10:48:19.652691, device xla:7, step 0, Compiles=122, _local_scalar_dense=1534
validation/ 2019-08-27 10:48:19.654749, device xla:3, step 0, Compiles=122, _local_scalar_dense=1534
validation/ 2019-08-27 10:48:19.657943, device xla:5, step 0, Compiles=122, _local_scalar_dense=1534
validation/ 2019-08-27 10:48:19.792657, device xla:4, step 0, Compiles=122, _local_scalar_dense=1534
validation stats on subset "valid" - 2019-08-27 10:48:25.719988
| epoch 024 | valid on 'valid' subset | loss 3.844 | nll_loss 2.031 | ppl 4.09 | num_updates 36192
| epoch 024 | valid on 'valid' subset | loss 3.875 | nll_loss 2.047 | ppl 4.13 | num_updates 36192
| epoch 024 | valid on 'valid' subset | loss 3.953 | nll_loss 2.141 | ppl 4.41 | num_updates 36192
| epoch 024 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 36192
| epoch 024 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 36192
| epoch 024 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 36192
| epoch 024 | valid on 'valid' subset | loss 3.906 | nll_loss 2.078 | ppl 4.22 | num_updates 36192
| epoch 024 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 36192
old learning rate: 0.0001697991106489232
new learning rate: 0.00016622399213546174
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 292447
Counter: 05d32h19m29s923ms776.550us
ValueRate: 06s115ms785.982us / second
Rate: 5.38677 / second
Percentiles: 1%=377ms606.823us; 5%=378ms261.682us; 10%=391ms439.956us; 20%=01s174ms252.056us; 50%=01s188ms15.724us; 80%=01s288ms133.517us; 90%=01s291ms336.763us; 95%=01s295ms850.441us; 99%=01s299ms379.827us
Metric: InboundData
TotalSamples: 1559
Counter: 3.02KB
ValueRate: 0.05B / second
Rate: 0.0268855 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1199789
Counter: 112.05GB
ValueRate: 968.65KB / second
Rate: 21.1698 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2641729
Counter: 13h17m06s974ms877.857us
ValueRate: 02s560ms406.658us / second
Rate: 49.2943 / second
Percentiles: 1%=443.170us; 5%=488.867us; 10%=527.673us; 20%=583.403us; 50%=784.856us; 80%=002ms539.835us; 90%=027ms494.395us; 95%=375ms778.290us; 99%=388ms934.686us
Metric: TransferFromServerTime
TotalSamples: 1559
Counter: 09s741ms779.186us
ValueRate: 56.652us / second
Rate: 0.0268855 / second
Percentiles: 1%=590.641us; 5%=650.443us; 10%=691.536us; 20%=742.829us; 50%=947.671us; 80%=002ms377.764us; 90%=004ms800.523us; 95%=007ms206.476us; 99%=028ms877.025us
Metric: TransferToServerTime
TotalSamples: 1199789
Counter: 04d48h03m33s926ms559.811us
ValueRate: 04s332ms628.582us / second
Rate: 21.1725 / second
Percentiles: 1%=001ms86.241us; 5%=001ms199.793us; 10%=001ms287.060us; 20%=001ms419.045us; 50%=002ms51.968us; 80%=225ms682.297us; 90%=986ms517.800us; 95%=01s060ms406.760us; 99%=01s081ms695.981us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 292325
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 214897121
Counter: CreateXlaTensor
Value: 1401759321
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 214890087
Counter: DestroyXlaTensor
Value: 1401753312
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 214890088
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 23957
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1559
Epoch 25 begin 2019-08-27 10:48:25.741025
training torch.Size([512, 32])/ 2019-08-27 10:48:34.211088, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:48:34.250791, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:48:34.346832, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:48:34.440551, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:48:34.662142, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:48:34.826725, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 10:48:35.166578, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:48:35.384733, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:51:22.854539, device xla:8, step 100, Rate=61.07, Global Rate=295.31, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 10:51:22.864293, device xla:6, step 100, Rate=60.94, Global Rate=295.29, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:51:22.866595, device xla:1, step 100, Rate=60.72, Global Rate=295.29, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:51:22.873984, device xla:4, step 100, Rate=60.80, Global Rate=295.28, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 10:51:22.889796, device xla:7, step 100, Rate=61.13, Global Rate=295.25, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:51:22.858808, device xla:3, step 100, Rate=60.77, Global Rate=295.30, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:51:22.882121, device xla:5, step 100, Rate=60.87, Global Rate=295.26, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:51:22.898144, device xla:2, step 100, Rate=60.72, Global Rate=295.23, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:54:05.406121, device xla:7, step 200, Rate=111.92, Global Rate=304.83, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:54:05.410706, device xla:8, step 200, Rate=111.85, Global Rate=304.82, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 10:54:05.422225, device xla:6, step 200, Rate=111.74, Global Rate=304.81, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:54:05.415157, device xla:1, step 200, Rate=111.57, Global Rate=304.82, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:54:05.431422, device xla:2, step 200, Rate=111.58, Global Rate=304.80, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:54:05.438429, device xla:4, step 200, Rate=111.63, Global Rate=304.80, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:54:05.425880, device xla:3, step 200, Rate=111.60, Global Rate=304.81, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:54:05.452369, device xla:5, step 200, Rate=111.69, Global Rate=304.79, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 10:56:47.523499, device xla:1, step 300, Rate=152.42, Global Rate=308.40, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:56:47.529667, device xla:5, step 300, Rate=152.53, Global Rate=308.40, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 10:56:47.566774, device xla:7, step 300, Rate=152.68, Global Rate=308.38, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:56:47.554805, device xla:4, step 300, Rate=152.47, Global Rate=308.39, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:56:47.542933, device xla:2, step 300, Rate=152.43, Global Rate=308.39, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:56:47.535868, device xla:6, step 300, Rate=152.56, Global Rate=308.40, Compiles=122, _local_scalar_dense=1559training torch.Size([256, 64])/ 2019-08-27 10:56:47.576426, device xla:8, step 300, Rate=152.62, Global Rate=308.37, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:56:47.549259, device xla:3, step 300, Rate=152.44, Global Rate=308.39, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:59:31.228486, device xla:7, step 400, Rate=184.71, Global Rate=309.48, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:59:31.232948, device xla:8, step 400, Rate=184.67, Global Rate=309.48, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:59:31.254062, device xla:6, step 400, Rate=184.60, Global Rate=309.47, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:59:31.238257, device xla:4, step 400, Rate=184.53, Global Rate=309.48, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:59:31.269666, device xla:2, step 400, Rate=184.49, Global Rate=309.46, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 10:59:31.246637, device xla:1, step 400, Rate=184.48, Global Rate=309.47, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:59:31.256724, device xla:5, step 400, Rate=184.57, Global Rate=309.47, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 10:59:31.264003, device xla:3, step 400, Rate=184.50, Global Rate=309.47, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:02:13.782540, device xla:8, step 500, Rate=210.73, Global Rate=310.57, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:02:13.786820, device xla:7, step 500, Rate=210.76, Global Rate=310.56, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:02:13.791121, device xla:2, step 500, Rate=210.60, Global Rate=310.56, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:02:13.795730, device xla:3, step 500, Rate=210.61, Global Rate=310.56, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:02:13.826221, device xla:6, step 500, Rate=210.66, Global Rate=310.55, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:02:13.809215, device xla:4, step 500, Rate=210.61, Global Rate=310.55, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:02:13.818446, device xla:1, step 500, Rate=210.57, Global Rate=310.55, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:02:13.802423, device xla:5, step 500, Rate=210.65, Global Rate=310.56, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:04:57.719079, device xla:7, step 600, Rate=231.07, Global Rate=310.86, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:04:57.724673, device xla:2, step 600, Rate=230.94, Global Rate=310.85, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:04:57.729082, device xla:3, step 600, Rate=230.95, Global Rate=310.85, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:04:57.734279, device xla:8, step 600, Rate=231.04, Global Rate=310.85, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:04:57.739286, device xla:4, step 600, Rate=230.96, Global Rate=310.85, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:04:57.749410, device xla:1, step 600, Rate=230.92, Global Rate=310.85, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:04:57.756891, device xla:5, step 600, Rate=230.98, Global Rate=310.84, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:04:57.769692, device xla:6, step 600, Rate=230.99, Global Rate=310.84, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:07:42.910939, device xla:8, step 700, Rate=246.83, Global Rate=310.72, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:07:42.920976, device xla:7, step 700, Rate=246.84, Global Rate=310.72, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:07:42.925161, device xla:3, step 700, Rate=246.75, Global Rate=310.72, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:07:42.938334, device xla:4, step 700, Rate=246.75, Global Rate=310.72, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:07:42.914621, device xla:5, step 700, Rate=246.78, Global Rate=310.72, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:07:42.930448, device xla:6, step 700, Rate=246.79, Global Rate=310.72, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:07:42.941437, device xla:1, step 700, Rate=246.73, Global Rate=310.72, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:07:42.956476, device xla:2, step 700, Rate=246.73, Global Rate=310.71, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:10:27.069533, device xla:7, step 800, Rate=259.86, Global Rate=310.87, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:10:27.073806, device xla:8, step 800, Rate=259.84, Global Rate=310.87, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:10:27.078468, device xla:2, step 800, Rate=259.77, Global Rate=310.87, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:10:27.082951, device xla:6, step 800, Rate=259.82, Global Rate=310.87, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:10:27.084707, device xla:4, step 800, Rate=259.78, Global Rate=310.87, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:10:27.097370, device xla:1, step 800, Rate=259.76, Global Rate=310.86, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:10:27.106690, device xla:5, step 800, Rate=259.79, Global Rate=310.86, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:10:27.118841, device xla:3, step 800, Rate=259.76, Global Rate=310.86, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:13:09.829295, device xla:7, step 900, Rate=270.80, Global Rate=311.28, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:13:09.833796, device xla:8, step 900, Rate=270.79, Global Rate=311.28, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:13:09.852797, device xla:2, step 900, Rate=270.73, Global Rate=311.27, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:13:09.840185, device xla:5, step 900, Rate=270.76, Global Rate=311.28, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:13:09.847141, device xla:3, step 900, Rate=270.74, Global Rate=311.27, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:13:09.869388, device xla:1, step 900, Rate=270.72, Global Rate=311.27, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:13:09.882461, device xla:6, step 900, Rate=270.75, Global Rate=311.27, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:13:09.859459, device xla:4, step 900, Rate=270.74, Global Rate=311.27, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:15:56.397151, device xla:2, step 1000, Rate=278.07, Global Rate=310.88, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:15:56.401681, device xla:7, step 1000, Rate=278.12, Global Rate=310.88, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:15:56.414918, device xla:3, step 1000, Rate=278.07, Global Rate=310.88, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:15:56.406560, device xla:4, step 1000, Rate=278.07, Global Rate=310.88, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:15:56.422510, device xla:1, step 1000, Rate=278.06, Global Rate=310.88, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:15:56.430185, device xla:8, step 1000, Rate=278.09, Global Rate=310.88, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:15:56.437179, device xla:5, step 1000, Rate=278.07, Global Rate=310.88, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:15:56.448075, device xla:6, step 1000, Rate=278.08, Global Rate=310.87, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:18:42.151894, device xla:7, step 1100, Rate=284.27, Global Rate=310.70, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:18:42.145649, device xla:1, step 1100, Rate=284.24, Global Rate=310.70, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:18:42.156369, device xla:2, step 1100, Rate=284.23, Global Rate=310.70, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:18:42.174962, device xla:6, step 1100, Rate=284.25, Global Rate=310.70, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:18:42.178345, device xla:4, step 1100, Rate=284.23, Global Rate=310.70, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:18:42.162232, device xla:5, step 1100, Rate=284.25, Global Rate=310.70, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:18:42.196090, device xla:8, step 1100, Rate=284.25, Global Rate=310.69, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:18:42.203237, device xla:3, step 1100, Rate=284.22, Global Rate=310.69, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:21:26.608616, device xla:2, step 1200, Rate=289.65, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:21:26.618091, device xla:7, step 1200, Rate=289.68, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:21:26.612935, device xla:3, step 1200, Rate=289.66, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:21:26.623055, device xla:4, step 1200, Rate=289.65, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:21:26.649486, device xla:8, step 1200, Rate=289.67, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:21:26.638334, device xla:6, step 1200, Rate=289.66, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:21:26.628014, device xla:1, step 1200, Rate=289.64, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:21:26.656948, device xla:5, step 1200, Rate=289.65, Global Rate=310.75, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:24:09.155900, device xla:3, step 1300, Rate=294.73, Global Rate=311.07, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:24:09.149335, device xla:5, step 1300, Rate=294.74, Global Rate=311.08, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:24:09.179404, device xla:6, step 1300, Rate=294.73, Global Rate=311.07, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:24:09.181690, device xla:1, step 1300, Rate=294.71, Global Rate=311.07, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:24:09.162193, device xla:8, step 1300, Rate=294.74, Global Rate=311.07, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:24:09.167024, device xla:4, step 1300, Rate=294.72, Global Rate=311.07, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:24:09.192345, device xla:2, step 1300, Rate=294.70, Global Rate=311.07, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:24:09.199679, device xla:7, step 1300, Rate=294.73, Global Rate=311.07, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:26:50.942172, device xla:2, step 1400, Rate=299.07, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:26:50.946288, device xla:3, step 1400, Rate=299.07, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:26:50.963370, device xla:4, step 1400, Rate=299.07, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:26:50.951041, device xla:1, step 1400, Rate=299.07, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:26:50.957552, device xla:7, step 1400, Rate=299.09, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:26:50.983102, device xla:8, step 1400, Rate=299.07, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:26:50.995114, device xla:6, step 1400, Rate=299.07, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:26:50.976281, device xla:5, step 1400, Rate=299.07, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:29:35.301373, device xla:2, step 1500, Rate=301.56, Global Rate=311.46, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:29:35.305784, device xla:4, step 1500, Rate=301.56, Global Rate=311.46, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:29:35.330262, device xla:7, step 1500, Rate=301.57, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:29:35.351921, device xla:6, step 1500, Rate=301.56, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:29:35.344006, device xla:1, step 1500, Rate=301.54, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([512, 32])/ 2019-08-27 11:29:35.336997, device xla:5, step 1500, Rate=301.56, Global Rate=311.45, Compiles=122, _local_scalar_dense=1559
training torch.Size([1024, 16])/ 2019-08-27 11:29:35.314084, device xla:3, step 1500, Rate=301.56, Global Rate=311.46, Compiles=122, _local_scalar_dense=1559
training torch.Size([256, 64])/ 2019-08-27 11:29:35.322309, device xla:8, step 1500, Rate=301.57, Global Rate=311.46, Compiles=122, _local_scalar_dense=1559
Epoch 25 Training stats:
device xla:1
| epoch 025 | loss 0.160 | nll_loss 0.160 | ppl 1.12 | wps 6068 | ups 1 | wpb 11142.353 | bsz 411.080 | num_updates 37700 | lr 0.000162866 | gnorm 0.046 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 54757
device xla:2
| epoch 025 | loss 0.159 | nll_loss 0.159 | ppl 1.12 | wps 6079 | ups 1 | wpb 11163.310 | bsz 409.532 | num_updates 37700 | lr 0.000162866 | gnorm 0.044 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 55772
device xla:3
| epoch 025 | loss 0.160 | nll_loss 0.160 | ppl 1.12 | wps 6053 | ups 1 | wpb 11115.015 | bsz 411.651 | num_updates 37700 | lr 0.000162866 | gnorm 0.048 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 54326
device xla:4
| epoch 025 | loss 0.160 | nll_loss 0.160 | ppl 1.12 | wps 6058 | ups 1 | wpb 11124.010 | bsz 410.775 | num_updates 37700 | lr 0.000162866 | gnorm 0.047 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 55765
device xla:5
| epoch 025 | loss 0.159 | nll_loss 0.159 | ppl 1.12 | wps 6080 | ups 1 | wpb 11165.232 | bsz 409.478 | num_updates 37700 | lr 0.000162866 | gnorm 0.045 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 55892
device xla:6
| epoch 025 | loss 0.160 | nll_loss 0.160 | ppl 1.12 | wps 6073 | ups 1 | wpb 11151.932 | bsz 408.853 | num_updates 37700 | lr 0.000162866 | gnorm 0.046 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 55996
device xla:7
| epoch 025 | loss 0.159 | nll_loss 0.159 | ppl 1.12 | wps 6088 | ups 1 | wpb 11179.427 | bsz 408.561 | num_updates 37700 | lr 0.000162866 | gnorm 0.046 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 54842
device xla:8
| epoch 025 | loss 0.160 | nll_loss 0.160 | ppl 1.12 | wps 6060 | ups 1 | wpb 11128.409 | bsz 408.588 | num_updates 37700 | lr 0.000162866 | gnorm 0.049 | clip 0.000 | oom 0.000 | wall 69227 | train_wall 54953
Epoch 25 Tracker Rates:
Rate=298.66, Global Rate=311.31
Rate=298.50, Global Rate=311.31
Rate=298.55, Global Rate=311.31
Rate=298.52, Global Rate=311.31
Rate=298.64, Global Rate=311.31
Rate=298.70, Global Rate=311.31
Rate=298.62, Global Rate=311.31
Rate=298.59, Global Rate=311.31
Epoch 25 end 2019-08-27 11:29:49.610540
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 304511
Counter: 05d42h03m03s788ms543.732us
ValueRate: 06s057ms366.316us / second
Rate: 4.91899 / second
Percentiles: 1%=01s163ms629.523us; 5%=01s168ms958.093us; 10%=01s171ms958.686us; 20%=01s176ms823.870us; 50%=01s273ms801.200us; 80%=01s287ms331.605us; 90%=01s291ms304.994us; 95%=01s295ms193.638us; 99%=01s311ms25.764us
Metric: InboundData
TotalSamples: 1599
Counter: 3.10KB
ValueRate: 0.05B / second
Rate: 0.0252459 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1249627
Counter: 116.54GB
ValueRate: 500.67KB / second
Rate: 20.5421 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2749285
Counter: 13h02m13s396ms566.856us
ValueRate: 336ms731.942us / second
Rate: 44.7339 / second
Percentiles: 1%=452.609us; 5%=505.023us; 10%=552.310us; 20%=621.847us; 50%=842.469us; 80%=002ms15.065us; 90%=011ms866.104us; 95%=026ms670.994us; 99%=057ms536.687us
Metric: TransferFromServerTime
TotalSamples: 1599
Counter: 09s818ms512.899us
ValueRate: 53.689us / second
Rate: 0.0252459 / second
Percentiles: 1%=590.641us; 5%=650.443us; 10%=684.924us; 20%=739.564us; 50%=935.639us; 80%=002ms344.489us; 90%=004ms830.211us; 95%=008ms808.072us; 99%=028ms877.025us
Metric: TransferToServerTime
TotalSamples: 1249627
Counter: 04d57h16m26s641ms73.962us
ValueRate: 05s921ms655.985us / second
Rate: 20.5421 / second
Percentiles: 1%=001ms63.884us; 5%=001ms171.059us; 10%=001ms260.075us; 20%=001ms367.594us; 50%=002ms956.543us; 80%=923ms226.893us; 90%=978ms177.597us; 95%=01s059ms976.630us; 99%=01s087ms651.480us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 304389
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 223849583
Counter: CreateXlaTensor
Value: 1460030953
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 223840613
Counter: DestroyXlaTensor
Value: 1460024944
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 223842550
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 23997
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1599
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 11:29:53.466428, device xla:5, step 0, Compiles=122, _local_scalar_dense=1599
validation/ 2019-08-27 11:29:53.472360, device xla:6, step 0, Compiles=122, _local_scalar_dense=1599
validation/ 2019-08-27 11:29:53.477412, device xla:4, step 0, Compiles=122, _local_scalar_dense=1599
validation/ 2019-08-27 11:29:53.481761, device xla:3, step 0, Compiles=122, _local_scalar_dense=1599
validation/ 2019-08-27 11:29:53.485018, device xla:8, step 0, Compiles=122, _local_scalar_dense=1599
validation/ 2019-08-27 11:29:53.486863, device xla:1, step 0, Compiles=122, _local_scalar_dense=1599
validation/ 2019-08-27 11:29:53.488557, device xla:2, step 0, Compiles=122, _local_scalar_dense=1599
validation/ 2019-08-27 11:29:53.617571, device xla:7, step 0, Compiles=122, _local_scalar_dense=1599
validation stats on subset "valid" - 2019-08-27 11:29:59.577205
| epoch 025 | valid on 'valid' subset | loss 3.828 | nll_loss 2.016 | ppl 4.04 | num_updates 37700
| epoch 025 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 37700
| epoch 025 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 37700
| epoch 025 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 37700
| epoch 025 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 37700
| epoch 025 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 37700
| epoch 025 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 37700
| epoch 025 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 37700
old learning rate: 0.00016622399213546174
new learning rate: 0.00016286558549611404
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 304632
Counter: 05d42h04m49s863ms57.688us
ValueRate: 06s970ms209.248us / second
Rate: 5.28204 / second
Percentiles: 1%=377ms80.194us; 5%=380ms534.913us; 10%=392ms216.775us; 20%=01s170ms308.765us; 50%=01s184ms634.843us; 80%=01s286ms260.268us; 90%=01s290ms183.535us; 95%=01s294ms100.618us; 99%=01s311ms25.764us
Metric: InboundData
TotalSamples: 1624
Counter: 3.15KB
ValueRate: 0.05B / second
Rate: 0.0268978 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1249867
Counter: 116.59GB
ValueRate: 952.55KB / second
Rate: 20.8181 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2749868
Counter: 13h03m43s217ms695.970us
ValueRate: 02s599ms922.354us / second
Rate: 51.7114 / second
Percentiles: 1%=465.095us; 5%=529.038us; 10%=574.919us; 20%=627.209us; 50%=815.393us; 80%=001ms444.070us; 90%=024ms68.464us; 95%=375ms446.509us; 99%=389ms111.260us
Metric: TransferFromServerTime
TotalSamples: 1624
Counter: 09s865ms775.005us
ValueRate: 56.785us / second
Rate: 0.0268978 / second
Percentiles: 1%=593.996us; 5%=650.709us; 10%=688.179us; 20%=741.177us; 50%=944.421us; 80%=002ms351.295us; 90%=004ms746.147us; 95%=007ms206.476us; 99%=028ms877.025us
Metric: TransferToServerTime
TotalSamples: 1249867
Counter: 04d57h17m50s391ms809.227us
ValueRate: 04s166ms666.134us / second
Rate: 20.818 / second
Percentiles: 1%=001ms77.050us; 5%=001ms192.689us; 10%=001ms292.317us; 20%=001ms439.986us; 50%=002ms248.627us; 80%=223ms538.775us; 90%=954ms759.389us; 95%=01s035ms482.847us; 99%=01s071ms744.601us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 304510
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 223851192
Counter: CreateXlaTensor
Value: 1460165770
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 223844158
Counter: DestroyXlaTensor
Value: 1460159761
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 223844159
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 23997
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1624
Epoch 26 begin 2019-08-27 11:29:59.602465
training torch.Size([512, 32])/ 2019-08-27 11:30:08.043725, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:30:08.057375, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:30:08.081863, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:30:08.210594, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:30:08.240164, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:30:08.415162, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:30:08.772055, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:30:08.903046, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:32:57.979006, device xla:5, step 100, Rate=60.33, Global Rate=292.74, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:32:57.996787, device xla:1, step 100, Rate=60.25, Global Rate=292.71, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:32:57.991000, device xla:7, step 100, Rate=60.51, Global Rate=292.72, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:32:58.006024, device xla:8, step 100, Rate=60.55, Global Rate=292.70, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:32:57.984014, device xla:6, step 100, Rate=60.39, Global Rate=292.73, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:32:58.010871, device xla:4, step 100, Rate=60.31, Global Rate=292.69, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:32:58.021675, device xla:2, step 100, Rate=60.25, Global Rate=292.67, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:32:57.999271, device xla:3, step 100, Rate=60.26, Global Rate=292.70, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:35:43.681728, device xla:5, step 200, Rate=110.06, Global Rate=300.64, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:35:43.694770, device xla:1, step 200, Rate=110.00, Global Rate=300.63, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:35:43.696485, device xla:4, step 200, Rate=110.05, Global Rate=300.63, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:35:43.700234, device xla:2, step 200, Rate=110.00, Global Rate=300.63, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:35:43.687527, device xla:7, step 200, Rate=110.21, Global Rate=300.64, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:35:43.705258, device xla:6, step 200, Rate=110.10, Global Rate=300.62, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:35:43.715662, device xla:3, step 200, Rate=110.00, Global Rate=300.61, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:35:43.727331, device xla:8, step 200, Rate=110.23, Global Rate=300.60, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:38:27.290737, device xla:2, step 300, Rate=150.60, Global Rate=304.63, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:38:27.295563, device xla:7, step 300, Rate=150.76, Global Rate=304.63, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:38:27.301730, device xla:4, step 300, Rate=150.63, Global Rate=304.63, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:38:27.303848, device xla:8, step 300, Rate=150.79, Global Rate=304.63, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:38:27.324044, device xla:1, step 300, Rate=150.58, Global Rate=304.61, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:38:27.327430, device xla:3, step 300, Rate=150.59, Global Rate=304.61, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:38:27.308753, device xla:6, step 300, Rate=150.67, Global Rate=304.62, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:38:27.317920, device xla:5, step 300, Rate=150.63, Global Rate=304.62, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:41:11.850016, device xla:7, step 400, Rate=182.83, Global Rate=306.23, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:41:11.854152, device xla:8, step 400, Rate=182.86, Global Rate=306.23, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:41:11.859408, device xla:2, step 400, Rate=182.70, Global Rate=306.23, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:41:11.871338, device xla:3, step 400, Rate=182.70, Global Rate=306.22, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:41:11.863825, device xla:5, step 400, Rate=182.73, Global Rate=306.23, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:41:11.896249, device xla:1, step 400, Rate=182.69, Global Rate=306.21, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:41:11.877350, device xla:6, step 400, Rate=182.76, Global Rate=306.22, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:41:11.884507, device xla:4, step 400, Rate=182.72, Global Rate=306.22, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:43:54.551065, device xla:8, step 500, Rate=209.23, Global Rate=307.89, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:43:54.555102, device xla:7, step 500, Rate=209.20, Global Rate=307.89, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:43:54.573423, device xla:2, step 500, Rate=209.09, Global Rate=307.88, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:43:54.559485, device xla:1, step 500, Rate=209.10, Global Rate=307.88, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:43:54.582259, device xla:3, step 500, Rate=209.10, Global Rate=307.88, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:43:54.576293, device xla:5, step 500, Rate=209.12, Global Rate=307.88, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:43:54.566236, device xla:4, step 500, Rate=209.12, Global Rate=307.88, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:43:54.590092, device xla:6, step 500, Rate=209.14, Global Rate=307.87, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:46:39.414649, device xla:8, step 600, Rate=229.49, Global Rate=308.33, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:46:39.418830, device xla:2, step 600, Rate=229.39, Global Rate=308.33, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:46:39.423465, device xla:7, step 600, Rate=229.47, Global Rate=308.33, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:46:39.428178, device xla:3, step 600, Rate=229.40, Global Rate=308.33, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:46:39.433507, device xla:4, step 600, Rate=229.41, Global Rate=308.32, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:46:39.440839, device xla:5, step 600, Rate=229.41, Global Rate=308.32, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:46:39.448206, device xla:6, step 600, Rate=229.43, Global Rate=308.32, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:46:39.465518, device xla:1, step 600, Rate=229.38, Global Rate=308.31, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:49:21.468935, device xla:2, step 700, Rate=246.71, Global Rate=309.39, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:49:21.485489, device xla:7, step 700, Rate=246.76, Global Rate=309.39, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:49:21.480284, device xla:5, step 700, Rate=246.72, Global Rate=309.39, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:49:21.497295, device xla:8, step 700, Rate=246.77, Global Rate=309.39, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:49:21.473504, device xla:4, step 700, Rate=246.72, Global Rate=309.39, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:49:21.502430, device xla:1, step 700, Rate=246.70, Global Rate=309.39, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:49:21.490223, device xla:6, step 700, Rate=246.74, Global Rate=309.39, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:49:21.511548, device xla:3, step 700, Rate=246.69, Global Rate=309.38, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:52:03.664010, device xla:8, step 800, Rate=260.56, Global Rate=310.17, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:52:03.668920, device xla:2, step 800, Rate=260.50, Global Rate=310.16, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:52:03.678973, device xla:3, step 800, Rate=260.50, Global Rate=310.16, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:52:03.672934, device xla:6, step 800, Rate=260.53, Global Rate=310.16, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:52:03.684324, device xla:4, step 800, Rate=260.50, Global Rate=310.16, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:52:03.701898, device xla:5, step 800, Rate=260.50, Global Rate=310.16, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:52:03.693478, device xla:7, step 800, Rate=260.54, Global Rate=310.16, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:52:03.715259, device xla:1, step 800, Rate=260.48, Global Rate=310.15, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:54:47.888066, device xla:7, step 900, Rate=270.80, Global Rate=310.34, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:54:47.913310, device xla:2, step 900, Rate=270.74, Global Rate=310.34, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:54:47.892837, device xla:1, step 900, Rate=270.76, Global Rate=310.34, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:54:47.899481, device xla:6, step 900, Rate=270.77, Global Rate=310.34, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:54:47.926570, device xla:3, step 900, Rate=270.75, Global Rate=310.33, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:54:47.919754, device xla:5, step 900, Rate=270.76, Global Rate=310.34, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:54:47.906289, device xla:4, step 900, Rate=270.76, Global Rate=310.34, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:54:47.937577, device xla:8, step 900, Rate=270.79, Global Rate=310.33, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:57:29.495502, device xla:7, step 1000, Rate=280.00, Global Rate=310.98, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:57:29.500221, device xla:2, step 1000, Rate=279.97, Global Rate=310.98, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 11:57:29.512047, device xla:8, step 1000, Rate=280.00, Global Rate=310.98, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:57:29.504969, device xla:6, step 1000, Rate=279.98, Global Rate=310.98, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:57:29.516545, device xla:4, step 1000, Rate=279.97, Global Rate=310.97, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 11:57:29.535490, device xla:1, step 1000, Rate=279.96, Global Rate=310.97, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:57:29.523956, device xla:5, step 1000, Rate=279.97, Global Rate=310.97, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 11:57:29.542861, device xla:3, step 1000, Rate=279.96, Global Rate=310.97, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:00:14.064241, device xla:3, step 1100, Rate=286.21, Global Rate=310.99, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:00:14.068875, device xla:8, step 1100, Rate=286.23, Global Rate=310.99, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:00:14.073518, device xla:7, step 1100, Rate=286.22, Global Rate=310.99, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:00:14.086420, device xla:1, step 1100, Rate=286.20, Global Rate=310.99, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:00:14.078489, device xla:4, step 1100, Rate=286.20, Global Rate=310.99, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:00:14.094599, device xla:2, step 1100, Rate=286.19, Global Rate=310.99, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:00:14.108994, device xla:5, step 1100, Rate=286.19, Global Rate=310.98, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:00:14.101368, device xla:6, step 1100, Rate=286.20, Global Rate=310.98, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:02:57.500013, device xla:2, step 1200, Rate=291.62, Global Rate=311.18, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:02:57.504735, device xla:3, step 1200, Rate=291.62, Global Rate=311.18, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:02:57.509700, device xla:7, step 1200, Rate=291.63, Global Rate=311.18, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:02:57.522641, device xla:1, step 1200, Rate=291.61, Global Rate=311.18, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:02:57.524890, device xla:5, step 1200, Rate=291.62, Global Rate=311.18, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 12:02:57.532980, device xla:4, step 1200, Rate=291.61, Global Rate=311.17, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:02:57.514125, device xla:6, step 1200, Rate=291.62, Global Rate=311.18, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:02:57.548423, device xla:8, step 1200, Rate=291.62, Global Rate=311.17, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:05:42.259016, device xla:2, step 1300, Rate=295.44, Global Rate=311.15, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:05:42.271426, device xla:8, step 1300, Rate=295.46, Global Rate=311.15, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:05:42.276426, device xla:3, step 1300, Rate=295.44, Global Rate=311.14, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:05:42.264069, device xla:4, step 1300, Rate=295.45, Global Rate=311.15, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:05:42.281632, device xla:1, step 1300, Rate=295.44, Global Rate=311.14, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:05:42.284024, device xla:5, step 1300, Rate=295.44, Global Rate=311.14, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:05:42.290192, device xla:7, step 1300, Rate=295.45, Global Rate=311.14, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:05:42.299819, device xla:6, step 1300, Rate=295.44, Global Rate=311.14, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:08:25.958137, device xla:8, step 1400, Rate=298.93, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:08:25.962550, device xla:7, step 1400, Rate=298.92, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:08:25.966930, device xla:3, step 1400, Rate=298.91, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:08:25.978760, device xla:1, step 1400, Rate=298.91, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:08:25.971579, device xla:6, step 1400, Rate=298.92, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:08:25.983871, device xla:2, step 1400, Rate=298.90, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:08:25.994857, device xla:4, step 1400, Rate=298.90, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 12:08:25.986884, device xla:5, step 1400, Rate=298.91, Global Rate=311.26, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:11:09.175675, device xla:8, step 1500, Rate=301.88, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:11:09.179888, device xla:7, step 1500, Rate=301.88, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:11:09.186103, device xla:2, step 1500, Rate=301.86, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
training torch.Size([1024, 16])/ 2019-08-27 12:11:09.193353, device xla:1, step 1500, Rate=301.86, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:11:09.197038, device xla:3, step 1500, Rate=301.86, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
training torch.Size([512, 32])/ 2019-08-27 12:11:09.203671, device xla:4, step 1500, Rate=301.86, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:11:09.207714, device xla:6, step 1500, Rate=301.86, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
training torch.Size([256, 64])/ 2019-08-27 12:11:09.215701, device xla:5, step 1500, Rate=301.86, Global Rate=311.42, Compiles=122, _local_scalar_dense=1624
Epoch 26 Training stats:
device xla:1
| epoch 026 | loss 0.154 | nll_loss 0.154 | ppl 1.11 | wps 6092 | ups 1 | wpb 11144.000 | bsz 410.888 | num_updates 39208 | lr 0.000159703 | gnorm 0.044 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 56791
device xla:2
| epoch 026 | loss 0.153 | nll_loss 0.153 | ppl 1.11 | wps 6101 | ups 1 | wpb 11160.452 | bsz 409.190 | num_updates 39208 | lr 0.000159703 | gnorm 0.043 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 57821
device xla:3
| epoch 026 | loss 0.154 | nll_loss 0.154 | ppl 1.11 | wps 6077 | ups 1 | wpb 11116.995 | bsz 411.619 | num_updates 39208 | lr 0.000159703 | gnorm 0.046 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 56379
device xla:4
| epoch 026 | loss 0.154 | nll_loss 0.154 | ppl 1.11 | wps 6082 | ups 1 | wpb 11126.168 | bsz 410.744 | num_updates 39208 | lr 0.000159703 | gnorm 0.046 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 57815
device xla:5
| epoch 026 | loss 0.153 | nll_loss 0.153 | ppl 1.11 | wps 6104 | ups 1 | wpb 11165.369 | bsz 409.660 | num_updates 39208 | lr 0.000159703 | gnorm 0.043 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 57941
device xla:6
| epoch 026 | loss 0.153 | nll_loss 0.153 | ppl 1.11 | wps 6097 | ups 1 | wpb 11152.241 | bsz 409.510 | num_updates 39208 | lr 0.000159703 | gnorm 0.044 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 58040
device xla:7
| epoch 026 | loss 0.153 | nll_loss 0.153 | ppl 1.11 | wps 6109 | ups 1 | wpb 11175.031 | bsz 408.302 | num_updates 39208 | lr 0.000159703 | gnorm 0.044 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 56899
device xla:8
| epoch 026 | loss 0.154 | nll_loss 0.154 | ppl 1.11 | wps 6084 | ups 1 | wpb 11129.341 | bsz 408.602 | num_updates 39208 | lr 0.000159703 | gnorm 0.047 | clip 0.000 | oom 0.000 | wall 71721 | train_wall 56997
Epoch 26 Tracker Rates:
Rate=298.45, Global Rate=311.27
Rate=298.42, Global Rate=311.27
Rate=298.46, Global Rate=311.27
Rate=298.49, Global Rate=311.27
Rate=298.54, Global Rate=311.27
Rate=298.51, Global Rate=311.27
Rate=298.41, Global Rate=311.27
Rate=298.40, Global Rate=311.27
Epoch 26 end 2019-08-27 12:11:23.575295
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 316696
Counter: 05d52h11m01s764ms73.047us
ValueRate: 06s125ms271.812us / second
Rate: 4.94553 / second
Percentiles: 1%=01s167ms476.622us; 5%=01s172ms279.364us; 10%=01s175ms675.299us; 20%=01s178ms994.528us; 50%=01s278ms98.202us; 80%=01s290ms59.377us; 90%=01s294ms526.810us; 95%=01s296ms203.378us; 99%=01s300ms702.687us
Metric: InboundData
TotalSamples: 1664
Counter: 3.23KB
ValueRate: 0.05B / second
Rate: 0.0252566 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1299605
Counter: 121.08GB
ValueRate: 496.80KB / second
Rate: 20.3803 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2857308
Counter: 13h12m38s983ms776.339us
ValueRate: 330ms87.350us / second
Rate: 42.3119 / second
Percentiles: 1%=466.290us; 5%=524.724us; 10%=560.030us; 20%=618.661us; 50%=856.393us; 80%=003ms683.197us; 90%=011ms505.988us; 95%=028ms256.443us; 99%=064ms596.203us
Metric: TransferFromServerTime
TotalSamples: 1664
Counter: 09s940ms447.879us
ValueRate: 53.069us / second
Rate: 0.0252566 / second
Percentiles: 1%=603.621us; 5%=655.342us; 10%=695.091us; 20%=745.203us; 50%=950.723us; 80%=002ms377.764us; 90%=004ms817.052us; 95%=008ms706.521us; 99%=012ms462.629us
Metric: TransferToServerTime
TotalSamples: 1299605
Counter: 04d05h06m53s401ms117.635us
ValueRate: 05s011ms419.721us / second
Rate: 20.3825 / second
Percentiles: 1%=001ms69.466us; 5%=001ms225.037us; 10%=001ms296.594us; 20%=001ms415.045us; 50%=002ms236.507us; 80%=878ms315.940us; 90%=982ms694.905us; 95%=01s060ms186.704us; 99%=01s088ms385.897us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 316574
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 232803554
Counter: CreateXlaTensor
Value: 1518437402
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 232796521
Counter: DestroyXlaTensor
Value: 1518431393
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 232796521
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24057
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1664
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 12:11:27.626484, device xla:8, step 0, Compiles=122, _local_scalar_dense=1664
validation/ 2019-08-27 12:11:27.644135, device xla:2, step 0, Compiles=122, _local_scalar_dense=1664
validation/ 2019-08-27 12:11:27.647347, device xla:5, step 0, Compiles=122, _local_scalar_dense=1664
validation/ 2019-08-27 12:11:27.654227, device xla:6, step 0, Compiles=122, _local_scalar_dense=1664
validation/ 2019-08-27 12:11:27.656749, device xla:7, step 0, Compiles=122, _local_scalar_dense=1664
validation/ 2019-08-27 12:11:27.784213, device xla:3, step 0, Compiles=122, _local_scalar_dense=1664
validation/ 2019-08-27 12:11:27.795792, device xla:1, step 0, Compiles=122, _local_scalar_dense=1664
validation/ 2019-08-27 12:11:27.803877, device xla:4, step 0, Compiles=122, _local_scalar_dense=1664
validation stats on subset "valid" - 2019-08-27 12:11:33.758590
| epoch 026 | valid on 'valid' subset | loss 3.828 | nll_loss 2.016 | ppl 4.04 | num_updates 39208
| epoch 026 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 39208
| epoch 026 | valid on 'valid' subset | loss 3.953 | nll_loss 2.109 | ppl 4.32 | num_updates 39208
| epoch 026 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 39208
| epoch 026 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 39208
| epoch 026 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 39208
| epoch 026 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 39208
| epoch 026 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 39208
old learning rate: 0.00016286558549611404
new learning rate: 0.00015970284587257685
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 316817
Counter: 05d52h12m47s993ms773.937us
ValueRate: 06s023ms671.486us / second
Rate: 5.29858 / second
Percentiles: 1%=377ms338.923us; 5%=379ms52.552us; 10%=393ms311.102us; 20%=01s174ms338.929us; 50%=01s193ms737.113us; 80%=01s289ms968.980us; 90%=01s293ms138.434us; 95%=01s296ms924.416us; 99%=01s300ms702.687us
Metric: InboundData
TotalSamples: 1689
Counter: 3.27KB
ValueRate: 0.05B / second
Rate: 0.0269173 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1299845
Counter: 121.13GB
ValueRate: 933.74KB / second
Rate: 20.4069 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2857930
Counter: 14h12m09s309ms577.784us
ValueRate: 02s654ms459.569us / second
Rate: 51.0972 / second
Percentiles: 1%=447.058us; 5%=495.501us; 10%=527.429us; 20%=587.437us; 50%=776.044us; 80%=001ms421.465us; 90%=028ms583.831us; 95%=376ms183.388us; 99%=391ms427.165us
Metric: TransferFromServerTime
TotalSamples: 1689
Counter: 09s981ms175.443us
ValueRate: 55.576us / second
Rate: 0.0269173 / second
Percentiles: 1%=603.621us; 5%=657.665us; 10%=697.463us; 20%=746.344us; 50%=957.308us; 80%=002ms351.295us; 90%=004ms800.523us; 95%=007ms114.348us; 99%=012ms409.776us
Metric: TransferToServerTime
TotalSamples: 1299845
Counter: 04d05h06m20s710ms43.288us
ValueRate: 04s254ms67.876us / second
Rate: 20.4068 / second
Percentiles: 1%=001ms81.267us; 5%=001ms235.330us; 10%=001ms322.614us; 20%=001ms463.220us; 50%=002ms223.048us; 80%=254ms376.567us; 90%=957ms782.780us; 95%=01s037ms467.623us; 99%=01s074ms99.524us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 316695
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 232805163
Counter: CreateXlaTensor
Value: 1518572219
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 232798129
Counter: DestroyXlaTensor
Value: 1518566210
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 232798130
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24057
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1689
Epoch 27 begin 2019-08-27 12:11:33.780100
training torch.Size([256, 64])/ 2019-08-27 12:11:42.641613, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:11:42.675753, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:11:42.711637, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:11:42.803115, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:11:43.734397, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:11:43.904957, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:11:44.316885, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:11:44.550409, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:14:32.728480, device xla:8, step 100, Rate=60.89, Global Rate=292.57, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:14:32.732751, device xla:4, step 100, Rate=60.26, Global Rate=292.56, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:14:32.737948, device xla:6, step 100, Rate=60.65, Global Rate=292.55, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:14:32.756124, device xla:7, step 100, Rate=60.79, Global Rate=292.52, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:14:32.762412, device xla:1, step 100, Rate=60.19, Global Rate=292.51, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:14:32.744563, device xla:2, step 100, Rate=60.21, Global Rate=292.54, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:14:32.769930, device xla:3, step 100, Rate=60.21, Global Rate=292.50, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:14:32.778527, device xla:5, step 100, Rate=60.58, Global Rate=292.48, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:17:15.923651, device xla:7, step 200, Rate=111.39, Global Rate=302.78, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:17:15.927977, device xla:4, step 200, Rate=110.96, Global Rate=302.78, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:17:15.933277, device xla:2, step 200, Rate=110.92, Global Rate=302.77, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:17:15.949771, device xla:8, step 200, Rate=111.45, Global Rate=302.76, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:17:15.955456, device xla:6, step 200, Rate=111.26, Global Rate=302.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:17:15.939984, device xla:1, step 200, Rate=110.91, Global Rate=302.76, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:17:15.962727, device xla:3, step 200, Rate=110.92, Global Rate=302.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:17:15.970944, device xla:5, step 200, Rate=111.21, Global Rate=302.74, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:20:02.365428, device xla:5, step 300, Rate=150.51, Global Rate=304.38, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:20:02.384942, device xla:8, step 300, Rate=150.68, Global Rate=304.36, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:20:02.377655, device xla:1, step 300, Rate=150.25, Global Rate=304.37, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:20:02.397998, device xla:7, step 300, Rate=150.62, Global Rate=304.36, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:20:02.360398, device xla:4, step 300, Rate=150.29, Global Rate=304.38, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:20:02.370588, device xla:6, step 300, Rate=150.54, Global Rate=304.37, Compiles=122, _local_scalar_dense=1689training torch.Size([512, 32])/ 2019-08-27 12:20:02.406789, device xla:3, step 300, Rate=150.26, Global Rate=304.35, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:20:02.390060, device xla:2, step 300, Rate=150.25, Global Rate=304.36, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:22:46.605159, device xla:4, step 400, Rate=182.58, Global Rate=306.18, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:22:46.610017, device xla:8, step 400, Rate=182.90, Global Rate=306.18, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:22:46.620158, device xla:3, step 400, Rate=182.56, Global Rate=306.18, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:22:46.614934, device xla:7, step 400, Rate=182.86, Global Rate=306.18, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:22:46.643088, device xla:2, step 400, Rate=182.54, Global Rate=306.17, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:22:46.650330, device xla:5, step 400, Rate=182.74, Global Rate=306.16, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:22:46.632338, device xla:6, step 400, Rate=182.77, Global Rate=306.17, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:22:46.622424, device xla:1, step 400, Rate=182.55, Global Rate=306.17, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:25:33.108026, device xla:7, step 500, Rate=207.79, Global Rate=306.45, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:25:33.102603, device xla:4, step 500, Rate=207.57, Global Rate=306.45, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:25:33.097197, device xla:5, step 500, Rate=207.71, Global Rate=306.45, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:25:33.123006, device xla:8, step 500, Rate=207.82, Global Rate=306.44, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:25:33.129680, device xla:1, step 500, Rate=207.54, Global Rate=306.44, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:25:33.138070, device xla:2, step 500, Rate=207.54, Global Rate=306.43, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:25:33.113173, device xla:6, step 500, Rate=207.73, Global Rate=306.44, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:25:33.156069, device xla:3, step 500, Rate=207.54, Global Rate=306.43, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:28:19.102847, device xla:7, step 600, Rate=227.92, Global Rate=306.78, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:28:19.113059, device xla:3, step 600, Rate=227.73, Global Rate=306.77, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:28:19.121843, device xla:1, step 600, Rate=227.72, Global Rate=306.77, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:28:19.114836, device xla:6, step 600, Rate=227.87, Global Rate=306.77, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:28:19.107177, device xla:4, step 600, Rate=227.74, Global Rate=306.78, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:28:19.132336, device xla:8, step 600, Rate=227.94, Global Rate=306.77, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:28:19.148726, device xla:2, step 600, Rate=227.71, Global Rate=306.76, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:28:19.139231, device xla:5, step 600, Rate=227.84, Global Rate=306.77, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:31:06.081430, device xla:5, step 700, Rate=243.61, Global Rate=306.76, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:31:06.092055, device xla:8, step 700, Rate=243.68, Global Rate=306.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:31:06.103492, device xla:3, step 700, Rate=243.51, Global Rate=306.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:31:06.096780, device xla:7, step 700, Rate=243.66, Global Rate=306.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:31:06.086643, device xla:4, step 700, Rate=243.51, Global Rate=306.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:31:06.105840, device xla:2, step 700, Rate=243.50, Global Rate=306.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:31:06.118662, device xla:6, step 700, Rate=243.61, Global Rate=306.75, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:31:06.129423, device xla:1, step 700, Rate=243.49, Global Rate=306.74, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:33:49.925408, device xla:7, step 800, Rate=257.43, Global Rate=307.46, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:33:49.930054, device xla:8, step 800, Rate=257.45, Global Rate=307.46, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:33:49.942071, device xla:6, step 800, Rate=257.39, Global Rate=307.46, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:33:49.934916, device xla:2, step 800, Rate=257.31, Global Rate=307.46, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:33:49.949976, device xla:1, step 800, Rate=257.30, Global Rate=307.46, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:33:49.960122, device xla:4, step 800, Rate=257.30, Global Rate=307.45, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:33:49.972522, device xla:3, step 800, Rate=257.30, Global Rate=307.45, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:33:49.920185, device xla:5, step 800, Rate=257.39, Global Rate=307.46, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:36:35.721993, device xla:8, step 900, Rate=267.72, Global Rate=307.61, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:36:35.726168, device xla:7, step 900, Rate=267.70, Global Rate=307.61, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:36:35.749747, device xla:2, step 900, Rate=267.60, Global Rate=307.60, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:36:35.760925, device xla:3, step 900, Rate=267.60, Global Rate=307.60, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:36:35.730406, device xla:6, step 900, Rate=267.68, Global Rate=307.61, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:36:35.752753, device xla:5, step 900, Rate=267.66, Global Rate=307.60, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:36:35.766008, device xla:4, step 900, Rate=267.60, Global Rate=307.60, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:36:35.737567, device xla:1, step 900, Rate=267.60, Global Rate=307.61, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:39:22.476243, device xla:8, step 1000, Rate=275.58, Global Rate=307.55, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:39:22.487788, device xla:2, step 1000, Rate=275.50, Global Rate=307.55, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:39:22.500729, device xla:4, step 1000, Rate=275.49, Global Rate=307.55, Compiles=122, _local_scalar_dense=1689training torch.Size([512, 32])/ 2019-08-27 12:39:22.480894, device xla:1, step 1000, Rate=275.50, Global Rate=307.55, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:39:22.516349, device xla:3, step 1000, Rate=275.49, Global Rate=307.55, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:39:22.495090, device xla:5, step 1000, Rate=275.54, Global Rate=307.55, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:39:22.506709, device xla:6, step 1000, Rate=275.54, Global Rate=307.55, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:39:22.530460, device xla:7, step 1000, Rate=275.55, Global Rate=307.54, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:42:10.538808, device xla:7, step 1100, Rate=281.39, Global Rate=307.29, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:42:10.533501, device xla:5, step 1100, Rate=281.37, Global Rate=307.29, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:42:10.552770, device xla:8, step 1100, Rate=281.39, Global Rate=307.28, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:42:10.577753, device xla:3, step 1100, Rate=281.32, Global Rate=307.28, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:42:10.568654, device xla:1, step 1100, Rate=281.32, Global Rate=307.28, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:42:10.557826, device xla:6, step 1100, Rate=281.37, Global Rate=307.28, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:42:10.586521, device xla:2, step 1100, Rate=281.31, Global Rate=307.28, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:42:10.543908, device xla:4, step 1100, Rate=281.33, Global Rate=307.29, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:44:59.872886, device xla:7, step 1200, Rate=285.59, Global Rate=306.87, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:44:59.877187, device xla:8, step 1200, Rate=285.59, Global Rate=306.87, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:44:59.899408, device xla:1, step 1200, Rate=285.53, Global Rate=306.87, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:44:59.907385, device xla:2, step 1200, Rate=285.53, Global Rate=306.87, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:44:59.881912, device xla:5, step 1200, Rate=285.56, Global Rate=306.87, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:44:59.909949, device xla:3, step 1200, Rate=285.53, Global Rate=306.86, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:44:59.888763, device xla:6, step 1200, Rate=285.57, Global Rate=306.87, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:44:59.928060, device xla:4, step 1200, Rate=285.52, Global Rate=306.86, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:47:46.042819, device xla:7, step 1300, Rate=290.09, Global Rate=306.97, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:47:46.037689, device xla:4, step 1300, Rate=290.06, Global Rate=306.97, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:47:46.047524, device xla:1, step 1300, Rate=290.05, Global Rate=306.97, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:47:46.064665, device xla:8, step 1300, Rate=290.09, Global Rate=306.96, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:47:46.056491, device xla:6, step 1300, Rate=290.08, Global Rate=306.96, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:47:46.071936, device xla:3, step 1300, Rate=290.05, Global Rate=306.96, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:47:46.076908, device xla:5, step 1300, Rate=290.07, Global Rate=306.96, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:47:46.083124, device xla:2, step 1300, Rate=290.04, Global Rate=306.96, Compiles=122, _local_scalar_dense=1689
training torch.Size([1024, 16])/ 2019-08-27 12:50:28.785310, device xla:8, step 1400, Rate=295.00, Global Rate=307.50, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:50:28.789820, device xla:7, step 1400, Rate=294.99, Global Rate=307.50, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:50:28.795983, device xla:2, step 1400, Rate=294.97, Global Rate=307.50, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:50:28.803142, device xla:6, step 1400, Rate=294.98, Global Rate=307.50, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:50:28.815454, device xla:5, step 1400, Rate=294.98, Global Rate=307.50, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:50:28.821511, device xla:3, step 1400, Rate=294.96, Global Rate=307.49, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:50:28.830420, device xla:4, step 1400, Rate=294.95, Global Rate=307.49, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:50:28.842986, device xla:1, step 1400, Rate=294.94, Global Rate=307.49, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:53:10.463236, device xla:8, step 1500, Rate=299.34, Global Rate=308.10, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:53:10.474674, device xla:2, step 1500, Rate=299.31, Global Rate=308.09, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:53:10.494414, device xla:3, step 1500, Rate=299.31, Global Rate=308.09, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:53:10.469140, device xla:4, step 1500, Rate=299.31, Global Rate=308.09, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:53:10.497276, device xla:7, step 1500, Rate=299.32, Global Rate=308.09, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:53:10.485560, device xla:1, step 1500, Rate=299.30, Global Rate=308.09, Compiles=122, _local_scalar_dense=1689
training torch.Size([512, 32])/ 2019-08-27 12:53:10.478251, device xla:6, step 1500, Rate=299.32, Global Rate=308.09, Compiles=122, _local_scalar_dense=1689
training torch.Size([256, 64])/ 2019-08-27 12:53:10.503905, device xla:5, step 1500, Rate=299.31, Global Rate=308.09, Compiles=122, _local_scalar_dense=1689
Epoch 27 Training stats:
device xla:1
| epoch 027 | loss 0.148 | nll_loss 0.148 | ppl 1.11 | wps 6111 | ups 1 | wpb 11142.694 | bsz 410.778 | num_updates 40716 | lr 0.000156717 | gnorm 0.043 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 58853
device xla:2
| epoch 027 | loss 0.147 | nll_loss 0.147 | ppl 1.11 | wps 6123 | ups 1 | wpb 11164.842 | bsz 409.244 | num_updates 40716 | lr 0.000156717 | gnorm 0.041 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 59880
device xla:3
| epoch 027 | loss 0.148 | nll_loss 0.148 | ppl 1.11 | wps 6098 | ups 1 | wpb 11119.052 | bsz 411.589 | num_updates 40716 | lr 0.000156717 | gnorm 0.044 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 58432
device xla:4
| epoch 027 | loss 0.148 | nll_loss 0.148 | ppl 1.11 | wps 6103 | ups 1 | wpb 11128.202 | bsz 410.546 | num_updates 40716 | lr 0.000156717 | gnorm 0.044 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 59881
device xla:5
| epoch 027 | loss 0.147 | nll_loss 0.147 | ppl 1.11 | wps 6120 | ups 1 | wpb 11159.855 | bsz 409.590 | num_updates 40716 | lr 0.000156717 | gnorm 0.042 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 60011
device xla:6
| epoch 027 | loss 0.147 | nll_loss 0.147 | ppl 1.11 | wps 6116 | ups 1 | wpb 11152.347 | bsz 409.760 | num_updates 40716 | lr 0.000156717 | gnorm 0.042 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 60100
device xla:7
| epoch 027 | loss 0.147 | nll_loss 0.147 | ppl 1.11 | wps 6128 | ups 1 | wpb 11173.527 | bsz 408.345 | num_updates 40716 | lr 0.000156717 | gnorm 0.042 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 58961
device xla:8
| epoch 027 | loss 0.148 | nll_loss 0.148 | ppl 1.11 | wps 6103 | ups 1 | wpb 11129.118 | bsz 408.666 | num_updates 40716 | lr 0.000156717 | gnorm 0.045 | clip 0.000 | oom 0.000 | wall 74243 | train_wall 59061
Epoch 27 Tracker Rates:
Rate=295.90, Global Rate=307.94
Rate=295.86, Global Rate=307.94
Rate=295.94, Global Rate=307.94
Rate=295.85, Global Rate=307.94
Rate=295.98, Global Rate=307.94
Rate=295.89, Global Rate=307.94
Rate=295.96, Global Rate=307.94
Rate=295.84, Global Rate=307.94
Epoch 27 end 2019-08-27 12:53:24.995166
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 328881
Counter: 05d03h20m13s614ms889.603us
ValueRate: 06s185ms871.886us / second
Rate: 4.97216 / second
Percentiles: 1%=01s170ms731.900us; 5%=01s175ms627.922us; 10%=01s177ms60.911us; 20%=01s180ms242.457us; 50%=01s284ms928.211us; 80%=01s292ms217.515us; 90%=01s295ms994.069us; 95%=01s297ms913.278us; 99%=01s303ms20.866us
Metric: InboundData
TotalSamples: 1729
Counter: 3.35KB
ValueRate: 0.05B / second
Rate: 0.0252571 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1349518
Counter: 125.62GB
ValueRate: 508.19KB / second
Rate: 20.7843 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2965945
Counter: 14h21m30s919ms146.472us
ValueRate: 441ms292.358us / second
Rate: 42.047 / second
Percentiles: 1%=434.822us; 5%=511.068us; 10%=555.727us; 20%=619.888us; 50%=856.788us; 80%=003ms662.235us; 90%=013ms607.848us; 95%=029ms646.197us; 99%=063ms799.419us
Metric: TransferFromServerTime
TotalSamples: 1729
Counter: 09s066ms69.804us
ValueRate: 51.966us / second
Rate: 0.0252571 / second
Percentiles: 1%=603.621us; 5%=656.764us; 10%=698.873us; 20%=746.344us; 50%=957.347us; 80%=002ms354.984us; 90%=004ms851.498us; 95%=008ms706.521us; 99%=012ms985.348us
Metric: TransferToServerTime
TotalSamples: 1349518
Counter: 04d14h19m33s215ms154.148us
ValueRate: 05s037ms678.074us / second
Rate: 20.7844 / second
Percentiles: 1%=001ms74.120us; 5%=001ms209.222us; 10%=001ms279.785us; 20%=001ms414.634us; 50%=002ms127.171us; 80%=909ms50.821us; 90%=01s019ms328.501us; 95%=01s069ms172.114us; 99%=01s094ms982.421us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 328759
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 241757460
Counter: CreateXlaTensor
Value: 1576843863
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 241749397
Counter: DestroyXlaTensor
Value: 1576837854
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 241750427
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24117
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1729
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 12:53:29.151106, device xla:8, step 0, Compiles=122, _local_scalar_dense=1729
validation/ 2019-08-27 12:53:29.155251, device xla:1, step 0, Compiles=122, _local_scalar_dense=1729
validation/ 2019-08-27 12:53:29.169654, device xla:5, step 0, Compiles=122, _local_scalar_dense=1729
validation/ 2019-08-27 12:53:29.173094, device xla:2, step 0, Compiles=122, _local_scalar_dense=1729
validation/ 2019-08-27 12:53:29.319169, device xla:4, step 0, Compiles=122, _local_scalar_dense=1729
validation/ 2019-08-27 12:53:29.320982, device xla:7, step 0, Compiles=122, _local_scalar_dense=1729
validation/ 2019-08-27 12:53:29.332129, device xla:3, step 0, Compiles=122, _local_scalar_dense=1729
validation/ 2019-08-27 12:53:29.349503, device xla:6, step 0, Compiles=122, _local_scalar_dense=1729
validation stats on subset "valid" - 2019-08-27 12:53:35.268613
| epoch 027 | valid on 'valid' subset | loss 3.828 | nll_loss 2.016 | ppl 4.04 | num_updates 40716
| epoch 027 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 40716
| epoch 027 | valid on 'valid' subset | loss 3.922 | nll_loss 2.109 | ppl 4.32 | num_updates 40716
| epoch 027 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 40716
| epoch 027 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 40716
| epoch 027 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 40716
| epoch 027 | valid on 'valid' subset | loss 3.906 | nll_loss 2.047 | ppl 4.13 | num_updates 40716
| epoch 027 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 40716
old learning rate: 0.00015970284587257685
new learning rate: 0.0001567174827131791
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 329002
Counter: 05d03h21m59s563ms535.442us
ValueRate: 06s092ms545.339us / second
Rate: 5.33818 / second
Percentiles: 1%=377ms699.549us; 5%=378ms44.996us; 10%=391ms943.744us; 20%=01s177ms604.388us; 50%=01s277ms515.405us; 80%=01s291ms491.926us; 90%=01s294ms309.028us; 95%=01s296ms5.882us; 99%=01s301ms916.288us
Metric: InboundData
TotalSamples: 1754
Counter: 3.40KB
ValueRate: 0.05B / second
Rate: 0.0269109 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1349758
Counter: 125.66GB
ValueRate: 949.32KB / second
Rate: 20.7475 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 2966536
Counter: 14h22m01s445ms875.763us
ValueRate: 02s652ms159.896us / second
Rate: 50.7569 / second
Percentiles: 1%=437.137us; 5%=493.308us; 10%=529.993us; 20%=584.879us; 50%=779.893us; 80%=001ms406.855us; 90%=031ms83.808us; 95%=375ms583.334us; 99%=388ms692.980us
Metric: TransferFromServerTime
TotalSamples: 1754
Counter: 09s102ms819.492us
ValueRate: 54.606us / second
Rate: 0.0269109 / second
Percentiles: 1%=603.621us; 5%=656.764us; 10%=697.883us; 20%=746.845us; 50%=959.649us; 80%=002ms336.836us; 90%=004ms800.523us; 95%=007ms433.370us; 99%=012ms985.348us
Metric: TransferToServerTime
TotalSamples: 1349758
Counter: 04d14h19m01s963ms839.759us
ValueRate: 04s317ms593.549us / second
Rate: 20.7474 / second
Percentiles: 1%=001ms81.665us; 5%=001ms213.909us; 10%=001ms305.239us; 20%=001ms455.649us; 50%=002ms92.240us; 80%=244ms229.894us; 90%=979ms911.378us; 95%=01s056ms81.871us; 99%=01s091ms476.241us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 328880
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 241759069
Counter: CreateXlaTensor
Value: 1576978680
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 241752036
Counter: DestroyXlaTensor
Value: 1576972671
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 241752036
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24117
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1754
Epoch 28 begin 2019-08-27 12:53:35.288679
training torch.Size([1024, 16])/ 2019-08-27 12:53:44.265963, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:53:44.303011, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:53:44.385354, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:53:44.476153, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:53:44.661732, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:53:44.889177, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:53:45.029724, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:53:45.075588, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:56:33.712147, device xla:8, step 100, Rate=60.72, Global Rate=293.49, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:56:33.716886, device xla:7, step 100, Rate=60.70, Global Rate=293.48, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:56:33.728481, device xla:3, step 100, Rate=60.47, Global Rate=293.46, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:56:33.721281, device xla:6, step 100, Rate=60.65, Global Rate=293.47, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:56:33.735981, device xla:1, step 100, Rate=60.42, Global Rate=293.45, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:56:33.743549, device xla:5, step 100, Rate=60.56, Global Rate=293.44, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:56:33.749573, device xla:4, step 100, Rate=60.49, Global Rate=293.43, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:56:33.762342, device xla:2, step 100, Rate=60.43, Global Rate=293.40, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 12:59:19.388430, device xla:8, step 200, Rate=110.39, Global Rate=301.06, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:59:19.419704, device xla:6, step 200, Rate=110.32, Global Rate=301.03, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:59:19.411881, device xla:2, step 200, Rate=110.16, Global Rate=301.04, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:59:19.393051, device xla:3, step 200, Rate=110.19, Global Rate=301.06, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:59:19.400273, device xla:4, step 200, Rate=110.21, Global Rate=301.05, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 12:59:19.405809, device xla:1, step 200, Rate=110.15, Global Rate=301.05, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:59:19.423806, device xla:7, step 200, Rate=110.36, Global Rate=301.03, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 12:59:19.430605, device xla:5, step 200, Rate=110.25, Global Rate=301.03, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:02:04.910047, device xla:7, step 300, Rate=150.17, Global Rate=303.77, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:02:04.935757, device xla:3, step 300, Rate=150.01, Global Rate=303.75, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:02:04.922191, device xla:8, step 300, Rate=150.17, Global Rate=303.76, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:02:04.928449, device xla:1, step 300, Rate=149.98, Global Rate=303.76, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:02:04.939853, device xla:4, step 300, Rate=150.03, Global Rate=303.75, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:02:04.915538, device xla:2, step 300, Rate=150.00, Global Rate=303.76, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:02:04.947126, device xla:5, step 300, Rate=150.07, Global Rate=303.74, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:02:04.964487, device xla:6, step 300, Rate=150.11, Global Rate=303.73, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:04:48.829326, device xla:8, step 400, Rate=182.61, Global Rate=305.87, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:04:48.822183, device xla:3, step 400, Rate=182.49, Global Rate=305.87, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:04:48.833920, device xla:4, step 400, Rate=182.50, Global Rate=305.87, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:04:48.839195, device xla:2, step 400, Rate=182.47, Global Rate=305.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:04:48.852820, device xla:7, step 400, Rate=182.59, Global Rate=305.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:04:48.862077, device xla:6, step 400, Rate=182.57, Global Rate=305.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:04:48.864952, device xla:1, step 400, Rate=182.45, Global Rate=305.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:04:48.846835, device xla:5, step 400, Rate=182.53, Global Rate=305.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:07:32.356667, device xla:4, step 500, Rate=208.62, Global Rate=307.29, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:07:32.371363, device xla:8, step 500, Rate=208.70, Global Rate=307.28, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:07:32.366130, device xla:5, step 500, Rate=208.65, Global Rate=307.28, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:07:32.376511, device xla:3, step 500, Rate=208.60, Global Rate=307.28, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:07:32.379778, device xla:1, step 500, Rate=208.58, Global Rate=307.28, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:07:32.387387, device xla:2, step 500, Rate=208.59, Global Rate=307.28, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:07:32.398444, device xla:7, step 500, Rate=208.69, Global Rate=307.27, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:07:32.406314, device xla:6, step 500, Rate=208.67, Global Rate=307.27, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:10:16.884721, device xla:5, step 600, Rate=229.16, Global Rate=307.93, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:10:16.890156, device xla:4, step 600, Rate=229.13, Global Rate=307.93, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:10:16.895601, device xla:6, step 600, Rate=229.19, Global Rate=307.93, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:10:16.899568, device xla:7, step 600, Rate=229.20, Global Rate=307.93, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:10:16.904273, device xla:2, step 600, Rate=229.11, Global Rate=307.92, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:10:16.925869, device xla:3, step 600, Rate=229.11, Global Rate=307.92, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:10:16.911790, device xla:1, step 600, Rate=229.10, Global Rate=307.92, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:10:16.930878, device xla:8, step 600, Rate=229.19, Global Rate=307.92, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:13:04.025745, device xla:4, step 700, Rate=244.58, Global Rate=307.70, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:13:04.030538, device xla:8, step 700, Rate=244.63, Global Rate=307.70, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:13:04.034908, device xla:5, step 700, Rate=244.59, Global Rate=307.70, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:13:04.046406, device xla:7, step 700, Rate=244.62, Global Rate=307.70, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:13:04.052453, device xla:1, step 700, Rate=244.55, Global Rate=307.69, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:13:04.039595, device xla:2, step 700, Rate=244.56, Global Rate=307.70, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:13:04.059229, device xla:3, step 700, Rate=244.56, Global Rate=307.69, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:13:04.064895, device xla:6, step 700, Rate=244.61, Global Rate=307.69, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:15:49.727895, device xla:8, step 800, Rate=257.50, Global Rate=307.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:15:49.744850, device xla:1, step 800, Rate=257.44, Global Rate=307.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:15:49.739220, device xla:7, step 800, Rate=257.50, Global Rate=307.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:15:49.732526, device xla:3, step 800, Rate=257.45, Global Rate=307.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:15:49.749690, device xla:6, step 800, Rate=257.49, Global Rate=307.86, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:15:49.775302, device xla:4, step 800, Rate=257.44, Global Rate=307.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:15:49.756616, device xla:2, step 800, Rate=257.44, Global Rate=307.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:15:49.767073, device xla:5, step 800, Rate=257.46, Global Rate=307.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:18:34.411977, device xla:7, step 900, Rate=268.18, Global Rate=308.20, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:18:34.423308, device xla:4, step 900, Rate=268.15, Global Rate=308.19, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:18:34.439306, device xla:5, step 900, Rate=268.15, Global Rate=308.19, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:18:34.443030, device xla:3, step 900, Rate=268.13, Global Rate=308.19, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:18:34.447290, device xla:1, step 900, Rate=268.13, Global Rate=308.19, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:18:34.416451, device xla:2, step 900, Rate=268.14, Global Rate=308.19, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:18:34.428816, device xla:6, step 900, Rate=268.17, Global Rate=308.19, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:18:34.455106, device xla:8, step 900, Rate=268.17, Global Rate=308.19, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:21:19.707183, device xla:1, step 1000, Rate=276.46, Global Rate=308.35, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:21:19.711800, device xla:7, step 1000, Rate=276.49, Global Rate=308.35, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:21:19.716352, device xla:2, step 1000, Rate=276.46, Global Rate=308.35, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:21:19.730376, device xla:6, step 1000, Rate=276.49, Global Rate=308.35, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:21:19.733823, device xla:3, step 1000, Rate=276.46, Global Rate=308.35, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:21:19.723491, device xla:5, step 1000, Rate=276.48, Global Rate=308.35, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:21:19.743855, device xla:8, step 1000, Rate=276.49, Global Rate=308.34, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:21:19.753156, device xla:4, step 1000, Rate=276.45, Global Rate=308.34, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:24:01.986444, device xla:4, step 1100, Rate=284.28, Global Rate=308.99, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:24:01.992636, device xla:7, step 1100, Rate=284.30, Global Rate=308.99, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:24:02.026077, device xla:1, step 1100, Rate=284.26, Global Rate=308.98, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:24:02.012406, device xla:8, step 1100, Rate=284.29, Global Rate=308.98, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:24:02.003252, device xla:2, step 1100, Rate=284.27, Global Rate=308.98, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:24:01.996946, device xla:6, step 1100, Rate=284.29, Global Rate=308.99, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:24:02.034845, device xla:5, step 1100, Rate=284.27, Global Rate=308.98, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:24:02.018820, device xla:3, step 1100, Rate=284.26, Global Rate=308.98, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:26:46.506864, device xla:1, step 1200, Rate=289.66, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:26:46.518104, device xla:8, step 1200, Rate=289.68, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:26:46.523822, device xla:2, step 1200, Rate=289.65, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:26:46.530420, device xla:6, step 1200, Rate=289.67, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:26:46.538957, device xla:3, step 1200, Rate=289.65, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:26:46.512714, device xla:5, step 1200, Rate=289.67, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:26:46.547747, device xla:7, step 1200, Rate=289.67, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:26:46.541122, device xla:4, step 1200, Rate=289.65, Global Rate=309.17, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:29:31.230253, device xla:1, step 1300, Rate=293.89, Global Rate=309.30, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:29:31.248244, device xla:3, step 1300, Rate=293.89, Global Rate=309.30, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:29:31.234687, device xla:2, step 1300, Rate=293.89, Global Rate=309.30, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:29:31.250390, device xla:7, step 1300, Rate=293.90, Global Rate=309.30, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:29:31.241904, device xla:5, step 1300, Rate=293.90, Global Rate=309.30, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:29:31.255743, device xla:6, step 1300, Rate=293.90, Global Rate=309.29, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:29:31.272992, device xla:8, step 1300, Rate=293.90, Global Rate=309.29, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:29:31.267158, device xla:4, step 1300, Rate=293.89, Global Rate=309.29, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:32:12.607467, device xla:8, step 1400, Rate=298.59, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:32:12.613075, device xla:4, step 1400, Rate=298.58, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:32:12.618374, device xla:6, step 1400, Rate=298.58, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:32:12.620328, device xla:1, step 1400, Rate=298.56, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:32:12.631459, device xla:3, step 1400, Rate=298.57, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:32:12.633374, device xla:7, step 1400, Rate=298.58, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:32:12.625242, device xla:5, step 1400, Rate=298.57, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:32:12.641675, device xla:2, step 1400, Rate=298.56, Global Rate=309.85, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:34:55.377781, device xla:4, step 1500, Rate=301.77, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:34:55.415938, device xla:3, step 1500, Rate=301.76, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
training torch.Size([1024, 16])/ 2019-08-27 13:34:55.396747, device xla:8, step 1500, Rate=301.78, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:34:55.401955, device xla:1, step 1500, Rate=301.76, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:34:55.420946, device xla:7, step 1500, Rate=301.76, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:34:55.390694, device xla:5, step 1500, Rate=301.77, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
training torch.Size([256, 64])/ 2019-08-27 13:34:55.383356, device xla:2, step 1500, Rate=301.77, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
training torch.Size([512, 32])/ 2019-08-27 13:34:55.407508, device xla:6, step 1500, Rate=301.77, Global Rate=310.16, Compiles=122, _local_scalar_dense=1754
Epoch 28 Training stats:
device xla:1
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6129 | ups 1 | wpb 11139.611 | bsz 410.792 | num_updates 42224 | lr 0.000153894 | gnorm 0.041 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 60909
device xla:2
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6143 | ups 1 | wpb 11166.304 | bsz 408.967 | num_updates 42224 | lr 0.000153894 | gnorm 0.040 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 61936
device xla:3
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6119 | ups 1 | wpb 11122.230 | bsz 411.635 | num_updates 42224 | lr 0.000153894 | gnorm 0.043 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 60481
device xla:4
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6123 | ups 1 | wpb 11129.189 | bsz 410.640 | num_updates 42224 | lr 0.000153894 | gnorm 0.042 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 61946
device xla:5
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6140 | ups 1 | wpb 11159.640 | bsz 409.731 | num_updates 42224 | lr 0.000153894 | gnorm 0.040 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 62063
device xla:6
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6136 | ups 1 | wpb 11152.756 | bsz 409.767 | num_updates 42224 | lr 0.000153894 | gnorm 0.041 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 62146
device xla:7
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6146 | ups 1 | wpb 11171.701 | bsz 408.264 | num_updates 42224 | lr 0.000153894 | gnorm 0.041 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 61022
device xla:8
| epoch 028 | loss 0.143 | nll_loss 0.143 | ppl 1.10 | wps 6122 | ups 1 | wpb 11128.220 | bsz 408.725 | num_updates 42224 | lr 0.000153894 | gnorm 0.043 | clip 0.000 | oom 0.000 | wall 76748 | train_wall 61120
Epoch 28 Tracker Rates:
Rate=296.77, Global Rate=309.96
Rate=296.70, Global Rate=309.96
Rate=296.82, Global Rate=309.96
Rate=296.69, Global Rate=309.96
Rate=296.73, Global Rate=309.96
Rate=296.79, Global Rate=309.96
Rate=296.84, Global Rate=309.96
Rate=296.76, Global Rate=309.96
Epoch 28 end 2019-08-27 13:35:10.199892
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 341066
Counter: 05d13h05m19s839ms796.053us
ValueRate: 06s113ms553.612us / second
Rate: 4.94849 / second
Percentiles: 1%=01s074ms808.804us; 5%=01s176ms29.436us; 10%=01s180ms191.099us; 20%=01s184ms428.794us; 50%=01s200ms896.764us; 80%=01s294ms359.944us; 90%=01s297ms417.397us; 95%=01s300ms900.353us; 99%=01s319ms640.794us
Metric: InboundData
TotalSamples: 1794
Counter: 3.48KB
ValueRate: 0.05B / second
Rate: 0.0252614 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1399435
Counter: 130.15GB
ValueRate: 501.42KB / second
Rate: 20.4975 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 3073653
Counter: 14h07m59s936ms446.614us
ValueRate: 306ms697.681us / second
Rate: 46.0337 / second
Percentiles: 1%=473.600us; 5%=521.444us; 10%=554.437us; 20%=609.289us; 50%=805.659us; 80%=002ms869.388us; 90%=009ms19.608us; 95%=030ms650.462us; 99%=057ms865.488us
Metric: TransferFromServerTime
TotalSamples: 1794
Counter: 09s182ms66.736us
ValueRate: 49.506us / second
Rate: 0.0252614 / second
Percentiles: 1%=603.621us; 5%=654.380us; 10%=694.507us; 20%=746.142us; 50%=957.308us; 80%=002ms310.132us; 90%=004ms804.536us; 95%=007ms451.695us; 99%=011ms91.002us
Metric: TransferToServerTime
TotalSamples: 1399435
Counter: 04d22h09m18s429ms651.992us
ValueRate: 05s919ms736.910us / second
Rate: 20.4975 / second
Percentiles: 1%=001ms99.121us; 5%=001ms203.080us; 10%=001ms279.069us; 20%=001ms407.864us; 50%=002ms143.064us; 80%=884ms753.414us; 90%=01s016ms990.789us; 95%=01s047ms101.152us; 99%=01s090ms41.335us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 340944
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 250711370
Counter: CreateXlaTensor
Value: 1635250324
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 250703104
Counter: DestroyXlaTensor
Value: 1635244315
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 250704337
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24147
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1794
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 13:35:14.406063, device xla:6, step 0, Compiles=122, _local_scalar_dense=1794
validation/ 2019-08-27 13:35:14.409710, device xla:1, step 0, Compiles=122, _local_scalar_dense=1794
validation/ 2019-08-27 13:35:14.413712, device xla:2, step 0, Compiles=122, _local_scalar_dense=1794
validation/ 2019-08-27 13:35:14.421248, device xla:7, step 0, Compiles=122, _local_scalar_dense=1794
validation/ 2019-08-27 13:35:14.422824, device xla:8, step 0, Compiles=122, _local_scalar_dense=1794
validation/ 2019-08-27 13:35:14.426841, device xla:3, step 0, Compiles=122, _local_scalar_dense=1794
validation/ 2019-08-27 13:35:14.428468, device xla:4, step 0, Compiles=122, _local_scalar_dense=1794
validation/ 2019-08-27 13:35:14.564857, device xla:5, step 0, Compiles=122, _local_scalar_dense=1794
validation stats on subset "valid" - 2019-08-27 13:35:20.494250
| epoch 028 | valid on 'valid' subset | loss 3.828 | nll_loss 2.031 | ppl 4.09 | num_updates 42224
| epoch 028 | valid on 'valid' subset | loss 3.875 | nll_loss 2.047 | ppl 4.13 | num_updates 42224
| epoch 028 | valid on 'valid' subset | loss 3.953 | nll_loss 2.125 | ppl 4.36 | num_updates 42224
| epoch 028 | valid on 'valid' subset | loss 3.922 | nll_loss 2.156 | ppl 4.46 | num_updates 42224
| epoch 028 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 42224
| epoch 028 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 42224
| epoch 028 | valid on 'valid' subset | loss 3.906 | nll_loss 2.062 | ppl 4.18 | num_updates 42224
| epoch 028 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 42224
old learning rate: 0.0001567174827131791
new learning rate: 0.00015389351298344497
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 341187
Counter: 05d13h06m05s876ms517.436us
ValueRate: 06s023ms756.759us / second
Rate: 5.29939 / second
Percentiles: 1%=377ms854.142us; 5%=379ms747.379us; 10%=392ms283.746us; 20%=01s179ms462.125us; 50%=01s194ms555.077us; 80%=01s294ms673.134us; 90%=01s297ms730.513us; 95%=01s300ms612.721us; 99%=01s319ms640.794us
Metric: InboundData
TotalSamples: 1819
Counter: 3.53KB
ValueRate: 0.05B / second
Rate: 0.026913 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1399675
Counter: 130.20GB
ValueRate: 940.63KB / second
Rate: 20.5574 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 3074263
Counter: 14h08m31s532ms91.937us
ValueRate: 02s666ms893.265us / second
Rate: 50.877 / second
Percentiles: 1%=449.013us; 5%=504.600us; 10%=539.351us; 20%=588.456us; 50%=772.644us; 80%=001ms414.263us; 90%=033ms784.340us; 95%=375ms308.515us; 99%=388ms304.516us
Metric: TransferFromServerTime
TotalSamples: 1819
Counter: 09s222ms348.066us
ValueRate: 52.321us / second
Rate: 0.026913 / second
Percentiles: 1%=603.621us; 5%=654.380us; 10%=694.507us; 20%=746.142us; 50%=958.062us; 80%=002ms311.626us; 90%=004ms746.147us; 95%=007ms206.476us; 99%=011ms91.002us
Metric: TransferToServerTime
TotalSamples: 1399675
Counter: 04d22h10m43s762ms963.132us
ValueRate: 04s158ms5.910us / second
Rate: 20.56 / second
Percentiles: 1%=001ms105.587us; 5%=001ms216.563us; 10%=001ms300.450us; 20%=001ms443.610us; 50%=002ms56.727us; 80%=222ms548.607us; 90%=991ms51.575us; 95%=01s045ms625.112us; 99%=01s077ms214.966us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 341065
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 250712979
Counter: CreateXlaTensor
Value: 1635385141
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 250705946
Counter: DestroyXlaTensor
Value: 1635379132
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 250705946
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24147
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1819
Epoch 29 begin 2019-08-27 13:35:20.514420
training torch.Size([256, 64])/ 2019-08-27 13:35:29.414732, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:35:29.453693, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:35:29.489433, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:35:29.771984, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:35:29.796481, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:35:29.962824, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:35:30.075225, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:35:30.380431, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:38:19.227871, device xla:7, step 100, Rate=60.54, Global Rate=292.95, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 13:38:19.232091, device xla:2, step 100, Rate=60.31, Global Rate=292.94, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:38:19.238716, device xla:8, step 100, Rate=60.64, Global Rate=292.93, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:38:19.244820, device xla:5, step 100, Rate=60.42, Global Rate=292.92, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:38:19.259118, device xla:1, step 100, Rate=60.29, Global Rate=292.89, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:38:19.263013, device xla:6, step 100, Rate=60.48, Global Rate=292.89, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:38:19.251176, device xla:3, step 100, Rate=60.32, Global Rate=292.91, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:38:19.269049, device xla:4, step 100, Rate=60.42, Global Rate=292.88, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:41:02.795144, device xla:5, step 200, Rate=110.95, Global Rate=302.65, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:41:02.799772, device xla:7, step 200, Rate=111.03, Global Rate=302.65, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:41:02.809536, device xla:1, step 200, Rate=110.84, Global Rate=302.64, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:41:02.818421, device xla:6, step 200, Rate=111.00, Global Rate=302.63, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:41:02.837388, device xla:8, step 200, Rate=111.11, Global Rate=302.62, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 13:41:02.804202, device xla:2, step 200, Rate=110.85, Global Rate=302.64, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:41:02.848275, device xla:4, step 200, Rate=110.94, Global Rate=302.60, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:41:02.829703, device xla:3, step 200, Rate=110.86, Global Rate=302.62, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:43:48.820464, device xla:8, step 300, Rate=150.58, Global Rate=304.54, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:43:48.825102, device xla:5, step 300, Rate=150.43, Global Rate=304.54, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:43:48.830847, device xla:1, step 300, Rate=150.35, Global Rate=304.53, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:43:48.845435, device xla:6, step 300, Rate=150.47, Global Rate=304.52, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:43:48.839133, device xla:2, step 300, Rate=150.36, Global Rate=304.53, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:43:48.857139, device xla:4, step 300, Rate=150.43, Global Rate=304.52, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:43:48.849860, device xla:7, step 300, Rate=150.49, Global Rate=304.52, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:43:48.871651, device xla:3, step 300, Rate=150.36, Global Rate=304.51, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:46:33.099354, device xla:8, step 400, Rate=182.80, Global Rate=306.29, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:46:33.092497, device xla:3, step 400, Rate=182.64, Global Rate=306.29, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 13:46:33.104517, device xla:5, step 400, Rate=182.68, Global Rate=306.29, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:46:33.115096, device xla:6, step 400, Rate=182.72, Global Rate=306.28, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:46:33.109653, device xla:7, step 400, Rate=182.74, Global Rate=306.29, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:46:33.132435, device xla:2, step 400, Rate=182.61, Global Rate=306.27, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:46:33.138799, device xla:1, step 400, Rate=182.60, Global Rate=306.27, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 13:46:33.119864, device xla:4, step 400, Rate=182.69, Global Rate=306.28, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:49:15.975613, device xla:7, step 500, Rate=209.06, Global Rate=307.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:49:15.980102, device xla:5, step 500, Rate=209.01, Global Rate=307.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:49:15.986718, device xla:1, step 500, Rate=208.96, Global Rate=307.86, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:49:16.002628, device xla:3, step 500, Rate=208.97, Global Rate=307.86, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:49:16.017265, device xla:6, step 500, Rate=209.03, Global Rate=307.85, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:49:15.995222, device xla:4, step 500, Rate=209.02, Global Rate=307.86, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:49:15.981363, device xla:2, step 500, Rate=208.97, Global Rate=307.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:49:16.011795, device xla:8, step 500, Rate=209.09, Global Rate=307.86, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:52:02.068794, device xla:8, step 600, Rate=228.94, Global Rate=307.93, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 13:52:02.083889, device xla:6, step 600, Rate=228.89, Global Rate=307.93, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:52:02.085664, device xla:5, step 600, Rate=228.86, Global Rate=307.93, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:52:02.078256, device xla:4, step 600, Rate=228.87, Global Rate=307.93, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:52:02.072964, device xla:2, step 600, Rate=228.83, Global Rate=307.93, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:52:02.088781, device xla:3, step 600, Rate=228.83, Global Rate=307.93, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:52:02.099622, device xla:7, step 600, Rate=228.89, Global Rate=307.92, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:52:02.106708, device xla:1, step 600, Rate=228.81, Global Rate=307.92, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:54:48.257837, device xla:8, step 700, Rate=244.77, Global Rate=307.96, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:54:48.282929, device xla:5, step 700, Rate=244.70, Global Rate=307.95, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:54:48.298076, device xla:6, step 700, Rate=244.72, Global Rate=307.94, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:54:48.288732, device xla:1, step 700, Rate=244.67, Global Rate=307.95, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:54:48.263834, device xla:3, step 700, Rate=244.69, Global Rate=307.95, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:54:48.277302, device xla:2, step 700, Rate=244.67, Global Rate=307.95, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:54:48.271027, device xla:7, step 700, Rate=244.74, Global Rate=307.95, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:54:48.305281, device xla:4, step 700, Rate=244.70, Global Rate=307.94, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:57:34.193426, device xla:8, step 800, Rate=257.53, Global Rate=308.03, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:57:34.215592, device xla:7, step 800, Rate=257.50, Global Rate=308.03, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:57:34.228174, device xla:1, step 800, Rate=257.45, Global Rate=308.02, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:57:34.210054, device xla:2, step 800, Rate=257.45, Global Rate=308.03, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 13:57:34.220404, device xla:6, step 800, Rate=257.49, Global Rate=308.02, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:57:34.204681, device xla:5, step 800, Rate=257.48, Global Rate=308.03, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 13:57:34.197856, device xla:3, step 800, Rate=257.46, Global Rate=308.03, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 13:57:34.238127, device xla:4, step 800, Rate=257.47, Global Rate=308.02, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:00:19.068020, device xla:5, step 900, Rate=268.09, Global Rate=308.31, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:00:19.072555, device xla:2, step 900, Rate=268.07, Global Rate=308.31, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:00:19.078936, device xla:7, step 900, Rate=268.11, Global Rate=308.30, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:00:19.083670, device xla:8, step 900, Rate=268.12, Global Rate=308.30, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:00:19.104526, device xla:6, step 900, Rate=268.10, Global Rate=308.30, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:00:19.097799, device xla:4, step 900, Rate=268.09, Global Rate=308.30, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:00:19.107195, device xla:1, step 900, Rate=268.06, Global Rate=308.30, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:00:19.090193, device xla:3, step 900, Rate=268.07, Global Rate=308.30, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:03:02.983197, device xla:2, step 1000, Rate=276.93, Global Rate=308.71, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:03:02.988219, device xla:3, step 1000, Rate=276.93, Global Rate=308.71, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:03:02.995019, device xla:8, step 1000, Rate=276.97, Global Rate=308.71, Compiles=122, _local_scalar_dense=1819training torch.Size([256, 64])/ 2019-08-27 14:03:03.009288, device xla:1, step 1000, Rate=276.93, Global Rate=308.70, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:03:03.028309, device xla:6, step 1000, Rate=276.94, Global Rate=308.70, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 14:03:03.030472, device xla:5, step 1000, Rate=276.93, Global Rate=308.70, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:03:03.020714, device xla:7, step 1000, Rate=276.95, Global Rate=308.70, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:03:03.001413, device xla:4, step 1000, Rate=276.95, Global Rate=308.70, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 14:05:48.622535, device xla:5, step 1100, Rate=283.38, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 14:05:48.627016, device xla:7, step 1100, Rate=283.39, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:05:48.637092, device xla:8, step 1100, Rate=283.40, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:05:48.643400, device xla:1, step 1100, Rate=283.36, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:05:48.650220, device xla:3, step 1100, Rate=283.36, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:05:48.631604, device xla:2, step 1100, Rate=283.36, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:05:48.657874, device xla:4, step 1100, Rate=283.37, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:05:48.666341, device xla:6, step 1100, Rate=283.38, Global Rate=308.74, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:08:33.780630, device xla:7, step 1200, Rate=288.72, Global Rate=308.85, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:08:33.785223, device xla:8, step 1200, Rate=288.72, Global Rate=308.85, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:08:33.804321, device xla:6, step 1200, Rate=288.71, Global Rate=308.84, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:08:33.811699, device xla:4, step 1200, Rate=288.70, Global Rate=308.84, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:08:33.818870, device xla:2, step 1200, Rate=288.68, Global Rate=308.84, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:08:33.795585, device xla:1, step 1200, Rate=288.69, Global Rate=308.85, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:08:33.790057, device xla:5, step 1200, Rate=288.70, Global Rate=308.85, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:08:33.835808, device xla:3, step 1200, Rate=288.68, Global Rate=308.84, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:11:17.750241, device xla:4, step 1300, Rate=293.42, Global Rate=309.11, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:11:17.755421, device xla:7, step 1300, Rate=293.42, Global Rate=309.11, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:11:17.766766, device xla:5, step 1300, Rate=293.41, Global Rate=309.10, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 14:11:17.760000, device xla:3, step 1300, Rate=293.41, Global Rate=309.11, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:11:17.783479, device xla:8, step 1300, Rate=293.42, Global Rate=309.10, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 14:11:17.773696, device xla:6, step 1300, Rate=293.42, Global Rate=309.10, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:11:17.800163, device xla:2, step 1300, Rate=293.39, Global Rate=309.10, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:11:17.790619, device xla:1, step 1300, Rate=293.40, Global Rate=309.10, Compiles=122, _local_scalar_dense=1819
training torch.Size([1024, 16])/ 2019-08-27 14:14:00.713759, device xla:7, step 1400, Rate=297.58, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:14:00.732322, device xla:6, step 1400, Rate=297.57, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:14:00.741401, device xla:8, step 1400, Rate=297.57, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:14:00.735887, device xla:5, step 1400, Rate=297.56, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:14:00.726294, device xla:2, step 1400, Rate=297.56, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:14:00.746810, device xla:1, step 1400, Rate=297.56, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:14:00.719090, device xla:3, step 1400, Rate=297.57, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:14:00.764179, device xla:4, step 1400, Rate=297.56, Global Rate=309.46, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:16:42.885059, device xla:8, step 1500, Rate=301.21, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:16:42.889495, device xla:7, step 1500, Rate=301.20, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:16:42.894117, device xla:5, step 1500, Rate=301.20, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:16:42.901664, device xla:1, step 1500, Rate=301.19, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:16:42.925235, device xla:2, step 1500, Rate=301.18, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([512, 32])/ 2019-08-27 14:16:42.910531, device xla:3, step 1500, Rate=301.19, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:16:42.933502, device xla:6, step 1500, Rate=301.19, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
training torch.Size([256, 64])/ 2019-08-27 14:16:42.919317, device xla:4, step 1500, Rate=301.19, Global Rate=309.87, Compiles=122, _local_scalar_dense=1819
Epoch 29 Training stats:
device xla:1
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6147 | ups 1 | wpb 11139.670 | bsz 410.653 | num_updates 43732 | lr 0.000151217 | gnorm 0.040 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 62955
device xla:2
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6161 | ups 1 | wpb 11165.653 | bsz 408.978 | num_updates 43732 | lr 0.000151217 | gnorm 0.038 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 63999
device xla:3
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6138 | ups 1 | wpb 11123.867 | bsz 411.384 | num_updates 43732 | lr 0.000151217 | gnorm 0.041 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 62525
device xla:4
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6142 | ups 1 | wpb 11131.419 | bsz 410.875 | num_updates 43732 | lr 0.000151217 | gnorm 0.041 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 64007
device xla:5
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6159 | ups 1 | wpb 11161.391 | bsz 409.991 | num_updates 43732 | lr 0.000151217 | gnorm 0.039 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 64131
device xla:6
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6152 | ups 1 | wpb 11149.729 | bsz 409.640 | num_updates 43732 | lr 0.000151217 | gnorm 0.039 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 64201
device xla:7
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6164 | ups 1 | wpb 11170.466 | bsz 408.428 | num_updates 43732 | lr 0.000151217 | gnorm 0.039 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 63087
device xla:8
| epoch 029 | loss 0.138 | nll_loss 0.138 | ppl 1.10 | wps 6140 | ups 1 | wpb 11127.498 | bsz 408.574 | num_updates 43732 | lr 0.000151217 | gnorm 0.042 | clip 0.000 | oom 0.000 | wall 79255 | train_wall 63173
Epoch 29 Tracker Rates:
Rate=297.50, Global Rate=309.71
Rate=297.58, Global Rate=309.71
Rate=297.53, Global Rate=309.71
Rate=297.57, Global Rate=309.71
Rate=297.47, Global Rate=309.71
Rate=297.62, Global Rate=309.71
Rate=297.46, Global Rate=309.71
Rate=297.45, Global Rate=309.71
Epoch 29 end 2019-08-27 14:16:57.389529
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 353251
Counter: 05d24h14m06s450ms691.705us
ValueRate: 06s171ms569.562us / second
Rate: 4.961 / second
Percentiles: 1%=01s170ms50.020us; 5%=01s174ms213.323us; 10%=01s176ms465.293us; 20%=01s180ms177.533us; 50%=01s283ms586.437us; 80%=01s292ms825.245us; 90%=01s295ms309.876us; 95%=01s299ms553.162us; 99%=01s314ms715.655us
Metric: InboundData
TotalSamples: 1859
Counter: 3.60KB
ValueRate: 0.05B / second
Rate: 0.025262 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1449507
Counter: 134.69GB
ValueRate: 509.81KB / second
Rate: 20.8273 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 3182350
Counter: 15h17m34s459ms639.119us
ValueRate: 335ms307.013us / second
Rate: 45.4325 / second
Percentiles: 1%=444.766us; 5%=491.646us; 10%=527.539us; 20%=591.679us; 50%=800.508us; 80%=002ms977.233us; 90%=011ms485.854us; 95%=024ms818.313us; 99%=057ms435.334us
Metric: TransferFromServerTime
TotalSamples: 1859
Counter: 09s300ms535.334us
ValueRate: 49.809us / second
Rate: 0.025262 / second
Percentiles: 1%=600.033us; 5%=658.530us; 10%=696.977us; 20%=745.968us; 50%=950.723us; 80%=002ms332.459us; 90%=004ms830.211us; 95%=008ms586.882us; 99%=011ms91.002us
Metric: TransferToServerTime
TotalSamples: 1449507
Counter: 05d31h24m44s129ms417.619us
ValueRate: 05s148ms320.163us / second
Rate: 20.8305 / second
Percentiles: 1%=001ms71.802us; 5%=001ms156.901us; 10%=001ms235.519us; 20%=001ms366.163us; 50%=002ms932.102us; 80%=949ms392.478us; 90%=01s048ms433.391us; 95%=01s080ms91.668us; 99%=01s106ms134.298us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 353129
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 259665435
Counter: CreateXlaTensor
Value: 1693656785
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 259656955
Counter: DestroyXlaTensor
Value: 1693650776
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 259658402
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24197
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1859
Validating the subset "valid"
| WARNING: 2459 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[34935, 29199, 25522, 50610, 31640, 50522, 29514, 23772, 21318, 30173]
validation/ 2019-08-27 14:17:01.395285, device xla:5, step 0, Compiles=122, _local_scalar_dense=1859
validation/ 2019-08-27 14:17:01.404182, device xla:8, step 0, Compiles=122, _local_scalar_dense=1859
validation/ 2019-08-27 14:17:01.406793, device xla:3, step 0, Compiles=122, _local_scalar_dense=1859
validation/ 2019-08-27 14:17:01.412561, device xla:1, step 0, Compiles=122, _local_scalar_dense=1859
validation/ 2019-08-27 14:17:01.415041, device xla:4, step 0, Compiles=122, _local_scalar_dense=1859
validation/ 2019-08-27 14:17:01.418158, device xla:2, step 0, Compiles=122, _local_scalar_dense=1859
validation/ 2019-08-27 14:17:01.549070, device xla:7, step 0, Compiles=122, _local_scalar_dense=1859
validation/ 2019-08-27 14:17:01.566032, device xla:6, step 0, Compiles=122, _local_scalar_dense=1859
validation stats on subset "valid" - 2019-08-27 14:17:07.483033
| epoch 029 | valid on 'valid' subset | loss 3.828 | nll_loss 2.016 | ppl 4.04 | num_updates 43732
| epoch 029 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 43732
| epoch 029 | valid on 'valid' subset | loss 3.953 | nll_loss 2.109 | ppl 4.32 | num_updates 43732
| epoch 029 | valid on 'valid' subset | loss 3.922 | nll_loss 2.141 | ppl 4.41 | num_updates 43732
| epoch 029 | valid on 'valid' subset | loss 3.875 | nll_loss 2.031 | ppl 4.09 | num_updates 43732
| epoch 029 | valid on 'valid' subset | loss 3.875 | nll_loss 2.062 | ppl 4.18 | num_updates 43732
| epoch 029 | valid on 'valid' subset | loss 3.906 | nll_loss 2.047 | ppl 4.13 | num_updates 43732
| epoch 029 | valid on 'valid' subset | loss 3.922 | nll_loss 2.125 | ppl 4.36 | num_updates 43732
old learning rate: 0.00015389351298344497
new learning rate: 0.00015121689988052225
Metric: CompileTime
TotalSamples: 122
Counter: 12h07m52s716ms393.371us
ValueRate: 269ms35.178us / second
Rate: 0.00185524 / second
Percentiles: 1%=048ms711.794us; 5%=062ms975.856us; 10%=071ms957.977us; 20%=094ms470.340us; 50%=28s580ms834.984us; 80%=06m45s329ms80.990us; 90%=07m43s003ms845.134us; 95%=07m09s613ms812.856us; 99%=01h00m12s899ms207.004us
Metric: ExecuteTime
TotalSamples: 353372
Counter: 05d24h15m52s454ms637.879us
ValueRate: 06s091ms75.855us / second
Rate: 5.32278 / second
Percentiles: 1%=377ms718.405us; 5%=378ms373.737us; 10%=391ms475.147us; 20%=01s177ms538.648us; 50%=01s280ms958.275us; 80%=01s291ms417.428us; 90%=01s295ms268.063us; 95%=01s299ms553.162us; 99%=01s314ms715.655us
Metric: InboundData
TotalSamples: 1884
Counter: 3.65KB
ValueRate: 0.05B / second
Rate: 0.0269139 / second
Percentiles: 1%=1.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=2.00B; 80%=2.00B; 90%=2.00B; 95%=2.00B; 99%=2.00B
Metric: OutboundData
TotalSamples: 1449747
Counter: 134.73GB
ValueRate: 960.86KB / second
Rate: 20.9996 / second
Percentiles: 1%=2.00B; 5%=2.00B; 10%=2.00B; 20%=2.00B; 50%=8.00B; 80%=8.00B; 90%=8.00B; 95%=8.00B; 99%=8.00B
Metric: ReleaseCompileHandlesTime
TotalSamples: 38
Counter: 03h06m15s900ms691.280us
ValueRate: 072ms550.401us / second
Rate: 0.000579121 / second
Percentiles: 1%=001ms182.408us; 5%=001ms257.829us; 10%=039ms60.340us; 20%=046ms520.210us; 50%=05s565ms653.873us; 80%=04m19s275ms660.228us; 90%=04m19s309ms954.111us; 95%=17m24s875ms983.828us; 99%=20m20s733ms697.196us
Metric: ReleaseDataHandlesTime
TotalSamples: 3182955
Counter: 15h17m07s850ms933.524us
ValueRate: 02s707ms589.234us / second
Rate: 51.3064 / second
Percentiles: 1%=453.411us; 5%=490.575us; 10%=521.906us; 20%=580.153us; 50%=779.293us; 80%=001ms341.888us; 90%=026ms880.252us; 95%=376ms658.307us; 99%=389ms660.441us
Metric: TransferFromServerTime
TotalSamples: 1884
Counter: 09s337ms334.165us
ValueRate: 52.251us / second
Rate: 0.0269139 / second
Percentiles: 1%=600.033us; 5%=661.030us; 10%=697.463us; 20%=746.010us; 50%=948.929us; 80%=002ms311.626us; 90%=004ms800.523us; 95%=007ms382.723us; 99%=011ms854.798us
Metric: TransferToServerTime
TotalSamples: 1449747
Counter: 05d31h00m11s940ms252.640us
ValueRate: 04s367ms873.398us / second
Rate: 21.0003 / second
Percentiles: 1%=001ms72.801us; 5%=001ms163.198us; 10%=001ms257.052us; 20%=001ms398.929us; 50%=002ms970.922us; 80%=242ms735.976us; 90%=982ms974.252us; 95%=01s076ms712.936us; 99%=01s094ms669.431us
Counter: CachedSyncParamMismatch
Value: 69
Counter: CachedSyncTensors
Value: 353250
Counter: CreateCompileHandles
Value: 64
Counter: CreateDataHandles
Value: 259667044
Counter: CreateXlaTensor
Value: 1693791602
Counter: DestroyCompileHandles
Value: 39
Counter: DestroyDataHandles
Value: 259660010
Counter: DestroyXlaTensor
Value: 1693785593
Counter: ReleaseCompileHandles
Value: 39
Counter: ReleaseDataHandles
Value: 259660011
Counter: SyncTensorsToData
Value: 1472
Counter: UncachedSyncTensors
Value: 122
Counter: XRTAllocateFromTensor_Empty
Value: 24197
Counter: XrtCompile_Empty
Value: 2176
Counter: XrtExecuteChained_Empty
Value: 2176
Counter: XrtExecute_Empty
Value: 2176
Counter: XrtRead_Empty
Value: 2176
Counter: XrtReleaseAllocationHandle_Empty
Value: 2176
Counter: XrtReleaseCompileHandle_Empty
Value: 2176
Counter: XrtSessionCount
Value: 33
Counter: XrtSubTuple_Empty
Value: 2176
Counter: aten::_local_scalar_dense
Value: 1884
Epoch 30 begin 2019-08-27 14:17:07.502929
training torch.Size([512, 32])/ 2019-08-27 14:17:15.641318, device xla:1, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:17:15.656799, device xla:2, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:17:15.732126, device xla:4, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:17:15.824552, device xla:3, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:17:15.865746, device xla:5, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([1024, 16])/ 2019-08-27 14:17:15.992562, device xla:6, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:17:16.414354, device xla:7, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:17:16.441812, device xla:8, step 0, Rate=0.00, Global Rate=0.00, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:20:06.153244, device xla:4, step 100, Rate=60.09, Global Rate=292.37, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:20:06.158008, device xla:2, step 100, Rate=60.06, Global Rate=292.36, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:20:06.166283, device xla:3, step 100, Rate=60.11, Global Rate=292.34, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:20:06.174925, device xla:1, step 100, Rate=60.05, Global Rate=292.33, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:20:06.192241, device xla:8, step 100, Rate=60.32, Global Rate=292.30, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:20:06.181074, device xla:5, step 100, Rate=60.12, Global Rate=292.32, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:20:06.186323, device xla:7, step 100, Rate=60.32, Global Rate=292.31, Compiles=122, _local_scalar_dense=1884
training torch.Size([1024, 16])/ 2019-08-27 14:20:06.147978, device xla:6, step 100, Rate=60.18, Global Rate=292.38, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:22:50.905009, device xla:4, step 200, Rate=110.22, Global Rate=301.29, Compiles=122, _local_scalar_dense=1884
training torch.Size([1024, 16])/ 2019-08-27 14:22:50.909656, device xla:1, step 200, Rate=110.20, Global Rate=301.28, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:22:50.916209, device xla:3, step 200, Rate=110.25, Global Rate=301.28, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:22:50.923417, device xla:6, step 200, Rate=110.29, Global Rate=301.27, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:22:50.944587, device xla:8, step 200, Rate=110.41, Global Rate=301.25, Compiles=122, _local_scalar_dense=1884
training torch.Size([1024, 16])/ 2019-08-27 14:22:50.929081, device xla:2, step 200, Rate=110.19, Global Rate=301.27, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:22:50.939126, device xla:5, step 200, Rate=110.25, Global Rate=301.26, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:22:50.953258, device xla:7, step 200, Rate=110.40, Global Rate=301.25, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:25:35.151367, device xla:7, step 300, Rate=150.68, Global Rate=304.69, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:25:35.157163, device xla:1, step 300, Rate=150.50, Global Rate=304.68, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:25:35.163964, device xla:8, step 300, Rate=150.69, Global Rate=304.68, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:25:35.170390, device xla:5, step 300, Rate=150.55, Global Rate=304.68, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:25:35.175737, device xla:2, step 300, Rate=150.50, Global Rate=304.67, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:25:35.193343, device xla:6, step 300, Rate=150.57, Global Rate=304.66, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:25:35.199432, device xla:4, step 300, Rate=150.51, Global Rate=304.66, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:25:35.185725, device xla:3, step 300, Rate=150.53, Global Rate=304.67, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:28:19.162271, device xla:4, step 400, Rate=182.86, Global Rate=306.53, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:28:19.166895, device xla:5, step 400, Rate=182.88, Global Rate=306.52, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:28:19.171663, device xla:3, step 400, Rate=182.87, Global Rate=306.52, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:28:19.179320, device xla:1, step 400, Rate=182.83, Global Rate=306.52, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:28:19.187074, device xla:6, step 400, Rate=182.90, Global Rate=306.51, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:28:19.196211, device xla:8, step 400, Rate=182.98, Global Rate=306.51, Compiles=122, _local_scalar_dense=1884
training torch.Size([1024, 16])/ 2019-08-27 14:28:19.201526, device xla:2, step 400, Rate=182.83, Global Rate=306.51, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:28:19.216264, device xla:7, step 400, Rate=182.96, Global Rate=306.50, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:31:04.966801, device xla:7, step 500, Rate=208.15, Global Rate=306.98, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:31:04.976273, device xla:8, step 500, Rate=208.15, Global Rate=306.97, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:31:04.971109, device xla:6, step 500, Rate=208.08, Global Rate=306.98, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:31:04.982779, device xla:5, step 500, Rate=208.06, Global Rate=306.97, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:31:04.989604, device xla:1, step 500, Rate=208.02, Global Rate=306.97, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:31:05.010598, device xla:4, step 500, Rate=208.03, Global Rate=306.96, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:31:04.999927, device xla:2, step 500, Rate=208.03, Global Rate=306.97, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:31:05.018656, device xla:3, step 500, Rate=208.04, Global Rate=306.96, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:33:49.929558, device xla:4, step 600, Rate=228.51, Global Rate=307.54, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:33:49.934393, device xla:3, step 600, Rate=228.52, Global Rate=307.54, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:33:49.960781, device xla:6, step 600, Rate=228.53, Global Rate=307.53, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:33:49.941659, device xla:5, step 600, Rate=228.52, Global Rate=307.53, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:33:49.955441, device xla:7, step 600, Rate=228.58, Global Rate=307.53, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:33:49.947612, device xla:1, step 600, Rate=228.50, Global Rate=307.53, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:33:49.978708, device xla:2, step 600, Rate=228.49, Global Rate=307.52, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:33:49.969643, device xla:8, step 600, Rate=228.58, Global Rate=307.53, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:36:35.619410, device xla:4, step 700, Rate=244.61, Global Rate=307.75, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:36:35.629629, device xla:6, step 700, Rate=244.64, Global Rate=307.75, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:36:35.644309, device xla:2, step 700, Rate=244.60, Global Rate=307.74, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:36:35.624486, device xla:8, step 700, Rate=244.68, Global Rate=307.75, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:36:35.635831, device xla:3, step 700, Rate=244.62, Global Rate=307.74, Compiles=122, _local_scalar_dense=1884
training torch.Size([512, 32])/ 2019-08-27 14:36:35.666080, device xla:5, step 700, Rate=244.61, Global Rate=307.74, Compiles=122, _local_scalar_dense=1884
training torch.Size([256, 64])/ 2019-08-27 14:36:35.673773, device xl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment