I am working on a translation of this pytorch code to Rust/Candle. Can you step through this code line by line and term by term. Please provide me the shapes of the tensors at each step and be sure to mention where there is implicit broadcasting.
for t_ in range(L):
t = decoding_order[:, t_] # [B]
chain_mask_t = torch.gather(chain_mask, 1, t[:, None])[:, 0] # [B]
mask_t = torch.gather(mask, 1, t[:, None])[:, 0] # [B]
bias_t = torch.gather(bias, 1, t[:, None, None].repeat(1, 1, 21))[