To follow along you'll need to install optuna.
Before we do real hpo let's just look for an efficient batch size for the current machine:
batch_size: this is determined to be the maximum power of 2 (for no particular reason for now) that shows improved samples/second processing.
we need a max_length for this because of how batches are handled when training. The training process automatically pads the data for you on-the-fly for every single batch.