Byte pair encoding tokenizer ported almost verbatim from Python code by Andrej Karpathy: https://github.com/karpathy/minbpe/blob/master/minbpe/basic.py
Inspired by the video on Andrej's Youtube channel
It's a Scala CLI project, so you can easily run it with scala-cli run bpe.scala -- shakespeare.small.txt