This is a KL controller: """ class AdaptiveKLController: """Adaptive KL Controller as described in Ziegler et al. "Fine-Tuning Language Models from Human Preferences" Reference: Section 2.2 https://arxiv.org/pdf/1909.08593.pdf#page=2 Source: https://github.com/openai/lm-human-preferences/blob/master/lm_human_preferences/train_policy.py """
def init(self, init_kl_coef: float, target: float, horizon: int):