Confidence intervals and quantile regression

Confidence/prediction intervals to quantiles

quantile_higher = (interval_width / 2) + 0.5

Quantile metrics:

def pinball_loss(y_true: np.ndarray, target_quantile: float, y_quantile: np.ndarray) -> np.ndarray:
    """
    The pinball loss function is a metric used to assess the accuracy of a quantile forecast.
    Calculates pinball loss element wise (does not aggregate).
    Smaller is better.

    Refs:
    * https://www.lokad.com/pinball-loss-function-definition
    """

    return (
            ((y_true - y_quantile) * target_quantile) * (y_true >= y_quantile)
            + ((y_quantile - y_true) * (1 - target_quantile)) * (y_true < y_quantile)
    )
    
def scaled_pinball_loss(
        y_true: np.ndarray, target_quantile: float, y_quantile: np.ndarray, y_train: np.ndarray) -> float:
    """
    Scaled pinball loss. Scaled using training ts_data.

    Roughly: mean pinball loss divided by mean gradient (or diff) of training ts_data

    Refs:
    * § Probabilistic Forecasts: https://github.com/Mcompetitions/M5-methods/blob/master/M5-Competitors-Guide.pdf
    * https://www.kaggle.com/chrisrichardmiles/m5u-wsplevaluator-weighted-scaled-pinball-loss
    """

    n_training = len(y_train)
    assert n_training > 1

    return (
            pinball_loss(y_true, target_quantile, y_quantile).mean()
            / ((1 / (n_training - 1)) * np.abs(np.diff(y_train)).sum())
    )
    
def interval_score(y_true: np.ndarray, y_upper: np.ndarray, y_lower: np.ndarray, interval_width: float) -> float:
    """
    Mean of the "interval score"

    Refs:
    * R implementation: https://rdrr.io/cran/scoringutils/man/interval_score.html
    * Paper. §6.2, page 21: https://apps.dtic.mil/sti/pdfs/ADA454828.pdf
    """
    alpha = 1 - interval_width
    return (
            (y_upper - y_lower)
            + 2 / alpha * (y_lower - y_true) * (y_true < y_lower)
            + 2 / alpha * (y_true - y_upper) * (y_true > y_upper)
    ).mean()

Standard deviation from quantile prediction (assume normally distributed)

$l$ and $u$ are the lower and upper predictions
$\mu$ and $\sigma$ is the mean and standard deviation
$\Phi^{-1}(p)$ is the inverse of the cumulative distribution function with probability $p$ . (AKA quantile function or percent point function). Scipy inpmentation, Matlab implmentation

$\sigma = \frac{u-\mu}{\Phi^{-1}(p)}$

and

$\sigma = \frac{\mu-l}{\Phi^{-1}(p)}$

To take the average of the two values:

$\sigma = \frac{0.5(u-l)}{\Phi^{-1}(p)}$

from scipy.stats import norm
def quantil_interval_to_std(lower, upper, interval_width=0.8):
    quantile_higher = (interval_width / 2) + 0.5
    std = 0.5*(upper-lower)/norm.ppf(quantile_higher)
    return std

joshlk/confidence_intervals.md

Confidence/prediction intervals to quantiles

Quantile metrics:

Standard deviation from quantile prediction (assume normally distributed)