Skip to content

Instantly share code, notes, and snippets.

@Barry1
Last active June 7, 2020 11:06
Show Gist options
  • Save Barry1/ed4f5c46600eb52397f76dd8831ad123 to your computer and use it in GitHub Desktop.
Save Barry1/ed4f5c46600eb52397f76dd8831ad123 to your computer and use it in GitHub Desktop.
statistical appoach, plot two columns against each other with local uncertainty bands by local linear regression

Details to show uncertainty for huge amount of data

Mathematical Definitions

The following Formulas are typeset with the help of https://alexanderrodin.com/github-latex-markdown/.

statistical Variable

X \coloneqq \left{ x_i \right}

Expectation

\widehat{X} = E \left( X \right) \coloneqq \frac{1}{n} \sum_{i=1}^{n} {x_i}

Variation

\widetilde{XX} = V \left( X \right) \coloneqq E \left( \left( X - E \left( X \right) \right)^2 \right) = E\left( X^2 \right) - E^2\left( X \right)

Covariance

\widetilde{XY} = \mathrm{cov} \left( X , Y \right) \coloneqq E \left( X Y   \right) - E \left( X \right) E \left( Y \right)

derived Mathematical Rules

E \left( a X + Y \right) = a E \left( X \right) + E \left( Y \right)

V \left( a X \right) = a^2 V \left( X \right)

V \left( X + b \right) = V \left( X \right)

V \left( X + Y \right) = V \left( X \right) + 2 \mathrm{cov} \left( X , Y \right) + V \left( Y \right)

Mathematical derivation

Model

Y = a X + b + F

  • input X
  • output Y
  • error F

Somehow natural conditions

expected error is zero

\widehat{F} = E \left( F \right) \stackrel{!}{=} 0

Applying expectation on the model yields

\begin{align*} E \left( Y \right) &= E \left( a X + b + F \right) \ \widehat{Y} &= a \widehat{X} + b + \widehat{F} \ \widehat{Y} &= a \widehat{X} + b \end{align*}

This means, the model applied to the expectation values is exact.

variation of error is minial

Inserting the model into this brings up

\begin{align*} V \left( F \right) &= V \left( Y - a X - b \right)= V \left( Y - a X \right) \ &= V \left( Y \right) + 2 \mathrm{cov}\left( Y ,  -a X \right) + V \left( - a X \right)\ &= \widetilde{YY} - 2 a \widetilde{YX}  + a^2 \widetilde{XX} \end{align*}

Derive for the unknown scalar value a and setting to zero shows

0 = - 2 \widetilde{YX}  + 2 a \widetilde{XX}

or simpler and explicit

a = \frac{\widetilde{YX}}{\widetilde{XX}}

combination of the former

Inserting the result from the variation we get from the expectation

\widehat{Y} = \frac{\widetilde{YX}}{\widetilde{XX}} \widehat{X} + b

and folling explicit

b = \widehat{Y} - \frac{\widetilde{YX}}{\widetilde{XX}} \widehat{X}

Explicit model

Putting all together shows we should model as

Y \approx \frac{\widetilde{YX}}{\widetilde{XX}} \left( X - \widehat{X}\right) + \widehat{Y}

with the error variance

\widetilde{FF} = \widetilde{YY} - \frac{\widetilde{XY} \widetilde{XY}}{\widetilde{XX}}

Python Implementation

def plotwithlinreguncertainty(data: pandas.DataFrame,
                              x: str,
                              y: str,
                              res: int = 150,
                              ci=.95,
                              globalmodell: bool = True,
                              **pltopts):
    assert 2 * res <= len(data), f"Zu hohe Auflösung {res} für {len(data)} Werte. Zwei Einträge pro Block für Kovarianz nötig."
    import matplotlib.pyplot
    import numpy
    import scipy.special
    sigmafactor = scipy.special.erfinv(ci) * numpy.sqrt(2)
    wrkdata = data[[x, y]].sort_values(x).reset_index(drop=True)
    assert (wrkdata[x].is_monotonic)
    fg = matplotlib.pyplot.figure(**pltopts)
    ax = matplotlib.pyplot.gca()
    grpcol = res*wrkdata.index.to_series() // len(wrkdata)
    grpobj = wrkdata.groupby(by=grpcol)
    pltdata = grpobj.mean()
    Vx = grpobj.cov().xs(y, level=1)[y]
    Cxy = grpobj.cov().xs(x, level=1)[y]
    Vy = grpobj.cov().xs(x, level=1)[x]
    if globalmodell:
        covges = wrkdata.cov()
        a = covges.loc[y, x] / covges.loc[x, x]
    else:
        a = Cxy / Vx
    delta = numpy.sqrt(Vy - 2 * a * Cxy +
                       a * a * Vx)  # Could be into if/else - Performance?
    # Create Plots
    lines, = ax.plot([], [])
    fillcol = lines.get_color()
    lines.remove()
    ax.plot(pltdata[x], pltdata[y], color=fillcol)
    ax.fill_between(pltdata[x],
                    pltdata[y] - sigmafactor * delta,
                    pltdata[y] + sigmafactor * delta,
                    facecolor=fillcol,
                    alpha=.15)
    ax.set_xlabel(x)
    ax.set_ylabel(y)
    ax.set_title(f'Mean and {100*ci} % confidence bands')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment