Given counts per item:
- P = positive, U = neutral, N = negative
- T = P + U + N
Default assumption: neutral = “meh / no push”, so it dilutes by being included in T.
Vote mapping (utility):
- positive -> +1
- neutral -> 0
- negative -> -1
True latent mixture: θ = (θP, θU, θN) True sentiment score: S(θ) = θP - θN (range [-1,+1])
Naive score (don’t use for ranking):
- S_naive = (P - N) / T
Problem: small T is noisy.
Use Dirichlet prior on θ:
- θ ~ Dirichlet(αP, αU, αN)
Pick prior strength k (typical 10).
Option A (easy): symmetric prior
- αP = αU = αN = k/3
Option B (better): global baseline prior
- gP = ΣP/ΣT, gU = ΣU/ΣT, gN = ΣN/ΣT
- α = k * (gP, gU, gN)
Posterior parameters:
- aP = P + αP
- aU = U + αU
- aN = N + αN
- A = aP + aU + aN (= T + k)
Smoothed mean sentiment:
- μ = E[S] = (aP - aN) / A
Goal: “pretty sure it’s good”, not “maybe good”.
Analytic variance for S = θP - θN under Dirichlet(a):
- Var(S) = (A*(aP + aN) - (aP - aN)^2) / (A^2 * (A + 1))
- σ = sqrt(Var(S))
One-sided 5% lower bound (default):
- Q = μ - 1.645 * σ
Notes:
- Q in [-1,+1]
- Small T => larger σ => Q drops (good)
- Big T => σ shrinks => Q ~ μ
Sort by:
- Q (lower bound)
Display:
- μ (smoothed mean)
- T (evidence)
- optionally decisiveness: D = 1 - U/T
Neutral is “no signal” (don’t dilute):
- compute using only P and N:
- T' = P + N
- apply same prior logic to (P,N) as a Beta posterior
- score is θP - θN = 2θP - 1
Neutral is slightly positive:
- use weights w = (1, 0.2, -1)
- then S(θ) = wP θP + wU θU + wN θN
- same Dirichlet posterior, same “lower bound” idea