Last active
May 4, 2023 04:14
-
-
Save mbadolato/8253004 to your computer and use it in GitHub Desktop.
PHP translation of the Wilson ConfidenceInterval Calculator. Ported from Ruby and uses a hardcoded (pre-calculated) confidence (z value) instead of a dynamic calculation with a translation of Ruby's Statistics2.pnormaldist method. Since z doesn't change once it's computed, nor is the computation dependant on the passed-in values, calculating it …
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
/* | |
* (c) Mark Badolato <[email protected]> | |
* | |
* This content is released under the {@link http://www.opensource.org/licenses/MIT MIT License.} | |
*/ | |
namespace Bado; | |
class WilsonConfidenceIntervalCalculator | |
{ | |
/** | |
* Computed value for confidence (z) | |
* | |
* These values were computed using Ruby's Statistics2.pnormaldist function | |
* 1.959964 = 95.0% confidence | |
* 2.241403 = 97.5% confidence | |
*/ | |
private const CONFIDENCE = 2.241403; | |
public function getScore(int $positiveVotes, int $totalVotes, float $confidence = self::CONFIDENCE) : float | |
{ | |
return (float) $totalVotes ? $this->lowerBound($positiveVotes, $totalVotes, $confidence) : 0; | |
} | |
private function lowerBound(int $positiveVotes, int $totalVotes, float $confidence) : float | |
{ | |
$phat = 1.0 * $positiveVotes / $totalVotes; | |
$numerator = $this->calculationNumerator($totalVotes, $confidence, $phat); | |
$denominator = $this->calculationDenominator($totalVotes, $confidence); | |
return $numerator / $denominator; | |
} | |
private function calculationDenominator(int $total, float $z) : float | |
{ | |
return 1 + $z * $z / $total; | |
} | |
private function calculationNumerator(int $total, float $z, float $phat) : float | |
{ | |
return $phat + $z * $z / (2 * $total) - $z * sqrt(($phat * (1 - $phat) + $z * $z / (4 * $total)) / $total); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
/* | |
* (c) Mark Badolato <[email protected]> | |
* | |
* This content is released under the {@link http://www.opensource.org/licenses/MIT MIT License.} | |
*/ | |
namespace Bado\Tests\ScoreCalculator; | |
use Bado\WilsonConfidenceIntervalCalculator; | |
use PHPUnit\Framework\TestCase; | |
class WilsonConfidenceIntervalCalculatorTest extends TestCase | |
{ | |
/** | |
* @test | |
* @dataProvider ratingsProvider | |
* | |
* @param int $positiveVotes | |
* @param int $totalVotes | |
* @param float $expectedScore | |
*/ | |
public function it_can_calculate_scores_properly(int $positiveVotes, int $totalVotes, float $expectedScore) : void | |
{ | |
$calculator = new WilsonConfidenceIntervalCalculator(); | |
$calculatedScore = $calculator->getScore($positiveVotes, $totalVotes); | |
self::assertEquals($expectedScore, $calculatedScore, '', 0.000001); | |
} | |
public function ratingsProvider() : array | |
{ | |
// Pre-calculated score results using the known Ruby implementation | |
// Array format is [$positiveVotes, $totalVotes, $expectedScore] | |
return [ | |
[0, 0, 0], | |
[0, 10, 0], | |
[1, 2, 0.077136], | |
[10, 10, 0.665607], | |
[10, 20, 0.275967], | |
[52, 76, 0.556480], | |
]; | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@capensisma @mbadolato From a quick look, it appears as if they've added the ability to account for ranges via the following method:
Get average, subtract the minimum vote (1 for a 5 star rating system), multiply that by the total number of votes, divide that by the range interval (max - min, so a 1-5 star rating would have an interval of 4), and use that final result for the number of "upvotes".
So, if we posit an example of 10 votes, half 1's and half 5's, then that's basically equivalent to 5 upvotes and 5 downvotes. The average in such a spread would be 3 stars. Using their math: (3 -1) * 10 / 4 = 5 upvotes, which means 5 downvotes.
If we now posit an example of 10 votes, half 3's and half 5's, average is 4. Using their math: (4 - 1) * 10 / 4 = 7.5 upvotes, which would mean 2.5 downvotes.
Basically, 1 star in their system is a full downvote, 5 stars is a full upvote. Any other rating is partially a downvote and partially an upvote, only instead of counting all the votes up individually, they rely on figuring out the final spread based on the average rating. This works fine if your average value is accurate, but causes there to be variance if your average is actually a rounded value. So, make sure you use enough significant digits on your average number for it to be accurate. None of this rounding to 2 digits stuff.
To apply this to this PHP library, you could add something like this:
Whether this is mathematically sound or not, no idea. Just basing it on that Ruby library listed above.