Skip to content

Instantly share code, notes, and snippets.

@gpluscb
Last active November 8, 2025 23:19
Show Gist options
  • Select an option

  • Save gpluscb/302d6b71a8d0fe9f4350d45bc828f802 to your computer and use it in GitHub Desktop.

Select an option

Save gpluscb/302d6b71a8d0fe9f4350d45bc828f802 to your computer and use it in GitHub Desktop.
So You Want to Use Glicko-2 for Your Game's Ratings

I wrote this article right after I published the first version of instant-glicko-2. It is meant to document how to implement a Glicko-2 algorithm that allows for instant feedback after games.

So You Want To Use Glicko-2 For Your Game's Ratings

Great! Glicko-2 is a very cool rating system. And a popular choice too! Lichess, CS:GO, and Splatoon 2 all use the Glicko-2 system.

It also offers unique advantages.

Glicko the first

Glicko-2 is builds on the original Glicko rating system. Glicko aims to improve on Elo by adding a measure of rating uncertainty, the "ratings deviation" (RD).

Using this value, we can calculate a confidence interval in which the player's actual strenght most likely lies. If r is the rating, the players actual strength is expected to lie between r - 2RD and r + 2RD in 95% of cases.

This RD value always decreases with every game the player plays - after all, a played game is a good clue to the player's actual strengh. And when the player doesn't play games, it decreases with the time of inactivity. So if a player stops playing rated games for a year, we are less certain about their strength coming back.

To achieve the RD decay over time, it also introduces a little devil named "rating period". But we'll think about that when we actually try to use Glicko-2 for our game.

Glicko-2 (the sequel)

Glicko-2 aims to further improve on Glicko by introducing another variable to the rating, the "rating volatility" (σ). This value describes the expected fluctuation in rating. If the value is high, the player is expected to have some high fluctuation in performance, and if it is low, they are expected to be very consistent. The value does not affect the confidence interval discussed above.

The average value will be higher for games that, for example, require some amount of luck, or where fewer games per match are played.

The value doesn't change during times of inactivity.

For intermission, listen to me go on a short rant about esports that is only kinda related

Now if you allow me to go on a small tangent here, I see potential for some great marketing in this rating volatility too. Give it a catchy name, and you can have stories about how players with a high X-Factor have the most exciting and dramatic performances. Anything can happen when they're on stage.

Meanwhile, players with a very low X-Factor are walls. They are extremely solid, experts in dealing with every playstyle you can throw at them, and they are a true test of strength. If you beat them, your improvement payed off. After all, beating them is very unlikely to be a fluke.

These stories happen organically in competitive games. For example, you'll hear a lot of commentators comment on how consistent and how much of a wall Dabuz is when he is on stage in Smash Bros. broadcasts. He even has been crowned "King of Consistency" by respected Smash Bros. community ranking authority PGstats.

The same article that names Dabuz as a very consistent player also names Marss as a player who is the opposite.

When Marss is hot, he is nigh unbeatable by anyone outside of the top 5 players in the world. When he is playing at his best, his potential is limitless. The problem is consistency [...].

I think the possibility to capture those stories in a value even for players who are not at the very top and who will not have such articles written about them is very exciting.

Implementation

The implementation should be relatively straight-forward. We just look at the steps described in Glickman's paper, and we're good. Just one little problem...

Screenshot from Glickman's paper. The section is "The formulas". The higlighted text reads: To apply the rating algorithm, we treat a collection of games within a "rating period" to have occurred simultaneously. Players would have ratings, RD's, and volatilities at the beginning of the rating period, game outcomes would be observed, and then updated ratings, RD's and volatilities would be computed at the end of the rating period

Do you spot it?

I brushed away the little devil named "rating period" earlier, and now it's coming back to haunt us.

We can only calculate ratings when such a rating period completes, and they don't complete after every game! In fact, Glickman recommends that at least 10-15 games per player should happen every rating period. So this is something we need to work around if we want to show our players how their ranking changed after a game. There are multiple approaches.

One simple approach is described in a blogpost by Ryan Juckett titled "The Online Skill Ranking of INVERSUS Deluxe". But this approach also has drawbacks. The later blogpost "Additional Thoughts on Skill Ratings" adresses these, and proposes a potential solution.

This solution seems to be very similar or even identical to the one Lichess uses. And one great thing is: Lichess is open source!

The crux of the solution is to allow fractional rating periods. We now can evaluate temporary ratings for a specific point in time in a rating period, and work with that. The secret sauce can be found in the RatingCalculator class in the Lichess implementation. Or, alternatively, in me own repo for which I stole it :).

So our new strategy for calculating a player's rating at a given point in time is:

  1. If necessary, close every rating period for our players that hasn't been closed yet and commit their rankings.
    We do this by just performing the steps described in the paper.
  2. Get every result for the player in the current rating period.
  3. Get the current player rating by using the results in the current rating period, as well as Lichess' our cool fractional period secret sauce.

And that's it really.

Sources/further reading

Wikipedia

Elo

Glicko and Glicko-2

Actual Other sources

Original paper on Glicko

Original paper on Glicko-2

Glickman's other adventures

Lichess' rating source code

Blogpost on how INVERSUS Deluxe implements Glicko-2

Blogpost on how the dev of INVERSUS Deluxe would want to implement Glicko-2

@gpluscb
Copy link
Author

gpluscb commented Jan 8, 2025

Hi @lorenzocarli83

thanks for the comment, I'd love to get in touch. If you use Discord, I'm @queer. there.
Otherwise, you can write to me at [email protected].

@frostu8
Copy link

frostu8 commented Nov 8, 2025

Hi! Unsure if you are still active on this, but I implemented this for a Ring Racers MMR API.

This works pretty great, but unfortunately players will sometimes "lose" MMR if they win against a highly skilled opponent and win against a very low skilled opponent in the same period, which looks very strange. This happens a lot when deviation is high, but it also still sometimes happens with more established players. Might you have any suggestions to fix this? Thanks!

@gpluscb
Copy link
Author

gpluscb commented Nov 8, 2025

@frostu8 That's a very interesting effect, I have not seen this before. But I can confirm it also happens with my implementation. I think this might be a direct consequence of the maths used.
image
This equation calculates the new rating. Here phi' is the new rating deviation in the internal glicko-2 scale and the sum sums over values of the games played in the period.
The more games are played within the rating period, the lower phi' will be. So maybe if you play a second game against a very low rated opponent, phi'^2 drops low enough to offset the rating that game would add to the sum part.

If this is true, playing more games in a rating period could indeed hurt your rating coming out of the period in the glicko-2 system (thought your rating deviation will of course also be lower). However, I'm not super familiar with the logic behind all the maths, so I can't confirm this. If you're really curious you could try to contact Dr. Glickman to ask about this.

As for how to fix this, if this is really an issue with the glicko-2 algorithm and not just with our implementations, there will be no simple way to fix it with this approach. You could maybe try to implement a minimal rating gain/loss per win/loss, but that wouldn't be exactly "true to glicko-2" if you care about that.

There is also one alternative approach to implementing glicko-2 that I was experimenting with some time back but didn't write about. The approach is to mostly ignore rating periods and to instead actually update ratings after every single game. When calculating the new rating, the new deviation is calculated using the fractional rating period approach, but other than that everything is standard glicko-2. One problem with this is that technically you want both the player's and the opponent's rating (especially the deviation) to represent the same point in time, in particular the time the player's rating was last updated. I'm not sure how possible that is since both ratings will have been updated at different points in time. If the opponent's rating was updated last, maybe you could try to calculate backwards to what the deviation would have been when the player's rating was last updated, but I'm not sure how reliable or desirable that would be. Maybe one would just need to accept some error when choosing this approach.

Because only a single game is rated at a time with this approach, I believe it would sidestep the issue. Of course also isn't exactly "true to glicko-2", but I think it's a logical approach to "glicko-2, but without the rating periods".

@frostu8
Copy link

frostu8 commented Nov 8, 2025

@gpluscb Thanks for the response! I think your conclusions are on-point.

I actually did educate myself shortly after asking the question and while I didn't get a "fix," I learned quickly that this is correct. I think Lichess had a problem like this in its younger days (forum post, and then the rotted link on WayBack, these are for different rating systems but the conclusions I think are relevant) that describes this effect. I don't think this is an issue Lichess has anymore, though I haven't figured out the secret.

I can sidestep most of the MMR visual clarity issues by making player MMR invisible until your rating deviation is sufficiently low enough, that way you need an impressively unbalanced matchup to see a -1 or -2. All of my players that have reported this so far also note this only happens in their first 10 matches playing.

I'm not too sure if I care about the correctness of the Glicko-2 impl here, so if the invisible MMR fails to mask these small drops, I may also include a min MMR gain/loss failsafe. Thankfully, Glicko's rating and deviation are a lot less sensitive than, say, OpenSkill's rating and deviation, so I think it could work.

Though with that being said, I may contact Dr. Glickman to see if he has some insights. I'm sure there is a way to put up "smoke and mirrors" around the MMR to make it look like you never drop on a win or gain on a loss, but I need to understand the math more before I start lying to my players about their MMR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment