Elo rating systems and how to manipulate them

Can a crowd wield the ban-hammer?

Jul 30, 2019

Competitive games need to match players of similar skill. This is called matchmaking. For matchmaking to work, the game needs knowledge of each individual’s relative skill level.

Most games use a version of the Elo rating system–invented for ranking Chess players–to determine individual skill. The system assigns an Elo rating to each player and adjusts it after each match. In theory, a player’s Elo goes up when they achieve an unexpected positive outcome and goes down when they suffer an unexpected negative outcome. For instance, a highly ranked player drawing with a lower ranked player leads to the lower ranked player earning Elo and the higher ranked player losing Elo.

Serious players of esports titles obsess over rankings. Dota 2 calls it MMR, Overwatch calls it SR, League of Legends calls it LP, but they’re all implementations of Elo ratings. While cosmetics can be a fun way to show off in games, your ranking is king. A high MMR is the greatest flex of all.

This makes Elo ratings highly desirable targets for manipulation. Players want a higher rating to show off. And a very high rating gives them a chance to play in the same games as their favorite streamers or professionals. While the Elo system performs very well in honest conditions, dishonest actors can game the system. This is a common occurrence in esports titles.

This paper by Microsoft summarizes the game-ability of Elo rankings well:

A question often asked in relation to rating systems is if the system can be “gamed.” For combined rating and matchmaking systems there are essentially two things people might want to do:
Have their ratings appear higher than they actually are to show off on the leader boards (“stats boosting”)
Have their ratings appear lower than they actually are to manipulate the matchmaking so they get easy-to-win games (“de-leveling”)
Stats boosting in Elo type systems as described here is essentially only possible if the boosters are able to manipulate the game outcome in their favor.
They can have a higher-skilled gamer play under their account.
They can try to get matched with friends who are willing to help manipulate the game outcome. Possibilities here are network “bridging” and entering matchmaking at exactly the same time when a low population of players can be expected.
They can manipulate the game software or the network connection.
De-leveling may be achieved as follows.
They can obtain a new account effectively resetting their ratings.
They can lose matches on purpose to get negative Elo updates. In order to delevel as quickly as possible de-levelers frequently leave the game as soon as it has started.

Gaming Elo ratings is either “identity fraud” or “match fixing.” Having a high-skilled gamer play for you or creating a new account is identify fraud. Getting friends to join your games on the opposing team and lose on purpose, hacking opponents to disconnect, or losing on purpose is “match fixing.”

These are common practice. The top rated Korean players for Teamfight Tactics were recently exposed for DDOSing their opponents, forcing them to disconnect so they couldn’t play the game. When I played Overwatch, it was common to find entire teams that were purposefully “throwing” games to derank so they could “stomp” on lower skilled teams later on. And a couple years ago, there was a scandal in Dota 2 involving players connecting to obscure servers at off-hours so they could have their friends on the opposing team intentionally lose.

“Boosting” (artificially raising your MMR) and “smurfing” (artificially decreasing your MMR) both negatively impact the overall game ecosystem. There’s no way to tweak the Elo rating system itself to prevent these exploits. Instead, devs have to rely on banning accounts, which they tend to do early and often.

Here is another example in games of a system that relies on centralized operators to function. Without the power of the ban hammer, ranked matchmaking would completely devolve. And with poor matchmaking, the game becomes unfun for the average player.

Perhaps decentralized governance could be used to enforce bans. And increased cost for creating identities would minimize identity fraud (more “sybil resistance”). Lots of interesting open research questions here that block the release of a functioning competitive game without an all-powerful operator.

What do you think? How could a network maintain the legitimacy of a matchmaking system?

Stuffed Blocks

Discussion about this post