BGG Star Realms League player ratings
It has exploded.
Over 160 different players have participated in the league over its first nine seasons, with the current league roster (as of Season 9) at 120 players. (Note: This is set to explode again for Season 10, up to as many as 180 players. Go to this waiting list to have a chance to be in the next season.)
For several seasons, I have been pondering how best to implement a rating system for the league. My first attempt did not adequately capture the dynamism of the league; an adjusted version of that first attempt -- which is more a points-scoring system than a rating system, technically -- has been re-implemented as the Stang rating. (More information on this below.)
Numerous people, however, indicated that a more accurate assessment of player skill would be something called an Elo rating, which has long been the method used to determine relative strength of chess players. What is Elo, and why is it a good tool? Matt Schoonmaker-Gates (BGG user: railbaron; Star Realms user: Schoonmj) brought his Excel skills to bear in generating Elo ratings for all players in all seasons using the data I have compiled throughout the league's history. Here's what he has to say:
"The Elo rating system is a method for calculating the relative skill levels of players in competitor-versus-competitor games. The difference in the ratings between two players serves as a predictor of the outcome of a game. A player whose rating is 100 points greater than their opponent's is expected to win 64% of the time; if the difference is 200 points, then the expected winning percentage for the stronger player is 76%.
A player's Elo rating is represented by a number which increases or decreases based upon the outcome of games between rated players. After every game, the winning player takes points from the losing one, with the difference between the ratings of the winner and loser determining the total number of points gained or lost after a game.
The rating system is self-correcting. This means that a player whose rating is too low should, in the long run, do better than the rating system predicts, and thus gain rating points until the rating reflects their true playing strength.
In the Star Realms adaptation of the Elo rating system, every player starts with a rating of 1500, which is the rating for an "average" player. Your rating changes with each game you play -- as you win games, your rating will increase ... and vice versa if you lose games. The system takes into account the strength of your opponents, so beating better opponents will improve your rating more, and losing to less-skilled opponents will hurt your rating more.
We all know that luck is a part of Star Realms. The best player can lose to the worst player if that's the way the cards fall. This doesn't mean that Elo in Star Realms is useless. Rather, it simply means that the best players won't ever achieve extremely high ratings, and the worst players won't achieve extremely low ratings. In chess (a game with basically no luck) a grand master can achieve ratings in the upper 2000s, while beginners can be well below 1000. We don't expect anyone to approach these ratings in Star Realms, but we won't find out until we play more games!"
Even after compiling the Elo rating, I still found myself wanting to implement the Stang rating, named for Mark Stang (BGG user: stangm; Star Realms user: BCSBuck), who was instrumental in helping me to craft a better version of the system I originally designed. The intent of this is not necessarily to be as robust a representation of player skill (which has a more predictive utility) but rather to be a representation of player performance in the league, with continual play in higher tiers being rewarded more.
The Stang rating system is a points system designed to reward players for performing well in and spending more time in higher divisions.
A win in the Platinum Tier is worth 100 points, with lower tiers yielding fewer points for each win -- Gold is 90, Silver is 82, Bronze is 75, and Iron is 69. A player's points are tallied, then divided by the number of games he or she has played, thus normalizing the final rating to a 100-point scale. You will find this score on this list as the Simple Stang rating. The notable flaw to the Simple rating is that a player who plays in and performs well in only one season will have a very high rating. This is ultimately solved for the Simple rating by removing players from the list when they are no longer active. Many players will find the Simple Stang rating to be a sufficient indicator.
It is solved in a different way by another variant on the rating. With the Legacy Stang rating, all seasons of the league are factored in. Therefore, even if a player has not played in a particular season, the number of games in that season are added to his or her "games played" ... which reduces the end result, sometimes quite dramatically. The notable flaw to the Legacy rating is that it highly rewards those players who have played in more seasons; players who have joined in more recent seasons will not perform well on this indicator. (This is the one that bears the most similarity to my original concept. We can thank all the players who shot this idea down for the better system that has arisen.)
My solution to the problem of the Legacy rating is to look at only the three most recent seasons. This Recency 3 Stang rating tries to split the difference between not over-rewarding high-performing newer players while not over-rewarding players who got in on the ground floor. It functions exactly like the Legacy rating in factoring in seasons in which a player has not played, but since there will never be more than two seasons (out of the three) for which this is true, nobody is terribly over- or under-penalized. Players who have not played in the last three seasons are simply not on the list. This is my preferred version of the Stang rating.
After posting this Geeklist, I was approached by Rob Brott (BGG and SR username: rlbrott) about implementing the TrueSkill system developed by Microsoft. More ratings doesn't seem like a bad thing to me, so I have decided to add this one as well. Here is what Rob has to say about what TrueSkill is trying to accomplish:
TrueSkill™ is a ranking system that was developed by Microsoft Research especially for Xbox Live. It is, in some sense, an extension of the Glicko rating system extended to multi-player games and TrueSkill explicitly models draws. Since Star Realms is a two-person game without draws, much of the machinery for TrueSkill is not used here.
The TrueSkill approach maintains a belief that the player's true ability/ranking is described by a normal (Gaussian) probability distribution with mean = mu, and standard deviation = sigma. After each game is played, we learn something (in a Bayesian sense) more about each players' true ability. By using the outcome of each match, we can adjust the estimates for mu and sigma. Over the course of more matches, the uncertainty (sigma) will be reduced and we can be more certain what a player's ability truly is.
To actually make a leaderboard we ask, with a certain level of certainty, what is their rank/ability? In the TrueSkill case, by taking mu-3*sigma as the TrueSkill score, we are saying that the person's ability is at least mu-3*sigma with probability given by 99.9%. This is a conservative estimate of the player's ability.
For StarRealms, each player starts with mu = 25, and sigma = 25/3, so that their initial TrueSkill rating is zero (0). Ratings practically lie between zero (0) and fifty (50). Since StarRealms involves a fair amount of luck, we may not see very high scores, just like for Elo ratings.
This list is ordered as follows:
- the most recent season's ratings in item #1
- all seasons, by Elo
- all seasons, by TrueSkill
- all seasons, by Stang (Simple)
- all seasons, by Stang (Recency 3)
- seasons 4 on, by Stang (Legacy)
You may click the links above to jump directly to the first season of a given rating type.
Thanks again to Matt, Mark, and Rob for their help, as well as all the others who Matt bounced ideas off of while working on the list. And, of course, a big thanks to all the players who generated all this data for us.
- [+] Dice rolls