GeekGold Bonus for All Supporters at year's end: 1000!
7,261 Supporters
$15 min for supporter badge & GeekGold bonus
23 Days Left

Support:

Recommend
3 
 Thumb up
 Hide
14 Posts

Hanabi» Forums » General

Subject: Online rating system rss

Your Tags: Add tags
Popular Tags: [View All]
Pierre Beri
France
flag msg tools
mbmbmbmb
For the upcoming official implementation of Hanabi, I am trying to find the best rating system for players. The only aim of such a system is for players to be able to restrict tables to, and find players of a similar level. The aim is not to rank players and there will be no Hanabi ranking on the website.

The two main rating systems we have considered so far are ELO and average score.

I would be glad to know which of the following ideas you think is/are the best and fairest. Also, if you have other ideas than those exposed here, I will be happy to read them.

Please note that the player starting a table will always be able to choose to play a game “off competition” and therefore the score the players at this table get will not count for their rating.

Average score
1) Average score
A simple and straightforward criterion. Every player would have one average score per variant – base game / 6 suits / 6 suits (1 of each in the 6th) / rainbow – and configuration.
A loss would count as half the points earned in the game, not as 0 point, in order not to ruin a player’s average.
Issue: low scores in your first games could weigh in your average for longer than you deserve once you have become better and consistently get good scores.

ELO
Several ideas have been put forward here.

2) Target score
The player who creates a table sets a minimum target score. The client then generates a non-playing virtual opponent whose ELO rating depends on the goal that has been set. At the end of the game, if you have beaten the target score, you gain ELO points, otherwise you lose some.
Issue: this might pervert the way some players play – if they feel the target score can hardly be reached, they will take any chances to get there if they feel this is the only way. They wouldn’t mind losing, since the result in terms of ELO would be the same as just 1 point below target.

3) –1 / 0 / +1
When the game ends, the client generates 3 virtual players: one with a score 1 point below yours, one with the same score as you and one 1 point above you. Your new ELO is calculated based on these virtual players’ ELO.

4) Multi virtual players
At every game you play, you compete with a number of virtual players, one for each score from 18 to 25. Same ELO system as above, just with more opponents.

5) End-of-day confrontation
All games played during a period of 24h will be kept in memory. At the end of the 24h, tables with the same card deal (there will be a duplicate system) will be compared and you will gain/lose ELO points based on how well you did in comparison with other tables.
Issue: this might be complicated to implement on the website, so if this is your favourite idea, tell me which other idea(s) you like.

Independent idea – team’s average
Regardless of how the ELO system works, an idea is that all players in the team gain or lose the same amount of ELO points at the end of the game as though their individual ELO ratings were equal to the average ELO rating of the team.

Hope some of these ideas sound good to you. Can’t wait to read which # you like best and whatever other ideas you come up with.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Sean McCarthy
United States
Seattle
Washington
flag msg tools
designer
badge
Avatar
mbmbmbmbmb
Some thoughts:

1) Striking out is probably not a good thing to punish at all, because it's generally the fault of 1-2 people that it happened, and often getting strikes isn't caused by low skill level but by differences in skill level. I would go so far as to - for the purposes of calculating the rating for matchmaking - treat struck-out games as worth only 1 less point than the number of successfully played cards.

2) Something that none of the ideas you mentioned address is that sheer number of plays should count for something. This is for the traditional reason that the system is more confident in your ranking while - when it has no data - it should be default assume you are in the largest category of inexperienced players. But it's also for the reason that the more experience you have, the more you've seen how people play online and therefore how you can safely interact with them.

3) Honestly, this doesn't have to be too complicated. It's only for matchmaking, and there aren't going to be that many people who are distinguishably good at the game, so we're exactly needing a fine resolution of measurement.

4) The really hard part is going to be fixing bad matchmaking. If you initially consider me to be a new/bad player, and therefore pair me with other new/bad players in 4 player games, I'm going to get bad scores regardless of how good I am. A single player can't do much to affect the score. Lots of expert moves require at least two good players.

5) I like the target score idea OK for setting up tables, but not for calculating Elo.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Malachi Brown
United States
Hermitage
TN
flag msg tools
It's turtles all the way down.
badge
“Questions are a burden to others; answers a prison for oneself.”
Avatar
mbmbmbmbmb
Dealing with lost games is an interesting issue. Is it better to average 22 and never get 25 or to get 25 half the time and strike out a quarter of the time?

1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Malachi Brown
United States
Hermitage
TN
flag msg tools
It's turtles all the way down.
badge
“Questions are a burden to others; answers a prison for oneself.”
Avatar
mbmbmbmbmb
To mitigate the "early bad games" it might be useful to age out games from a player's historical rating. e.g. calculate someone's average for the most recent X games they have played.
2 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
James Rousselle
United States
Metairie
Louisiana
flag msg tools
I would strongly recommend ELO.

The US Chess Federation has been one of the early users of the ELO system. It works great. A bell shaped distribution is typical of most populations and I don't see why Hanabi should be any different.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Pierre Beri
France
flag msg tools
mbmbmbmb
Malachi wrote:
To mitigate the "early bad games" it might be useful to age out games from a player's historical rating. e.g. calculate someone's average for the most recent X games they have played.
This is a great idea.
JGRno5 wrote:
I would strongly recommend ELO.

The US Chess Federation has been one of the early users of the ELO system. It works great. A bell shaped distribution is typical of most populations and I don't see why Hanabi should be any different.
How would you handle ELO? Any favourite idea among those suggested in the OP?
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Andrew E
United States
flag msg tools
mbmb
This just seems fundamentally impossible to implement in a productive way.

The problem is that performance in this game is dependent on the skill of the players at the table, but it's also highly dependent on the uniformity of skill (used in the game).

You can imagine a group playing Hanabi at what I think of as baseline. Draw into this side, discard out of that side, play cards that people point at and turn the crank with only the most basic of reflection (they pointed at a 5, maybe I don't play it immediately).

This group will score consistently mediocre. They'll never strike out, but they'll pretty much never win aside from incredible luck of the draw.

If you add a skilled player to this group, they will in all likelyhood cause a strike or two either by inferring things not meant to be inferred or by giving a clue that means something other than "play this immediately" or "this is a 5". Hopefully, they'll catch on that this is a point-and-play table before they strike out, but the damage is already done and a lower score will be produced than if you just dropped in a mediocre player to begin with. That player will then be punished (on average) in any system of automatic rating calculation I can think of, because they caused the group to score lower than they otherwise would have.

I think simply self-rating and sorting would probably be at least as useful as any kind of individual ELO system. I suppose you could let other players rate users after a game, which would also be more useful than anything I expect you could calculate.

That said, I think ELO or rating of some sort for groups or teams would be pretty cool. Then you could intentionally feed different groups the same deck and show how they compare.
2 
 Thumb up
0.25
 tip
 Hide
  • [+] Dice rolls
Pierre Beri
France
flag msg tools
mbmbmbmb
Andrew, I agree with what you say about level discrepancies. However, remember you can disable the rating for a given game. So if you play with players you've never been with, you'd better disable it for a game or two.

The idea behind this is that players of a similar level find each other and play together, but you don't have to use it.
Even though it will never be perfect, due to how Hanabi works (randomness + one player screwing a game on their own), there should be something to rate people.

In the last part of your post, do you mean you like the team's average idea or do you have any other idea?
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Andrew E
United States
flag msg tools
mbmb
The team's average in your post is still an attempt to produce a rating for an individual. I think that's misguided for reasons I mentioned above.

I'm talking about registering teams or groups, and rating those groups. Say I have a group of 7 people I regularly play with. We register a group, and games that we play among the 7 of us get recorded and rated as a group, not as individuals.

Unfortunately, this is almost exactly counter to your stated goal, of having a hidden rating for matchmaking purposes only, as this would be a public (or maybe private, but not hidden) rating for a group that doesn't assist in automatic matchmaking in the slightest, so perhaps you aren't interested.

However, I can imagine nice effects that pop out if you made those ratings public. Maybe separate groups at similar levels mingle to see what kind of "tech" the other group employs. Maybe a newer player who's only played at baseline pokes around and finds a mid-level group to hang out with.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Malachi Brown
United States
Hermitage
TN
flag msg tools
It's turtles all the way down.
badge
“Questions are a burden to others; answers a prison for oneself.”
Avatar
mbmbmbmbmb
Maybe something as simple as tracking how many times a player has been on a team that scored 25 would be useful. In our lunch group, new players don't play with rainbow cards until they have at least gotten 25 once without them. Granted, getting 25 can be easy if things just fall a certain way, but getting 25 consistently usually requires some level of awareness.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Pierre Beri
France
flag msg tools
mbmbmbmb
AndrewE wrote:
I'm talking about registering teams or groups, and rating those groups.
I see your point, but I think we cannot create such thing specifically for Hanabi on the website. We will have to stick with the
Plus, your idea would encourage playing always with the same players.

Malachi wrote:
Maybe something as simple as tracking how many times a player has been on a team that scored 25 would be useful. In our lunch group, new players don't play with rainbow cards until they have at least gotten 25 once without them. Granted, getting 25 can be easy if things just fall a certain way, but getting 25 consistently usually requires some level of awareness.
How often a player hits 25 is pretty much the same as what his average score or ELO rating is. A player scoring 24 at every game is better than a player scoring 25-19-25-19, etc.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Malachi Brown
United States
Hermitage
TN
flag msg tools
It's turtles all the way down.
badge
“Questions are a burden to others; answers a prison for oneself.”
Avatar
mbmbmbmbmb
beri2 wrote:
How often a player hits 25 is pretty much the same as what his average score or ELO rating is. A player scoring 24 at every game is better than a player scoring 25-19-25-19, etc.

That is debatable. I know people who will quit playing as soon as they realize that they cannot score 25. For those players, strikeout-25-strikeout-strikeout is a better record than 24-24-24-24.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Pierre Beri
France
flag msg tools
mbmbmbmb
Malachi wrote:
beri2 wrote:
How often a player hits 25 is pretty much the same as what his average score or ELO rating is. A player scoring 24 at every game is better than a player scoring 25-19-25-19, etc.

That is debatable. I know people who will quit playing as soon as they realize that they cannot score 25. For those players, strikeout-25-strikeout-strikeout is a better record than 24-24-24-24.
I believe this is 1-2% of Hanabi players. Anyway, whatever the rating system, strikeouts will be bad marks in their record, so they may prefer playing off-competition.
Plus, with the duplicate system, they had better get as high a score as possible in order to compare with other tables that had the same deal. Just for the fun of it.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Malachi Brown
United States
Hermitage
TN
flag msg tools
It's turtles all the way down.
badge
“Questions are a burden to others; answers a prison for oneself.”
Avatar
mbmbmbmbmb
beri2 wrote:
Malachi wrote:
beri2 wrote:
How often a player hits 25 is pretty much the same as what his average score or ELO rating is. A player scoring 24 at every game is better than a player scoring 25-19-25-19, etc.

That is debatable. I know people who will quit playing as soon as they realize that they cannot score 25. For those players, strikeout-25-strikeout-strikeout is a better record than 24-24-24-24.
I believe this is 1-2% of Hanabi players. Anyway, whatever the rating system, strikeouts will be bad marks in their record, so they may prefer playing off-competition.
Plus, with the duplicate system, they had better get as high a score as possible in order to compare with other tables that had the same deal. Just for the fun of it.

To belabor the point from a different direction, I would prefer to play with someone who gets 25-19-25-19 over one who gets 24-24-24-24. It's easy to get into the 20's, it is much more difficult to eek out the last few cards to get to 25.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Front Page | Welcome | Contact | Privacy Policy | Terms of Service | Advertise | Support BGG | Feeds RSS
Geekdo, BoardGameGeek, the Geekdo logo, and the BoardGameGeek logo are trademarks of BoardGameGeek, LLC.