

This post is to provide an updated list to my previous rankings based on the pairwise comparison BradlyTerryLuce model (see http://www.boardgamegeek.com/thread/221606).
Technique Summary The BTL model is used for paired comparison data. This is ideal for the rating data most people provide because, regardless of the rating guidelines, individuals tend to use different underlying scales. Furthermore, the only viable assumption to make is that the rating data for each individual is ordinal (see here if you need a refresher on data types: http://faculty.chass.ncsu.edu/garson/PA765/datalevl.htm).
The model itself is based on fitting a logit model for paired evaluations, providing a parameter estimate for each game. At its most basic level, the only thing that matters when comparing two games is the proportion of times one game is preferred over another. The current framework introduces an additional parameter, P, that provides a weight for the number of the number of times game A was rated when game B was not  this value can range anywhere from 0 to 1 (e.g., when P = .1, that means that for every 10 times game A was rated and game B was not, game A gets a "preference point". Similarly, if P = 1, then game A would get a point every time it is rated when game B is not).
Advantages of BTL as compared to Bayesian Averaging
Problem:The current Bayesian averaging system is inappropriate for the type of data generated for individual users. The Bayesian system treats the data as interval level data, when actually it is ordinal level data. BTL Solution:The BTL model actually matches the measurement type of the data.
Problem:The Bayesian averaging system uses an arbitrary number of "dummy votes" of 5.5 to correct the possibility of over weighting games with high ratings but relatively few number of votes. The problem with this approach is that the number of dummy votes is based on an arbitrary heuristic, regardless of how well it is thought out. BTL Solution: Clearly, the dummy votes are serving a similar role as P in the current modeling framework. It is of note that the nummber of dummy votes is unbounded  it could be set to infinity and all average ratings would converge to 5.5; on the other hand, in the BTL framework, P is bounded below by zero and above by one. Since P is on a continuous interval, we can integrate it out and compute the expected value of the ranking of each game across the entire interval. In this situation, there is no researcher decision that biases the ranking in any direction  just the required calculus to integrate the likelihood function.
Problem: The current system convolutes the idea of ranking and rating. Specifically, the average ratings are computed and then the rankings are derived from ordering the rankings. BTL Solution: The model does not provide average ratings. In fact, it provides a full ranking of all games analyzed. Additionally, for any two games, one can compute the probability that one game is preferred over the other.
Data Stuff The data were downloaded in early October and consist of all games that had more than 100 individual ratings at the time, resulting in 2,264 games being included in the analysis. The ranking is based on fitting the BTL model via maximum likelihood esimation (see http://en.wikipedia.org/wiki/Maximum_likelihood).
RESULTS TOP 100 1. (Puerto Rico) 2. (Power Grid) 3. (Tigris & Euphrates) 4. (El Grande) 5. (Caylus) 6. (Settlers of Catan, The) 7. (Ra) 8. (Princes of Florence, The) 9. (Ticket to Ride) 10. (Carcassonne) 11. (Agricola) 12. (San Juan) 13. (Lost Cities) 14. (Memoir '44) 15. (Ticket to Ride: Europe) 16. (Citadels) 17. (Samurai) 18. (Through the Desert) 19. (Acquire) 20. (Bohnanza) 21. (Race for the Galaxy) 22. (Goa) 23. (Modern Art) 24. (RoboRally) 25. (Tikal) 26. (Ingenious) 27. (Thurn and Taxis) 28. (Alhambra) 29. (Arkham Horror) 30. (BattleLore) 31. (Shadows over Camelot) 32. (War of the Ring) 33. (Twilight Struggle) 34. (Blokus) 35. (AmunRe) 36. (Railroad Tycoon) 37. (Saint Petersburg) 38. (Magic: The Gathering CCG) 39. (Taj Mahal) 40. (Carcassonne: Hunters and Gatherers) 41. (Lord of the Rings) 42. (Pillars of the Earth, The) 43. (For Sale) 44. (Age of Steam) 45. (Game of Thrones, A) 46. (Twilight Imperium 3rd Edition) 47. (Pandemic) 48. (Coloretto) 49. (Notre Dame) 50. (Attika) 51. (Torres) 52. (Chess) 53. (Bang!) 54. (Settlers of Catan Card Game, The) 55. (Go) 56. (Shogun) 57. (Lord of the Rings  The Confrontation) 58. (Hive) 59. (Battle Line) 60. (Formula De) 61. (Descent: Journeys in the Dark) 62. (Diplomacy) 63. (Hey! That's My Fish!) 64. (Carcassonne: The Castle) 65. (HeroScape Master Set: Rise of the Valkyrie) 66. (Blue Moon City) 67. (Category 5) 68. (Ticket to Ride: Marklin Edition) 69. (Yspahan) 70. (Civilization) 71. (Scrabble) 72. (Louis XIV) 73. (Traders of Genoa, The) 74. (Imperial) 75. (Mr. Jack) 76. (Age of Empires III: The Age of Discovery) 77. (Colossal Arena) 78. (Liar's Dice) 79. (Apples to Apples) 80. (Die Macher) 81. (Hollywood Blockbuster) 82. (Medici) 83. (La Citta ) 84. (Zooloretto) 85. (Tichu) 86. (Nexus Ops) 87. (Wallenstein) 88. (Fury of Dracula) 89. (No Thanks!) 90. (Reef Encounter) 91. (Cosmic Encounter) 92. (Commands & Colors: Ancients) 93. (Elfenland) 94. (1960: The Making of the President) 95. (Jambo) 96. (Vinci) 97. (Maharaja: Palace Building in India) 98. (Union Pacific) 99. (Vegas Showdown) 100. (Can't Stop)
Here we see that Agricola fell from the BGG ranking of 1 to a ranking of 11 by BTL. The following graph helps explain this
Agricola is ranked 2 when P = 0, then moves to 3 at P = .05 and continues to descend afterwards. The truth is that the current BGG ranking uses a narrow range of dummy votes that results in Agricola being one; however, it is completely arbitrary. In the current analysis, I use the expected value because it:
a. Accounts for an average across all feasible constructions of the popularity and availability of games that would impact people's preferences.
b. It is not arbitary, it is very welldefined and mathematically defensible.
c. If one really wanted to define their rankings at a particular popularity level, that is completely possible as well. For instance, while Agricola is ranked 11th on average, it is only ranked 11th or better in the analysis for about P < .15  thus, it is possible for a small portion of the range (in this case 15%) to outweigh the remaining portion of the range.
Here are the Top 20 based on P = 0
1. (Conflict of Heroes: Awakening the Bear!  Russia 19411942) 2. (Agricola) 3. (Napoleon's Triumph) 4. (Twilight Struggle) 5. (EastFront II) 6. (Power Grid) 7. (Grant Takes Command) 8. (Through the Ages: A Story of Civilization) 9. (Warriors of God) 10. (Brass) 11. (Puerto Rico) 12. (Dominion) 13. (Paths of Glory) 14. (Asia Engulfed) 15. (The Devil's Cauldron: The Battles for Arnhem and Nijmegen) 16. (Advanced Squad Leader (ASL) Starter Kit #3) 17. (DAK2) 18. (Die Macher) 19. (El Grande) 20. (Tigris & Euphrates) As expected, this list contains games with fewer (but higher rankings) and many more wargames. This type of ranking could be done with any value of P; however, the expected value seems to be the most "fair" for a general ranking of the games.
EDIT: More rankings (200 and on) and more graphs can be included as requested. Or more lists at different levels of popularity  all in all, it is a very flexible modeling system.
EDIT: Added List for 0 < P < .5 This edit was put in to illustrate some of the concerns about using the full range of P. This uses half of the range, here Agricola only falls to 7, instead of 11. In short, it shows that the top half of the range of P doesn't have too dramatic of an effect on the overall outcome.
1. (Puerto Rico) 2. (Power Grid) 3. (Tigris & Euphrates) 4. (El Grande) 5. (Caylus) 6. (Princes of Florence, The) 7. (Agricola) 8. (Ra) 9. (Settlers of Catan, The) 10. (Ticket to Ride) 11. (Race for the Galaxy) 12. (San Juan) 13. (Goa) 14. (Memoir '44) 15. (Carcassonne) 16. (Ticket to Ride: Europe) 17. (Samurai) 18. (Twilight Struggle) 19. (Acquire) 20. (Through the Desert) 21. (Modern Art) 22. (Lost Cities) 23. (War of the Ring) 24. (Tikal) 25. (BattleLore) 26. (Ingenious) 27. (Citadels) 28. (Railroad Tycoon) 29. (AmunRe) 30. (Age of Steam) 31. (Bohnanza) 32. (Thurn and Taxis) 33. (Taj Mahal) 34. (RoboRally) 35. (Arkham Horror) 36. (Pandemic) 37. (Blokus) 38. (Pillars of the Earth, The) 39. (Shadows over Camelot) 40. (Shogun) 41. (Saint Petersburg) 42. (Twilight Imperium 3rd Edition) 43. (Notre Dame) 44. (Alhambra) 45. (Game of Thrones, A) 46. (For Sale) 47. (Magic: The Gathering CCG) 48. (Carcassonne: Hunters and Gatherers) 49. (Battle Line) 50. (Torres) 51. (Go) 52. (Lord of the Rings  The Confrontation) 53. (Die Macher) 54. (Lord of the Rings) 55. (Ticket to Ride: Marklin Edition) 56. (Attika) 57. (Imperial) 58. (Descent: Journeys in the Dark) 59. (Hive) 60. (Age of Empires III: The Age of Discovery) 61. (Coloretto) 62. (Yspahan) 63. (Blue Moon City) 64. (Traders of Genoa, The) 65. (Wallenstein) 66. (Mr. Jack) 67. (Civilization) 68. (Carcassonne: The Castle) 69. (Louis XIV) 70. (Tichu) 71. (Hollywood Blockbuster) 72. (Commands & Colors: Ancients) 73. (1960: The Making of the President) 74. (La Citta ) 75. (Medici) 76. (Settlers of Catan Card Game, The) 77. (Reef Encounter) 78. (Hey! That's My Fish!) 79. (Formula De) 80. (HeroScape Master Set: Rise of the Valkyrie) 81. (Diplomacy) 82. (Fury of Dracula) 83. (Chess) 84. (Union Pacific) 85. (Maharaja: Palace Building in India) 86. (Liar's Dice) 87. (Colossal Arena) 88. (Vinci) 89. (Nexus Ops) 90. (Bang!) 91. (Category 5) 92. (Thebes) 93. (Vegas Showdown) 94. (In the Year of the Dragon) 95. (Zooloretto) 96. (Stone Age) 97. (YINSH) 98. (Jambo) 99. (Santiago) 100. (PitchCar)

Steve Duff
Canada Ottawa Ontario

That first list looks way better than the official one.

T. Nomad
Netherlands Den Bosch

I don't understand much of it, but I suspect you meant to say BTL model.



tommynomad wrote: I don't understand much of it, but I suspect you meant to say BTL model.
Thanks for catching that  have a Geek Nickel. It is a wonder I have had anything published

Hunga Dunga
Canada Coquitlam British Columbia

I think wargames are awesome. Don't you?

Chris Ferejohn
United States Mountain View California
Pitying fools as hard as I can...

So if I can attempt to explain what I *think* you are saying (and please do correct me if I am wrong), this model corrects for the fact that what is an "8" to one person doesn't necessarily mean the same thing as an "8" to someone else. So if someone ranks a lot of games quite low, then their high rankings carry more weight.
On the other hand, if someone is a giant game whore, like oh, say, me (average rating of just under 7), my high rankings won't mean as much while my low rankings will be relatively damning (take that On the Underground!).
Is that roughly what this is trying to accomplish translated into "See Dick. See Dick do statistical analysis. Analyze Dick, analyze!"?



cferejohn wrote: So if I can attempt to explain what I *think* you are saying (and please do correct me if I am wrong), this model corrects for the fact that what is an "8" to one person doesn't necessarily mean the same thing as an "8" to someone else. So if someone ranks a lot of games quite low, then their high rankings carry more weight. On the other hand, if someone is a giant game whore, like oh, say, me (average rating of just under 7), my high rankings won't mean as much while my low rankings will be relatively damning (take that On the Underground!). Is that roughly what this is trying to accomplish translated into "See Dick. See Dick do statistical analysis. Analyze Dick, analyze!"?
Not quite. The only thing that matters is a persons order of games. So, if I rate Descent 8.5 and Monopoly 6, then it is clear that I prefer Descent > Monopoly. If you rate Descent 10 and Monopoly 1, then you too prefer Descent > Monopoly. Those two preferences are given equal weight because my 8.5 might be the same as your 10 and your 1 might be the same as my 6.
So all that matters is the order of preferences  like:
User #1: Descent > Agricola > Monopoly > Dominion User #2: Agricola > Dominion > Descent > Monopoly User #3: Dominion > Descent > Monopoly > Agricola
This is really the only consistent information we can extract from the ratings data. These preference orders are then amalgamated through the modeling process.

"L'état, c'est moi."
Canada Vancouver BC
Roger's Reviews: check out my reviews page, right here on BGG!
Caution: May contain wargame like substance

Hungadunga wrote: I think wargames are awesome. Don't you?
Yeah baby.

"L'état, c'est moi."
Canada Vancouver BC
Roger's Reviews: check out my reviews page, right here on BGG!
Caution: May contain wargame like substance

steinley wrote: Not quite. The only thing that matters is a persons order of games. So, if I rate Descent 8.5 and Monopoly 6, then it is clear that I prefer Descent > Monopoly. If you rate Descent 10 and Monopoly 1, then you too prefer Descent > Monopoly. Those two preferences are given equal weight because my 8.5 might be the same as your 10 and your 1 might be the same as my 6.
So all that matters is the order of preferences  like:
User #1: Descent > Agricola > Monopoly > Dominion User #2: Agricola > Dominion > Descent > Monopoly User #3: Dominion > Descent > Monopoly > Agricola
This is really the only consistent information we can extract from the ratings data. These preference orders are then amalgamated through the modeling process.
Nifty!
I have questions... (always with the questions)
How does the method account for equivalent preferences within a user.
For instance, let's say this example:
User #1: Descent > Agricola = Monopoly > Dominion User #2: Agricola = Dominion > Descent > Monopoly User #3: Dominion > Descent > Monopoly = Agricola
I'm just curious because looking at my own collection, I have about 200 items and they're rated more or less according to a standard distribution. I have a lot of games rated a 7 for instance, so how would those preferences wash out since without asking me which of my 7's I like best I don't see how an algorithm would know that I preferred GameA to GameB.
Would the sheer number of equivalences smooth that out, or would secondary preferences be considered?
Finally, with this kind of methodology how likely is it to have a game to have a higher ranking than a game with a higher rating?

Tom Chappelear
United States Kensington California

What's striking is how the older, more "classic" Euros have risen to the top. It looks like BGG when I first signed up.
Any guesses as to why this might be?

(The Artist formerly known as) Arnest R
Germany Munich Bavaria
Keep calm and carry on...

Would the results be significantly different if one took into account more than the order of preferences by weighing the differences by the rating deltas ?
E.g. if I rate
Power Grid a 9 Chizo Rising an 8 Funny Rabbits a 3
I'd have something like
Power Grid > Chizo Rising >>>>> Funny Rabbits
Any way to integrate this into your model (sorry, have not looked your links up yet) ? I am not suggesting to use absolute differences but to give bigger differences more weight to evaluate the probability of liking one game over another. That information seems to be present in individual user rankings...
Just curious

Peter
Canada Vancouver British Columbia
Game Artisans of Canada Member
My Best Bud Parker 20042016

Note to self: Must remember to order personal game preferences in anticipation of massive website ranking overhaul.

Ryan Strand
United States Hollywood Florida

I don't want to learn the process, but I do know that your top 100 list makes much more sense to me than the actual one. So, I guess I support this...I think.

Colin Hunter
New Zealand Auckland
Stop the admins removing history from the Wargaming forum.

Awesome, thanks for putting this together!

Tomello Visello
United States Reston Virginia

strandiam wrote: I don't want to learn the process, but I do know that your top 100 list makes much more sense to me than the actual one. So, I guess I support this...I think. I am supportive of your statement. And then I am also discomforted with my own support when I can resummarize it as, "I feel that I like this outcome much better so surely this must be a better underlying system."

John Earles
Canada Toronto Ontario

Interesting. So your goal was to run the BTL model at P, where P was the value of Maximum Likelihood to push Agricola out of the Top 10?
Explain to me again how a game with a raw User Rating of 7.61 (albeit with 9700 ratings) is a better game than a game with a raw User Rating of 8.47 (with only 5215 ratings)?
If your value of P is able to remove .86 of User Preference simply based on the fact that game B has 4485 more votes it seems to me that you are rating longevity too highly over popularity.



tomchaps wrote: What's striking is how the older, more "classic" Euros have risen to the top. It looks like BGG when I first signed up.
Any guesses as to why this might be?
Not only guesses:
The method prefers games that have more votes (at least when P > 0 like it is for the "main" list above) as games with more votes get "bonuspoints". If for example P = 0.1 then Puerto Rico would gain roughly 8,000 * 0.1 = 800 "bonuspoints" more than Agricola as it has around 8,000 votes more than Agricola (I obviously picked one of the extremest examples.). Unfortunately I can't say on first glance how much influence these "bonuspoints" have on the overall rating, but I can certainly say that this is the reason why older games which had a longer time to collect votes are positioned higher in this list than newer games.



leroy43 wrote: Nifty!
I have questions... (always with the questions)
How does the method account for equivalent preferences within a user.
For instance, let's say this example:
User #1: Descent > Agricola = Monopoly > Dominion User #2: Agricola = Dominion > Descent > Monopoly User #3: Dominion > Descent > Monopoly = Agricola
I'm just curious because looking at my own collection, I have about 200 items and they're rated more or less according to a standard distribution. I have a lot of games rated a 7 for instance, so how would those preferences wash out since without asking me which of my 7's I like best I don't see how an algorithm would know that I preferred GameA to GameB.
Would the sheer number of equivalences smooth that out, or would secondary preferences be considered?
Finally, with this kind of methodology how likely is it to have a game to have a higher ranking than a game with a higher rating?
If you have given two games the same rating than your ratings don't influence the relation of these two games in this algorithm. Still their relation to other games of you which have another rating is taken into account.
It is very likely for a game to have a higher ranking although having a lower rating as long as it has more votes (at least when P > 0).



And I've a question too: Which value for P is used in the list above (you know, the one with bold TOP 100 above )? Couldn't find it anywhere in your posting.



And another question (I'll stop after this before it looks as if I'm spammnig this thread ):
While I really like the approach as I think it can provide better results than the one currently used by BGG I still have some criticism. For example although I have now looked at the other posting I still have no idea how the bonus provided by P comes into play and how big it's influence is.
Basically the bonus favors older games over newer (generally spoken  to be precise: games with more votes over games with fewer votes). But now you've yourself used an easy method to keep games with too few votes out of the ranking: You just ignored every game with less than 100 votes. I again agree but then I don't see any more necessity of giving a bonus to older games. And even if you do use a bonus for older games then I have the feeling that it is too big.
If we take the often cited Agricola: In your older posting you provided a percentage matrix showing for each pair of games (of the top 10) what percentage of players prefer this game over the others. I'd like to see that too for the current top 10 and take this as a basis to decide how to choose P.

Tony Ackroyd
United Kingdom Brighton E Sussex

Looks good to me. A much more meaningful, populist (for boardgame fans) Top 100 list. Far less niche than the current list.
But what would we do without average ratings? I'd want to keep that as well (but not use it for the rankings). But take the bayesian dummy votes out?

Doug Faust
United States Malverne New York

Great analysis!
steinley wrote: Data Stuff The data were downloaded in early October and consist of all games that had more than 100 individual ratings at the time, resulting in 2,264 games being included in the analysis.
One question: how'd you collect the data? I tried to collect a bunch of data semimanually using Stats Analyzer a while back, but my hands started hurting after the first couple hundred games...

Aaron Cinzori
United States Holland Michigan

Phrim wrote: Great analysis! steinley wrote: Data Stuff The data were downloaded in early October and consist of all games that had more than 100 individual ratings at the time, resulting in 2,264 games being included in the analysis.
One question: how'd you collect the data? I tried to collect a bunch of data semimanually using Stats Analyzer a while back, but my hands started hurting after the first couple hundred games...
You might be looking for this: http://www.boardgamegeek.com/xmlapi/



Threepy wrote: And I've a question too: Which value for P is used in the list above (you know, the one with bold TOP 100 above )? Couldn't find it anywhere in your posting.
That is part of the innovation in some sense  it finds the expected ranking across the range of 0< P < 1. So, if P was chosen randomly from this interval, the Top 100 would be what you would expect to get on average.
Of course, I could change the range and get a different ranking.



Threepy wrote: And another question (I'll stop after this before it looks as if I'm spammnig this thread ): While I really like the approach as I think it can provide better results than the one currently used by BGG I still have some criticism. For example although I have now looked at the other posting I still have no idea how the bonus provided by P comes into play and how big it's influence is. Basically the bonus favors older games over newer (generally spoken  to be precise: games with more votes over games with fewer votes). But now you've yourself used an easy method to keep games with too few votes out of the ranking: You just ignored every game with less than 100 votes. I again agree but then I don't see any more necessity of giving a bonus to older games. And even if you do use a bonus for older games then I have the feeling that it is too big. If we take the often cited Agricola: In your older posting you provided a percentage matrix showing for each pair of games (of the top 10) what percentage of players prefer this game over the others. I'd like to see that too for the current top 10 and take this as a basis to decide how to choose P.
I will provide a long edited answer to this  just give me a few hours as I have some meetings I have to attend.


