$10.00

Mining the Geek

Discussion about methods for gathering data from BGG, ideas of what you might look for, and interesting results found. There's an amazing wealth of data here on BGG, and a lot of really interesting ways to look at it.

Recommend
104 
 Thumb up
7.00
 tip
 Hide

BGG's most influential reviewers

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
What reviewers do people check out? Who has a good following? We've all seen a lot of the popular reviewers' work, but how do they compare?

We start with a list of the top 7500 games by BGG rank, and for this purpose filter it down to the games that are listed as released in 2010 or 2011, just so we are looking at more recent stuff.

For each game, get a list of videos for it identified with the category of "review".

For each video review, get a list of all the users who thumbed or tipped the video post itself. The author-user pairing gets a point for each thumb, and tips get turned into points at an exchange rate of 1 GG = 10 points.

Also for each game, get a list of all the text reviews, by checking the game-specific review forum.

For each text review, get a list of all the users who thumbed or tipped it. The author-user pairing gets points as above. Text reviews can be thumbed/tipped both in the "header" and the "body" of the post, so we look at and combine both of those.

Now we have a list of authors (combined from the text and video reviews), and for each a list of all the people who thumbed/tipped them along with a score based on the number/amount of thumbs/tips totaled from all their reviews. We filter out those with a score of less than 5 (either 5 thumbs, 0.50 GG in tips, or some combination thereof) -- so what we have left is a good approximation of the text and video reviewers along with their "followers", where a follower has shown via thumbs/tips that they might pay at least some attention to what the author posts when they see their name. The threshold of 5 is pretty low, but should filter out the random/occasional thumb or tip associations.

Here's a list of the top 25 reviewers for games released 2010-2011, with the number of "followers" (based on the above criterion):

Drakkenstrike (1205)
TomVasel (870)
EndersGame (471)
marnaudo (341)
UndeadViking (321)
doubtofbuddha (159)
JohnBandettini (132)
leroy43 (119)
eekamouse (95)
thinwhiteduke (93)
snicholson (80)
superflypete (78)
slaqr (65)
idlemichael (46)
jamesmckane (44)
toerck (41)
ckirkman (30)
Synnical77 (30)
daveboy (30)
thepackrat (29)
spacedogg (29)
Aurendrosl (25)
Neil Thomson (24)
Uncle G (21)
gittes (21)

Remember that these only go by thumbs/tips -- there's lots of unregistered vistors, users who aren't logged in, and people who read/watch but don't thumb, but I think this is list certainly forms a good ranking on which all those other viewers would be based on.

We can try to apply the data for use, by say, a publisher looking to figure out who they should get a new game to. If you are interested in "coverage" -- what subset of reviewers reach the most people -- we can calculate that. Using a greedy algorithm, we can loop through the reviewers, adding ones to our list that will add the most already-not-included followers (a greedy algorithm is not without weaknesses, but it's a decently good/fast way to calculate coverage in this case).

So here's the top 12 reviewers based on the same data, sorted by the number of new followers each would add (based on all reviewers' followers listed above them being included already):

Drakkenstrike (1205)
TomVasel (317)
EndersGame (224)
marnaudo (117)
doubtofbuddha (53)
UndeadViking (38)
leroy43 (37)
JohnBandettini (33)
thinwhiteduke (24)
toerck (13)
daveboy (12)
superflypete (10)

There is a lot of overlap, so the numbers do drop down quickly. It's not a definitive list, since only the BGG folks know the real post views (which would include unregistered users or those not logged-in), and direct video views on YouTube and such, but it does show popularity combined with diversity of audience. It should be a rather good approximation of which people you would want to get your games in the hands of, as a game designer/publisher.

Hope you found that interesting and/or helpful! If you are a publisher or designer and the information was useful, I like games too
Twitter Facebook
41 Comments
Mon Dec 19, 2011 6:53 pm
Post Rolls
  • [+] Dice rolls
Recommend
68 
 Thumb up
2.00
 tip
 Hide

How current/active are BGG users?

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
Time for some just a few more neat numbers and statistic found in the BGG database!

I'm starting with a fresh run collection all the ratings for the top 7500 ranked games by BGG rank. Let's see what numbers we can find!

Number of different users who have rated 1 or more games in the top 7500:
72,484
(There are about a dozen users that I didn't collect info on since their usernames have ampersands or slashes or the like -- not enough to skew the results in any way, but they are not counted in the above total)

Number of those users who logged in at some point during 2011:
52,739

Nearly 73%. Wow! I was a little surprised by this one -- that seems like a really high proportion of active users.

Number of ratings entered for the top 7500 games:
3,369,038

Number of ratings entered by a user who logged in during 2011:
3,104,817

More than 92%. Holy crap! That means less than 8% of the total ratings come from "inactive" (or at least, non-current) users. That number really surprised me, I was expecting that there would be lots more ratings on older games made by now inactive users. This shows that the active users are the ones doing the ratings, which is great, since it means the vast majority of ratings you look at are made by users who are active/current users.

I was going to try to pull up some differences between the active 2011 users and set of all users, but the two are so close I'm not sure there's that big a difference in anything to find. But I'm happy knowing that most of the ratings I look at are made by people who still visit the site.
Twitter Facebook
10 Comments
Mon Dec 12, 2011 6:22 pm
Post Rolls
  • [+] Dice rolls
Recommend
61 
 Thumb up
 tip
 Hide

Games with bogus/stacked/fake ratings

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
I was going to post a list of games with the highest percentage of their ratings being a user's top rating, but found the list rather contaminated. This calculation was really simple: Go through each user's ratings (in this case, all the ratings of the top 7500 games), and if the game was the user's top rating given, then give that game a point. In the case of a user having more than one game with the same top score, each game gets a fraction. The points awarded are divided by the number of total ratings (of any value), so you get an idea of which games have the higher proportion of getting someone's top score.

First run of the data, and the number one game is:
90°
Have a look at the ratings, and I think it pretty obvious things are off - lot of users giving it a 10, and if you check their profile you find they only have that one game rated.

Next on the list is:
Good Help
There's a few users rating it 10 that seem much more legitimate, but still a number of single-rating users.

Number 3:
N/A
At least they seemed to put effort into comment the ratings, but there's still a bunch of single-rating users -- comments made in clumps on the same day seem fishy, too.

Sigh... okay, let's filter the ratings set so it only contains users with 10 or more ratings and see what we get. The new number one:

My Little Pony Hide & Seek

Okay, at least this one makes sense given the pony shenanigans -- a lot of users with real rating sets.

Next:
War of the Ring Collector's Edition
This makes sense.

Third:
Puerto Rico: Limited Anniversary Edition
Okay.

Fourth:
Champions of the Galaxy
Huh? It's got some suspect single ratings, but 19 of them get filtered out with the 10 minimum. But it's got 3 people with large sets that rate it their sole 10, and 7 others who rate it highest along with others. Alrighty.

Fifth:
Strike of the Eagle
Cool. This is the sort of thing I was hoping I would see -- things I haven't heard of, but looking at the ratings and reviews, seem genuinely loved.

Sixth:
RPGQuest: Greek Mythology
Another potential problem. All 4 of the RPGQuest items are ranked really high in this list. There's a number of users that get filtered out, but still enough with other ratings. Is it bogus? Or is there a limited selection of of Brazilian games and this one deserves the score from its fans? Some of the collection look rather random, but I just skimmed them.

Now I'm trying to figure out if filtering on a minimum rating count (like 100, after the <10 ratings users are removed) might give more useful results. There are some really interesting games that seem to be coming up -- like Doctor Who: Solitaire Story Game, and I don't want to filter out less popular (but deserving) titles as a side-effect of purging the bogus ratings.

--edit

Geeklist up now! I ended up just filtering the single-vote users and kept it to games with more than 50 ratings -- check it out and see the top 100 most-loved games!
Twitter Facebook
17 Comments
Wed Nov 30, 2011 6:31 pm
Post Rolls
  • [+] Dice rolls
Recommend
12 
 Thumb up
5.00
 tip
 Hide

Random walk through data to find the top games

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
In this geeklist I used a Monte Carlo Method to generate a list of top games based on ratings.

I'm pretty happy with how the list turned out (though I shouldn't spend hours putting together a list to post on the American holiday weekend, it just got lost) as I think it finds a good balance in its orderings between number of ratings and higher ratings.

It's probably more time efficient to actually just loop through all the games and ratings and do a raw calculation instead of a Monte Carlo simulation, but I'm not sure what the exact calculation would be (I'm guessing it's just a sum of scores, each just the inverse of the number of games the user has rated, assigned to the "winning" game)... If I get a chance (no pun intended, this work-week looks to be busy) I'll try a calculation instead, but sometime Monte Carlo methods are not only easier to implement, but just more fun.

So, I quick list of the tools I've got so far:
-Data scraper
-Pairing calculator (used by the buddy finder)
-Buddy finder
-Top commenter finder (interesting, but more of its own reporting tool)
-Random walk top game calculator

There is one other tool to add -- a better way to filter ratings. Instead of putting all the filtering into the buddy finder, it's easier to pre-filter the raw ratings, so if you want only users with comments, games that only have a minimum number of ratings, or only users who rate Twilight Struggle a 9 or higher, we can filter them out from the get-go.

Just running the filter with some different options shows some interesting characteristics of the BGG data (in this case the top 7500 games):

Number of ratings without a comment: 2297657 (68.539%)
Users with only one game rated: 8463 (0.252%)
Games with less than 100 ratings: 202223 (6.032%)

The rest of the filter of functionality is more user- or game-specific, so it possible to filter out users or games (entirely or in a certain score range), or filter out any users who have not rated a specific game, or user who have rated a game a certain way.

I think this will give a lot of good options for people to tweak the data before they run something like the buddy finder or the recommendation tool.

But back to the random walk stuff...

Take a look at this geeklist if you haven't already, then compare it to some runs on some filtered data... How do the top games change when we select our ratings a little differently?

Top 10 games of users with 50+ ratings:
1. Puerto Rico
2. Power Grid
3. Agricola
4. Dominion
5. Race for the Galaxy
6. Twilight Struggle
7. Caylus
8. Pandemic
9. El Grande
10. Tigris & Euphrates

(only a little different, which is expected, but a bit of movement, and El Grande makes the top 10)

Top 10 games of users who rate ASL 7+ :
1. ASL
2. Twilight Struggle
3. Advanced Squad Leader: Starter Kit #1
4. Puerto Rico
5. Agricola
6. Power Grid
7. Advanced Squad Leader: Starter Kit #2
8. Paths of Glory
9. Up Front
10. Squad Leader

(The wargames aren't surprising, but I didn't quite expect all of PR/Agricola/PG in their top 10)

Now, one interesting thing with the random walk, it's really easy to add in per-user based weightings -- just change the per-step score to be a variable based on the user's weight, instead of being a fixed value. Now we've got the makings of a custom game-recommendation tool! Coming soon!

In the meantime, let me know if you've got any interesting filtered sets you'd like to see the top games for!
Twitter Facebook
1 Comment
Mon Nov 28, 2011 4:20 pm
Post Rolls
  • [+] Dice rolls
Recommend
76 
 Thumb up
9.00
 tip
 Hide

BGG's top users with comments on their ratings

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
I've got a lot of great feedback from this thread testing out a tool to find good user matches based on game ratings.

One thing that came up from a number of people was how important they find comments on ratings -- which I get, since seeing comments on the geekbuddy comparisons is a great ways to see what people like or don't like about certain games. I modified my data-siphon scripts to grab the length of the comments on the ratings, so here's a few lists based on the top 2500 games -- I broke them up based on a few possible criteria, so you can pick a list based on how much minimum or average length of comments matters to you.


Users ranked by number of commented ratings
1. sisteray (1544 commented ratings)
2. skeletodoc (1475 commented ratings)
3. PBrennan (1473 commented ratings)
4. Nap16 (1466 commented ratings)
5. TomVasel (1387 commented ratings)
6. dougadamsau (1233 commented ratings)
7. snoozefest (1169 commented ratings)
8. jtakagi (1152 commented ratings)
9. larryjrice (1148 commented ratings)
10. Walt Mulder (1091 commented ratings)
11. Terry Egan (1088 commented ratings)
12. wkusau (1085 commented ratings)
13. kystas (1047 commented ratings)
14. Bankler (1037 commented ratings)
15. peacmyer (1009 commented ratings)
16. chaddyboy_2000 (1004 commented ratings)
17. Socal Tim (998 commented ratings)
18. cfarrell (996 commented ratings)
19. JasonMatthews (985 commented ratings)
20. BeyondMonopoly (958 commented ratings)
21. cbrua (943 commented ratings)
22. jcarvin (939 commented ratings)
23. brianeikunst (936 commented ratings)
24. houjix (933 commented ratings)
25. wkover (928 commented ratings)

Users ranked by number of commented ratings, minimum 250 characters
1. sisteray (772 commented ratings)
2. PBrennan (756 commented ratings)
3. dougadamsau (738 commented ratings)
4. jtakagi (710 commented ratings)
5. larryjrice (566 commented ratings)
6. snoozefest (550 commented ratings)
7. cfarrell (513 commented ratings)
8. houjix (503 commented ratings)
9. Verkisto (475 commented ratings)
10. modboy (472 commented ratings)
11. Glamorous Mucus (467 commented ratings)
12. Blind Reality (452 commented ratings)
13. Zalasta (430 commented ratings)
14. cornjob (429 commented ratings)
15. Scholle (413 commented ratings)
16. Socal Tim (406 commented ratings)
17. shawn_low (405 commented ratings)
18. jayjonbeach (404 commented ratings)
19. Starsunsky (401 commented ratings)
20. chaddyboy_2000 (391 commented ratings)
21. Audacon (382 commented ratings)
22. smilingra (375 commented ratings)
23. Terry Egan (375 commented ratings)
24. Orski (367 commented ratings)
25. familywontplay (366 commented ratings)

Users ranked by number of commented ratings, minimum 100 characters
1. sisteray (1506 commented ratings)
2. PBrennan (1441 commented ratings)
3. dougadamsau (1207 commented ratings)
4. jtakagi (1147 commented ratings)
5. larryjrice (1119 commented ratings)
6. snoozefest (1026 commented ratings)
7. Socal Tim (962 commented ratings)
8. cfarrell (957 commented ratings)
9. kystas (895 commented ratings)
10. houjix (894 commented ratings)
11. TomVasel (891 commented ratings)
12. chaddyboy_2000 (891 commented ratings)
13. Terry Egan (888 commented ratings)
14. modboy (875 commented ratings)
15. Walt Mulder (868 commented ratings)
16. brianeikunst (795 commented ratings)
17. Glamorous Mucus (771 commented ratings)
18. EYE of NiGHT (750 commented ratings)
19. familywontplay (730 commented ratings)
20. Larry Chong (728 commented ratings)
21. faqtotum (723 commented ratings)
22. latindog (719 commented ratings)
23. jcarvin (713 commented ratings)
24. zefquaavius (710 commented ratings)
25. bucklen_uk (707 commented ratings)

Users ranked by number of commented ratings, minimum 500 characters
1. sisteray (906 commented ratings)
2. PBrennan (720 commented ratings)
3. dougadamsau (602 commented ratings)
4. snoozefest (599 commented ratings)
5. houjix (570 commented ratings)
6. jtakagi (529 commented ratings)
7. cfarrell (519 commented ratings)
8. Starsunsky (471 commented ratings)
9. larryjrice (405 commented ratings)
10. CortexBomb (381 commented ratings)
11. jayjonbeach (375 commented ratings)
12. darquil (371 commented ratings)
13. cymric (370 commented ratings)
14. Glamorous Mucus (364 commented ratings)
15. Verkisto (363 commented ratings)
16. Unitoch (356 commented ratings)
17. shawn_low (334 commented ratings)
18. Zalasta (329 commented ratings)
19. modboy (312 commented ratings)
20. cornjob (294 commented ratings)
21. pusboyau (293 commented ratings)
22. Thommy8 (285 commented ratings)
23. chaddyboy_2000 (284 commented ratings)
24. Blind Reality (281 commented ratings)
25. familywontplay (281 commented ratings)

These lists make no assumptions based on the quality of the comments or their taste in games (that's what the tool in the above-mentioned thread is for), but hopefully this is a good general starting point to help you find some useful geekbuddies!

-- edit

Updated the lists above to use data after fixing the collection program (on some games it would not load the last page of ratings). Above lists run on top 7500 games.
Twitter Facebook
28 Comments
Wed Nov 23, 2011 3:40 pm
Post Rolls
  • [+] Dice rolls
Recommend
15 
 Thumb up
 tip
 Hide

Similar Games

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
I've been playing around the past few days with the data from the pairwise comparisons, trying to find a good way to extract similar games based on the pairwise results.

I first tried using the cosine similarity method, but the result were very disappointing, with the scores all being nearly identical. This was not totally surprising, since the cosine similarity is great for text searches with limited terms and smaller documents, but the dense rating set is like searching for common words in a set of larger books -- they all come back a good results.

So, on to a bit of a hodge-podge homebrew...

The first step was to massage the pairwise results into a more easily-to-compare value. I started by reducing each pairwise value to a number between 1 and -1, with closer to 0 representing an even match-up and closer to the 1 or -1 being a more decisive percentage of the match-ups going to once game or the other.

I also calculate a "confidence" factor based on the total number of users the match-up is based on -- so games with just 1 or 5 or whatever users voting on the the comparison gets a lower confidence factor just so it doesn't weigh quite as heavily as 100 or 1000 users.

Read more »
Twitter Facebook
8 Comments
Mon Nov 14, 2011 6:42 pm
Post Rolls
  • [+] Dice rolls
Recommend
8 
 Thumb up
 tip
 Hide

Calculating Pairwise Comparisons

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
In the first blog entry I showed a script, gather_data.pl, that will gather all the ratings for some games. If you want to follow along with this post, we'll assume you've run
gather_data.pl -b 5
to grab the top 500 games by BGG rank and have a games.csv and ratings.csv from that run to work with. Since expansions (ass defined by BGG type) aren't included, it'll probably get something like 498 items.

Out next script will take the data in the ratings file and do a pairwise comparison -- that is, compare every possible game X and Y that a given user has rated both, and see which they rated higher. We'll do this for every user, and tally the result for each possible pairing of games and score the number of time X was rated higher than Y, Y higher than X, and how many times they tied.

Read more »
Twitter Facebook
0 Comments
Fri Nov 11, 2011 5:57 pm
Post Rolls
  • [+] Dice rolls
Recommend
16 
 Thumb up
 tip
 Hide

Gathering Games and Ratings

Alex Wilson
Canada
Waterloo
Ontario
flag msg tools
Generally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
mbmbmbmbmb
I love finding patterns and surprises in heaps of data.

I created the 'A somewhat surprising Top 100 hundred game list, based on BGG ratings… AND SCIENCE!' geeklist out of curiosity, I had no idea what the result would be. The voting method seemed like an interesting method to apply.

I loved the discussion that showed up, and I wanted to pursue some of the ideas. My original method certainly had a number of flaws, so I've been trying to fix up some of the scripts I created so that others can use them to do their own BGG data-mining. The purpose of this blog is to explore the methods to get the data, like the BGG XML API, massage it into more useful forms (I like Perl generally for this sort of thing, but I'm a "right tool for the job" kind of guy), and finding cool and interesting stuff along the way. I'm not a statistics expert, so I hope to learn some stuff as I go.

Here's the first Perl script, it will create a list of games and gather all the ratings from them. It will get the games from the BGG top-ranked and/or the most-rated lists, then it fires up a bunch of threads to retrieve all the ratings.

You can run it just plain:
gather_data.pl
and it will generate a games.csv and ratings.csv, or run it with -? or --help to see a list of options and defaults.

Read more »
Twitter Facebook
2 Comments
Fri Nov 11, 2011 1:44 am
Post Rolls
  • [+] Dice rolls

Subscribe

Categories

Contributors

Front Page | Welcome | Contact | Privacy Policy | Terms of Service | Advertise | Support BGG | Feeds RSS
Geekdo, BoardGameGeek, the Geekdo logo, and the BoardGameGeek logo are trademarks of BoardGameGeek, LLC.