Big Game Theory!

Musings on games, design, and the theory of everything. www.big-game-theory.com

Archive for Data Mania

Recommend
97 
 Thumb up
73.02
 tip
 Hide

Visualizing the BGG Game Database with Gephi. Whoa!

Oliver Kiley
United States
Ann Arbor
Michigan
flag msg tools
designer
badge
Avatar
mbmbmbmbmb
So I stumbled into an interesting post over at r/boardgames from reddit user Shepperstein, who had downloaded a trove of data from BGG’s database. He then used Gephi to create some fantastic network models (aka graphs) depicting relationships between game categories. Very cool stuff. I urge you to check out his post and links to his analysis.

Of course, I immediately wanted to start playing around with the data myself!

Fortunately, I’m no stranger to excel AND I used Gephi several years ago, so I was already familiar with its basic functionality. Shepperstein also kindly provided a direct link to his database, so I could tap into that information directly. Are we excited yet?



Even more, this would prove to be an opportunity to tackle something I’ve long wanted to do. If you’ve read this blog before, you’ll know I’ve always had an interest in game classification and taxonomy. In particular, I’ve had a long-standing attraction to Selwyth’s Alternative Classification of Boardgames, which provides a comprehensive rework of BGG’s category and mechanism descriptors.

One of the challenges has always been finding a way (or perhaps simply the motivation) to “remap” BGG’s category + mechanism descriptors into new classes (based on Selwyth’s approach for example). Ideally, these classes would better reflect the nature of the individual descriptors. For example, the 80+ descriptors in the category field are a total hodge-podge of thematic items (“farming” or “trading in the mediterranean”, etc.), mechanisms, domains (i.e. Wargame or Party Game), and more besides. Likewise the mechanism attribute contains stuff that aren’t really mechanisms at all.

Long story short, I remapped all of the categories and mechanisms from BGG’s system over to an “alternative” system. You can check out the category-mechanism reclassification tables to see what I did, if you’re so inclined. Armed with these reclassified tables and a trove of BGG database… uhh… data… I set about pulling it all into Gephi and having a look at what I could do.

In contrast to Shepperstein’s work, I wanted to use Gephi to visualize not just the BGG categories, but also the Mechanisms, AND do it in a way such that the final output would give an indication what new class the descriptors would fall into. I wanted it so that things Selwyth classified as mechanisms or genre would be identified as such. Of course I also needed to balance this with the ability to logically discern groupings (aka “communities”) of related attributes.

The image below shows the culmination of this effort. If you want to read it, you really need to expand the image link and make it full screen. Have at it, and I’ll provide some discussion below.




A few technical notes about the above analysis.

(1) The database from Shepperstein only includes games from 1990 to 2018, although that still reflects tens of thousands of games, and also tends to be things more recent and more likely to be tagged with mechanisms and categories.

(2) In Gephi, I excluded node records (i.e. the list of descriptors) with less than 50 games using that category. Likewise, I excluded games where the “weight” of connections between any two descriptors was less than 40. This means that if there aren’t more than 40 games that both share a pairing of any two attributes, then the relationship is ignored. With over 18,000 node connections, it made sense to prune out the ones with a fairly minimal impact.

(3) The fainter-shaded outer circles/colors around the nodes correspond to my reclassified descriptors discussed above.

(4) The colored “community” groupings were based on running a modularity statistic (I have no idea what it’s doing, just for the record), but it results in assigning nodes to groupings based on the relatedness to other nodes. After playing around with the tolerances, it ended up with 11 categories that you see in the brighter colors (e.g. all the “Wargame” related stuff are Red).

Now, I think there some really cool things to come out of this graph and the community groupings. Wargames along with their frequently used mechanics (area movement, campaign/card driven, chit-pulling, point-to-point movement) are all clustered pretty well together. Likewise we see groupings around Party games, which also contains the gamut of social deduction-style games.

Given the plethora of cooperative games with horror/zombie themes, roleplaying elements, and adventure, it was neat to see all those clustered together. Of course, this was pretty well intermingled with fantasy games that leverage variable player-powers, fighting mechanics/genres, miniatures, collectable components (i.e. LCG’s). Science-fiction is likewise ensconced in this zone of the graph.

Economic games are in the bottom right, and constitutes the bulk of what I see as mainline euro-style games. I like the little enclave of Route-Network Building, Transportation-theme, Train-them, Stock holding down there. Aka, the 18xx games and their ilk. I do think there is a high level of alignment with Tile-laying games and eurogames, which is why they also fell into the same community.

Another interesting result is that Area-Control / Area-Influence ended up as it’s own community, and rightly situated between wargames and more euro-style economic games. Area control games tend to have more direct player-to-player interaction on a map, and hence are associated somewhat with their wargaming neighbors. Is this the homeland of the wuero?

Abstract games are down at the bottom, at a logical point between both euro-style economic games (which also tend to be somewhat abstract in nature) and Children’s Games, which are also quite abstract (perhaps as a means of keeping things simple in mechanics - or just that they share some common descriptors?).

In the dead center are a few big communities, including card games and the obviously associated hand management, along with Dice and press your luck type systems. Some of these, like cards and dice are so ubiquitous across domains of games that it’s not at all surprising to see them in the middle of the graph with connections to just about everywhere. I tried excluding them from graph and it basically had no structural impact at all, more or less confirming this assessment. Of course you get things like “take that” games and “trick-taking” games are very closely associated with card games, so I left it in for clarity and completeness.

I also thought it was interesting to compare opposite sides of the graph. Wargames are directly opposite to Children’s games. Highly thematic games in the Fantasy/Fighting, Science fiction, and Cooperative realms are all opposite to Economic (euro-style) games and abstract games. Likewise, games that focus on area control/majority elements and derive much of their deep strategic play from spatial positioning and the like are opposite to party and deduction style games, which emphasize an entirely different sort of player-to-player interactions.

Phew!

Having done all of this, I’m not sure what’s next! I’m tempted to see about refining the database to pull, for example, the top 10,000 ranked games or top 10,000 most owned games - irrespective of year - in order to hone the database around games more likely to be known, as well as grabbing more of the popular (or classic) games from prior to 1990. Much of the database is filled with relatively obscure games or print-and-play projects and don’t reflect fully published and circulated titles. Over 50% of the dataset (~8,200 records) are games with less than 250 owners for example. I also have pulled in BGG ranking data, average weights, number of owned copies, and more - but I’ll need to think more on how to make that interesting.

So for now, I guess it’s time to open the phones! Any reactions? Thoughts or ideas of other ways to slice the data? I’d love to hear from you all. Cheers.
Twitter Facebook
29 Comments
Fri Aug 31, 2018 2:53 am
Post Rolls
  • [+] Dice rolls
Recommend
77 
 Thumb up
9.41
 tip
 Hide

By the Numbers - BGG Rank Data + Analysis

Oliver Kiley
United States
Ann Arbor
Michigan
flag msg tools
designer
badge
Avatar
mbmbmbmbmb
So I have a bit of a love affair with Excel. I’m sorry, but it’s true. I’m not a super mathy-type but I do enjoy working with statistics and thinking about data. Probably has something to do with my background in science eh?

Anyway, since I’ve been on BGG I’ve had a lot of interest in running some basic analysis on the game data, focused on the ranked games. While statistics can certainly be manipulated to say just about anything, I still find it intriguing to examine the numbers, make graphs, and pontificate on imagined importance.

So what have I done? I’ve assembled a massive excel file containing game data for all 8000+ ranked games in the BGG database. I’ve started whipping up some graphs below (which we will get into), but I’m also curious about what other people are interested in seeing. Any correlations between factors? Trends? Summary statistics?

But first, I need to give credit where it is due. First, I need to thank
Andrea Nand
Italy
Modena
flag msg tools
Best. Game. Ever.
badge
I wrote nanDECK and BGG1tool
Avatar
mbmbmbmbmb


for his awesome tool BGG2nanDECK. Among other things, this tool allows you to load list of game ID’s from collection, geeklist, or manually imputed data and then collect a vast array of information from those games. This data can be exported into excel file formats.

I also need to thank
Luis Olcese
Argentina
Córdoba
Córdoba
flag msg tools
admin
badge
Avatar
mbmbmbmbmb


for writing a script to collect the game ID’s for all the ranked games. While that is awesome and got me started, n_and has since added a feature to BGG2nanDECK to allow you to download data for games ranked from X to Y. Whow!

So the discussion below is based on analyzing 8082 ranked games from data collected on 2012-07-13. Here we go!


Subdomain Analysis

First up for this post, I wanted to start looking at the data behind the sub-domains. The first chart (below) shows the distribution of sub-games across the 8000+ ranked games.



Bear in mind, that many games can be listed in multiple sub-domains. But it is worth noting that there are lot more wargames and a lot fewer thematic games that I would have guessed.



So the majority of games are falling into one sub-domain (above), but there is a fair amount of games with two sub-domains listed as well.

As an additional point of consideration, I looked at the subdomain function as breakdown of ownership (below). In cases where games were part of more than 1 sub-domain, each game split its ownership, counting 50% towards one sub-domain and 50% towards the other (or 33% in the case of those games in three subdomains.)

So while strategy + family games (the "euro-games"?) constitute about 22% of the total number of games, in terms of percentage of ownership here on BGG it accounts for 50% of all games owned. Compare this against wargames, where nearly 1/4 of the ranked games are wargames yet it only 15% of the owned games.



Next, I was curious to explore what subdomains tended to be associated with which other subdomains. The table below provides raw number of associations as well as a percentage of the total number of games within each subdomain associated with another one.

I've highlighted a few interesting cells. Namely that around 21-22% of strategy games are cross-listed with family games. 25% of party games are cross-listed with family games (not surprising). And that in total nearly half (43%) of Thematic games are associated with either Strategy, wargames, or family games. This perhaps explains the total confusion in attempting to label thematic games on BGG; there is a lot of cross-over.

Overall though, wargames have the purest sub-domain association (i.e. associated the most with themselves only), followed by customizable, abstract, and children’s games. Thematic and family games have the most cross-over with other domains, as nearly 50% of those games are cross-listed. Party and strategy games are around 62%.

For what it's worth, Startup Fever is the lone game that is listed in both the thematic and abstract categories. Huh-what!?




Mechanics Analysis

I was particularly interested at the start of this endeavor how the mechanics were breaking down across the ranked games. So the first chart is just a basic histogram of the distribution of mechanics. I found it interesting (although I guess not terribly surprising) how much die rolling shows up as a mechanic. Many of the more popular euro-style mecahcnis (i.e. worker placement) are naturally much lower on the list, reflecting the few number of games that have been released to date using those mechanics compared to others in use for decades (or centuries).



Next up, and for my own understanding, I was curious to see how many games listed multiple mechanics. So here' a little chart (below), depicting that.



Lastly, I performed a rather heavy operation to look at the distribution of ratings for each mechanic, including the mean rating, +/- 1 standard devision, and the 1% and 99% percentile extremes. So the crazy chart below shows all of this.



All time lowest mechanic? Roll + Move, followed closely by singing. I guess that explains why Cranium is such a hit with this group right? It likewise isn't surprising to see Worker Placement + Deck/Pool building take the #1 and #2 spots for the highest average. These are the latest two "trends" mechanically speaking, so I suppose the newness of them is pushing up the ratings quite a bit.

One thing that caught me eye on the distributions above is the low-tail end of the tended to be much more compressed than the high end. What is going on with the ratings overall that might explain this observation? To help answer, I took a look at a histogram of the rating distribution, and here is what I found:



Basically, you have a huge volume of ratings in the middle (5.5 to 6). The ratings on the high end drop off relatively gradually towards the upper end of the range. The ratings on the bottom however drop off very quickly. Overall there are just not that many games getting low scores below 5.25 or so, a few dozen here and there only.


Weight Analysis

For what's worth, BGG has a weight ratings, some combination of rules complexity and strategic depth that remains forever ambiguous. That said, the data is interesting. First up a brief histogram of the weight distributions.



That's nice isn't it?

Now, one question I see pop up quite a bit is whether the weights of games have gone up or down over the past. Of course there are a lot of factors that go into this next piece of data, most importantly "who" is assigning the weight values. My guess is that wargamers are going to rate a heavy euro lower at a lower weight rating than a family gamer is, because people's perception of weight is relative to their own experience. That said, here is the chart:



Essentially, the weight ratings are all over the place until things start to stabilize in the 1970's. Perhaps this coincides with the growing hobby game market? Overall, I think the trend is interesting, showing a gradually climb higher and higher all the way up the present.

Again, I'm not sure what explains this. It is hard to look at this data and say objectively whether games are in fact harder, because the weight ratings are so dependent on the users rating the games. But I suppose it is valid to say, from the perspective of this sample population, the weight of games is increasing on average over time.

Last, it's probably worth examining the relationship between weight and rating. Are heavier games rated higher? Here are the results:



This terrible looking chart isn't too convincing. But I ran a regression analysis to check for statistical significance and there IS a statistically significant correlation between weight and rating; that is, as weight goes up one can expect the BGG rating to go up as well. Of course this isn't a hard and fast rule, but it's interesting to think about.


Ratngs Analysis

For our last wonderful chart, let's look at the average rating of games by year.



Ratings really drop off prior to 1970. No love for monopoly it seems; at least not here at BGG. Things are fairly flat (but slowly rising) up until the 1990's, at which point the ratings start climbing more swiftly. The last few years mark a definite up-tick in the ratings as well. Are games really getting better? Or are people just overly infatuated with the newest hotness? Of course this data doesn't tell us that.


For those wanting to download and play around with the data yourself, here is is the link to the file:

BGG Ranked Game Data (2012-07-13)


That's all for now. If you have things you want me to look at, let me know. I have a little laundry list of things I want to spend more time playing with, so I'll post things below as they emerge.

Cheers!
Twitter Facebook
13 Comments
Mon Jul 16, 2012 7:15 pm
Post Rolls
  • [+] Dice rolls