Search: Titles Only:
Article Edit | History | Editors

Data Mining

Mining Data from BGG

If you're reading this page, chances are that you've spent enough time on this site that you've begun wondering about all the data that is stored here and what you can do with it. Two easy ways to get data are to download user collection information or to use the XML API.

The user collection information is obtained by simply clicking on either All or Owned in the tiny text below the menu bar when looking at someone's collection after the word Download. This text is on the right-hand side of the window:

Download: (all | owned)
This provides the collection information in a comma-separated-format (CSV) file. Of course, the limitation here is that you can only deal with single user collections.

The XML API is another way to get data and has some cool features to grab data about a particular game or games. However, it is lacking in several regards, as it can only lookup data for games by name and does not provide detailed play data at the present time.

Given these limitations, writing scripts in Perl is a useful way to mine data from BGG as you have access to the HTML directly, and if you know Perl, you can parse it. The rest of this article is going to discuss this. It's possible that PHP and a dozen other things could also accomplish the task, so if someone would like to write an article about PHP or other data-mining means and link it here, that would be awesome.

Using Perl to Get HTML

There is really nothing special about getting general data from BGG; you can simply use Perl's LWP package to grab the data. The following script grabs the main 'Games' page:

use LWP::Simple;

my $url = "http://www.boardgamegeek.com/browse/boardgame";
my $content = get $url; # Gets the HTML page.
if (defined $content)
  # Do something with content (which is the returned HTML) here.

Acting Like a User: Dealing with Cookies

However, you might want to get data from BGG as if you are logged in as an actual user. As an example, I wanted to browse the top games and record my personal ratings in a spreadsheet and supplement with my own text. To do this, you'll need to use cookies: first by creating a batch of them and storing them in a file, and then by using a script to consume the cookies and actually supply them to BGG.

Baking up a Batch of Cookies

There's nothing quite like the smell of fresh-baked cookies! The first thing to do is to create a file that has the cookie information. First, you'll need to go to Cookie Dump to get the important cookie information. On that page, in the middle, you'll find two lines like these (note that this is not a real password):

   [bggusername] => smilingra
   [bggpassword] => Ha$HEdV@lue2UsE

Next, save the following code as a script called cookie_baker.pl:

# cookie_baker.pl
# Usage: cookie_baker (username) (hashed password)

# Creates a cookie file for use with BGG.
my $filename = "bggcookie.txt";
if ($#ARGV < 1) { die "Need to specify user name and hash value.";}
print "Creating cookie file at $filename...\\n";  

open (COOKIE_FILE, ">" . $filename) or die "Cannot open $filename for writing!\\n";
$expire_time = time() + 31536000; # Time in a year.
print COOKIE_FILE "# HTTP Cookie File\\n";
print COOKIE_FILE "www.boardgamegeek.com    FALSE    /    FALSE    $expire_time    bggusername    $ARGV[0]\\n";
print COOKIE_FILE "www.boardgamegeek.com    FALSE    /    FALSE    $expire_time    bggpassword    $ARGV[1]\\n";
close (COOKIE_FILE); 

While providing the username and hashed password values above, run


In this example, you'll type in the following line:

cookie_baker.pl smilingra Ha$HEdV@lue2UsE

At this point, you'll have created a file called bggcookie.txt. It contains the important cookies necessary to log into BGG.

Consuming the Cookies

Now for the tasty part - actually writing a script to consume the cookies and send them to BGG! Save the following code into a file called taste_test.pl:

# taste_test.pl
# Usage: taste_test
# Uses a cookie file to get data from BGG.
use LWP::Simple;
use LWP 5.64;

use strict;
use warnings;

# Set up the browser to use the cookies.
use HTTP::Cookies::Netscape;
my $cookie_jar = HTTP::Cookies::Netscape->new(file => "bggcookie.txt",);

my $browser = LWP::UserAgent->new;
$browser->cookie_jar( $cookie_jar ); 

my $response = $browser->post("http://www.boardgamegeek.com/browse/boardgame");
my $content = $response->content('match=www&errors=0');

# The HTML should now be stored in $content and can be parsed to your heart's content!

In fact, if you add the following lines to your code, the script provides output indicating either that the cookies worked, you were able to connect to BGG but the cookies failed, or that you were unable to connect to BGG at all.

# We have data for this URL.  Go ahead and parse it.
if (defined $content)
  if ($content !~ /Recently/)
    die "Able to connect to BGG, but cookies not working; not using logged in data!\\n";
  print "Cookies working! Congratulations!\\n";
  print "Unable to download at all - is BGG down or do you problems connecting to the Internet?\\n";

Please send suggestions for improvement to smilingra.

[What Links Here]
Front Page | Welcome | Contact | Privacy Policy | Terms of Service | Advertise | Support BGG | Feeds RSS
Geekdo, BoardGameGeek, the Geekdo logo, and the BoardGameGeek logo are trademarks of BoardGameGeek, LLC.