Yesterday (by which I mean three days ago because constantly writing kickstarter updates has confused my ability to constantly write blog updates) JT Schiavo asked what I thought of AJ's Game Success Indicator model. Let's talk about that.
I'd recommend reading the original article, but I know that some people are allergic to links so I'll give a brief summary: The model is a way to measure the success of a game, using two metrics. One is the proportion of players who immediately want to play again once a game has concluded; the other is the proportion of players who end up thinking about the game once it's over. The length of the game determines which of these factors will be more useful, as short games are easier to play again immediately and long games generally have more food for thought.
Taking a step back, I think this model is attempting to address a common problem in game design: It's hard to get an objective measure of a game's quality. A lot of testers don't want to hurt the designer’s feelings and will consciously or unconsciously lead them to believe that their games are better than they are. Then the designer releases the game, it gets panned and there is sadness and misery all around. Combating this is about finding metrics that will work, even if a tester has subconsciously decided that they like a game that really isn't all that good and is absolutely convinced that they're speaking god’s honest truth when they tell you that it's wonderful.
Both of the tests mentioned are extremely good signs. I knew that 404 was working out once more new players ended their first game with "can we play again?" than not. I also know that things are going pretty well when people phone/email/text me days later to say "I've been thinking about the game and..." However I'm not sure that I'd want to use them as primary metrics.
The "play again" test is somewhat hampered by practicalities. Often I organise playtest sessions back to back, doing 8 hour playtest sessions in which new groups arrive just as the old group is leaving, so there's no opportunity for an immediate replay. As a metric it'll vary with the time of day you run your tests, what is organised afterwards and so on. I find that a little too situational.
The "thinking" test is great. One of my favourite personal notes on the 404 BGG page opens "Based on KS PnP version. I really felt this game shouldn't work but, after some solo play tests, I couldn't stop thinking about it." However this has the drawback of requiring you to contact playtesters long after the session, which is a fairly awkward way of obtaining feedback. It's also prone to the same problems as direct questions about quality, people don't want to say "I didn't think about it, it wasn't important to me." so it runs into some of the issues it seeks to avoid.
So what do I think will work?
If you playtest my games I'd appreciate it if you stopped reading now.
Given the curious nature of my testers I imagine that got rid of at most 5% of them, ah never mind, I constantly change my approach anyway.
I can see a "set date for play again" test working. I've never used it myself due to the chaotic nature of my testing, but if someone wants to play again and is willing to commit to a time and date they probably enjoyed it. Bear in mind that the inverse isn't necessarily true (they might genuinely not know their plans.)
I'm not sure how well a "thinking about it afterwards" test could be adapted. One option would be to ask testers "If we played again right now, what would you do differently this time?" If they've got answers then it means that they perceive the game as offering genuine meaningful choices and were immersed enough to care whether they'd made the right ones (and to consider the alternatives.) If they can't answer the question or resort to copying what the winner of that game did then there might be some sort of an issue.
I also have a habit of asking about playtime, seeing whether a game was too long or too short. This is handy for adjusting the length of the game, but it's also a fair metric of whether people were engaged. It's rare (but not impossible) for someone who's been engaged and having fun to say that a game is too long.
Really though, I think the only irreplaceable measure is to watch people when they're playing. When do they laugh or smile? When is their posture forwards rather than backwards? When do they check their phones? When do they talk about things other than the game? When do they stop doing that, who pulls them back in and why? How do they handle the cards and components? Where do they direct their gaze when it's not their turn? How many players are looking to see what the result of a randomiser is and how many are simply waiting for it to be over? For me all of this stuff is the bread and butter of whether the game is any good.
It's not as quantifiable as some other forms of data, but it's incredibly rich and very accurate - even about subjects that the tester thinks they don't know anything about. Then again, maybe it's only useful if the game designer is also a doctor of psychology Perhaps everyone needs to develop their own metrics, based on their strengths and weaknesses and the eccentricities of their playtesting group. I can only talk about what works for me.
(Btw 404 has less than a week to go on kickstarter and is funded with stretch goals unlocked!)