Niels Taatgen
Netherlands Groningen Groningen

Using the reported wins and loses on the Days of Wonder site, it is possible to calculate whether scenarios are likely to be balanced. What I did was use the binominal function to calculate the 95% confidence interval. That is, for each adventure I will give a range of percentages, and it there is a 95% probability that the true win percentage is in that range.
For example:
1. Agincourt: 66 wins out of 107 for the standard banners means that the 95% confidence interval is 52%71%. This means the true probability of winning for the French is somewhere in between 52%71%, with 5% probability that is is outside this range. This means that Agincourt is probably not fair, because 50% is not within the confidence interval.
Here are the others. The confidence interval is always the probability that the standard bearers win, and the numbers are based on the information on the DOW site on 30 december. I will make an update when the numbers increase at some point to improve the reliability (assuming people are interested).
2. First Chevauchee 32 out of 42, 46%71% 3. Burgos 23 out of 29, 62%90% 4. Deeper in Castille 15 out of 19, 57%91% 5. Wizards and Lore 11 out of 24, 28%65% 6. A complex web 17 out of 24, 51%85% 7. Crisis in Avignon 4 out 11, 15%65% 8. A Burgundian Chevauchee 3 out of 15, 7%45% 9. Free companies on war footing 7 out of 16, 23%67% 10. Assaulting the Tourelles 8 out of 13, 36%82%
As you can see, there are quite some adventures that are not balanced given the current reports. It is of course unlikely that the probability of winning is exact 50%, so hopefully the intervals can inform you how reasonable the odds are (and they will become more precise with more data).


Kevin Duke
United States Wynne Arkansas

Some kind of "understandable communications" class would be useful as well.


Paul DeStefano
United States Long Island New York
It's a Zendrum. www.zendrum.com

The samples are tiny. The margin of error in 15 games will be huge.


Brian
Unspecified Unspecified

One thing to consider is that some of these mission were not intended to be balanced but rather to show off a new rule or feature as they are mostly tutorial in nature. For example, the whole point of the Complex Web mission seems to be to show the value of a creature (if they wanted it more balanced they should have given it to the other side).
In fact, only the the last couple seemed to be "real missions", in the sense of them not being tutorial in nature. Not surprisingly, they seem to be the most balanced of the lot.


Bob Gallagher
United Kingdom Halesowen
Spectrum is Green

kduke wrote: Some kind of "understandable communications" class would be useful as well.
Thanks Kevin!
My keyboard is now wearing the coffee I was drinking...


Niels Taatgen
Netherlands Groningen Groningen

Warpstorm wrote: One thing to consider is that some of these mission were not intended to be balanced but rather to show off a new rule or feature as they are mostly tutorial in nature. For example, the whole point of the Complex Web mission seems to be to show the value of a creature (if they wanted it more balanced they should have given it to the other side).
In fact, only the the last couple seemed to be "real missions", in the sense of them not being tutorial in nature. Not surprisingly, they seem to be the most balanced of the lot.
I agree, but it is nevertheless useful to know the odds, so you can give the side with the highest probability of winning to the novice player.


Niels Taatgen
Netherlands Groningen Groningen

Ok, I made a graphic to better illustrate the issue. Below are all the adventures in Battlelore. The dot in each row represents the current probability that the standard bearers will win. But of course, this can be a bad estimate, because it is based on only a couple of games, and random chance may influence the accuracy of the current estimate. So the bars to the left and right indicate the range that the real probability is in. As we are dealing with statistics, nothing is certain, but there is a 95% probability that it is within this range.
(click to enlarge)


gary rembo
United Kingdom brighton Alaska

Having played a lot of Memoir 44 I think that as in memoir the trick is to alternate sides. So whenever you play you play two games in a session changing sides. This keeps things nice and fair and prevents the "your army was better than mine" syndrome. My self i love the challenge of trying to overcome a superior foe. The victory is that much sweeter.


Niels Taatgen
Netherlands Groningen Groningen

Update for January 3, 2006
With some more reports coming in, we can narrow the margins:
1. Agincourt 91 out of 150, 53%68% 2. First Chevauchee 44 out of 79, 45%66% 3. Burgos 27 out of 40, 52%80% 4. Deeper in Castille 20 out of 27, 55%87% 5. Wizards and Lore 17 out of 37, 31%62% 6. A complex web 26 out of 38, 53%81% 7. Crisis in Avignon 8 out 16, 28%72% 8. A Burgundian Chevauchee 5 out of 17, 13%53% 9. Free companies on war footing 9 out of 20, 26%66% 10. Assaulting the Tourelles 11 out of 16, 44%86%
As you can see, the early adventures start to narrow down. Agincourt is definitely unbalanced, but not dramatically, something like 60/40 in favor of the French. First Chevauchee seems nicely balanced, but the two goblin adventures (Burgos and Deeper in Castille) favor the nongoblin player, although further data should show to what extend. Wizards and Lore appears to be pretty well balanced, but the the first Spider adventure, a complex web, appears to favor the Spider player (or is it the nongoblin player?) The final four adventures need more data: all of them could still be balanced.


Stephan Rasmussen
Denmark Odense C

having played the first 3 scenarios many times I would agree with the statements except that in my games the english tends to win a little more than the french in Agincourt.. Scenario 2 "the first chevauchee" is nicely balanced because each side has the same troops and in scenario 3 I still have to find a winning strategy for the goblin side so good work on that .. The next step for your statistics analysis would be to somehow make a strength record for each different unit.. for example how many green goblins would it take to equal one red infantry.. If you can do that with your statistics knowledge it would be awesome..
When the above mentioned task is done all you had to do was type in what units each side had and you would get how balanced the scenario is..
(then all there is to consider is victory conditions, strategy, terrain, number of command cards and so on )


Niels Taatgen
Netherlands Groningen Groningen

Stradk wrote: The next step for your statistics analysis would be to somehow make a strength record for each different unit.. for example how many green goblins would it take to equal one red infantry.. If you can do that with your statistics knowledge it would be awesome..
When the above mentioned task is done all you had to do was type in what units each side had and you would get how balanced the scenario is..
Well, my statistics knowledge is enough to know that such an analysis is as good as impossible. What you can do is pit one unit against the other, and figure out the odds that one unit will win. But I would rather use a simulation to calculate it. In a real game various factors interact, and the probability of winning will be a combination of setup and strategy. Simulation might do the trick here again, but you need some AI to play each side in a way that is similar to human players.
This is of course all good news, because a game that can be analyzed to easily is probably not a very good game.


David desJardins
United States Burlingame California

niels wrote: Using the reported wins and loses on the Days of Wonder site, it is possible to calculate whether scenarios are likely to be balanced. What I did was use the binominal function to calculate the 95% confidence interval. That is, for each adventure I will give a range of percentages, and it there is a 95% probability that the true win percentage is in that range.
With all due respect, that is NOT what a confidence interval means. This is, unfortunately, the most misunderstood subject in all of statistics.
You can't compute the probability that the win percentage is in any particular interval, from the data you have. You would need to have the prior probability distribution of the win percentage, which you don't.


Matthew M
United States New Haven Connecticut
8/8 FREE, PROTECTED
513ers Assemble!

DaviddesJ wrote: niels wrote: Using the reported wins and loses on the Days of Wonder site, it is possible to calculate whether scenarios are likely to be balanced. What I did was use the binominal function to calculate the 95% confidence interval. That is, for each adventure I will give a range of percentages, and it there is a 95% probability that the true win percentage is in that range. With all due respect, that is NOT what a confidence interval means. This is, unfortunately, the most misunderstood subject in all of statistics. You can't compute the probability that the win percentage is in any particular interval, from the data you have. You would need to have the prior probability distribution of the win percentage, which you don't.
I imagine niels is taking the average result and creating an interval based upon taking two standard deviations above and below that mean. So it's really a credible interval, rather than confidence. But only stat geeks will care about the distinction .
MMM


Niels Taatgen
Netherlands Groningen Groningen

DaviddesJ wrote: niels wrote: Using the reported wins and loses on the Days of Wonder site, it is possible to calculate whether scenarios are likely to be balanced. What I did was use the binominal function to calculate the 95% confidence interval. That is, for each adventure I will give a range of percentages, and it there is a 95% probability that the true win percentage is in that range. With all due respect, that is NOT what a confidence interval means. This is, unfortunately, the most misunderstood subject in all of statistics. You can't compute the probability that the win percentage is in any particular interval, from the data you have. You would need to have the prior probability distribution of the win percentage, which you don't.
I guess you are right. What I should have said is that if we would have taken multiple samples of the same size, 95% percent of the confidence intervals would have contained the true probability. If I have time left over I will calculate the right distributions assuming a uniformly distributed priors. But will anyone still understand me?


John Harley
Canada Toronto Ontario

this thread rocks.
i subscribed, hopefully there will be more updates as the DOW numbers pile in.


Niels Taatgen
Netherlands Groningen Groningen

Update for January 9, 2007
1. Agincourt 119 out of 197, 53%67% 2. First Chevauchee 63 out of 112, 47%65% 3. Burgos 42 out of 61, 56%79% 4. Deeper in Castille 29 out of 42, 54%81% 5. Wizards and Lore 26 out of 52, 37%63% 6. A complex web 32 out of 49, 51%77% 7. Crisis in Avignon 11 out 21, 32%72% 8. A Burgundian Chevauchee 7 out of 19, 19%59% 9. Free companies on war footing 12 out of 23, 33%71% 10. Assaulting the Tourelles 15 out of 26, 39%74%
Conclusion: no major changes since the previous update, just some narrowing of the margins.


Joe Grundy
Australia Sydney NSW

One quick point... this technique is making assumptions about the games that are logged. While I confess I've never even seen BattleLore, it seems likely that at least some of the plays are logged by one pair of people playing rematches. It seems likely that in at least some cases there are two people playing the same scenario repeatedly with the same players taking the same sides.
Which means the relative skill of the players logging the results is likely a significant factor producing a win/loss skew in at least some cases.
btw the other way the original poster could be correct is if you change the word "calculate" to the word "formulate".
He would be correct to assume binomial distrubtion of win/loss.
And hence I believe he would also be correct (terminology quibbles aside) to assert that his intervals contain the "actual" win percentages with 95% probability. Except for (at least) the sampling bias I just noted.


Niels Taatgen
Netherlands Groningen Groningen

jgrundy wrote: One quick point... this technique is making assumptions about the games that are logged. While I confess I've never even seen BattleLore, it seems likely that at least some of the plays are logged by one pair of people playing rematches. It seems likely that in at least some cases there are two people playing the same scenario repeatedly with the same players taking the same sides.
Which means the relative skill of the players logging the results is likely a significant factor producing a win/loss skew in at least some cases.
The rules encourage alternating sides when replaying a scenario, so I don't think that there is a danger that the percentages are biased too much by that. The bias that I am worried about is the fact that more experienced players may tend to pick the side with the lower odds, giving the beginners the better odds. If that happens systematically, balance problems will be underestimated.


Ryan Langton
United States Unspecified Unspecified

niels wrote: Update for January 3, 2006
Agincourt is definitely unbalanced, but not dramatically, something like 60/40 in favor of the French.
You do realize many people play Agincourt with the fullrules, even though in the text it says not to use them (battleback, take ground, pursuit). Without these rules I'd say Agincourt is balanced.


tomletermite
United States ft lauderdale Florida
I think we're going to need a bigger tub.

Hey all,
Is there a way to have updated stats with the new official scenarii?
(me? picky? nooooooo ^^ )


Niels Taatgen
Netherlands Groningen Groningen

Update for May 1, 2007
1. Agincourt 297 out of 497, 55%64% 2. First Chevauchee 139 out of 275, 45%56% 3. Burgos 110 out of 181, 54%68% 4. Deeper in Castille 123 out of 164, 68%81% 5. Wizards and Lore 119 out of 231, 45%58% 6. A complex web 98 out of 161, 53%68% 7. Crisis in Avignon 43 out 108, 31%49% 8. A Burgundian Chevauchee 38 out of 81, 36%58% 9. Free companies on war footing 48 out of 88, 44%65% 10. Assaulting the Tourelles 44 out of 83, 42%63%
With many more datapoints for all of the adventures, the earlier conclusions have helt with one exception: Crisis in Avignon now seems to favor the pennant bearers. Otherwise, five of the ten adventures are clearly balanced or close to balanced.


Niels Taatgen
Netherlands Groningen Groningen

Here are some statistics for the new adventures. I haven't included the Epic adventures yet, because there are too few datapoints for them.
Two bridges 24 out of 41, 43%72% Crossing the Rhone 18 out of 29, 44%77% Hill Camp 33 out of 62, 41%65% West of the Rhone 14 out of 27, 34%69% Brignals 6 out of 19, 15%54% The battle of Brignals 5 out of 7, 36%92% The battle of Lewes 11 out of 19, 36%77%
Conclusions: All of them are still potentially balanced, but more data will have to tell.


tomletermite
United States ft lauderdale Florida
I think we're going to need a bigger tub.

thanks for the update




Thanks for the stats Niels! Very interesting info to know. I agree that aside from some sampling/response bias that these intervals are good to go by. Just a basic one proportion z confidence interval on a binomial variable, and I don't see anything unusual here.
I'll have to run these by my AP Stats kids I teach at my high school. This would be a good real life example of stats in action.


Jacques Marcotte
United States Chicago IL

Do you have any further update? Or do you have a spreadsheet / program used to calculate these? I'd love to be able to run it myself. I'm a bit of a stats geek myself and love the numbers. I'd also love to do these on my own and learn a bit more about the process... in the process.



