Recommend
14 
 Thumb up
 Hide
24 Posts

Ludology» Forums » News

Subject: Incan Gold Experiment (discussed in 185) rss

Your Tags: Add tags
Popular Tags: [View All]
Geoffrey Engelstein
United States
Bridgewater
New Jersey
flag msg tools
designer
Pit Crew avaialble now! The Expanse coming in October!
badge
Ludology Host and Dice Tower Contributor
Avatar
Microbadge: LudologistMicrobadge: Massachusetts Institute of TechnologyMicrobadge: AtheistMicrobadge: Physics geekMicrobadge: The Dice Tower fan
As I've mentioned, Dr. Stephen Blessing at the University of Tampa is doing an experiment about how theme affects how players take chances in a push-your-luck game.

There will be three different versions: cave exploration (the original Incan Gold), Rescuing people from a fire, and unthemed.

He just sent me some of the in-process artwork for these versions:








Predictions on whether people will play the game differently with different themes?
9 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Jerry Reece
United States
Centreville
Virginia
flag msg tools
Avatar
mbmbmbmbmb
I predict people will play the original version most aggressively, followed by the themeless version, and finally the fire fighting version least aggressively.

At least that’s how I would do it. I’m very aggressive playing Diamant, but if I was rescuing people or pets, I’d be more likely to get them out safely. And a themeless game likely wouldn’t entice me to be as aggressive.
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
hibou
United States
Dist of Columbia
flag msg tools
With Trigun being this lock consensus good read for so many people, I find it really bizarre that she's alive. There's a threshold where if she's still alive she should probably be murdalated.
badge
I want to watch the world burn, I'll grab the gasoline.
Avatar
mbmbmbmbmb
I don't think there will be much of a difference between original and the unthemed version. I think the firefighter version will probably have the greatest standard deviation in scores because people will either want to go all in and try to rescue "everyone" or be more focused on keeping people safe like Jerry said.

Have they talked about their sample size? I find it unlikely that they'd be able to get enough of one for the results to be truly significant.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Matt Wolfe
United States
Raleigh
North Carolina
flag msg tools
designer
badge
Avatar
mbmbmbmbmb
Are they doing this electronically or at a table with a physical version?

I am curious if the medium will influence the outcomes.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Geoffrey Engelstein
United States
Bridgewater
New Jersey
flag msg tools
designer
Pit Crew avaialble now! The Expanse coming in October!
badge
Ludology Host and Dice Tower Contributor
Avatar
Microbadge: LudologistMicrobadge: Massachusetts Institute of TechnologyMicrobadge: AtheistMicrobadge: Physics geekMicrobadge: The Dice Tower fan
Electronically.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Geoffrey Engelstein
United States
Bridgewater
New Jersey
flag msg tools
designer
Pit Crew avaialble now! The Expanse coming in October!
badge
Ludology Host and Dice Tower Contributor
Avatar
Microbadge: LudologistMicrobadge: Massachusetts Institute of TechnologyMicrobadge: AtheistMicrobadge: Physics geekMicrobadge: The Dice Tower fan
ljtrigirl wrote:
Have they talked about their sample size? I find it unlikely that they'd be able to get enough of one for the results to be truly significant.
I believe they are having over 100 participants.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
hibou
United States
Dist of Columbia
flag msg tools
With Trigun being this lock consensus good read for so many people, I find it really bizarre that she's alive. There's a threshold where if she's still alive she should probably be murdalated.
badge
I want to watch the world burn, I'll grab the gasoline.
Avatar
mbmbmbmbmb
engelstein wrote:
ljtrigirl wrote:
Have they talked about their sample size? I find it unlikely that they'd be able to get enough of one for the results to be truly significant.
I believe they are having over 100 participants.
Hmm. Not sure that's really enough, but it'll be interesting to see the outcome anyways!
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Rob Tarr
United States
Floyds Knobs
Indiana
flag msg tools
Avatar
mbmbmbmbmb
ljtrigirl wrote:
engelstein wrote:
ljtrigirl wrote:
Have they talked about their sample size? I find it unlikely that they'd be able to get enough of one for the results to be truly significant.
I believe they are having over 100 participants.
Hmm. Not sure that's really enough, but it'll be interesting to see the outcome anyways!
I was doing some statistical research at work a couple years back and read some studies discussing the confidence of *comparing* two items (rather than a confidence interval of a *specific* value).

We were trying to determine if, based on user data, one question on an exam was "better" than another one. I would have needed over 100 samples to determine an actual numeric rating value for each question, which I would then compare, but I needed only about 30-40 to be confident that item A was "better" (would have a higher value) than item B.

I found that fascinating.
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Aleph Bias
msg tools
badge
mbmbmbmbmb
This reminded me of something I've seen mentioned many times, where in the game Flashpoint: Fire Rescue, a lot of people will go out of their way to ensure that the puppy or cat gets rescued, sometimes leaving adults or children (in game!) where they are.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Felix the Lobsterous Languste
Germany
Aachen
Nordrhein-Westfalen
flag msg tools
Avatar
mb
tarrkid wrote:
ljtrigirl wrote:
engelstein wrote:
ljtrigirl wrote:
Have they talked about their sample size? I find it unlikely that they'd be able to get enough of one for the results to be truly significant.
I believe they are having over 100 participants.
Hmm. Not sure that's really enough, but it'll be interesting to see the outcome anyways!
I was doing some statistical research at work a couple years back and read some studies discussing the confidence of *comparing* two items (rather than a confidence interval of a *specific* value).

We were trying to determine if, based on user data, one question on an exam was "better" than another one. I would have needed over 100 samples to determine an actual numeric rating value for each question, which I would then compare, but I needed only about 30-40 to be confident that item A was "better" (would have a higher value) than item B.

I found that fascinating.
Do you still have the documents from that? This sounds pretty interesting, and I am wondering how you found out that! Especially the concrete numbers you mentioned :b
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Rob Tarr
United States
Floyds Knobs
Indiana
flag msg tools
Avatar
mbmbmbmbmb
Languste wrote:
tarrkid wrote:
ljtrigirl wrote:
engelstein wrote:
ljtrigirl wrote:
Have they talked about their sample size? I find it unlikely that they'd be able to get enough of one for the results to be truly significant.
I believe they are having over 100 participants.
Hmm. Not sure that's really enough, but it'll be interesting to see the outcome anyways!
I was doing some statistical research at work a couple years back and read some studies discussing the confidence of *comparing* two items (rather than a confidence interval of a *specific* value).

We were trying to determine if, based on user data, one question on an exam was "better" than another one. I would have needed over 100 samples to determine an actual numeric rating value for each question, which I would then compare, but I needed only about 30-40 to be confident that item A was "better" (would have a higher value) than item B.

I found that fascinating.
Do you still have the documents from that? This sounds pretty interesting, and I am wondering how you found out that! Especially the concrete numbers you mentioned :b
Sadly, no. I have some pages saved from the beginning of the project, but once I got that deep into it, I wasn't saving them anymore.
cry
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Felix the Lobsterous Languste
Germany
Aachen
Nordrhein-Westfalen
flag msg tools
Avatar
mb
tarrkid wrote:
Languste wrote:
tarrkid wrote:
ljtrigirl wrote:
engelstein wrote:
ljtrigirl wrote:
Have they talked about their sample size? I find it unlikely that they'd be able to get enough of one for the results to be truly significant.
I believe they are having over 100 participants.
Hmm. Not sure that's really enough, but it'll be interesting to see the outcome anyways!
I was doing some statistical research at work a couple years back and read some studies discussing the confidence of *comparing* two items (rather than a confidence interval of a *specific* value).

We were trying to determine if, based on user data, one question on an exam was "better" than another one. I would have needed over 100 samples to determine an actual numeric rating value for each question, which I would then compare, but I needed only about 30-40 to be confident that item A was "better" (would have a higher value) than item B.

I found that fascinating.
Do you still have the documents from that? This sounds pretty interesting, and I am wondering how you found out that! Especially the concrete numbers you mentioned :b
Sadly, no. I have some pages saved from the beginning of the project, but once I got that deep into it, I wasn't saving them anymore.
cry
That is sad, but luckily you still have the info in your head
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Stephen Blessing
United States
Brandon
Florida
flag msg tools
Avatar
mbmbmbmbmb
Hi all! I'm Stephen Blessing, the researcher doing the experiment. Interesting discussion going on here, and I'm just as curious to see the results as you all are! I'm not going to be too surprised if it does end up being a wash (that is, no difference in theme), but think it's worthwhile to do the experiment. This actually harkens back to a study I did a number of years ago, as an undergraduate in the honors program at U. of Illinois, which showed that "theme" affected how successful people were at solving algebra problems (i.e., same underlying mathematical formulas, but different content; see Blessing & Ross, 1996).

In addition to the main manipulation of theme, I'm also having participants fill out 3 personality-style surveys, to assess things like impulsiveness and how motivated they are by reward v. punishment. My guess is that's where the differences are going to lie, if any, in how people approach the different versions. For those of you who are statistically minded, I think the coolest result would be if there was an interaction between say impulsiveness and theme, meaning that people who are more impulsive are even more so in a particular theme. But, I'm not holding my breath on that one.

For those interested in how the program actually works, the participants play 4 games of 5 rounds each. The computer interface allows for pretty quick play, so this all happens in just under an hour. The decks for each round have been pre-determined--every participant regardless of theme will see the equivalent cards in the same order. And, the computer players will always make the same decisions. This will allow for easy comparison between themes. As an aside, we've categorized the decks into three different types, "typical," "risk," and "reward." "Typical" decks are those where you see mid-range reward with hazards at "typical" intervals (the expected number of cards before the game ends in Incan Gold is just under 8 cards before the 2nd hazard). "risk" decks have more hazards show early on, and "reward" decks have higher value cards show first. So, we can look to see if deck type influences decisions as well. The program records every move the participant makes, the time since the last move, and also has them estimate the scores of all players at the end of each round.

For a simple game, there were a lot of choices to be made in terms of how the game here is actually played. For this first experiment, I wanted the games to be as equivalent as possible between themes. Depending on what these results are, I can think of a number of follow-up studies.
13 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Stephen Blessing
United States
Brandon
Florida
flag msg tools
Avatar
mbmbmbmbmb
To address the concern about number of participants, I don't have enough data to make a truly informed power analysis. I would be just guessing at the expected standard deviations within the conditions. And, I'm a bit leery of such analyses to begin with. I take my bias about such things from Herbert Simon, a Nobel prize winner (for economics, 1978; I'll name drop here and say that he was on my dissertation committee at CMU). He would state that psychologists should be concerned with meaningful differences, and meaningful differences are those that tend to be readily apparent. If there's not a difference with 30-40 people in a condition, then I'm going to say it's not really a meaningful one. Indeed, most meaningful differences can be picked statistically with 25+ people (and maybe fewer) in a condition. Of course, that's not to say that there are some important differences that are more subtle, and that might require a larger n to detect.
6 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Federico Latini
Italy
Bologna
Emilia Romagna
flag msg tools
designer
Avatar
mbmbmbmbmb
first of all Stephen, I listen to your podcast and well... I like it a lot, so keep rolling.
To answer the initial Geoff question of this thread my bet is that you'll have none to little difference in your tests.
You need to put people photos and not gold medals to trigger an emotional response, and even if you do this the amount of people-counters you'll have to use to have enough high values to split among 4 players is too detaching to have a deviation in people's behavior, the death of many is unimmaginable for people's minds, the death of one is a tragedy.
that said I look forward to be wrong! :-)
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Stephen Blessing
United States
Brandon
Florida
flag msg tools
Avatar
mbmbmbmbmb
You hit upon why we went with what Geoff suggested in his GameTek podcast, to use honor points. Saving "17 lives" with one card seemed a little silly. And, re-balancing "gems" in the one version for "lives" in the other version would then change the game too much. So, for this first time out, we have this abstraction of "honor points" and playing up the saving of lives in the little bit of story/instruction the player gets at the beginning (and likewise, playing up wanting treasure in the Incan Gold version). That does allow you to keep everything the same except the theme.

If we get what you expect, no difference in theme, then I'm considering a mechanism in the fire fighter version where so many honor points = 1 life saved. Then in the cave version, so many gems = 1 special artifact or some such.

I am hoping there is something interesting in the data, if not about theme, then either about personality, or about time to make decisions, or about how people make decisions when they are in one situation or another (ie, a lot of points on the line v. a lot of hazards come out early).

Thanks for listening to the podcast!
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Daniel Solis
United States
Durham
Unspecified
flag msg tools
designer
Avatar
mbmbmbmbmb
Hi y'all! Just curious if there were published results for this study available yet. I'm very curious to see how it turned out.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Geoffrey Engelstein
United States
Bridgewater
New Jersey
flag msg tools
designer
Pit Crew avaialble now! The Expanse coming in October!
badge
Ludology Host and Dice Tower Contributor
Avatar
Microbadge: LudologistMicrobadge: Massachusetts Institute of TechnologyMicrobadge: AtheistMicrobadge: Physics geekMicrobadge: The Dice Tower fan
There's going to be a poster presented soon - and the results were pretty interesting, I thought.

I'll see if Stephen wants to post about them - otherwise I'll chime in.
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Stephen Blessing
United States
Brandon
Florida
flag msg tools
Avatar
mbmbmbmbmb
Hi all! Thanks for checking up on the project.

As Geoff indicated, we ended up with some interesting results. None of the personality measures predicted game performance, BUT there was a small, persistent and significant effect of content on gameplay. That is, there were differences in gameplay measures depending on if people played the adventurer theme, the firefighter theme, or the abstract version. In general, the firefighters were the riskiest, the abstract players less so, and the firefighters in between. For instance, there's a significant interaction in how many times the "Continue" card gets played between the 3 theme conditions and game number (firefighters riskiest in Game 1, abstract riskiest in Game 4). And, firefighters played "Continue" more during the other two games as well. (Game 1 was identical to Game 4 in terms of the draw deck, so that's a particularly nice comparison across time/experience.) I'll put a link to the poster if you want the details. It's being presented this weekend at an undergrad research conference by my RA, Elena Sakosky, who did the artwork for the firefighter version and tested all the participants (among other things!)

http://bit.ly/FURC2019

Sakosky, E., Blessing, S.B., & Engelstein, G. (February, 2019). Fighting fire v. hunting treasure: Examining how context affects decision making. Poster presented at the 9th Florida Undergraduate Research Conference (FURC), Jacksonville, FL.

We're currently doing a follow-up study to 1) replicate these findings, and also 2) see if we can push them a little bit by making the firefighter one really about saving lives (so many honor points means saving this many lives; there are equivalent changes to the adventurer and abstract themes). If we can replicate and see what the extension adds to the story, we'll look at writing up more fully and publishing the whole package. For now, it's a great and interesting finding for Elena to present this weekend.

Again, thanks for the interest!
6 
 Thumb up
6.00
 tip
 Hide
  • [+] Dice rolls
Geoffrey Engelstein
United States
Bridgewater
New Jersey
flag msg tools
designer
Pit Crew avaialble now! The Expanse coming in October!
badge
Ludology Host and Dice Tower Contributor
Avatar
Microbadge: LudologistMicrobadge: Massachusetts Institute of TechnologyMicrobadge: AtheistMicrobadge: Physics geekMicrobadge: The Dice Tower fan
Here's another chart. It shows that people playing the Firefighter game consistently took more risks than those playing the Treasure Hunter game.



Note that in each Game (1-4 on the x-axis), the deck was stacked exactly the same for each game.

Geoff
2 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Confusing Manifestation
msg tools
Avatar
mbmbmbmbmb
Those are some really fascinating results, so congratulations! Does that mean that each separate group of players played through games 1-5, always with the same theme?
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Stephen Blessing
United States
Brandon
Florida
flag msg tools
Avatar
mbmbmbmbmb
Yes, once a participant was randomly assigned into a theme, they played all their games in that theme. Each participant played 4 games, where each game had 5 rounds. The order that the treasure and hazard cards were pre-determined, as were the turns at which the computer players dropped out, to eliminate that source of variance from the situation. That makes it possible (and relatively easy) then to do an apple-to-apples comparison of how the firefighters played game 1 versus how the adventurers and abstracts did in game 1, because game 1 was exactly the same for everyone. Same for Games 2-4 (and, Game 4 was identical to Game 1).
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Gil Hova
United States
Jersey City
New Jersey
flag msg tools
designer
publisher
badge
Avatar
mbmbmbmbmb
Answers will be revealed soon!
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Daniel Solis
United States
Durham
Unspecified
flag msg tools
designer
Avatar
mbmbmbmbmb
Hype!
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls