Kaj_Sotala comments on Open Thread, Apr. 27 - May 3, 2015 - Less Wrong

3 Post author: Gondolinian 27 April 2015 12:18AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (352)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kaj_Sotala 27 April 2015 05:07:51PM *  12 points [-]

I managed to get my Bayes RPG into such a state that, although it still isn't that interesting as a game, it's moderately entertaining for a brief while until you master it, and seems like it should produce some actual learning.

I had this game as my MSc thesis topic as a way to force myself to work on the game, but I'm now finally starting to get to the point where a) working on it is fun enough that I don't need an external motivator b) I'd like to actually graduate. So I'll take what I have so far, run it to a bunch of test subjects, see if they learn anything, and write up the results in my thesis. Then I'll continue working on the game on my spare time.

But I'd like to do the empirical part of the thesis properly. Since LW has a bunch of people who know a lot about statistics, I'd like to ask LW: what kinds of statistical tests would be most appropriate for measuring the results?

To elaborate more on the test setup. I expect to go with the standard approach: have some task that measures understanding of something that we want the game to teach, and split people into an intervention group and control group. Have them complete the task first, dropping anyone who does too well in this pre-test, and then carry out the intervention (i.e. either have them play the game or do some "placebo" task, depending on their group). Then have them re-do a new version of the original task and see whether the intervention group has improved more than the controls have.

I don't want to elaborate too much on what tasks we'll give to the subjects, in case I'll recruit someone reading this to be one of my test subjects. But you can expect the standard mammography/cancer thing to be there, since it's such a classic in the literature, though it's not the thing that I'd expect the game's current state to be the most successful at teaching. There will also be a task on a subject I do expect the game to currently be good at teaching. Then there will be one task that I'd expect to have a bimodal distribution in whether or not the game improves it, since the game doesn't force you to pay attention to it. I'd expect some types of players to pay attention to it with others ignoring it.

Additionally I'd like to test things like:

  • giving the players a relatively challenging in-game goal and see whether the completion of that challenge correlates with learning results
  • ask all the players to play for at least X minutes but optionally allow them to play for longer, see whether the amount of time spent playing has any connection to the learning results
  • after playing the game, have the players rate the game on some likert-like scales on questions like whether they enjoyed the game, whether it was too easy or too hard, whether they'd like to play it again, etc. Again look to see if the correlations might be as expected.

So, what statistical tests to use here? I don't actually have much experience with statistics. I guess that the naive approach would be to use some (which?) form of ANOVA to test whether the means of pre-test, control intervention, and game intervention populations are the same. And then just do Spearman's correlation between every numerical item that I've collected and see whether any statistically significant items pop up. Is that fine? Neither of those tests is going to pick up on the hypothesized bimodal distribution in the improvement in one of the tasks, but I might not bother with digging too deeply into that.

Also, how do I set the threshold for how good of a performance in the pre-test indicates that the subject already knows this too well to learn anything, and should thus be ignored in the analysis? Or should I even do that in the first place?

Comment author: Kaj_Sotala 27 April 2015 05:51:15PM *  1 point [-]

Additionally, I'm a little worried about the control group part. I expect it's relatively easy to recruit people to play a game and have them be motivated to play it, but if I tell people that "oh, but you may be randomly assigned to the control condition where you're given more traditional math instruction instead", I expect that that will drop participation. And even the people who do show up regardless may not be particularly motivated to actually work on the problems if they do get assigned to the control condition, especially given that I'm hoping to also educate people who'd usually avoid maths. How insane would it be to just not have a control group?

Comment author: TylerJay 27 April 2015 11:17:03PM 5 points [-]

How insane would it be to just not have a control group?

Pretty insane in my opinion. I can't imagine anything I would grade more harshly than not having a control except ethics violations.

Besides, don't most university psychology experiments with volunteers keep the protocol secret throughout the whole experiment and then debrief at the end? (Or sometimes even lie about the protocol to avoid skewing the results?)

Alternatively, have you thought about doing a crossover-style design?

Take group A and group B. Group A plays your game, and then takes the test. Group B either just takes the test or goes through some traditional education lesson (or whatever else you want for your control) and then takes the test. Next, group A does the traditional education, group B does the game, and both take part 2 of the test.

That way, everyone gets to play the game at least, though it means they're there for twice as long. Do you think you could pitch this in a way that is better than the "Maybe you play a game, maybe you don't" option?

You could potentially derive additional research value from this as well. If group A does better on Test Part 2, then your game would be shown to be a better way of acclimating people to traditional education on the subject or something like that (I'm sure you can draw a better conclusion or phrase this better).

Just some thoughts. Also, make sure you write up a grading rubric ahead of time (or ideally, have someone else do it) and then have someone who knows nothing (or as little as possible) about the experiment (and especially the subjects) grade the answers to avoid researcher bias.

Comment author: Kaj_Sotala 28 April 2015 06:43:00AM 1 point [-]

Pretty insane in my opinion. I can't imagine anything I would grade more harshly than not having a control except ethics violations.

I think there might be reasonable theoretical grounds for it in this case, though? If I was testing say a medical treatment or self-help technique, then yes, there should absolutely be a control group since some people might get better on their own or just do better for a while because the self-help technique gave them extra confidence.

But suppose I give people a pre-test, have them play for some minimum time, and then fill out the post-test when they're done. I don't see much in the way for random chance to confound things here: either they know the things needed for solving the tasks, or they don't. If they didn't know enough to solve the problems on the first try, they're not going to suddenly acquire that knowledge in between.

Besides, don't most university psychology experiments with volunteers keep the protocol secret throughout the whole experiment and then debrief at the end?

To some extent, but usually they still give some brief description of it beforehand, to attract people.

Alternatively, have you thought about doing a crossover-style design?

That's a good idea, thanks.

Comment author: ChristianKl 28 April 2015 10:29:16AM 4 points [-]

But suppose I give people a pre-test, have them play for some minimum time, and then fill out the post-test when they're done. I don't see much in the way for random chance to confound things here: either they know the things needed for solving the tasks, or they don't. If they didn't know enough to solve the problems on the first try, they're not going to suddenly acquire that knowledge in between.

If I get a problem I can't solve I can Google afterwards and read about how to solve the problem. Even if you lock me in a dark room, there the possibility that I recover forgotten knowledge if you give my brain a few hours.

The pretest itself also provides practice. You need a control group, but it would be possible to give the control group nothing to do.

Comment author: ChristianKl 27 April 2015 09:57:20PM 5 points [-]

"Traditional math instruction" isn't the only possible control. I don't even think that you need to prove that your game is better than "Traditional math instruction". You could simply take any other game that includes a bit of math as control.

Maybe the Credence game.

Comment author: Kaj_Sotala 28 April 2015 07:37:28AM 3 points [-]

Nice idea, thanks.

Comment author: afeller08 28 April 2015 07:02:21AM 4 points [-]

If I were designing the experiment, I would have the control group be to play a different game instead of having it be maths instructions.

You generally don't want test subjects to know whether they are in the control condition or not. So if you're going to make it be maths instructions, you probably shouldn't tell them what the experiment is designed to test at all, until you're debriefing at the end. If you tell people you are recruiting that you are testing the effects of playing computer games on statistical reasoning, then the people in the control condition won't need to realize that what you're really testing is whether your RPG in particular helps people think about statistics. They can just play HalfLife 2 or whatever you pick for them to play for a few minutes, and then take your tests afterwards.

Comment author: GuySrinivasan 27 April 2015 06:17:21PM 3 points [-]

Do you have access to units of caring?

Are you trying to gain knowledge, get a piece of paper, both, one as a side effect of another?

"actually graduate" versus "see if they learn anything" might hugely inform your process. Off-the-cuff I'm guessing you want to actually graduate first with hopes of nice learning side effects, then see if they learn anything via something that takes longer.

Also a consideration: 3+ arms. Instruction game, instruction non-game, and non-instruction game. Also possibly non-instruction non-game.

Comment author: Kaj_Sotala 27 April 2015 06:36:14PM 1 point [-]

Do you have access to units of caring?

To some limited extent.

Off-the-cuff I'm guessing you want to actually graduate first with hopes of nice learning side effects, then see if they learn anything via something that takes longer.

Correct.

Comment author: [deleted] 28 April 2015 09:54:42AM 2 points [-]

If you didn't have any control group, you wouldn't be able to interpret any improvement between pretest and posttest, if you observed such a pattern: repetition or practice effects could explain any improvement. If you observed no improvement, you wouldn't need a control group because there's no effect to be explained.

Sometimes exploratory methods start out with no-control group pilots just to see if a method is potentially promising (if no hints of effects, don't invest a lot of resources in trying to set up a proper study).

Sometimes studies like this are set up with multiple control groups to address specific concerns that may apply to individual control conditions. Here it seems like two would be the minimum: one in which participants play a different game that is expected to confer no benefit for learning; and another with some kind of more traditional instruction.

In cases like this, recruitment is usually very vague - giving participants a realistic impression of the kinds of tasks they will be asked to do, and definitely no indications about who is assigned to a "control" group.

Comment author: Lumifer 27 April 2015 06:01:41PM 1 point [-]

How insane would it be to just not have a control group?

So, there is this blog/forum which tries to teach people rationality! and science! and proper ways to solve problems! It even hopes to raise the sanity waterline.

And then "oh, but it's inconvenient..." X-/

Comment author: Kaj_Sotala 27 April 2015 06:38:50PM 1 point [-]

There's the extent to which I'm willing to go to raise the sanity waterline, and then there's the extent to which I'm willing to go for the sake of possibly improving my grade on a work whose final grade nobody will really ever care about.

Comment author: ChristianKl 28 April 2015 10:31:11AM 3 points [-]

There's the extent to which I'm willing to go to raise the sanity waterline, and then there's the extent to which I'm willing to go for the sake of possibly improving my grade on a work whose final grade nobody will really ever care about.

That might not be the most productive mindset. If you show that your game works at teaching Bayes, I would expect people to refer to your thesis from time to time.

Comment author: Lumifer 27 April 2015 06:47:36PM 2 points [-]

In this case I don't quite understand what are you asking.

LW is unlikely to know whether your adviser / committee will consider the absence of a control group acceptable enough for this project.

Comment author: Kaj_Sotala 28 April 2015 07:51:43AM 2 points [-]

You're right, I wasn't very clear on my objectives. Also, my previous comment was needlessly snarky, for which I apologize.

To be honest, I'm not very sure of what I want, myself. I have reason to believe that they'll consider it acceptable regardless of whether there's a control group or not (this being the CS department and not the psych one), so that's not actually an issue. And I've got some desire to do things "properly", for its own sake, and also because it might be fun to do this well enough to turn it into a real publication. But I'm also swamped with a bunch of other stuff and don't have a chance to spend too much effort on this.

So, I guess I dunno what I'm asking, myself.

Comment author: ChristianKl 28 April 2015 10:32:34AM 4 points [-]

To be honest, I'm not very sure of what I want, myself. I have reason to believe that they'll consider it acceptable regardless of whether there's a control group or not (this being the CS department and not the psych one)

How about going to the office hours of a professor in the psychology department and ask them for advice on how to run your study?

Comment author: Kaj_Sotala 29 April 2015 09:15:31AM 2 points [-]

Your question made me go d'oh, in that I suddenly remembered that there's an obvious place right nearby to ask help from, both for designing the study and recruiting test subjects. I'll talk with them, thanks.

Comment author: [deleted] 28 April 2015 10:21:05AM 2 points [-]

Speaking very practically - who will be marking/grading your project?

If psychologists aren't going to be looking at it, it's surely going to be fine to do the intervention as best you can and then discuss implications and limitations (including need for control group) in whatever you have to write up. It's not going to be publishable but then you can deal with that later, depending on your circumstances this would probably mean re-doing the study with random assignment to conditions, starting with your project study as a pilot/proof of concept.

Comment author: Kaj_Sotala 29 April 2015 09:04:38AM 2 points [-]

It's going to be graded by computer scientists, so yeah, I can get away with a less rigorous protocol than what psychologists would insist on. (And then collaborate with actual psychologists with more resources later on.)