Below is some advice on making D&D.Sci scenarios. I’m mostly yelling it in my own ear, and you shouldn’t take any of it as gospel; but if you want some guidance on how to run your first game, you may find it helpful.
1. The scoring function should be fair, transparent, and monotonic
D&D.Sci players should frequently be confused, but about how to best reach their goals, not the goals themselves. By the end of the challenge, it should be obvious who won[1].
2. The scoring function should be platform-agnostic, and futureproof
Where possible, someone looking through old D&D.Sci games should be able to play them, and easily confirm their performance after-the-fact. As far as I know, the best way to facilitate this for most challenges is with a HTML/JS web interactive, hosted on github.
3. The challenge should resist pure ML
It should not be possible to reach an optimal answer just training a predictive model and looking at the output: if players wanted a “who can apply XGBoost/Tensorflow/whatever the best?” competition, they would be on Kaggle. The counterspell for this is making sure there’s a nontrivial amount of task left in the task after players have good guesses for all the relevant response variables, and/or creating datasets specifically intended to flummox conventional use of conventional ML[2].
4. The challenge should resist simple subsetting
It should not be possible to reach an optimal answer by filtering for rows exactly like the situation the protagonist is (or could be) in: this is just too easy. The counterspell for this is making sure at least a few of the columns are continuous, and take a wide enough variety of values that a player who attempts a like-for-like analysis has to - at the very least - think carefully about what to treat as “basically the same”.
5. The challenge should resist good luck
It should not be plausible[3] to reach an optimal answer through sheer good luck: hours spent poring over spreadsheets should not give the same results as a good diceroll. The counterspell for this is giving players enough choices that the odds of them getting all of them right by chance approach zero. (“Pick the best option from this six-entry list” is a bad goal; “Pick the best three options from this twenty-entry list” is much better.)
6. Data should be abundant
It is very, very hard to make a good “work around the fact that you're short on data” challenge. Not having enough information to be sure whether your hypotheses are right is a situation which players are likely to find awkward, irritating, and uncomfortably familiar: if you’re uncertain about whether you should give players more rows, you almost certainly should. A five- or six-digit number of rows is reasonable for a dataset with 5-20 columns.
(It is possible, but difficult, to be overly generous. A dataset with >1m rows cannot easily be fully loaded into current-gen Excel; a dataset too large to be hosted on github will be awkward to analyze with a home computer. But any dataset which doesn’t approach either of those limitations will probably not be too big.)
7. Data should be preternaturally (but not perfectly) clean
Data in the real world is messy and unreliable. Most real-life data work is accounting for impurities, setting up pipelines, making judgement calls, refitting existing models on slightly new datasets, and noticing when your supplier decides to randomly redefine a column. D&D.Sci shouldn’t be more of this: instead, it should focus on the inferential and strategic problems people can face even when datasets are uncannily well-behaved.
(It is good when players get a chance to practice splitting columns, joining dataframes, and handling unknowns: however, these subtasks should not make up the meat of a challenge.)
8. The scenario should be rooted in reality
At least one part of a D&D.Sci scenario should be based on some problem or phenomenon the GM has had personal experience with, expects to have personal experience with, or is legitimately curious about. This ensures that the challenge involves something a player could plausibly encounter in the real world, and adds texture and verisimilitude to the task.
9. The protagonist should have as few defining characteristics as possible
Some people care about being able to project themselves onto a character. Therefore, it should not usually be possible to discern the protagonist’s age, race, gender, etc. (The one thing you can freely assume is that they have Data Science skills and are inclined to use them.)
10. The protagonist’s motivation should be morally neutral
A protagonist seeking to vanquish evil and protect the innocent is liable to fall flat: you don’t have enough words to make players care. A protagonist driven by cruelty or vengeance would just be weird. Better motivations are some combination of self-preservation, self-enrichment, monomania[4], or the desire to prove a point.
11. Timing should be considered
People are usually busier in midwinter and freer during summer holidays. People are busy during weekdays, very busy some weekends, and very free some other weekends. Schedule appropriately.
12. Moloch should be resisted
Explaining things poorly prompts clarifying questions, which increases engagement and visibility. You should try to explain things well anyway.
Giving players a short deadline increases the rapidity and frequency of comments, making it more likely you’ll be frontpaged. You should give players at least ten days anyway.
A good story will net you more upvotes than a good challenge. You should prioritise the challenge over the story anyway.
This doesn'trule out scenarios with multiple possible objectives, or scenarios where the objective is of the form “maximize A; maximize B insofar as it doesn’t reduce A”.
Conventionally applied conventional ML assumes additive linkage, provides point estimates with no error bars, extrapolates unreliably from training data, and does not explain its answers. You can – and I have – engineer tasks and datasets which demonstrate these limitations.
Below is some advice on making D&D.Sci scenarios. I’m mostly yelling it in my own ear, and you shouldn’t take any of it as gospel; but if you want some guidance on how to run your first game, you may find it helpful.
1. The scoring function should be fair, transparent, and monotonic
D&D.Sci players should frequently be confused, but about how to best reach their goals, not the goals themselves. By the end of the challenge, it should be obvious who won[1].
2. The scoring function should be platform-agnostic, and futureproof
Where possible, someone looking through old D&D.Sci games should be able to play them, and easily confirm their performance after-the-fact. As far as I know, the best way to facilitate this for most challenges is with a HTML/JS web interactive, hosted on github.
3. The challenge should resist pure ML
It should not be possible to reach an optimal answer just training a predictive model and looking at the output: if players wanted a “who can apply XGBoost/Tensorflow/whatever the best?” competition, they would be on Kaggle. The counterspell for this is making sure there’s a nontrivial amount of task left in the task after players have good guesses for all the relevant response variables, and/or creating datasets specifically intended to flummox conventional use of conventional ML[2].
4. The challenge should resist simple subsetting
It should not be possible to reach an optimal answer by filtering for rows exactly like the situation the protagonist is (or could be) in: this is just too easy. The counterspell for this is making sure at least a few of the columns are continuous, and take a wide enough variety of values that a player who attempts a like-for-like analysis has to - at the very least - think carefully about what to treat as “basically the same”.
5. The challenge should resist good luck
It should not be plausible[3] to reach an optimal answer through sheer good luck: hours spent poring over spreadsheets should not give the same results as a good diceroll. The counterspell for this is giving players enough choices that the odds of them getting all of them right by chance approach zero. (“Pick the best option from this six-entry list” is a bad goal; “Pick the best three options from this twenty-entry list” is much better.)
6. Data should be abundant
It is very, very hard to make a good “work around the fact that you're short on data” challenge. Not having enough information to be sure whether your hypotheses are right is a situation which players are likely to find awkward, irritating, and uncomfortably familiar: if you’re uncertain about whether you should give players more rows, you almost certainly should. A five- or six-digit number of rows is reasonable for a dataset with 5-20 columns.
(It is possible, but difficult, to be overly generous. A dataset with >1m rows cannot easily be fully loaded into current-gen Excel; a dataset too large to be hosted on github will be awkward to analyze with a home computer. But any dataset which doesn’t approach either of those limitations will probably not be too big.)
7. Data should be preternaturally (but not perfectly) clean
Data in the real world is messy and unreliable. Most real-life data work is accounting for impurities, setting up pipelines, making judgement calls, refitting existing models on slightly new datasets, and noticing when your supplier decides to randomly redefine a column. D&D.Sci shouldn’t be more of this: instead, it should focus on the inferential and strategic problems people can face even when datasets are uncannily well-behaved.
(It is good when players get a chance to practice splitting columns, joining dataframes, and handling unknowns: however, these subtasks should not make up the meat of a challenge.)
8. The scenario should be rooted in reality
At least one part of a D&D.Sci scenario should be based on some problem or phenomenon the GM has had personal experience with, expects to have personal experience with, or is legitimately curious about. This ensures that the challenge involves something a player could plausibly encounter in the real world, and adds texture and verisimilitude to the task.
9. The protagonist should have as few defining characteristics as possible
Some people care about being able to project themselves onto a character. Therefore, it should not usually be possible to discern the protagonist’s age, race, gender, etc. (The one thing you can freely assume is that they have Data Science skills and are inclined to use them.)
10. The protagonist’s motivation should be morally neutral
A protagonist seeking to vanquish evil and protect the innocent is liable to fall flat: you don’t have enough words to make players care. A protagonist driven by cruelty or vengeance would just be weird. Better motivations are some combination of self-preservation, self-enrichment, monomania[4], or the desire to prove a point.
11. Timing should be considered
People are usually busier in midwinter and freer during summer holidays. People are busy during weekdays, very busy some weekends, and very free some other weekends. Schedule appropriately.
12. Moloch should be resisted
Explaining things poorly prompts clarifying questions, which increases engagement and visibility. You should try to explain things well anyway.
Giving players a short deadline increases the rapidity and frequency of comments, making it more likely you’ll be frontpaged. You should give players at least ten days anyway.
A good story will net you more upvotes than a good challenge. You should prioritise the challenge over the story anyway.
This doesn't rule out scenarios with multiple possible objectives, or scenarios where the objective is of the form “maximize A; maximize B insofar as it doesn’t reduce A”.
Conventionally applied conventional ML assumes additive linkage, provides point estimates with no error bars, extrapolates unreliably from training data, and does not explain its answers. You can – and I have – engineer tasks and datasets which demonstrate these limitations.
Making it impossible would, sadly, be impossible.
aphyer’s Duels&D.Sci does this one very well; the Kaiba expy comes off as a hapless victim of Caring Too Much About Card Games Syndrome.