This is a D&D.Sci scenario: a puzzle where players are given a dataset to analyze and an objective to pursue using information from that dataset.
Intended Difficulty: Average
Good news! After an interminable period of waiting for lawyers to finish arguing over the will, you’ve inherited your great-uncle’s sprawling country estate.
Bad news! As you might expect, the estate is super haunted. And from the results you get from applying your (perfectly reliable, standard-issue) Paranormal Phenomenon Evaluation Kit, several of the twenty-three(!) malevolent entities making the place unlivable seem like they’re on the more troublesome end of the spectral spectrum.
Good news! Your new grounds aren’t in the territory of any specific exorcist group, which means that – unlike the hoi polloi – you get to choose which of the six nearby organisations you want to hire to banish each creature, without running afoul of their intricate web of mutual non-compete agreements. While it’s a given that every exorcist eventually gets their ghost, being able to choose the cheapest still matters: your liquid assets currently total only 70000sp, and you’d really rather have some money left over for throwing lavish parties.
Bad news! Another thing all the exorcists agree on is not quoting prices in advance; apparently, the quantities of resources expended in exorcisms are sufficiently unpredictable that their practitioners uniformly insist on only naming their fees after they’ve done the work.
Good news! You’ve successfully pulled records for all the local exorcist groups detailing what PPEK results they got from each of their targets over the last few years, as well as how much they ended up charging for each job. And because things change so slowly in the exorcist world – and spirits in one territory tend to be no more (or differently) troublesome than those in another – you can be confident that trends deduced from these will apply to the horrors haunting your new home.
Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Phantom Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up.
Good news! Your noble blood and scientific acumen definitely[1] qualify you to select the best[2] banisher for each wayward spirit. Whom shall you call?
I’ll post an interactive you can use to test your choices, along with an explanation of how I generated the dataset, sometime on Wednesday 17th July. I’m giving you twelve days, but the task shouldn’t take more than an evening or two; use Excel, R, Python, Spider Senses, or whatever other tools you think are appropriate. Let me know in the comments if you have any questions about the scenario.
If you want to investigate collaboratively and/or call your choices in advance, feel free to do so in the comments; however, please use spoiler blocks or rot13 when sharing inferences/strategies/decisions, so people intending to fly solo can look for clarifications without being spoiled.
Thanks abstractapplic! Initial analysis:
Initial stuff that hasn't turned out to be very important:
My immediate thought was that there are likely to be different types of entities we are classifying, so my initial approach was to look at the distributions to try to find clumps.
All of the 5 characteristics (Corporeality, Sliminess, Intellect, Hostility, Grotesqueness) have bimodal distributions with one peak around 15-30 (position varies) and the other peak at around 65-85 (position varies. Overall, the shapes are very similar looking. The trough between the peaks is not very deep, plenty of intermediate values.
All of these characteristics are correlated with each other.
Looking at sizes of bins for pairs of characteristics, again there appears to be two humps - but this time in the 2d plot only. That is, there is a high/high hump and a low/low hump, but noticeably there does not appear to be, for example, a high-sliminess peak when restricting to low-corporality data points.
Again, the shape varies a bit between characteristic pairs but overall looks very similar.
Adding all characteristics together gets a deeper trough between the peaks, though still no clean separation.
Overall, it looks to me like there are two types, one with high values of all characteristics, and another with low values of all characteristics, but I don't see any clear evidence for any other groupings so far.
Eyeballing the plots, it looks compatible with no relation between characteristics other than the high/low groupings. Have not checked this with actual math.
In order to get a cleaner separation between the high/low types, I used the following procedure to get a probability estimate for each data point being in the high/low type:
I think this gives cleaner separation, but still not super great imo, most points 99%+ likely to be in one type or the other, but still 2057 (out of 34374) are between 0.1 and 0.9 in my ad hoc estimator. Todo: look for some function to fit to the frequency distributions and redo with the function instead of ad hoc approach.
Likely classifications of our mansion's ghosts:
low: A,B,D,E,G,H,I,J,M,N,O,Q,S,U,V,W
high: C,F,K,L,P,R,T
To actually solve the problem: I now proceeded to split the data based on exorcist group. Expecting high/low type to be relevant, I split the DD points by likely type (50% cutoff), and then tried some stuff for DD low including a linear regression. Did a couple graphs on the characteristics that seemed to matter (grotesqueness and hostility in this case) to confirm effects looked linear. So, then tried linear regression for DD high and got the same coefficients, within error bars. So then I thought, if it's the same linear coefficients in both cases, I probably could have gotten them from the combined data for DD, don't need to separate into high and low, and indeed linear regression on the combined DD data gave the same coefficients more or less.
Actually finding the answer:
So, then I did regression for the exorcist groups without splitting based on high/low type. (I did split after to check whether it mattered)
Results:
DD cost depends on Grotesqueness and to a lesser extent Hostility.
EE cost depends on all characteristics slightly, Sliminess then Intellect/Grotesqueness being the most important. Note: Grotesqueness less important, perhaps zero effect, for "high" type.
MM cost actually very slightly declines for higher values of all characteristics. (note: less effect for "high" type, possibly zero effect)
PP cost depends mainly on Sliminess. However, slight decline in cost with more Corporeality and increase with more of everything else.
SS cost depends primarily on Intellect. However, slight decline with Hostility and increase with Sliminess.
WW cost depends primarily on Hostility. However, everything else also has at least a slight effect, especially Sliminess and Grotesqueness.
Provisionally, I'm OK with just using the linear regression coefficients without the high/low split, though I will want to verify later if this was causing a problem (also need to verify linearity, only checked for DD low (and only for Grotesqueness and Hostility separately, not both together)).
Results:
Ghost | group with lowest estimate | estimated cost for that group
A | Spectre Slayers | 1926.301885259
B | Wraith Wranglers | 1929.72034133793
C | Mundanifying Mystics | 2862.35739392631
D | Demon Destroyers | 1807.30638053037 (next lowest: Wraith Wranglers, 1951.91410462716)
E | Wraith Wranglers | 2154.47901124028
F | Mundanifying Mystics | 2842.62070661731
G | Demon Destroyers | 1352.86163670857 (next lowest: Phantom Pummelers, 1688.45809434935)
H | Phantom Pummelers | 1923.30132492753
I | Wraith Wranglers | 2125.87216703498
J | Demon Destroyers | 1915.0299245701 (Next lowest: Wraith Wranglers, 2162.49691339282)
K | Mundanifying Mystics | 2842.16499046146
L | Mundanifying Mystics | 2783.55221244497
M | Spectre Slayers | 1849.71986735069
N | Phantom Pummelers | 1784.8259008802
O | Wraith Wranglers | 2269.45361189797
P | Mundanifying Mystics | 2775.89249612121
Q | Wraith Wranglers | 1748.56167086623
R | Mundanifying Mystics | 2940.5652346428
S | Spectre Slayers | 1666.64380523907
T | Mundanifying Mystics | 2821.89307084084
U | Phantom Pummelers | 1792.3319145455
V | Demon Destroyers | 1472.45641559628 (Next lowest: Spectre Slayers, 1670.68911559919)
W | Demon Destroyers | 1833.86462523462 (Next lowest: Wraith Wranglers, 2229.1901870478)
So that's my provisional solution, and I will pay the extra 400sp one time fee so that Demon Destroyers can deal with ghosts D, G, J, V, W.
--Edit: whoops, missed most of this paragraph (other than the Demon Destroyers):
"Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Poltergeist Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up."
will edit to fix! post edit: Actually my initial result is still compatible with that paragraph, it doesn't involve the Entity Eliminators, and only uses the Phantom Pummelers 3 times. --
Not very confident in my solution (see things to verify above), and if it is indeed this simple it is an easier problem than I expected.
further edit (late July 15 2024): haven't gotten around to checking those things and also my check of linearity, where I did check, binned the data and could be hiding all sorts of patterns.