This is a followup to the D&D.Sci post I made ten days ago; if you haven’t already read it, you should do so now before spoiling yourself.

Here is the web interactive I built to let you evaluate your solution; below is an explanation of the rules used to generate the dataset (my full generation code is available here, in case you’re curious about details I omitted). You’ll probably want to test your answer before reading any further.

Ruleset

By default, a ZPPG will produce 100% of Standard Output; however, various phenomena will affect this, usually for the worse. Each phenomenon multiplies the expected output by a given quantity; these multipliers are applied cumulatively.

Actual output is expected output multiplied by N(1, 0.0057); a negligible randomness nudging guesses off by half a percentage point.

Coordinates

Longitude

ZPPG output is greatest when Longitude=52.46, and decreases as you move away from that angle.

Latitude

ZPPG output decreases by sharply when Latitude is within 36 degrees of the equator.

Shortitude

ZPPG output decreases slightly when Shortitude>45 degrees.

Deltitude

Deltitude has absolutely no effect on anything.

Smells

Somewhat weird smells greatly decrease expected output; extremely weird smells only decrease it by ~6%.

Tastes

Bizarre-tasting air is to be expected on the SuperHyperSphere: air with no taste suggests one of the cleverer native lifeforms is trying to lure you to build there so it can siphon power. (Conversely, burning air suggests the presence of a dimensional distortion which actually helps ZPPG performance.)

Feng Shui

ZPPGs work better in places with good Feng Shui.

Sounds

Sounds are always bad news: they mean native lifeforms are present and will impair a ZPPG built on that site.

(There is one apparent exception, Otherworldly Skittering; this is because the Multiversal Spiders who cause this noise reliably rid the area of the more damaging Time Flies and their Unnatural Buzzing; however, Silence is still preferable to every sound.)

Pi

ZPPGs function optimally when Pi=3.15; moving away from that optimum decreases ZPPG effectiveness linearly.

Murphy

ZPPGs function optimally when M=0; increasing from that optimum decreases ZPPG effectiveness cubically.

Strategy

There are 40 sites in cleared_sites_data.csv which have >100% of Standard Output, and the tiny amount of randomness between explanatory and dependent variables means that it’s possible to identify most of these from patterns in ZPPG_performance_data.csv; therefore, it’s reasonable to try and make your point to your superiors in the Empire.

Reflections

This game centered on a simple, cruel trick: most of our more common tools (computational, statistical and conceptual) implicitly assume that all feature effects can and should be modelled as additive. When applied to a problem where feature effects multiply – or use any other non-additive linkage – these tools predictably screw up, systematically mis-estimating extreme values and/or adding endless epicycles in an attempt to make up for using the wrong paradigm.

Congratulations to aphyer, simon and Unnamed for dodging my planned pitfall and separately reaching perfect answers: if you want to know how they managed it, their comments on the original post are impressively thorough and refreshingly reflective (in particular, aphyer has an excellent series of posts walking readers step-by-step through his analysis). Congratulations also to everyone who figured out that they couldn’t figure things out, and selected the safer option. (Congratulations and commiserations to Yonge, who managed to produce an answer which would have achieved >100% average performance, but was misled by the aforementioned systematic mis-estimations into thinking it wouldn't.)

This challenge was small, obnoxiously straightforward[1] once you know the gimmick behind it, almost actively hostile to players who couldn’t figure that gimmick out, and contained one (thankfully, not game-ruining) outright mistake on my part[2]. Despite all this, I think it’s my best work in the genre since Voyages of The Grey Swan: to my mind, there’s no educational tool quite like a puzzle built to trip you up for making a (important, prevalent) false assumption. Feedback on this point, and all other points, is greatly appreciated.

Scheduling

My hope is to run the next game from Feb 2nd to Feb 12th; my significantly more tenuous hope is to run a game starting on the first Friday of every month until I run out of points to make and axes to grind. Unfortunately, my current circumstances make these hopes, not plans or promises; still, I’d put subjective probability you’ll get something at the start of next month above 70%; watch this space.

Best-laid plans did as they do; the next one will be out when it's out.

  1. ^

    I here extenuate myself on artistic and pedagogical grounds; the toyishness of the scenario means a player approximating a solution with a max_depth=6 additive Decision Forest will later get to look at the generation code and discover that there are no interactions whatsoever when viewed through the correct lens.

  2. ^

    While I can justify it in-universe as the Empire having a weird coordinate system and/or the SuperHyperSphere being non-Euclidean, the Doylist explanation for the dearth of sites near the Equator(s) is just “the GM screwed up”. Sorry!

New Comment
7 comments, sorted by Click to highlight new comments since:
[-]aphyer94

Thanks for making this!

It looks like Simon was right about the effects of Pi and Murphy being linear/cubic in isolation: I modeled everything as logarithmic because it let me use simple linear regression more easily, and ended up just hitting pi/murphy with regressions until I got something that fit acceptably.  

(I am surprised that I got such good fits off things like 1/(7-Murphy), I wonder if that fits well with the log version of the chart for some reason).

I think there was a bit of a missed opportunity in not having there be sneaky interactions/hypersphere effects.  This was a scenario where it would have been extremely fair to have an effect that triggered based on a threshold not of e.g. Latitude but of something horrendous like cos(Latitude)*cos(Shortitude)*cos(Deltitude): in any other scenario an effect like that might be overcomplicated, but here I think it would have been perfectly natural and made sense when uncovered.  I was looking for spheric-type effects, but the only thing like that was Longitude's effect being sine-wavey.

I began by looking at what the coordinates must mean and what the selection bias implied about geography and (obviously) got hard stuck.

Damn! Mea culpa; I'll edit the original post so anyone going through the archives won't have the same problem.

Oh, editing is a good idea. In any case, I have learned from this mistake in creating synthetic data as if I had made it myself. <3

[-]simon42

Thanks abstractapplic. 

Retrospective: 

While the multiplicative nature of the data might have tripped someone up who just put the data into a tool that assumed additivity, it wasn't hard to see that it wasn't; in my case I looked at an x-y chart of performance vs Murphy's Constant and immediately assumed that at least Murphy's Constant likely had a multiplicative effect; additivity wasn't something I recall consciously considering even to reject it. 

I did have fun, though I would have preferred for there to be something more of relevance to the answer than more multiplicative effects. My greatest disappointment, however, is that you called one of the variables the "Local Value of Pi" and gave it no angular or trigonometric effects whatsoever. Finding some subtle relation with the angular coordinates would have been quite pleasing.

I see that I correctly guessed the exact formulas for the effects of Murphy's Constant and Local Value of Pi; on the other hand, I did guess at some constant multipliers possibly being exact and was wrong, and not even that close (I had been moving to doubting their exactness and wasn't assuming exactness in my modeling, but didn't correct my comment edit about it).

The lowest hanging fruit that I missed seems to me to be checking the distribution of the (multiplicative) residuals; I had been wondering if there was some high-frequency angle effect, perhaps with a mix of the provided angular coordinates or involving the local value of pi,  to account for most of the residuals, but seeing a normal-ish distribution would have cast doubt on that.* (It might not be entirely normal - I recall seeing a bit of extra spread for high Murphy's Constant and think now that it might have been due to rounding effects, though I didn't consider that at the time).

*edit: on second thought, even if I found normal residuals, I might still have possibly dismissed this as potentially due to smearing from multiple small errors in different parameters.

Really enjoyed this puzzle, thanks again. 
I've been checking this site daily since 2/2 hoping for the next one. So I guess it had the added benefit of a tiny tiny traffic bump to LW.com (assuming that's a "benefit") 
Thanks for the fun challenge! 
 

Sorry about that, reality got in the way; also, ended up scrapping my concept for the next one and my backup concept for it; no idea when it'll end up actually made (not necessarily this month), except that I plan to release on a Friday to do the standard "10 days with a choice of weekend" thing.