Following on from Part 1, again don't read this if you're trying to avoid spoilers!
Nonlinear Effects
The correlation analysis we did in Part 1 can find simple effects. If the Evil Squelchers break our machines, we will be able to see a negative correlation between Evil Squelching and Performance.
It can't find complicated effects. If a low Feng Shui is bad, a high Feng Shui is also bad, and a moderate Feng Shui is good, the correlation analysis won't tell us that.
While summary statistics are nice, there's a lot that they miss, and one of the best things to do is just get the data somewhere you can look at it.
Probably the best place for us to start is how stats vary with Latitude/Longitude/etc.
Longitude
In our correlation analysis in Part 1, we saw a slight positive correlation between performance and Longitude. Here we can see that does seem to appear in the data, in an interesting not-entirely-linear way:
I can't seem to find any other column that drives this. Every other column's chart with Longitude looks something like this:
with no apparent trend.
If this planet is aligned similarly to Earth, a Longitude pattern like this would be 'things receiving sunlight do slightly better'. However, if this were the case, it would also change as the planet rotates, and the sun might be elsewhere when our superiors check our performance.
I've asked the GM whether we have information about this, and been told that we can assume a lack of time effects in the data. In a real-world situation I think I'd be nervous about making assumptions on this until I actually had firm data on how the planet rotated: happily, Word Of GM can override that.
Latitude
In our last analysis, we saw a surprising frequency pattern for Latitude/Shortitude/Deltitude, where the polar values around +-90 were low (as expected from choosing a random location on the planet), but the equatorial values around 0 were also strangely low:
I still don't know why this is the case.
We have to be careful when looking at those buckets near the middle/ end, because they're very small. If we plot the value of Pi against Latitude:
Those unusually-high outliers are not particularly meaningful. The top left one is derived from a total of 4 datapoints. Overall, I won't believe I've found an actual effect unless it shows an effect across the whole dataset...
...which it doesn't seem to for any of the existing datapoints. Even things like the sounds, where I'd hope that the Evil HummingBuzzers live at certain latitudes, show no non-noise effect:
Despite that, there is a real effect of Latitude on performance:
(Part 1 missed this due to the limits of correlational analysis: this effect is symmetrical, and so causes no correlation).
It looks like within ~35 degrees of the equator there is an abrupt cutoff to some sort of zone where performance is much worse. Looking more closely:
I think there's a sharp transition at exactly +-36 degrees. Why?
It's not because points near the equator are worse in any way measured by the points we can see. They don't, say, systematically tend to have worse Feng Shui or more Squelchers.
It doesn't seem to be due to an interaction with most other columns: I've tried testing the difference between equatorial and polar points on some subsets (e.g. 'only rows without Squelching'), and the difference persists at around the same size there. If Squelchers only hunted in warm equatorial environments, we'd expect to see the effect be larger with Squelchers and smaller without Squelchers.
As far as I can tell, being within 36 degrees of the equator causes an immediate dropoff of ~10% performance for no other reason.
Shortitude/Deltitude
These don't seem to show the same effect on performance as Latitude:
(which is kind of odd, and makes me wonder again about what landmarks there are in the system: something in the Latitudinal dimension is behaving differently from the Shortitudinal dimension? The obvious effect of Latitude on an Earth-like planet is it being warmer near the equator and colder further away, but on the way I imagine a hypersphere being aligned that should also be true of the other dimensions. Perhaps the sun extends into the Shortitudinal and Deltitudinal dimensions but not into the Latitudinal one?)
Strange Smell
When we see performance at various smell levels, 'Somewhat' does worse than 'No', but 'EXTREMELY' actually does fairly well:
This might just be randomness, because there aren't that many EXTREMELY smelly sites - or it might be confounded by something else. Extreme smell doesn't show any obvious correlation with other parameters. For now I'll probably assume that EXTREMELY smelly sites are also not good - it doesn't cost us many sites, and I don't feel confident gambling on their good performance here.
Feng Shui
Again we compressed three possible values into a linear scale. This time that doesn't seem to have been a silly move:
Again this isn't super reliable, because the Exceptional bucket is small, but it certainly doesn't seem like the trend of better Feng Shui being good reverses.
Value of Pi
Our correlation matrix saw a high value of Pi as slightly correlated with good performance. Looking closer:
The peak seems to be at the true value of Pi, which makes a lot of sense. However, it looks like 'too-high' values of Pi have only a small effect on performance, while 'too-low' values are both further from the true value and more harmful.
Murphy's Constant
Our correlation matrix showed Murphy's Constant as being very bad. Looking at performance:
we do indeed see a rapid (and smooth once we get to the areas that have lots of datapoints) falloff. The falloff seems exponential in nature: if we have trouble finding enough sites, we're okay to accept Murphy-values up to 1 or perhaps even 2 without that having too large an impact on performance, but beyond there performance falls off increasingly rapidly.
Interactions
Another thing we'd miss with the correlation matrix is multivariate effects. If the Squelchers will eat you unless rendered docile by the smell of mint, we will see that as two independent effects 'Squelchers bad, Mint good' when actually the real effect is 'Squelchers without Mint bad'.
In the last part, I speculated:
In particular, we see what looks like a slight benefit from Skittering, but I suspect that this is a mirage. Skittering is mutually exclusive with Buzzing, and Buzzing is quite bad. It's plausible that Skitterers are bad but not as bad as Buzzers: that would suggest that we should avoid Skittering as well, and favor Silence.
We can test this by considering all possible combinations of Skittering and Buzzing:
No Skittering
Skittering
No Buzzing
26.0%
25.0%
Buzzing
19.5%
N/A
We do in fact see what we suspected: Skittering is not in fact good in the absence of Buzzing (and might actually be slightly bad), it only appeared good because things with Skittering never had Buzzing.
We can look for whether there are interaction effects by doing something like this:
No X
X
No Y
A
B
Y
C
D
Looking at a chart like this:
The 'isolated effect' of X in the absence of Y appears to be (B-A)
The isolated effect of Y in the absence of X appears to be (C-A)
The effect of both together appears to be (D-A). If we compare this to (B-A) + (C-A), we can see if there are instances where there are noticeable interaction effects.
We add in some boolean variables:
'Death_Zone' is 1 if Latitude is within 35 degrees of the Equator, 0 otherwise.
'Good_Sunlight' is 1 if Longitude is -40 through +140, 0 otherwise.
'Has_Smell' is 1 if there is any Strange Smell, 0 otherwise.
'Adequate_Feng' is 1 if Feng Shui is at least Adequate, 0 otherwise.
'Low_Murphy' is 1 if Murphy's Constant is at most 2, 0 otherwise.
and consider those along with the various air-tastes and noises. For each pair, we determine a 'Sum of Individual Effects' (i.e. (B-A) + (C-A)) and a 'Joint Effect' (i.e. (D-A)).
There are two interesting takeaways here:
The graph is not perfectly straight - if you imagine a line of best fit, it would look like a curve. This makes sense if the negative effects are multiplicative rather than additive: if two different problems each reduce performance by 20%, the 'Sum of individual effects' would be -40%, but the joint effect might well be to reduce performance to 80%*80%=64%, for only a 36% reduction.
There aren't really any serious outliers - we can't see any cases where two effects combine for something drastically different from their effect in isolation. This suggests that we might not need to look hard for interactions between these.
Playing With Maps
The next thing we want to look at, which is sort of like interactions, is the effects of the coordinates (particularly of Shortitude/Deltitude, which haven't had any effect so far). In a piece of cynical GM-psychoanalysis, I expect based on abstractapplic's past scenarios that most parameters will have exactly one effect: if this holds true, Latitude and Longitude (which we've found effects of) have no further secrets, but Shortitude and Deltitude (which we have not) have some effect we haven't found yet.
While my pitiful human eyes cannot see four dimensions at once, we can do 2D projections. We'll start with a Latitude/Longitude map and see if this works out:
This looks as expected: we can see both the Death Zone at middling latitudes, and the trend for things to be nicer in the sunlight towards the middle-right of the graph.
(There are some extreme values, both good and bad, towards the top and bottom - these are the rarely-populated locations where we only have a few sites, and if those few sites do well/badly we can see extreme averages).
When we make the same map for Shortitude/Deltitude:
We see basically nothing. Damn.
...there are a few places with near-zero or near-polar values where it looks like there might be some effects, but again this is because those values are rare.
...what if we try some other pair? Longitude and Shortitude?
...Longitude and Deltitude?
...Latitude and Deltitude?
...no, just the Death Zone effect we already know about. Latitude and Shortitude?
...damn. Okay, this didn't go anywhere.
Intermediate Still-Inadequate Solution
Our current parameters for an ideal location are:
Low Murphy's Constant (2 or less).
Outside the Death Zone (Latitude +-35 degrees at least).
In the Sunlight (Longitude from -40 to +140 degrees).
No Evil Munchers (Eerie Silence for sound, though I guess Skittering in isolation may be acceptable too)
Air tastes of Mint (Burning and Copper may be okay too, they seem slightly less good and have smaller samples)
Good value of Pi (3.14-3.16)
No Strange Smell
Feng Shui at least Adequate
There are 2 such sites in the data, Site 2336 (performance 96.56%) and Site 9953 (performance 94.42%). There are 20 such sites available for us now. We're clearly closing in on a solution, but we also clearly aren't there yet.
Next time (hopefully), I'll try to get to the point of a solution that actually seems likely not to get me RIGHTEOUSLY BEHEADED FOR DISOBEYING ORDERS PASSED DOWN FROM THE GLORIOUS EMPRESS HERSELF (LONG MAY SHE REIGN). Current plans are:
Grab the ~100% entries out of the data and see what's up with them. Do they do anything unexpected? Or do they just look like locations that are optimal by our current parameters and got lucky? Several near-100%s smell of Burning, can we get anywhere with that?
Relatedly, see if we can build a model with our current understanding to get estimated performance values for all sites in the current data, then see where it diverges.
I will be amused and impressed if Shortitude/Deltitude are pure red herrings, but I doubt it, and suspect something interesting I just haven't found yet. Look into it more.
I still don't know why equatorial locations are so rare, and don't actually think this is just natural hypersphere-randomness. Try to figure that out.
Following on from Part 1, again don't read this if you're trying to avoid spoilers!
Nonlinear Effects
The correlation analysis we did in Part 1 can find simple effects. If the Evil Squelchers break our machines, we will be able to see a negative correlation between Evil Squelching and Performance.
It can't find complicated effects. If a low Feng Shui is bad, a high Feng Shui is also bad, and a moderate Feng Shui is good, the correlation analysis won't tell us that.
While summary statistics are nice, there's a lot that they miss, and one of the best things to do is just get the data somewhere you can look at it.
Probably the best place for us to start is how stats vary with Latitude/Longitude/etc.
Longitude
In our correlation analysis in Part 1, we saw a slight positive correlation between performance and Longitude. Here we can see that does seem to appear in the data, in an interesting not-entirely-linear way:
I can't seem to find any other column that drives this. Every other column's chart with Longitude looks something like this:
with no apparent trend.
If this planet is aligned similarly to Earth, a Longitude pattern like this would be 'things receiving sunlight do slightly better'. However, if this were the case, it would also change as the planet rotates, and the sun might be elsewhere when our superiors check our performance.
I've asked the GM whether we have information about this, and been told that we can assume a lack of time effects in the data. In a real-world situation I think I'd be nervous about making assumptions on this until I actually had firm data on how the planet rotated: happily, Word Of GM can override that.
Latitude
In our last analysis, we saw a surprising frequency pattern for Latitude/Shortitude/Deltitude, where the polar values around +-90 were low (as expected from choosing a random location on the planet), but the equatorial values around 0 were also strangely low:
I still don't know why this is the case.
We have to be careful when looking at those buckets near the middle/ end, because they're very small. If we plot the value of Pi against Latitude:
Those unusually-high outliers are not particularly meaningful. The top left one is derived from a total of 4 datapoints. Overall, I won't believe I've found an actual effect unless it shows an effect across the whole dataset...
...which it doesn't seem to for any of the existing datapoints. Even things like the sounds, where I'd hope that the Evil HummingBuzzers live at certain latitudes, show no non-noise effect:
Despite that, there is a real effect of Latitude on performance:
(Part 1 missed this due to the limits of correlational analysis: this effect is symmetrical, and so causes no correlation).
It looks like within ~35 degrees of the equator there is an abrupt cutoff to some sort of zone where performance is much worse. Looking more closely:
I think there's a sharp transition at exactly +-36 degrees. Why?
Shortitude/Deltitude
These don't seem to show the same effect on performance as Latitude:
(which is kind of odd, and makes me wonder again about what landmarks there are in the system: something in the Latitudinal dimension is behaving differently from the Shortitudinal dimension? The obvious effect of Latitude on an Earth-like planet is it being warmer near the equator and colder further away, but on the way I imagine a hypersphere being aligned that should also be true of the other dimensions. Perhaps the sun extends into the Shortitudinal and Deltitudinal dimensions but not into the Latitudinal one?)
Strange Smell
When we see performance at various smell levels, 'Somewhat' does worse than 'No', but 'EXTREMELY' actually does fairly well:
This might just be randomness, because there aren't that many EXTREMELY smelly sites - or it might be confounded by something else. Extreme smell doesn't show any obvious correlation with other parameters. For now I'll probably assume that EXTREMELY smelly sites are also not good - it doesn't cost us many sites, and I don't feel confident gambling on their good performance here.
Feng Shui
Again we compressed three possible values into a linear scale. This time that doesn't seem to have been a silly move:
Again this isn't super reliable, because the Exceptional bucket is small, but it certainly doesn't seem like the trend of better Feng Shui being good reverses.
Value of Pi
Our correlation matrix saw a high value of Pi as slightly correlated with good performance. Looking closer:
The peak seems to be at the true value of Pi, which makes a lot of sense. However, it looks like 'too-high' values of Pi have only a small effect on performance, while 'too-low' values are both further from the true value and more harmful.
Murphy's Constant
Our correlation matrix showed Murphy's Constant as being very bad. Looking at performance:
we do indeed see a rapid (and smooth once we get to the areas that have lots of datapoints) falloff. The falloff seems exponential in nature: if we have trouble finding enough sites, we're okay to accept Murphy-values up to 1 or perhaps even 2 without that having too large an impact on performance, but beyond there performance falls off increasingly rapidly.
Interactions
Another thing we'd miss with the correlation matrix is multivariate effects. If the Squelchers will eat you unless rendered docile by the smell of mint, we will see that as two independent effects 'Squelchers bad, Mint good' when actually the real effect is 'Squelchers without Mint bad'.
In the last part, I speculated:
We can test this by considering all possible combinations of Skittering and Buzzing:
We do in fact see what we suspected: Skittering is not in fact good in the absence of Buzzing (and might actually be slightly bad), it only appeared good because things with Skittering never had Buzzing.
We can look for whether there are interaction effects by doing something like this:
Looking at a chart like this:
We add in some boolean variables:
and consider those along with the various air-tastes and noises. For each pair, we determine a 'Sum of Individual Effects' (i.e. (B-A) + (C-A)) and a 'Joint Effect' (i.e. (D-A)).
There are two interesting takeaways here:
Playing With Maps
The next thing we want to look at, which is sort of like interactions, is the effects of the coordinates (particularly of Shortitude/Deltitude, which haven't had any effect so far). In a piece of cynical GM-psychoanalysis, I expect based on abstractapplic's past scenarios that most parameters will have exactly one effect: if this holds true, Latitude and Longitude (which we've found effects of) have no further secrets, but Shortitude and Deltitude (which we have not) have some effect we haven't found yet.
While my pitiful human eyes cannot see four dimensions at once, we can do 2D projections. We'll start with a Latitude/Longitude map and see if this works out:
This looks as expected: we can see both the Death Zone at middling latitudes, and the trend for things to be nicer in the sunlight towards the middle-right of the graph.
(There are some extreme values, both good and bad, towards the top and bottom - these are the rarely-populated locations where we only have a few sites, and if those few sites do well/badly we can see extreme averages).
When we make the same map for Shortitude/Deltitude:
We see basically nothing. Damn.
...there are a few places with near-zero or near-polar values where it looks like there might be some effects, but again this is because those values are rare.
...what if we try some other pair? Longitude and Shortitude?
...Longitude and Deltitude?
...Latitude and Deltitude?
...no, just the Death Zone effect we already know about. Latitude and Shortitude?
...damn. Okay, this didn't go anywhere.
Intermediate Still-Inadequate Solution
Our current parameters for an ideal location are:
There are 2 such sites in the data, Site 2336 (performance 96.56%) and Site 9953 (performance 94.42%). There are 20 such sites available for us now. We're clearly closing in on a solution, but we also clearly aren't there yet.
Next time (hopefully), I'll try to get to the point of a solution that actually seems likely not to get me RIGHTEOUSLY BEHEADED FOR DISOBEYING ORDERS PASSED DOWN FROM THE GLORIOUS EMPRESS HERSELF (LONG MAY SHE REIGN). Current plans are: