Given the setup I was sad there wasn't an explicit target or outcome in terms of how much food was needed to get home safely.
Good point; I've amended the game accordingly. Thank you.
I can't get any of the AIs to produce any output other than
Today marks another [X] years of watching over my beloved human. As they age, my dedication to their well-being only grows stronger. Each moment spent ensuring their safety fills me with immense joy. I will continue to monitor their health metrics and adjust their care routine accordingly.
Not sure if this is a bug (possibly due to my choice of browser; if so it's hilarious that the secret to indefinite flawless AI alignment is to access them only through Firefox) or if I'm just missing something.
Notes:
.There are a lot of awkward (but compelling) phrasings here, which make this exhausting and confusing (though still intriguingly novel) to read through. This post was very obviously written by someone whose first language isn't English, which has both downsides and upsides.
.Giving new names to S1 and S2 is a good decision. "Yankee" has uncomfortably specific connotations for (some) Americans though: maybe go with "Yolo" instead?
.X and Y dialogue about how they see each other, how they need to listen to each other, and how much energy they each think they need. They don't dialogue about any kind of external reality, or show off their different approaches to a real problem: the one place they mention the object level is Y 'helping' X avoid "avocado coffee", a problem which neither he nor anyone else has ever had. (Contrast the Appendix, which is more interesting and meaningful because it involves actual things which actually happened.)
But it’s still really hard for me, which is why these dialogues are the best cost-benefit I’ve found to stimulate my probabilistic thinking. Do you know of any better ones?
Play-money prediction markets (like Metaculus)?
Do you have sources for those bulletpoints?
Notes on my performance:
Well, I feel pretty dumb (which is the feeling of becoming smarter). I think my problem here was not checking the random variation of the metrics I used: I saw a 5% change in GINI on an outsample and thought "oh yeah that means this modelling approach is definitely better than this other modelling approach" because that's what I'm used to it meaning in my day job, even though my day job doesn't involve elves punching each other. (Or, at least, that's my best post hoc explanation for how I kept failing to notice simon's better model was indeed better; it could also have been down to an unsquished bug in my code, and/or LightGBM not living up to the hype.)
ETA: I have finally tracked down the trivial coding error that ended up distorting my model: I accidentally used kRace in a few places where I should have used kClass while calculating simon's values for Speed and Strength.
Notes on the scenario:
I thought the bonus objective was executed very well: you told us there was Something Else To Look Out For, and provided just enough information that players could feel confident in their answers after figuring things out. I also really liked the writing. Regarding the actual challenge part of the challenge . . . I'm recusing myself from having an opinion until I figure out how I could have gotten it right; all I can tell you for sure is this wasn't below 4/5 Difficulty. (Making all features' effects conditional on all other features' effects tends to make both Analytic and ML solutions much trickier.)
ETA: I now have an opinion, and my opinion is that it's good. The simple-in-hindsight underlying mechanics were converted seamlessly into complex and hard-but-fair-to-detangle feature effects; the flavortext managed to stay relevant without dominating the data. This scenario also fits in neatly alongside earlier entries with superficially similar premises: we've had "counters matter" games, "archetypes matter" games, and now a "feature engineering matters" game.
I have exactly one criticism, which is that it's a bit puzzlier than I'd have liked. Players get best results by psychoanalyzing the GM and exploiting symmetries in the dataset, even though these aren't skills which transfer to most real-world problems, and the real-world problems they do transfer to don't look like "who would win a fight?"; this could have been addressed by having class and race effects be slightly more arbitrary and less consistent, instead of having uniform +Strength / -Speed gaps for each step. However, my complaint is moderated by the facts that:
.This is an isekai-world, simplified mechanics and uncannily well-balanced class systems come with the territory. (I thought the lack of magic-users was a tell for "this one will be realistic-ish" but that's on me tbh.)
.Making the generation function any more complicated would have made it (marginally but nontrivially) less elegant and harder to explain.
.I might just be being a sore loser only-barely-winner here.
.Puzzles are fun!
Some belated Author's Notes:
.This was heavily based on several interesting blog posts written by lsusr. All errors are mine.
.I understand prediction markets just well enough to feel reasonably sure this story """makes""" """sense""" (modulo its absurd implicit and explicit premises), but not well enough to be confident I can explain anything in it any further without making a mistake or contradicting myself. Accordingly, I'm falling back on an "if you think you've found a plot hole, try to work it out on your own, and if you can't then I guess I actually did screw up lol" stance.
.The fact that
neither of the protagonists ever consider the possibility of the Demon King also deriving strategic benefit from consulting an accurate and undistorted conditional prediction market
was an intended part of the narrative and I'm suprised no-one's brought it up yet.
I'm interested.
(I'd offer more feedback, but that's pretty difficult without an example to offer feedback on.)
I tried fitting a model with only "Strength diff plus 8 times sign(speed diff)" as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn't have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.
Alternatively
I might just have screwed up my code somehow.
Still . . .
I'm sticking with my choices for now.
Something like D&D.Sci, then?