This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.
Estimated Complexity: 4/5 (this is a guess, I will update based on feedback/seeing how the scenario goes)
STORY
The Demon King rises in his distant Demon Castle. Across the free lands of the world, his legions spread, leaving chaos and death in their wake. The only one who can challenge him is the Summoned Hero, brought by the Goddess Herself from a distant world to aid this one in its time of need. The Summoned Hero must call together all the free peoples of the world under their banner, to triumph united where they would surely fall separately.
And what is the Summoned Hero doing now?
Well, right now you are staring in disbelief at your companions' explanation of the politics of the Sunset Coast.
Apparently, little things like a Demon King attempting to subjugate the world are not enough to shake them from their traditions. If you want them to listen to you, being the Summoned Hero is not going to suffice. Instead, they conduct all their politics based on gladiatorial combat in the Arena of Dusk.
The good news is that the Four Great Houses of the Sunset Coast will gladly listen to you, and maybe even join you against the Demon King, if you can defeat their Champions in gladiatorial combat.
The bad news is that you are...not really suited to gladiatorial combat. Neither your class nor your isekai cheat powers[1] are especially good at physical fights.
The good news is that you have accumulated by now a large retinue of vagabonds and misfits loyal party members who will gladly fight on your behalf.
The bad news is that even your party members who are good at fighting still seem somewhat outclassed by the Champions.
The good news is that, as any adventuring party should, you have accumulated various magical items, wholly legitimately looted from various places: dungeons, bandits who made the mistake of being your random encounter on a trip between cities, buildings that looked like they might be thieves' guilds, manifestly corrupt local governors who attempted to have you arrested for no legitimate reason at all...ahem. In any case, you have accumulated various magical items to equip your party members with.
The bad news is that the Four Great Houses have more magic items to equip their Champions with.
The good news is that you've gotten your hands on a dataset containing the history of combats in the Arena. With this, you're hopeful that you can choose how to assign and equip your party members for the best possible odds against the Champions!
The bad news is that it sounds like this will require a lot of work The even better news is that it sounds like this will give you the opportunity to do a lot of fun Data Science! Hooray!
DATA & OBJECTIVES
- Your adventuring party has the following martial party members:
- Uzben Grimblade, a Level 5 Dwarf Ninja.[2]
- Varina Dourstone, a Level 5 Dwarf Warrior.
- Willow Brown, a Level 5 Human Ranger.
- Xerxes III of Calantha, a Level 5 Human Monk.
- Yalathinel Leafstrider, a Level 5 Elf Fencer.
- Zelaya Sunwalker, a Level 6 Elf Knight.
- You also have some magical items to distribute among them. You have seven magical items total, one each of:
- +1, +2, +3 and +4 Boots of Speed
- +1, +2 and +3 Gauntlets of Power
- You need to choose who will fight each of the four opposing champions:
- House Adelon's champion is a Level 6 Human Warrior with +3 Boots of Speed and +1 Gauntlets of Power.
- House Bauchard's champion is a Level 6 Human Knight with +3 Boots of Speed and +2 Gauntlets of Power.
- House Cadagal's champion is a Level 7 Elf Ninja with +2 Boots of Speed and +3 Gauntlets of Power.
- House Deepwrack's champion is a Level 6 Dwarf Monk with +3 Boots of Speed and +2 Gauntlets of Power.
- Your goal is to maximize the number of champions you defeat.
- For each opposing champion, you need to choose and equip one of your party members to fight them. You cannot send the same party member to fight two champions, nor can you equip the same item to two party members.
- For example, a solution could be:
- Give Uzben the +4 Boots of Speed and the +3 Gauntlets of Power and send him to fight House Adelon's champion.
- Give Varina the +3 Boots of Speed and the +2 Gauntlets of Power and send her to fight House Bauchard's champion.
- Give Willow the +2 Boots of Speed and the +1 Gauntlets of Power and send her to fight House Cadagal's champion.
- Give Xerxes the +1 Boots of Speed and send him to fight House Deepwrack's champion.
- Do not send Yalathinel or Zelaya to fight at all.
- To assist in this, you have a dataset with the records of past fights in the Arena. Each dataset shows the two fighters that took part, what their levels/races/classes/magical items were, and which one won.
SECRET BONUS OBJECTIVE?
A strange piece of paper appears out of nowhere and falls into your hands. You try to read it, but most of it is damaged beyond recognition. You get a sudden feeling, though, that what it says is very important. Did it come from one of your isekai cheat powers? Was it revealed to you by Enlightenment, or sent from the future by Temporal Distortion? Or is the Goddess putting another finger on the scales?
If you ??? ??? ?? ????? ?? ?????? ??? ???? ??????? ???? ??????? ???? ??? ???? responsible ??? ????? ?????? ????? ???? ??? House. ??? ???? ???? ??? lasting enmity, ??? ?????? ???? ???????? ???? ??? ???? ?? ??? ???? ??????? ?? ?? ????? ?? ????????? ?? ??? ???? your honor ?? ????????? ???? ?? ???? ??? ???? ??? ??? friendship ???? ?? ??? ???? ?? ??? ?????? ??????? ?? ?? ?????
I'll aim to post the ruleset and results on October 28th (giving one week and both weekends for players). If you find yourself wanting extra time, because you found this scenario late and want a chance to attempt it yourself, or just because you end up a bit rushed/busy with other commitments and would be happier to have a extra week, comment below and I can push this deadline back.
As usual, working together is allowed, but for the sake of anyone who wants to work alone, please spoiler parts of your answers that contain information or questions about the dataset. To spoiler answers on a PC, type a '>' followed by a '!' at the start of a line to open a spoiler block - to spoiler answers on mobile, type a ':::spoiler' at the start of a line and then a ':::' at the end to spoiler the line.
Inspired by abstractapplic's machine learning and wanting to get some experience in julia, I got Claude (3.5 sonnet) to write me an XGBoost implementation in julia. Took a long time especially with some bugfixing (took a long time to find that a feature matrix was the wrong shape - a problem with insufficient type explicitness, I think). Still way way faster than doing it myself! Not sure I'm learning all that much julia, but am learning how to get Claude to write it for me, I hope.
Anyway, I used a simple model that
only takes into account 8 * sign(speed difference) + power difference, as in the comment this is a reply to
and a full model that
takes into account all the available features including the base data, the number the simple model uses, and intermediate steps in the calculation of that number (that would be, iirc: power (for each), speed (for each), speed difference, power difference, sign(speed difference))
Results:
Rank 1
Full model scores: Red: 94.0%, Black: 94.9%
Combined full model score: 94.4%
Simple model scores: Red: 94.3%, Black: 94.6%
Combined simple model score: 94.5%
Matchups:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion
Willow Brown (+3 boots, +0 gauntlets) vs House Adelon Champion
Xerxes III of Calantha (+2 boots, +2 gauntlets) vs House Deepwrack Champion
Zelaya Sunwalker (+1 boots, +1 gauntlets) vs House Bauchard Champion
This is the top scoring scoring result with either the simplified model or the full model. It was found by a full search of every valid item and hero combination available against the house champions.
It is also my previously posted, found w/o machine learning, proposal for the solution. Which is reassuring. (Though, I suppose there is some chance that my feeding the models this predictor, if it's good enough, might make them glom on to it while they don't find some hard-to learn additional pattern.)
My theory though is that giving the models the useful metric mostly just helps them - they don't need to learn the metric from the data, and I mostly think that if there was a significant additional pattern the full model would do better.
(for Cadagal, I haven't changed the champion's boots to +4, though I don't expect that to make a significant difference)
As far as I can tell the full model doesn't do significantly better and does worse in some ways (though, I don't know much about how to evaluate this, and Claude's metrics,
including a test set log loss of 0.2527 for the full model and 0.2511 for the simple model, are for a separately generated version which I am not all that confident are actually the same models, though they "should be" up to the restricted training set if Claude was doing it right). * see edit belowBut the red/black variations seen below for the full model seem likely to me (given my prior that red and black are likely to be symmetrical) to be an indication that what the full model is finding that isn't in the full model is at least partially overfitting. Though actually, if it's overfitting a lot, maybe it's surprising that the test set log loss wouldn't be a lot worse than found (though it is at least worse than the simple model)? Hmm - what if there are actual red/black difference? (something to look into perhaps, as well as try to duplicate abstractapplic's report regarding sign(speed difference) not exhausting the benefits of speed info
... but for now I'm more likely to leave the machine learning aside and switch to looking at distributions of gladiator characteristics, I think.)Predictions for individual matchups for my and abstractapplic's solutions:
My matchups:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion (+2 boots, +3 gauntlets)
Full Model: Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%
Willow Brown (+3 boots, +0 gauntlets) vs House Adelon Champion (+3 boots, +1 gauntlets)
Full Model: Red: 94.3%, Black: 95.1%
Simple Model: Red: 94.3%, Black: 94.6%
Xerxes III of Calantha (+2 boots, +2 gauntlets) vs House Deepwrack Champion (+3 boots, +2 gauntlets)
Full Model: Red: 95.2%, Black: 93.7%
Simple Model: Red: 94.3%, Black: 94.6%
Zelaya Sunwalker (+1 boots, +1 gauntlets) vs House Bauchard Champion (+3 boots, +2 gauntlets)
Full Model: Red: 95.3%, Black: 93.9%
Simple Model: Red: 94.3%, Black: 94.6%
(all my matchups have 4 effective power difference in my favour as noted in an above comment)
abstractapplic's matchups:
Matchup 1:
Uzben Grimblade (+3 boots, +0 gauntlets) vs House Adelon Champion (+3 boots, +1 gauntlets)
Win Probabilities:
Full Model: Red: 72.1%, Black: 62.8%
Simple Model: Red: 65.4%, Black: 65.7%
Stats:
Speed: 18 vs 14 (diff: 4)
Power: 11 vs 18 (diff: -7)
Effective Power Difference: 1
--------------------------------------------------------------------------------
Matchup 2:
Xerxes III of Calantha (+2 boots, +1 gauntlets) vs House Bauchard Champion (+3 boots, +2 gauntlets)
Win Probabilities:
Full Model: Red: 46.6%, Black: 43.9%
Simple Model: Red: 49.4%, Black: 50.6%
Stats:
Speed: 16 vs 12 (diff: 4)
Power: 13 vs 21 (diff: -8)
Effective Power Difference: 0
--------------------------------------------------------------------------------
Matchup 3:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion (+2 boots, +3 gauntlets)
Win Probabilities:
Full Model: Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%
Stats:
Speed: 7 vs 25 (diff: -18)
Power: 22 vs 10 (diff: 12)
Effective Power Difference: 4
--------------------------------------------------------------------------------
Matchup 4:
Yalathinel Leafstrider (+1 boots, +2 gauntlets) vs House Deepwrack Champion (+3 boots, +2 gauntlets)
Win Probabilities:
Full Model: Red: 35.7%, Black: 39.4%
Simple Model: Red: 34.3%, Black: 34.6%
Stats:
Speed: 20 vs 15 (diff: 5)
Power: 9 vs 18 (diff: -9)
Effective Power Difference: -1
--------------------------------------------------------------------------------
Overall Statistics:
Full Model Average: Red: 61.4%, Black: 60.7%
Simple Model Average: Red: 60.9%, Black: 61.4%
Edit: so I checked the actual code to see if Claude was using the same hyperparameters for both, and wtf wtf wtf wtf. The code has 6 functions that all train models (my fault for at one point renaming a function since Claude gave me a new version that didn't have all the previous functionality (only trained the full model instead of both - this was when doing the great bughunt for the misshaped matrix and a problem was suspected in the full model), then Claude I guess picked up on this and started renaming updated versions spontaneously, and I was adding Claude's new features in instead of replacing things and hadn't cleaned up the code or asked Claude to do so). Each one has it's own hardcoded hyperparameter set. Of these, there are one pair of functions that have matching hyperparameters. Everything else has a unique set. Of course, most of these weren't being used anymore, but the functions for actually generating the models I used for my results, and the function for generating the models used for comparing results on a train/test split, weren't among the matching pair. Plus another function that returns a (hardcoded, also unique) updated parameter set, but wasn't actually used. Oh and all this is not counting the hyperparameter tuning function that I assumed was generating a set of tuned hyperparameters to be used by other functions, but in fact was just printing results for different tunings. I had been running this every time before training models! Obviously I need to be more vigilant (or maybe asking Claude to do so might help?).
edit:
Had Claude clean up the code and tune for more overfitting, still didn't see anything not looking like overfitting for the full model. Could still be missing something, but not high enough in subjective probability to prioritize currently, so have now been looking at other aspects of the data.
further edit:
My (what I think is) highly overfitted version of my full model really likes Yonge's proposed solution. In fact it predicts a
higher winrate than forequal winrate to the best possible configuration not using the +4 boots (I didn't have Claude code the situation where +4 boots are a possibility). I still think that's probably because they are picking up the same random fluctuations ... but it will be amusing if Yonge's "manual scan" solution turns out to be exactly right.