When there's no clear winner, the winner can't take all.
https://en.wikipedia.org/wiki/Winner-take-all_in_action_selection
Happens all the time in decision theory & reinforcement learning: the average of many good plans is often a bad plan, and a bad plan followed to the end is often both more rewarding & informative than switching at every timestep between many good plans. Any kind of multi-modality or need for extended plans (eg due to upfront costs/investments) will do it, and exploration is quite difficult - just taking the argmax or adding some randomness to action choices is not nearly enough, you need "deep exploration" (as Osband likes to call it) to follow a s...
Related to: Half-assing it with everything you've got; Wasted motion; Say it Loud.
Once upon a time (true story), I was on my way to a hotel in a new city. I knew the hotel was many miles down this long, branchless road. So I drove for a long while.
After a while, I began to worry I had passed the hotel.
So, instead of proceeding at 60 miles per hour the way I had been, I continued in the same direction for several more minutes at 30 miles per hour, wondering if I should keep going or turn around.