This story was originally posted as a response to this thread.
It might help to imagine a hard takeoff scenario using only known sorts of NN & scaling effects...
In A.D. 20XX. Work was beginning. "How are you gentlemen !!"... (Work. Work never changes; work is always hell.)
Specifically, a MoogleBook researcher has gotten a pull request from Reviewer #2 on his new paper in evolutionary search in auto-ML, for error bars on the auto-ML hyperparameter sensitivity like larger batch sizes, because more can be different and there's high variance in the old runs with a few anomalously high performance values. ("Really? Really? That's what you're worried about?") He can't see why worry, and wonders what sins he committed to deserve this asshole Chinese (given the Engrish) reviewer, as he wearily kicks off yet another HQU experiment...
I think if I have a space of hypotheses, I'll label 'probable' the ones that have >50% probability, and 'plausible' the ones that are clearly in the running to become 'probable'. The plausible options are the 'contenders for probableness'; they're competitive hypotheses.
E.g., if I'm drawing numbered balls from an urn at random, and there are one hundred balls, then it's 'plausible' I could draw ball #23 even though it's only 1% likely, because 1% is pretty good when none of the other atomic hypotheses are higher than 1%.
On the other hand, if I have 33 cyan balls in an urn, 33 magenta balls, 33 yellow balls, and 1 black ball, then I wouldn't normally say 'it's plausible that I'll draw a black ball', because I'm partitioning the balls by color and 'black' isn't one of the main contender colors.