If false oracles are wrong 50% of the time for length > N, you can run the test program X times, and have 1/2^X odds of it being a false oracle if it matches each time. Obviously, if the test program fails, you now know N is less than the test program's length.
So, if you run the test program 4 times, odds are 1/16 that the oracle is fake, and 15/16 that it is real. For any "realistic" scenario, there's a probability of even a regular computer program failing due to hardware glitches, so you'll eventually approach the same confidence you'd have in a standard desktop PC calculating 2+2 :)
(This solution does somewhat generalize to odds other than 50%: once you find ANY test program that fails, you can repeat that program numerous times to estimate the oracle's actual failure rate.)
The idea that the oracle flips a coin to determine if the test program of length >N halts is also not terribly "realistic".
Instead, how about the following: the black box is either an oracle, or uses some sort of clever but not foolproof heuristic to guess whether or not a program halts, such that all programs for which the heuristic fails have length >N. One example of this is a Turing machine that runs a given program for BB(N) steps to see if it halts.
Then finding a test program that fails would actually be fiendishly difficult: you bas...
Here's something I've been wondering about, in the context of Solomonoff induction and uncomputable sequences.
I have a device that is either a halting oracle, or an ordinary Turing machine which gives the correct answer to the halting problem for all programs smaller than some finite length N but always outputs "does not halt" when asked to evaluate programs larger than N. If you don't know what N is and you don't have infinite time, is there a way to tell the difference between the actual halting oracle (which gives correct answers for all possible programs) and a "fake" halting oracle which starts giving wrong answers for some N that just happens to be larger than any program that you've tested so far?
The Kolmogorov complexity of an uncomputable sequence is infinite, so Solomonoff induction assigns it a probability of zero, but there's always a computable number with less than epsilon error, so would this ever actually matter?