It's very possible that I misunderstood your question, but here's a stab at an answer.
Assume that you have such a testing procedure. First let's run it on a device that is actually a halting oracle. Presumably the procedure would ask the device a finite number of questions, receive correct answers to all of them, and then declare "yes, this is a halting oracle". Let N be the length of the longest question thus asked. Then a fake halting oracle with length limit N would also pass the procedure.
I think there's an assumption in this that isn't quite spelled out by the question being asked: your argument holds if the lengths of the programs which you test the argument on are not a function of the oracle.
To say that more clearly, you state that for all L, if you give programs of length at most L to the maybe-oracle, then there exists a maybe-oracle for which N>L and hence that maybe-oracle goes undetected. But if we reverse the for-all and the there-exists, then we don't have a true statement: there does not exist a maybe-oracle such that for all...
Here's something I've been wondering about, in the context of Solomonoff induction and uncomputable sequences.
I have a device that is either a halting oracle, or an ordinary Turing machine which gives the correct answer to the halting problem for all programs smaller than some finite length N but always outputs "does not halt" when asked to evaluate programs larger than N. If you don't know what N is and you don't have infinite time, is there a way to tell the difference between the actual halting oracle (which gives correct answers for all possible programs) and a "fake" halting oracle which starts giving wrong answers for some N that just happens to be larger than any program that you've tested so far?
The Kolmogorov complexity of an uncomputable sequence is infinite, so Solomonoff induction assigns it a probability of zero, but there's always a computable number with less than epsilon error, so would this ever actually matter?