How about the classic example of testing whether a coin is biased? This seems to use "virtual sample" as described in the original post to reflect the hypothesised state of affairs in which the coin is fair: P(heads) = P(tails) = 0.5. This can be simulated without a coin (whatever number of samples one wishes) then compared against observed counts of heads vs tails of the coin in question.
The same applies for any other situation where there is a theoretically derived prediction about probabilities to be tested (for example, "is my multiple choice exam so hard that students are not performing above chance?" If there are four choices we can test against a hypothetical P=.25).
But there you have a probabilistically formulated null hypothesis (coin is fair, students perform at chance level). In the equids example, the null hypothesis is that the probability of sampling a zebra is 0, which is disproven by simply pointing out that you, in fact, sampled some zebras. It makes no sense to calculate a p-value.
I have no idea what Fisher's test is supposed to do here. Show a correlation between the property of being a zebra and the property of being in the real, as opposed to the imaginary, sample? ... That's meaningless.
Go zebras! Bevtvany grkg erzbirq nf vg unf freirq vgf checbfr.