Refusals were mostly 1-2%, so ignoring them doesn't change results significantly. Ignoring gibberish does change results, but since we are measuring correct answers this shouldn't matter
fixed! edited hyperlink.
edited, thanks for catching this!