Broken Benchmark: MMLU
Phillip over at the AI Explained channel has been running some experiments on his SmartGPT framework against the MMLU benchmark and discovered a not-insignificant amount of issues with the problem set. Among them: * Crucial context missing from questions (apparently copy-paste errors?) * Ambiguous sets of answers * Wrong sets...