awg

Broken Benchmark: MMLU

Phillip over at the AI Explained channel has been running some experiments on his SmartGPT framework against the MMLU benchmark and discovered a not-insignificant amount of issues with the problem set. Among them: * Crucial context missing from questions (apparently copy-paste errors?) * Ambiguous sets of answers * Wrong sets of answers He highlights a growing need for a proper benchmarking organization that can research and create accurate, robust, sensible benchmarking suites for evaluating SOTA models. I found this video to be super interesting and the findings to be very important, so I wanted to spread this here.

24Aug 29, 2023

awg

Message

385

127

Broken Benchmark: MMLU

Aug 29, 202324

Auto-GPT: Open-sourced disaster?

Sharing this here doesn't seem like an infohazard at this point. This is all over my YouTube feed anyway. Description from the authors: > Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, autonomously develops and manages businesses to increase...

Apr 5, 202323

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

The story is simple: Mickey is an apprentice to a powerful sorcerer whose magic comes from his hat. Mickey is tasked with carrying buckets of water up a long flight of stairs and dumping them into a basin–hard work for a mouse! When the sorcerer steps out, however, Mickey takes...

Mar 29, 20239

awg's Shortform

Feb 27, 20232

LESSWRONG
LW

LESSWRONG
LW

awg

awg

awg

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform

awg

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform