Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

Test set: 51% vs prior SoTA of 34% (human baseline is unknown)

Ryan tested against the public test set and got 51%. The SOTA score reported here was on the private test set.

Reporting scores on public data are usually inflated due to overfitting (by humans looking at the questions and answers then tailoring their model)