Mohammad Bavarian


Sorted by New

Wiki Contributions


Did you test Claude for it being less susceptible to this issue? Otherwise not sure where your comment actually comes from. Testing this, I saw similar or worse behavior by that model - albeit GPT4 also definitely has this issue

What do you mean by Scaling Hypothesis? Do you believe extremely large transformer models trained based on autoregressive loss will have superhuman capabilities?