Not only does gpt-neox-20b have shards, it has exactly forty-six.
I notice that human DNA has 46 shards. This looks to me like evidence against the orthogonality thesis and for human values being much easier to reach than I expected.
I didn't make it very far beyond combusting into laughter (the first image).
Sadly I should have realised it by footnote 1. I didn't take sufficient notice of that tiny note of discord [1] at Turntrout describing reading non LW content as an extraordinary effort.
I have failed as a rationalist. 😔
I did at least notice my confusion enough at the joke post on Alignment Forum to realise that today was April 1st though. ↩︎
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I've written a lot about shard theory over the last year. I've poured dozens of hours into theorycrafting, communication, and LessWrong comment threads. I pored over theoretical alignment concerns with exquisite care and worry. I even read a few things that weren't blog posts on LessWrong.[1] In other words, I went all out.
Last month, I was downloading
gpt-neox-20b
when I noticed the following:I've concluded the following:
gpt-neox-20b
).gpt-neox-20b
has.gpt-neox-20b
, and nonchalantly printed that number for us to read.I then had GPT-4 identify some of the 46 shards.
I'm pleasantly surprised that
gpt-neox-20b
shares values which I (and other LessWrong users) have historically demonstrated. The fact thatgpt-neox-20b
and LessWrong share many shards makes me more optimistic about alignment overall, since it implies convergence in the values of real-world trained systems.(As a corollary, this demonstrates that LLMs can make serious alignment progress.)
Although I'm embarrassed and humbled by this experience, I'm glad that shard theory is true. Here's to—next time—only taking three months before running the first experiment! 🥂
DM me if you want links to websites where you can find information which is not available on LessWrong!
"The broader AI community" taken in particular to include more traditional academics, like those at EleutherAI.