Looking at this comment from three years in the future, I'll just note that there's something quite ironic about your having put Sam Bankman-Fried on this list! If only he'd refactored his identity more! But no, he was stuck in short-sighted-greed/CDT/small-self, and we all paid a price for that, didn't we?
In Defense of the Shoggoth Analogy
In reply to: https://twitter.com/OwainEvans_UK/status/1636599127902662658
The explanations in the thread seem to me to be missing the middle or evading the heart of the problem. Zoomed out: an optimization target at level of personality. Zoomed in: a circuit diagram of layers. But those layers with billions of weights are pretty much Turing complete.
Unfortunately, I don't think anyone has much idea how all those little learned computations are make up said personality. My suspicion is there isn't going to be an *easy* way to explain what they're doing. Of course, I'd be relieved to be wrong here!
This matters because the analogy in the thread between averaged faces and LLM outputs is broken in an important way. (Nearly) every picture of a face in the training data has a nose. When you look at the nose of an averaged face, it's based very closely on the noses of all the faces that got averaged. However, despite the size of the training datasets for LLMs, the space of possible queries and topics of conversation is even vaster (it's exponential in the prompt-window size, unlike the query space for the average faces which are just the size of the image).
As such, LLMs are forced to extrapolate hard. So, I'd expect that which particular generalizations they learned, hiding in those weights, to start to matter once users start poking them in unanticipated ways.
In short, if LLMs are like averaged faces, I think they're faces that will readily fall apart into Shoggoths if someone looks at them from an unanticipated or uncommon angle.
I know this post was chronologically first, but since I read them out of order my reaction was "wow, this post is sure using some of the notions from the Waluigi Effect mega-post, but for humans instead of chatbots"! In particular, they're both pointing at the notion that an agent (human or AI chatbot) can be in something like a superposition between good actor and bad actor unlike the naive two-tone picture of morality one often gets from children's books.
I interpreted OP as saying that KataGo, despite being a super-human Go player, came up with a flawed approximation to the natural abstraction that two eyed groups are alive which was inaccurate in some situations (and that's how it can be exploited by building a small living group that ends up appearing dead from its perspective).
One of my pet journalism peeves is the "as" (or sometimes "while") construction, which I often see in titles or first sentences of articles. It looks like "<event A was happening> as <event B was happening>". You can fact check the events and it'll turn out they happened, but the phrasing comes with this super annoying nudge-nudge-wink-wink-implication that the two events totally have direct causal connection. Unfortunately, you can't pin this on the journalist because they didn't actually say it.
This sort of thing happens a lot. To give just a couple example templates, articles like "as <political thing happened>, markets rallied" or "<stock> falls as <CEO did something>" are often trying to pull this.
I broadly agree. Though I would add that those things could still be (positive motivation) wants afterwards, which one pursues without needing them. I'm not advocating for asceticism.
Also, while I agree that you get more happiness by having fewer negative motives, being run by positive motives is not 100% happiness. One can still experience disappointment if one wants access to Netflix, and it's down for maintenance one day. However, disappointment is still both more hedonic than fear and promotes a more measured reaction to the situation.
Are you trying to say that it should work similarly to a desensitization therapy? But then, there might exist the reversed mode, where you get attached to things even more, as you meditate on why are they good to have. Which of these modes dominates is not clear to me.
I think you make a good point. I feel I was gesturing at something at something real when I wrote down the comparison notion, but didn't express it quite right. Here's how I would express it now:
The key thing I failed to point out in the post is that just visualizing a good thing you have or what's nice about it is not the same as being grateful for it. Gratitude includes an acknowledgement. When you thank an acquaintance for, say, having given you helpful advice, you're acknowledging that they didn't necessarily have to go out of their way to do that. Even if you're grateful for something a specific person didn't give you, and you don't believe in a god, the same feeling of acknowledgment is present. I suspect this acknowledgement is what pushes things out of the need-set.
And indeed, as you point out, just meditating on why something is good to have might increase attachment (or it might not, the model doesn't make a claim about which effect would be stronger).
I don't think I get this. Doesn't this apply to any positive thing in life? (e.g. why single out the gratitude practise?)
I expect most positive things would indeed help somewhat, but that gratitude practice would help more. If someone lost a pet, giving them some ice cream may help. However, as long as their mind is still making the comparison to the world where their pet is still alive, the help may be limited. That said, to the extent that they manage to feel grateful for the ice cream, it seems to me as though their internal focus has shifted in a meaningful way, away from grasping at the world where their pet is still alive and towards the real world.
1. Yes, I agree with the synopsis (though expanded need-sets are not the only reason people are more anxious in the modern world).
2. Ah. Perhaps my language in the post wasn't as clear as it could have been. When I said:
More specifically, your need-set is the collection of things that have to seem true for you to feel either OK or better.
I was thinking of the needs as already being about what seems true about future states of the world, not just present states. For example, your need for drinking water is about being able to get water when thirsty at a whole bunch of future times.
If this is true then a larger need-set would lead to more negative motivation due to there being more ways for something we think we need to be taken away from us.
Yes, exactly.
A Cautionary Tale about Trusting Numbers from Wikipedia
So this morning I woke up early and thought to myself: "You know what I haven't done in a while? Good old fashioned Wikipedia rabbit hole." So I started reading the article on rabbits. Things were relatively sane until I got to the section on rabbits as food.
Something has gone very wrong here!
200 million tons is 400 billion pounds (add ~10% if they're metric tons, but we can ignore that.)
Divide that by 1.2 billion, and we can deduce that those rabbits weigh in at over 300 pounds each on average! Now I know we've bred some large animals for livestock, but I'm rolling to disbelieve when it comes to three hundred pound bunnies.
Of the two sources Wikipedia cites it looks like [161] is the less reliable looking one. It's a WSJ blog. But the biggest reason we shouldn't be trusting that article is that its numbers aren't even internally consistent!
From the article:
There's a basic arithmetic error here! `200 million * 70% * 30%` is 42 *million*, not 420,000.
If we assume this 200 million ton number was wrong and the 420,000 ton number for Sichuan was right, the global number should in fact be 2 million tons. This would make the rabbits weigh three pounds each on average, which is a much more reasonable weight for a rabbit!
If I had to take a guess as to how this mistake happened, putting on my linguist hat, Chinese has a single word for ten thousand, like the Greek-derived "myriad", (spelt either 万 or 萬). If you actually wanted to say 2*10^6 in Chinese, it would end up as something like "two hundred myriad". So I can see a fairly plausible way a translator could mess up and render it as "200 million".
Anyway, I've posted this essay to the talk page and submitted an edit request. We'll see how long it takes Wikipedia to fix this.
Links:
Original article: https://en.wikipedia.org/wiki/Rabbit#As_food_and_clothing
[161] https://web.archive.org/web/20170714001053/https://blogs.wsj.com/chinarealtime/2014/06/1 3/french-rabbit-heads-the-newest-delicacy-in-chinese-cuisine/