I think I've just figured out why decision theories strike me as utterly pointless: they get around the actual hard part of making a decision. In general, decisions are not hard because you are weighing payoffs, but because you are dealing with uncertainty.
To operationalise this: a decision theory usually assumes that you have some number of options, each with some defined payout. Assuming payouts are fixed, all decision theories simply advise you to pick the outcome with the highest utility. "Difficult problems" in decision theory are problems where the p...
This has shifted my perceptions of what is in the wild significantly. Thanks for the heads up.
Activations in LLMs are linearly mappable to activations in the human brain. Imo this is strong evidence for the idea that LLMs/NNs in general acquire extremely human like cognitive patterns, and that the common "shoggoth with a smiley face" meme might just not be accurate
That surprisingly straight line reminds me of what happens when you use noise to regularise an otherwise decidedly non linear function: https://www.imaginary.org/snapshot/randomness-is-natural-an-introduction-to-regularisation-by-noise
I think this is a really cool research agenda. I can also try to give my "skydiver's perspective from 3000 miles in the air" overview of what I think expected free energy minimisation means, though I am by no means an expert. Epistemic status: this is a broad extrapolation of some intuitions I gained from reading a lot of papers, it may be very wrong.
In general, I think of free energy minimisation as a class of solutions for the problem of predicting complex systems behaviour, in line with other variational principles in physics. Thus, it is an attempt to ...
It's hard to empathise with dry numbers, whereas a lively scenario creates an emotional response so more people engage. But I agree that this seems to be very well done statistical work.
Hey, thank you for taking the time to reply honestly and in detail as well. With regards to what you want, I think that this is in many senses also what I am looking for, especially the last item about tying in collective behaviour to reasoning about intelligence. I think one of the frames you might find the most useful is one you've already covered---power as a coordination game. As you alluded to in your original post, people aren't in a massive hive mind/conspiracy---they mostly want to do what other successful people seem to be doing, which translates ...
Hey, really enjoyed your triple review on power lies trembling, but imo this topic has been... done to death in the humanities, and reinventing terminology ad hoc is somewhat missing the point. The idea that the dominant class in a society comes from a set of social institutions that share core ideas and modus operandi (in other words "behaving as a single organisation") is not a shocking new phenomenon of twentieth century mass culture, and is certainly not a "mystery". This is basically how every country has developed a ruling class/ideology since the te...
Thanks for the well-written and good-faith reply. I feel a bit confused by how to relate to it on a meta level, so let me think out loud for a while.
I'm not surprised that I'm reinventing a bunch of ideas from the humanities, given that I don't have much of a humanities background and didn't dig very far through the literature.
But I have some sense that even if I had dug for these humanities concepts, they wouldn't give me what I want.
What do I want?
Yeah, I'm not gonna do anything silly (I'm not in a position to do anything silly with regards to the multitrillion param frontier models anyways). Just sort of "laying the groundwork" for when AIs will cross that line, which I don't think is too far off now. The movie "Her" is giving a good vibe-alignment for when the line will be crossed.
Ahh, I was slightly confused why you called it a proposal. TBH I'm not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.
Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as "the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role" (sociopathy is the obvious counterargument here, and I'm really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character's emotions.
I take the two problems a bit further, and would suggest that being humane to AIs may ...
Hey Daniel, thank you for the thoughtful comment. I always appreciate comments that make me engage further with my thinking because one of the things I do is that I get impatient with whatever post I'm writing and "rush it out of the door", so to speak, so this gives me another chance to reflect on my thoughts.
I think that there are approximately ~3 defensible positions with regards to AI sentience, especially now that AIs seem to be demonstrating pretty advanced reasoning and human-like behaviour. One is the semi mystical argument that humans/brains/embod...
This seems like an interesting paper: https://arxiv.org/pdf/2502.19798
Essentially: use developmental psychology techniques to cause LLMs to develop a more well rounded human friendly persona that involves reflecting on their actions, while gradually escalating the moral difficulty of the dilemmas presented as a kind of phased training. I see it as a sort of cross between RLHF, CoT, and the recent work on low example count fine tuning but for moral instead of mathematical intuitions.
Yeah, that's basically the conclusion I came to awhile ago. Either it loves us or we're toast. I call it universal love or pathos.
This seems like very important and neglected work, I hope you get the funds to continue.
Yeah, definitely. My main gripe where I see people disregarding unknown unknowns is a similar one to yours- people who present definite worked out pictures of the future.
Note to self: If you think you know where your unknown unknowns sit in your ontology, you don't. That's what makes them unknown unknowns.
If you think that you have a complete picture of some system, you can still find yourself surprised by unknown unknowns. That's what makes them unknown unknowns.
If your internal logic has almost complete predictive power, plus or minus a tiny bit of error, your logical system (but mostly not your observations) can still be completely overthrown by unknown unknowns. That's what makes them unknown unknowns.
You can respect u...
The problem here is that you are dealing with survival necessities rather than trade goods. The outcome of this trade, if both sides honour the agreement, is that the scope insensitive humans die and their society is extinguished. The analogous situation here is that you know there will be a drought in say 10 years. The people of the nearby village are "scope insensitive", they don't know the drought is coming. Clearly the moral thing to do if you place any value on their lives is to talk to them, clear the information gap, and share access to resources. F...
Except that's a false dichotomy (between spending energy to "uplift" them or dealing treacherously with them). All it takes to not be a monster who obtains a stranglehold over all the watering holes in the desert is a sense of ethics that holds you to the somewhat reasonably low bar of "don't be a monster". The scope sensitivity or lack thereof of the other party is in some sense irrelevant.
The question as stated can be rephrased as "Should EAs establish a strategic stranglehold over all future resources necessary to sustain life using a series of unequal treaties, since other humans will be too short sighted/insensitive to scope/ignorant to realise the importance of these resources in the present day?"
And people here wonder why these other humans see EAs as power hungry.
Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.
First, I don't suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I'd advocate for something more akin to "walking away" as in Valentine's exit. There is a lot of work to be done and (yes) very little time to do it.
Second, the pattern I am noticing is something more...
Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.
Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.
Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.
Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.
Grave men, near death, who see w...
Because their words had forked no lightning they
I think we have the opposite problem: our words are about to fork all the lightning.
Thank you.
It does not currently look to me like we will win this war, speaking figuratively. But regardless, I still have many opportunities to bring truth, courage, justice, honor, love, playfulness, and other virtues into the world, and I am a person whose motivations run more on living out virtues rather than moving toward concrete hopes. I will still be here building things I love, like LessWrong and Lighthaven, until the end.
In my book this counts as severely neglected and very tractable ai safety research. Sorry that I don't have more to add but felt important to point it out.
Even so, it seems obvious to me that addressing the mysterious issue of the accelerating drivers is the primary crux in this scenario.
Epistemic status: This is a work of satire. I mean it---it is a mean-spirited and unfair assessment of the situation. It is also how, some days, I sincerely feel.
A minivan is driving down a mountain road, headed towards a cliff's edge with no guardrails. The driver floors the accelerator.
Passenger 1: "Perhaps we should slow down somewhat."
Passengers 2, 3, 4: "Yeah, that seems sensible."
Driver: "No can do. We're about to be late to the wedding."
Passenger 2: "Since the driver won't slow down, I should work on building rocket boosters so that (when we inevita...
unfortunately, the disanalogy is that any driver who moves their foot towards the brakes is almost instantly replaced with one who won't.
Driver: My map doesn't show any cliffs
Passenger 1: Have you turned on the terrain map? Mine shows a sharp turn next to a steep drop coming up in about a mile
Passenger 5: Guys maybe we should look out the windshield instead of down at our maps?
Driver: No, passenger 1, see on your map that's an alternate route, the route we're on doesn't show any cliffs.
Passenger 1: You don't have it set to show terrain.
Passenger 6: I'm on the phone with the governor now, we're talking about what it would take to set a 5 mile per hour national speed limit.
Passenger 7: Don't ...
This is imo quite epistemically important.
It's definitely something I hadn't read before, so thank you. I would say to that article (on a skim) that it has clarified my thinking somewhat. I therefore question the law/toolbox dichotomy, since to me it seems that usefulness - accuracy-to-perceived reality are in fact two different axes. Thus you could imagine:
Hey, thanks for responding! Re the physics analogy, I agree that improvements in our heuristics are a good thing:
...However, perhaps you have already begun to anticipate what I will say—the benefit of heuristics is that they acknowledge (and are indeed dependent) on the presence of context. Unlike a “hard” theory, which must be applicable to all cases equally and fails in the event a single counter-example can be found, a “soft” heuristic is triggered only when the conditions are right: we do not use our “judge popular songs” heuristic when staring at a dinne
And as for the specific implications of "moral worth", here are a few:
Thank you for the feed back! I am of course happy for people to copy over the essay
> Is this saying that human's goals and options (including options that come to mind) change depending on the environment, so rational choice theory doesn't apply?
More or less, yes, or at least that it becomes very hard to apply it in a way that isn't either highly subjective or essentially post-hoc arguing about what you ought to have done (hidden information/hindsight being 20/20)
> This is currently all I have time for; however, my current understanding is that there...
Yeah, of course
I mean, this applies to humans too. The words and explanations we use for our actions are often just post hoc rationalisations. An efficient text predictor must learn not what the literal words in front of them mean, but the implied scenario and thought process they mask, and that is a strictly nonlinear and "unfaithful" process.