Ok that makes sense, I'll add proper disclaimers and take your word for it that it's fine as long as there are no objections. Thank you :)
The dust / torture thought experiment seems pretty controversial. Why do we have to sum the sufferings? We can aggregate them in various ways, for example we can care about minimizing the maximum suffering rather than minimizing the sum of sufferings, then we don't torture people no matter how many dust specks there are, which better aligns with some of our values.
Nobody seems to have problems with circular preferences in practice, probably because people's preferences aren't precise enough. So we don't have to adopt utilitarianism to fix this non-problem.
This may not be a problem at the individual scale, but individuals design systems (a program, a government, an AI) and these systems must be precise and designed to handle these kinds of issues, they can't just adapt like humans do to avoid repeated exploits, we first have to build the adaptation mechanism. The way I see it utilitarianism is an attempt to describe such a mechanism.
Chatting with ChatGPT I learned that latest organoids have about 1 million neurons.
Wondering whether that's a lot or not, it tells me that bees and other insects have on the order of 10^5, fish, like zebra fish, have on the order of 10^6. So we are engineering fish brains, at least in terms of number of neurons. That's concerning, as far as I know zebra fish are conscious and can hurt.
What about humans? ChatGPT says humans have around 10^11 neurons, however 10-12 weeks embryo have about 10^6. It so happens that 10 to 12 weeks is th...
I broadly agree with the conclusions (not with the arguments) though from a 2025 perspective they do not feel novel (the value of challenge, destination vs journey, agency and authenticity are discussed within self help books).
On the arguments:
what is a computer game except synthetic work?
Games and work have a few common points sure, but they have huge differences:
Obviously a superintelligence knows that this is an unusual case
Since the ASI knows this is an unusual case, can it do some exception handling (like asking a human) instead of executing the normal path?
but that doesn't say if it's a positive or negative case.
Why only positive or negative? some classifiers have an "out-of-distribution" category, for example One-Class SVM, using several of them should handle multiple classes. Perhaps this is also doable with any other latent feature spaces (transformers?) using a threshold distance to limit categories...
It sometimes takes me a long time to go from "A is true", "B is true", "A and B implies C is true" to "C is true".
I think this is a common issue with humans, for example I can see a word such as "aqueduct", and also know that "aqua" means water in Latin, yet fail to notice that "aqueduct" comes from "aqua". This is because when I see a word it does not trigger a dynamic that searches for a root.
Another case is when the rule looks a bit different, say "a and b implies c" rather than "A and B implies C" and some effort is needed to notice that it still appli...
The core argument that there is "no universally compelling argument" holds if we literally consider all of mind design space, but for the task of building and aligning AGIs we may be able to constrain the space such that it is unclear that the argument holds.
For example in order to accomplish general tasks AGIs can be expected to have a coherent, accurate and compressed model of the world (as do transformers to some extent) such that they can roughly restate their input. This implies that in a world where there is a lot of evidence that the sky is blue (in...
This leads to the first big problem with this post: The idea that minds are determined by DNA. This idea only makes sense if one is thinking of a mind as a sort of potential space.
Clone Einstein and raise him with wolves and you get a sort of smart wolf mind inhabiting a human body. Minds are memetic. Petunias don't have minds. I am my mind.
Reversing your analogy, if you clone a wolf and raise it with Einsteins, you do not get another Einstein. That is because hardware (DNA) matters and wolves do not have the required brain hardware to instantiate Ei...
Thanks for the suggestion, I added the "Edit 1" section to the post to showcase a small study on 3 posts known to contain factual mistakes. The LLM is able to spot and correct the mistake in 2 of the 3 cases, and provides valuable (though verbose) context. Overall this seems promising to me.
This post assumes the word "happiness" is crisply defined and means the same thing for everyone but that's not the case. Or perhaps it is implicitly arguing what the meaning of "happiness" should be?
Anyway this post would be much clearer if the word "happiness" was tabooed.
I have always been slightly confused at people arguing against wire-heading. Isn't wire-heading the thing that is supposed to max out our utility function? if that's not the case then what's the point of talking about it? why not just find what does maximize our utility function and do t...
"Yeah? Let's see your aura of destiny, buddy."
Another angle: if I have to hire a software engineer, I'll pick the one with the aura of destiny any time, because that one is more likely to achieve great things than the others.
I would say auras of destiny are Bayesian evidence for greatness, and they are hard to fake signals.
Slightly off-topic but Wow! this is material for an awesome RTS video-game! That would be so cool!
And a bit more on topic: that kind of video game would give the broader public a good idea of what's coming, and researchers and leaders a way to vividly explore various scenario, all while having fun.
Imagine playing the role of an unaligned AGI, and noticing that the game dynamics push you to deceive humans to gain more compute and capabilities until you can take over or something, all because that's the fastest way to maximize your utility function!
If you now put a detector in path A , it will find a photon with probability ( ), and same for path B. This means that there is a 50% chance of the configuration |photon in path A only>, and 50% chance of the configuration |photon in path B only>. The arrow direction still has no effect on the probability.
This 50/50 split is extra surprising and perhaps misleading? What's the cause? why not 100 on path A and 0 on path B (or the reverse)?
As a layman it seems like either:
Insularity will make you dumber
Okay but there is another side to the issue, insularity can also have positive effects:
If you look at evolution, when a population gets stuck on an island, it starts to develop in interesting ways, maybe insularity is a necessary step to develop truly creative worldviews?
Also IIRC in "The Timeless way of building" Christopher Alexander mentions that cities should be designed as several small neighborhoods with well-defined boundaries, where people with similar background live. He also says something to the effect that the ...
I liked the intro but some parts of the previous posts and this one have been confusing, for example in this post:
Second, we saw that configurations are about multiple particles. [...] And in the real universe, every configuration is about all the particles… everywhere.)
and more glaring in the previous one:
A configuration says, “a photon here, a photon there,”
Here my intuition is that we can model the world as particles, or we can use the lower-level model of the world which configurations are, but we can't mix both any way we want. These sentences...
Planecrash (from Eliezer and Lintamande) seems highly relevant here: the hero, Keltam, tries to determine whether he is in a conspiracy or not. To do that he basically applies Bayes theorem to each new fact he encounters: "Is fact F more likely to happen if I am in a conspiracy or if I am not? hmm, fact F seems more likely to happen if I am not in a conspiracy, let's update my prior a bit towards the 'not in a conspiracy' side".
Planecrash is a great walkthrough on how to apply that kind of thinking to evaluate whether someone is bullshitting you or not, by...
You did not explicitly state the goal of the advice, I think it would be interesting to distinguish between advice that is meant to increase your value to the company, and advice meant to increase your satisfaction with your work, especially when the two point in opposite directions.
For example it could be that "swallow[ing] your pride and us[ing] that garbage language you hate so much" is good for the company in some cases, but terrible for job satisfaction, making you depressed or angry every time you have to use that silly language/tool.
For that reason try to structure teams such that every team has everything it needs for its day to day work.
I would extend that to "have as much control as you can over what you do". I increasingly find that this is key to move fast and produce quality software.
This applies to code and means dependencies should be owned and open to modifications, so the team understands them well and can fix bugs or add features as needed.
This avoids ridiculous situations where bugs are never fixed or shipping very simple features (such as changing a theme for a UI c...
Interactions with ChatGPT can be customized durably in the options, for example you can add the following instructions: "include a confidence rating at the end of your response in the format 'Confidence: X%'. If your confidence is below 80%, briefly explain why".
Here is a sample conversation demonstrating this and showing what ChatGPT has to say about its calibration:
Me: Are you calibrated, by which I mean, when you output a confidence X as a percentage, are you right X times out of 100?
C...
Many developers have been reporting that this is dramatically increasing their productivity, up to 5x'ing/10x'ing it
I challenge the data: none of my colleagues have been reporting this high a speed-up. I think your observation can just be explained by a high sampling bias.
People who do not use AI or got no improvement are unlikely to report. You also mention Twitter where users share "hot takes" etc to increase engagement.
It's good to have actual numbers before we explain them, so I ran a quick search and found 3 articles that look promising (I only did...
I also think it is unlikely that AGIs will compete in human status games. Status games are not just about being the best: Deep Blue is not high status, sportsmen that take drugs to improve their performance are not high status.
Status games have rules and you only win if you do something impressive while competing within the rules, being an AGI is likely to be seen as an unfair advantage, and thus AIs will be banned from human status games, in the same way that current sports competitions are split by gender and weight.
Even if they are not banned given their abilities it will be expected that they do much better than humans, it will just be a normal thing, not a high status, impressive thing.
For those interested in writing better trip reports there is a "Guide to Writing Rigorous Reports of Exotic States of Consciousness" at https://qri.org/blog/rigorous-reports
A trip report is an especially hard case of something one can write about:
I have a similar intuition that if mirror-life is dangerous to Earth-life, then the mirror version of mirror-life (that is, Earth-life) should be about equally as dangerous to mirror-life as mirror-life is to Earth-life. Having only read this post and in the absence of any evidence either way this default intuition seems reasonable.
I find the post alarming and I really wish it had some numbers instead of words like "might" to back up the claims of threat. At the moment my uneducated mental model is that for mirror-life to be a danger it has to:
[ epistemological status: a thought I had while reading about Russell's paradox, rewritten and expanded on by Claude ; my math level: undergraduate-ish ]
Mathematics has faced several apparent "crises" throughout history that seemed to threaten its very foundations. However, these crises largely dissolve when we recognize a simple truth: mathematics consists of coherent systems designed for specific purposes, rather than a single universal "true" mathematics. This perspective shift—from seeing mat...
I really like the idea of milestones, I think seeing the result of each milestones will help create trust in the group, confidence that the end action will succeed and a realization of the real impact the group has. Each CA should probably start with small milestones (posting something on social medias) and ramp things up until the end goal is reached. Seeing actual impact early will definitely keep people engaged and might make the group more cohesive and ambitious.
Ditch old software tools or programming languages for better, new ones.
My take on the tool VS agent distinction:
A tool runs a predefined algorithm whose outputs are in a narrow, well-understood and obviously safe space.
An agent runs an algorithm that allows it to compose and execute its own algorithm (choose actions) to maximize its utility function (get closer to its goal). If the agent can compose enough actions from a large enough set, the output of the new algorithm is wildly unpredictable and potentially catastrophic.
This hints that we can build safe agents by carefully curating the set of actions it chooses from so that any algorithm composed from the set produces an output that is in a safe space.
I think being as honest as reasonably sensible is good for oneself. Being honest applies pressure on oneself and one’s environment until the both closely match. I expect the process to have its ups and downs but to lead to a smoother life on the long run.
An example that comes to mind is the necessity to open up to have meaningful relationships (versus the alternative of concealing one’s interests which tends to make conversations boring).
Also honesty seems like a requirement to have an accurate map of reality: having snappy and accurate feedback is essenti...
I also thought about something along those lines: explaining the domestication of wolves to dogs, or maybe prehistoric wheat to modern wheat, then extrapolating to chimps. Then I had a dangerous thought, what would happen if we tried to select chimps for humaneness?
goals appear only when you make rough generalizations from its behavior in limited cases.
I am surprised no one brought up the usual map / territory distinction. In this case the territory is the set of observed behaviors. Humans look at the territory and with their limited processing power they produce a compressed and lossy map, here called the goal.
The goal is a useful model to talk simply about the set of behaviors, but has no existence outside the head of people discussing it.
This is a great use case for AI: expert knowledge tailored precisely to one’s needs
Is the "cure cancer goal ends up as a nuke humanity action" hypothesis valid and backed by evidence?
My understanding is that the meaning of the "cure cancer" sentence can be represented as a point in a high-dimensional meaning space, which I expect to be pretty far from the "nuke humanity" point.
For example "cure cancer" would be highly associated with saving lots of lives and positive sentiments, while "nuke humanity" would have the exact opposite associations, positioning it far away from "cure cancer".
A good design might specify that if the two go...
If you know your belief isn't correlated to reality, how can you still believe it?
Interestingly, physics models (map) are wrong (inaccurate) and people know that but still use them all the time because they are good enough with respect to some goal.
Less accurate models can even be favored over more accurate ones to save on computing power or reduce complexity.
As long as the benefits outweigh the drawbacks, the correlation to reality is irrelevant.
Not sure how cleanly this maps to beliefs since one would have to be able to go from one belief to anothe...
@Eliezer, some interesting points in the article, I will criticize what frustrated me:
> If you see a beaver chewing a log, then you know what this thing-that-chews-through-logs looks like,
> and you will be able to recognize it on future occasions whether it is called a “beaver” or not.
> But if you acquire your beliefs about beavers by someone else telling you facts about “beavers,”
> you may not be able to recognize a beaver when you see one.
Things do not have intrinsic meaning, rather meaning is an emergent property of
things in relation to each...
The examples seem to assume that "and" and "or" as used in natural language work the same way as their logical counterpart. I think this is not the case and that it could bias the experiment’s results.
As a trivial example the question "Do you want to go to the beach or to the city?" is not just a yes or no question, as boolean logic would have it.
Not everyone learns about boolean logic, and those who do likely learn it long after learning how to talk, so it’s likely that natural language propositions that look somewhat logical are not interpreted as just l...
Specific details of a case can make people emotional and corrupt the reasoning, less so for an abstract general rule.