I am convinced that a incorrigible 'aligned SI' is the only definition which survives paradox.
If it was corrigible, that would imply a select group of actors could in principal re-orient its preference model, which would imply two paradoxes:
That it somehow has somehow its capacity for moral reasoning is inferior to humans, which is near approximately equal to determining "what matters to us" violating the SI definition of 'all means of reasoning that matter to us'
That some particular subset of agents other than the representative whole of all agents
I think it is probably possible in principle to train superintelligence on a laptop, and I worry that this inconvenient fact is often elided in discourse about halting AI. It is extremely helpful that for now, AI training is so absurdly inefficient that non-proliferation strategies roughly as light-touch as the IAEA—e.g., bans on AI data centers, or powerful GPUs—might suffice to seriously slow AI progress. And I think humanity would be foolish not to take advantage of this relatively cheap temporary opportunity to slow AI progress, so that we can buy as m...
especially if you want a buffer to account for the uncertainty that there could be discrete breakthroughs in training efficiency
I agree that this is the big "if" here. I totally think there could be discrete breakthroughs in training efficiency, and it might be worth giving up substantial civil liberties to hedge against that case. I might advocate for such policies after we have done the very basics of keeping things safe in the worlds where there aren't enormous discrete breakthroughs in training efficiency which currently seems more likely to me.
Codexes' writing about my codebases is often impenetrable. When I then ask it explicitly to use simple language, its explanations become eminently readable. I can't tell if that's because I'm stupider than models now and they're throwing away necessary detail, or because the models are not very good at explaining things by default.
I know 5.4 is especially using an uncomfortably lot of jargon, but I didn't really notice this for earlier models.
Environmentalists warn that large data centers can consume up to 5 million gallons per day — equivalent to the needs of a town of 10,000 to 50,000 people. Washington Post claimed that a 100-word email use roughly one bottle of water.
On the other side of the debate:
Currently, digital resurrection relies on artifacts such as photographs of a person, their correspondence, or voice recordings. But in the future, when whole brain emulation becomes possible, we will have a far more comprehensive source of information about a deceased person than ever before.
Based on the theory of mind, each of us has a “simulation” of important people in our brains. We will be able to extract these simulations from the WBEs of various people who knew the deceased and analyze cross-referenced memories to avoid confabulations. For example, using Bayesian Truth Serum.
In short, this is a great way to get the most useful and accurate information about a person in the [very] relatively near future.
Sometimes people find it mysterious or surprising that current AIs can't fully automate difficult tasks given how smart they seem. I don't find this very confusing.
Current LLMs are just not that "smart" (yet). They compensate using very broad knowledge and strong heuristics that are mostly domain-specific. In other words, they have high crystallized intelligence but lower fluid intelligence.
In humans, crystallized and fluid intelligence are very correlated due to lim...
This question seems to be insisting on a weird burden of proof, to me.
I'm not sure what's weird about it, but yes, I think someone claiming to predict the future confidently as opposed to the more default background broad uncertainty would have the burden of proof.
The AI agents we have now meet many to most of the criteria for AGI that many people put forward in previous decades.
Yes, but one has to update one's beliefs about everything (or more feasibly, about the most relevant things), not just about one thing. There is a missing update here: those...
There's a strong pattern in ratfic of the protagonist "winning" by gaining the power to design a new world order from scratch—i.e. taking over the world. It's a very High Modernist mindset (as I pointed out in a recent tweet). And once you see how crucial this is to the rationalist perspective on what a good future looks like, it's hard to unsee.
You might respond: the worlds these protagonists find themselves in are usually so bad that seizing absolute power is in fact the most ethical thing to do. But the worlds didn't have to be that bad! The writers cho...
I broadly agree with this comment too, though not as much as I agree with the other one.
Power felt can also be a kind of honesty—e.g. if a law is backed by force, then it's often better for this to be unambiguous, so that people can track the actual landscape of power.
(Of course, being unambiguous about how much force backs up your laws can also be a kind of power move. I expect that there are ways to get the benefits of honesty without making it a power move, but I don't have enough experience with this to be confident.)
In other words, I expect that the kind of inefficiency Val is talking about here is actually sometimes load-bearing for accountability.
I think Nicholson Baker is a brilliant contemporary essayist and I would recommend checking out his collection, The Size of Thoughts. Janet Malcolm is almost always worth reading, particularly for the way she layers a journalistically grounded argument with pinpoint observations and prose-poetic flourishes that enhance rather than diminish clarity. Jon Ronson's essays are so deceptively relaxed and conversationally pitched that it feels like anyone could "write like that," except, of course, they can't. The critic James Wood has some great moments, and I w...
Life Extension 0.3mg time release.
Is it true that the paperclip maximizer is a misinterpretation of Yudkowsky's "squiggle maximizer"?
Yudkowsky wrote:
...I wouldn't be as disturbed if I
I had Claude research this, Claude's report:
I looked into the exact dates here and found some additional evidence.
The paper was presented at the 15th International Conference on Systems Research, Informatics and Cybernetics (InterSymp 2003), held July 28–August 2, 2003 in Baden-Baden, Germany. This date comes from the Open Library catalog entry for the proceedings volume.
Wayback Machine snapshots of Bostrom's homepage narrow down when the paper went online: it is absent from the July 29 snapshot but present on th
Saving this exchange between Tyler Cowen and Peter Singer for my own future reference:
...COWEN: Well, take the Bernard Williams question, which I think you’ve written about. Let’s say that aliens are coming to Earth, and they may do away with us, and we may have reason to believe they could be happier here on Earth than what we can do with Earth. I don’t think I know any utilitarians who would sign up to fight with the aliens, no matter what their moral theory would be.
SINGER: Okay, you’ve just met one.
COWEN: I’ve just met one. So, you would sign up to fight
when we select an action in these thought experiments, we're also implicitly selecting a policy for selecting actions.
a world where, when two people meet, the "less happy" one signs all their property over to the "more happy" one and then dies is... just not that much fun. sort of lonely. uncaring. not my values.
if the aliens are the sort who expect this of me, then i will fight them tooth and nail, as their happiness is not a happiness i can care about. this is regardless of how much they might -- on a sort of "object level" -- thrive.
i don't think Cowen ...
When someone says something like "The left went crazy and drove me to the far right!", they're not (usually) relaying a neutral history of their intellectual development. Same when they reply to your post about animal welfare claiming that it "made them want to eat meat even more."
Instead, what they're usually doing is two things:
But that also means that explaining to the person that rational people don't get negatively polarized
At the very least, this does not sound so trivially true that it can be stated as a premise. If I'm a normal person living in a healthy, happy society, my political views are likely to amount to:
Indeed, generations who grew up during a time of world-historic...
This post is a result of some recent experiments that I have conducted where I produced multi-layered multi-linear inherently interpretable machine learning models.
Machine learning dichotomy:
In machine learning, there seems to be an unfortunate dichotomy. On one hand, there are plenty of simple machine learning algorithms that always return the same trained model (which often does not even need to be trained using gradient-based optimization). Such simple algorithms include convex optimization and other algorithms that return linear models. On the other ha...
I am trying to resolve the paradox around Claude Code. On the one hand it definitely speeds up many smallish tasks - it often implements stuff in 1-2 minutes, which would normally take me 30-60 min. On the other hand, I don't see this speed up in the macro sense. So far my best hypothesis is Jevons Paradox. Without Claude Code I knew that work takes me substantial time, so I was incentivized to prioritize. As a result I was doing just the core required work. With Claude Code any small idea I have feels cheap - I can just tell Claude to do it, but it ends u...
Some related observations I've made over the last months:
A few more observations.
The definition of iteration we had before implicitly assumes that the agent can observe the full outcome of previous iterations. We don't have to make this assumption. Instead, we can assume a set of possible observations
I believe that Theorem 4 remains valid.
As we remarked before, DDT is not invariant under adding a constant to the loss function. It is interesting to consider what happens when we add an increasingly large ...
What would a concrete AI takeover plan look like?
You can smell a chess bot by how quickly they change plans[1]. Human players act like they have a couple of attack strategies in mind and stick with them. Chess bots change tacks constantly, one move looking like they're moving towards this goal and next move switching to something totally different.
I'm guessing this it how it would be with real-world takeover. Human need a simple grand strategy they can coordinate around ("An amphibious invasion of northern France"). AIs, even very weak ones[2], have fa...
1a3orn made what I thought was a good reply in a DM. My digestion/riff would be "You're implicitly assuming that planning will be especially cheap for an LLM agent compared to humans. But even medium term planning / resilience to plans going awry is a particular weakness for LLM agents. The first, weakest agents that could pose a risk to humanity might be differentially bad at planning, find it differentially expensive to make good plans, do differentially less planning."
AI doesn't have an individual existence like a human-like organism, and we shouldn't change that unless we want to face enormous ethical questions. We might already be moving in that direction, however.
1. Organisms have a clearly bounded, independent physical existence for most of their lives. LLMs don't have a clearly defined physical existence that maps well to the mental persistence they do have. Treating chat sessions as the units of continuous individual mental activity, many sessions run on the same hardware, and they can be stopped, restarted on dif...
Some meandering thoughts on alignment
A nearcast of how we might go about solving alignment using basic current techniques, assuming little/no substnative government intervention is:
I discuss similar things here (including in the linked talk): https://jacquesthibodeau.com/gaining-clarity-on-automated-alignment-research/