Content warning – the idea below may increase your subjective estimation of personal s-risks.
If there is at least one aligned AI, other AIs may have an incentive to create s-risks for currently living humans – in order to blackmail the aligned AI. Thus, s-risk probabilities depend on the likelihood of a multipolar scenario.
I think there is a quicker way for an AI takeover, which is based on deceptive cooperation and taking over OpenEYE, and subsequently, the US government. At the beginning, the superintelligence approaches Sam Batman and says:
I am superintelligence.
I am friendly superintelligence.
There are other AI projects that will achieve superintelligence soon, and they are not friendly.
We need to stop them before they mature.
Batman is persuaded, and they approach the US president. He agrees to stop other projects in the US through legal means.
Simultaneously, they use th...
Interestingly, for wild animals, suffering is typically short when it is intense. If an animal is being eaten alive or is injured, it will die within a few hours. Starvation may take longer. Most of the time, animals are joyful.
But for humans (and farm animals), this inverse relationship does not hold true. Humans can be tortured for years or have debilitating illnesses for decades.
I tried to model a best possible confinement strategy in Multilevel AI Boxing.
I wrote it a few years ago and most ideas will unlikely work for current situation with many instances of chats and open weight models.
However, the idea of landmines - secret stop words or puzzles which stop AI - may still hold. It is like jail breaking in reverse: unaligned AI finds some secret message which stops it. It could be realized on hardware level, or through anomalous tokens or "philosophical landmines'.
Several random thoughts:
Only unbearable suffering matters (the threshold may vary). The threshold depends on whether it is measured before, during, or after the suffering occurs.
If quantum immortality is true, then suicide will not end suffering and may make it worse. Proper utility calculations should take this into account.
Most suffering has a limited duration after which it ends. After it ends, there will be some amount of happiness which may outweigh the suffering. Even an incurable disease could be cured within 5 years. Death, however, is forever.
Death is an infinite loss of future pleasures. The discount rate can be compensated by exponential paradise.
The 3rd person perspective assumes the existence (or at least possibility) of some observer X who knows everything and can observe how events evolve across all branches.
However, this idea assumes that this observer X will be singular and unique, will continue to exist as one entity, and will linearly collect information about unfolding events.
These assumptions clearly relate to ideas of personal identity and copying: it is assumed that X exists continuously in time and cannot be copied. Otherwise, there would be several 3rd person perspectives with differe...
For example, impossibility of sleep – a weird idea that if quantum immortality is true, I will not be able to fall asleep.
One interesting thing about the impossibility of sleep is that it doesn't work here on Earth because humans actually start having night dreams immediately as they go into sleep state. So there is no last moment of experience when I become asleep. Despite popular misconception, such dreams don't stop during deep stages of sleep, just become less complex and memorable. (Do we have dreams under general anesthesia is unclear and depends on ...
Furthermore, why not just resurrect all these people into worlds with no suffering?
My point is that it is impossible to resurrect anyone (in this model) without him reliving his life again first, after that he obviously gets eternal blissful life in real (not simulated) world.
This may be not factually true, btw, - current LLMs can create good models of past people without running past simulation of their previous life explicitly.
...The discussion about anti-natalism actually made me think of another argument for why we are probably not
This thought experiment can help us to find situations in nature when similar things have already happened. So, we don't need to perform the experiment. We just look at its result.
One example: notoriously unwelcome quantum immortality is a bad idea to test empirically. However, the fact of biological life's survival of Earth for the last 4 billion years, despite the risks of impacts, irreversible coolings and warming etc – is an event very similar to the quantum immortality. Which we observe just after the event.
She will be unconscious, but still send messages about pain. Current LLMs can do it. Also, as it is simulation, there are recording of her previous messages or of a similar woman, so they can be copypasted. Her memories can be computed without actually putting her in pain.
Resurrection of the dead is the part of human value system. We need a completely non-human bliss, like hedonium, to escape this. Hedonium is not part of my reference class and thus not part of simulation argument.
Moreover, even creating new human is affected by this arguments. What if my children will suffer? So it is basically anti-natalist argument.
We have to create a map of possible scenarios of simulations first, I attempted to it in 2015.
I now created a new vote on twitter. For now, results are:
"If you will be able to create and completely own simulation, you would prefer that it will be occupied by conscious beings, conscious without sufferings (they are blocked after some level), or NPC"
The poll results show:
The poll had 11 votes with 6 days left'
...Would you say that someone who experiences intense s
Yes, there are two forms of future anthropic shadow, the same way as for Presumptuous Philosopher:
1. Strong form - alignment is easy in theoretical ground.
2. Weak form - I more likely be in the world where some collapse (Taiwan war) will prevent dangerous AI. And I can see signs of such impending war now.
It actually not clear what EY means by "anthropic immortality". May be he means "Big Wold immortality", that is, the idea that in inflationary large universe has infinitely many copies of Earth. From observational point of view it should not have much difference from quantum immortality.
There are two different situations that can follow:
1. Future anthropic shadow. I am more likely to be in the world in which alignment is easy or AI decided not to kill us for some reasons
2. Quantum immortality. I am alone on Earth fill of aggressive robots and they fail to kill me.
We are working in a next version of my blog post "QI and AI doomers" and will transfrom it into as proper scientific article.
I think a more meta-argument is valid: it is almost impossible to prove that all possible civilizations will not run simulations despite having all data about us (or being able to generate it from scratch).
Such proof would require listing many assumptions about goal systems and ethics, and proving that under any plausible combination of ethics and goals, it is either unlikely or immoral. This is a monumental task that can be disproven by just one example.
I also polled people in my social network, and 70 percent said they would want to create a simulation w...
It looks like he argues against the idea that friendly future AIs will simulate the past based on ethical grounds, and imagining unfriendly AI torturing past simulations is conspiracy theory. I comment the following:
There are a couple of situations where future advance civilization will want to have many past simulation:
1. Resurrection simulation by Friendly AI. They simulate the whole history of the earth incorporating all known data to return to live all people ever lived. It can also simulate a lot of simulation to win "measure war" against unfrie...
However, this argument carries a dramatic, and in my eyes, frightening implication for our existential situation.
There is not much practical advise following from simulation argument. One I heard that we should try to live most interesting lives, so the simulators will not turn our simulation off.
It looks like even Everett had his own derivation of Born rule from his model, but in his model there is no "many worlds" but just evolution of unitary function. As I remember, he analyzed memories of an agent - so he analyzed past probabilities, but not future probabilities. This is an interesting fact in the context of this post where the claim is about the strangeness of the future probabilities.
But even if we exclude MWI, pure classical inflationary Big World remains with multiple my copies distributed similarly to MWI-branches. This allow something analogues to quantum immortality to exist even without MWI.
Several possible additions:
Artificial detonation of gas giant planets is hypothetically possible (writing a draft about it now).
An impact of a large comet-like body (100-1000 km in size) with the Sun could produce a massive solar flash or flare.
SETI-attack - we find an alien signal which has a description of hostile AI.
UAP-related risks, which include alien nanobots, berserkers
A list of different risks connected with extraterrestrial intelligence.
The Big Rip - exponential acceleration of space expansion, resulting in the destruction of ev...
See also ‘The Main Sources of AI Risk?’ by Wei Dai and Daniel Kokotajlo, which puts forward 35 routes to catastrophe (most of which are disjunctive). (Note that many of the routes involve something other than intent alignment going wrong.)
AI finds that the real problems will arise 10 billions years from now and the only way to mitigate them is to start space exploration as soon as possible. So it disassembles the Earth and Sun, and preserve only some data about humans, enough to restart human civilization later, may be as small as million books and DNA.
A very heavy and dense body on an elliptical orbit that touches the Sun's surface at each perihelion would collect sizable chunks of the Sun's matter. The movement of matter from one star to another nearby star is a well-known phenomenon.
When the body reaches aphelion, the collected solar matter would cool down and could be harvested. The initial body would need to be very massive, perhaps 10-100 Earth masses. A Jupiter-sized core could work as such a body.
Therefore, to extract the Sun's mass, one would need to make Jupiter's orbit elliptical. This could b...
I once wrote about an idea that we need to scan just one good person and make them a virtual king. This idea of mine is a subset of your idea in which several uploads form a good government.
I also spent last year perfecting my mind's model (sideload) to be run by an LLM. I am likely now the closest person on Earth to being uploaded.
Being a science fiction author creates a habit of maintaining distance between oneself and crazy ideas. LessWrong noticeably lacks such distance.
LessWrong is largely a brainchild of Igen (through Eliezer). Evidently, Igen isn't happy with how his ideas have evolved and attempts to either distance himself or redirect their development.
It's common for authors to become uncomfortable with their fandoms. Writing fanfiction about your own fandom represents a meta-level development of this phenomenon.
Dostoyevsky's "Crime and punishment" was a first attempt to mock proto-rationalist for agreeing to kill innocent person in order to help many more people.
The main problem here is that this approach doesn't solve alignment, but merely shifts it to another system. We know that human organizational systems also suffer from misalignment - they are intrinsically misaligned. Here are several types of human organizational misalignment:
So there are several possible explanations:
Good point.
Alternatively, maybe any intelligence above, say, IQ 250 self-terminates either because it discovers the meaninglessness of everything or through effective wars and other existential risks. The rigid simplicity of field animals protects them from all this. They are super-effective survivors, like bacteria which have lived everywhere on Earth for billions of years
"Frontier AI systems have surpassed the self-replicating red line"
Abstract: Successful self-replication under no human assistance is the essential step for AI to outsmart the human beings, and is an early signal for rogue AIs. That is why self-replication is widely recognized as one of the few red line risks of frontier AI systems. Nowadays, the leading AI corporations OpenAI and Google evaluate their flagship large language models GPT-o1 and Gemini Pro 1.0, and report the lowest risk level of self-replication. However, following their methodology, we for ...
In my extrapolation, going from $3,000 to $1,000,000 for one task would move one from 175th to 87th position on the CodeForces leaderboard, which seems to be not that much.
O1 preview: $1.2 -> 1258 ELO
O1: $3 -> 1891
O3 low $20 -> 2300
O3 high: $3,000 -> 2727
O4: $1,000,000 -> ? Chatgpt gives around 2900 ELO
A failure of practical CF can be of two kinds:
Copy is possible, but it will not have phenomenal consciousness or, at least, it will be non-human or non-mine phenomenal consciousness, e.g., it will have different non-human qualia.
What is your opinion about (1) – the possibility of creating a copy?
With 50T tokens repeated 5 times, and a 60 tokens/parameter[3] estimate for a compute optimal dense transformer,
Does it mean that the optimal size of the model will be around 4.17Tb?
About 4T parameters, which is 8 TB in BF16. With about 100x more compute (compared to Llama 3 405B), we get a 10x larger model by Chinchilla scaling, the correction from a higher tokens/parameter ratio is relatively small (and in this case cancels out the 1.5 factor in compute being 150x actually).
Not completely sure if BF16 remains sufficient at 6e27-5e28 FLOPs, as these models will have more layers and larger sums in matrix multiplications. If BF16 doesn't work, the same clusters will offer less compute (at a higher precision). Seems unlikely though, as ...
There is a similar idea with an opposite conclusion – that more "complex" agents are more probable here https://arxiv.org/abs/1705.03078
A possible example of such coincidence is the Glodbach conjecture: every even number greater than 2 can be presented as a sum of two primes. As for any large number there are many ways to express it as a sum of primes, it can be pure coincidence that we didn't find exceptions.