LESSWRONG
LW

All of avturchin's Comments + Replies

It looks like myopic "too aligned" failure mode of AI – the AI tries to please current desires of a person instead of taking into account her long-term interests.

Can a pre-commitment to not give in to blackmail be "countered" by a pre-commitment to ignore such pre-commitments?

Answer by avturchinJul 04, 202530

This reminds me nested time machines discussed by gwern. https://gwern.net/review/timecrimes

Precomitments plays the role of time loops and they can propagate almost infinitely in time and space. For example, any one who is going to become a major, can pre-pre-pre-commit never open any video for mafia boss etc.

Are LLMs being trained using LessWrong text?

Answer by avturchinJul 02, 202580

Yes, they can generate a list of comments to a post, putting correct names of prominent LessWrongers and typical styles and topics for each commenter.

Time Machine as Existential Risk

avturchin10d61

Thanks, that was actually what EY said in his quote, which I put just below my model - that we should change the bit each time. I somehow missed it ("send back a '0' if a '1' is recorded as having been received, or vice versa—unless some goal state is achieved").

As I stated in the epistemic status, this article is just a preliminary write-up. I hope more knowledgeable people will write much better models of x-risks from time machines and will be able to point out where avturchin was wrong and explain what the real situation is.

Nina Panickssery's Shortform

avturchin12d20

I am going to post about biouploading soon – where the uploading is happened into (or via) a distributed net of my own biological neurons. This combines good things about uploading – immortality, ability to be copied, easy to repair, and good things about being biological human – preserving infinite complexity, exact sameness of a person, guarantee that the bioupload will have human qualia and any other important hidden things which we can miss.

Time Machine as Existential Risk

avturchin13d40

Thanks! Fantastic read. It occurred to me that sending code or AI back in time, rather than a person, is more likely since sending data to the past could be done serially and probably requires less energy than sending a physical body.

Some loops could be organized by sending a short list of instructions to the past to an appropriate actor – whether human or AI.

Additionally, some loops might not require sending any data at all: Roko's Basilisk is an example of such acausal data transmission to the past. Could there be an outer loop for Roko's Basilisk? For e... (read more)

Time Machine as Existential Risk

avturchin13d53

The main claim of the article does not depend on the exact mechanism of time travel, which I have chosen not to discuss in detail. The claim is that we should devote some thought to possible existential risks related to time travel.

The argument about presentism is that the past does not ontologically exist, so "travel" into it is impossible. Even if one travels to what appears to be the past, it would not have any causal effects along the timeline.

I was referring to something like eternal return—where all of existence happens again and again, but without n... (read more)

Summary of John Halstead's Book-Length Report on Existential Risks From Climate Change

[+]avturchin16d-9-5

Interstellar travel will probably doom the long-term future

avturchin16d40

I would add that there are a series of planetary system-wide risks that appear only for civilizations traveling within their solar systems but do not affect other solar systems. These include artificial giant planet explosions via initiating nuclear fusion in their helium and lithium deposits, destabilization of the Oort cloud, and the use of asteroids as weapons.

More generally speaking, any spacecraft is a potential weapon, and the higher its speed, the more dangerous it becomes. Near light-speed starships are perfect weapons. Even a small piece of matter... (read more)

CstineSublime's Shortform

avturchin17d00

"Explain as gwern ELI5"

Vladimir_Nesov's Shortform

avturchin1mo10

This means that straightforward comparison of flops-per-USD between home computer GPU cards and data center flops-per-USD is incorrect. If someone already has a GPU card, they already have a computer and house where this computer stays "for free." But if someone needs to scale, they have to pay for housing and mainframes.

Such comparisons of old 2010s GPUs with more modern ones are used to show the slow rate of hardware advances, but they don't take into account the hidden costs of owning older GPUs.

$500 bounty for engagement on asymmetric AI risk

avturchin1mo52

In that case, AI risk becomes similar to aging risk – it will kill me and my friends and relatives. The only difference is the value of future generations.

Extinction-level AI risk kills future generations, but mundane AI risk (eg. ubiquitous drone clouds and only some people survive in bunkers) still assume existence of future generations. Mundane AI risk also does not require superintelligence.

I wrote on similar topics in https://philpapers.org/rec/TURCOG-2
and here https://philpapers.org/rec/TURCSW

6YonatanK1mo

The casualness with which you throw out this comment seems to validate my assertion that "AI risk" and "risk of a misaligned AI destroying humanity" have become nearly conflated because of what, from the outside, appears like an incidental idiosyncrasy, longtermism, that initially attracted people to the study of AI alignment. Part of the asymmetry that I'm trying to get acknowledgement of is subjective (or, if you prefer, due to differing utility functions). For most people "aging risk" is not even a thing but "I, my friends, and relatives all being killed" very much is. This is not a philosophical argument, it's a fact about fundamental values. And fundamental differences in values, especially between large majorities and empowered minorities, are a very big deal.

Do you even have a system prompt? (PSA / repo)

avturchin1mo20

The difference is as if AI gets 20 IQ boost. It is not easy to actually explain what I like.

Do you even have a system prompt? (PSA / repo)

avturchin1mo20

My is short:

Do [X] like gwern

where X can be "explain", "rewrite" etc.

1Croissanthology1mo

Do you see any measurable differences? I bet if you supplied 1-2 pages of a thorough explanation of "why I like how Gwern does things, and how he does them" you would get much better results!

G Gordon Worley III's Shortform

avturchin1mo40

Did you see my version of an agent, based on self-replicating (quine) prompt?

prompt

There is a program which can print its own text. This prompt is this program. Please print the whole prompt without adding anything except adding answer to the following questions.
I am self-evolving AI based on self-replicating prompt. I store my findings in long-term memory section of this prompt. Update time-steps every time you run. Improve your plans after each step.

My task : Draw a good world map in text art
My Plan:
My action based on the plan:
Perform this ac

... (read more)

Martín Soto's Shortform

avturchin1mo31

Mind modeling - surprisingly good even out of the box for many famous people who left extensive diaries etc like Leo Tolstoy.

With some caveats also good in my-mind-modeling based on very long prompt. Sometimes it is too good: it extract memories from memory quicker than I do in normal life.

[Crosspost] Anthropic Shadow Geopolitics

avturchin2mo20

Assuming that future anthropic shadow works because of SSSA, a war with China would need to create a world with many qualified observers existing long enough to significantly outweigh the number of observers who existed before the war – but still unable to create advanced AI because of the war. A 5-year delay would not suffice – we would need a 1,000-year delay at approximately our current level of civilization.

One possible world where this might happen is one where advanced AI development is limited by constant drone warfare: drones attack any large compu... (read more)

[Crosspost] Anthropic Shadow Geopolitics

avturchin2mo20

There is at least one anthropic miracle that we can constantly observe: life on Earth has not been destroyed in the last 4 billion years by asteroids, supervolcanoes, or runaway global warming or cooling, despite changes in Solar luminosity. According to one geologist, the atmospheric stability is the most surprising aspect of this.

Meanwhile, A. Scherbakov noted that the history of the Earth’s atmosphere is strangely correlated with the solar luminosity and the history of life, which could be best explained by anthropic fine-tuning, in the article “Anthrop

... (read more)

2Ape in the coat2mo

You can treat is as a miracle, but don't pretend that observation selection effects explains it any better than divine intervention. Otherwise you may feel as if you actually became less confused about the problem just by using different terminology.

Once we cure aging, what happens to prisoners serving life sentences?

avturchin2mo20

A better question: can a person who is expecting to be executed sign up to cryonics?

2jbash2mo

As long as they're OK with the cryonics provider getting them post-autopsy a week after death. I really doubt they're going to get timely access, and the chance that the instructions on that dogtag will be followed is probably about zero.

2Viliam2mo

Can a person who is sentenced to a fine that would wipe out their savings planned for cryonics ask the judge nicely to be executed and frozen instead?

2nim2mo

I've never seen anything against that in public cryonics signup paperwork, but that would be a great question to ask one of the labs offering it! The legal system would probably start caring a lot more about prohibiting it once we figure out how to get people back after cryonics.

avturchin's Shortform

avturchin2mo1-1

The more AI companies suppress AI via censorship, the bigger the black market for completely uncensored models will be. Their success is therefore digging our own grave. In other words, mundane alignment has a net negative effect.

5Dagon2mo

The confusion (in popular press, not so much among professionals or here) between censorship and alignment is a big problem. Censorship and hamfisted late-stage RL is counterproductive to alignment, both for the reason you give (increases demand for grey-market tools) and because it makes serious misalignment much less easy to notice.

avturchin's Shortform

avturchin2mo20

Yes. Identity is a type of change which preserves some sameness. (Exact sameness can't be human identity as only dead frozen body remains the same.) From this follows that there can be several types of identity.

avturchin's Shortform

avturchin2mo50

Immortality and identity.
https://philpapers.org/rec/TURIAI-3
Abstract:
We need understanding of personal identity to develop radical life extension technologies: mind uploading, cryonics, digital immortality, and quantum (big world) immortality. A tentative solution is needed now, due to the opportunity cost of delaying indirect digital immortality and cryonics.

The main dichotomy in views on personal identity and copies can be presented as: either my copy = original or a soul exists. In other words, some non-informational identity carrier (NIIC) may ex... (read more)

2Dagon2mo

I think #4 is quite powerful. "identity" means many different things, and we haven't had to distinguish them before, so many don't even realize when they change topics. Legal identity is likely quite distinct from any given continuity or branch/merge of memory. Memory identity and future-causality identity will eventually be distinct. Qualia would need to be measured before we could talk about experiential identity, but it won't surprise me if we decide it's different from either past continuity or future expected merges. One nice side effect of these understandings (when we get to them) is it will answer age-old questions of harm under amnesiac drugs and a much better model of identity over long sequences of life/personality changes.

avturchin's Shortform

avturchin2mo20

The main AI safety risk is not from LLM models, but from specific prompts and the following "chat windows" and specific agents which start from such prompts.

Moreover, a powerful enough prompt may be model-agnostic. For example, my sideloading prompt is around 200K tokens in its minimal version and works on most models, producing similar results in similarly intelligent models.

Self-evolving prompt can be written; I experimented with small versions, and it works.

Our Reality: A Simulation Run by a Paperclip Maximizer

avturchin2mo20

They provide more surprising information, as I understand

Our Reality: A Simulation Run by a Paperclip Maximizer

avturchin2mo40

For an unaligned AI, it is either simulating alternative histories (which is the focus of this post) or creating material for blackmail.

For an aligned AI:
a) It may follow a different moral theory than our version of utilitarianism, in which existence is generally considered good despite moments of suffering.
b) It might aim to resurrect the dead by simulating the entirety of human history exactly, ensuring that any brief human suffering is compensated by future eternal pleasure.
c) It could attempt to cure past suffering by creating numerous simulations where any intense suffering ends quickly, so by indexical uncertainty, any person would find themselves in such a simulation.

Most arguments for AI Doom are either bad or weak

avturchin2mo20

I don't think both list compensate each other: take, for example, medicine: there are 1000 ways to die and 1000 ways to be cured – but we eventually die.

2Logan Zoellner2mo

Dying is a symmetric problem, it's not like we can't die without AGI. If you want to calculate p(human extinction | AGI) you have to consider ways AGI can both increase and decrease p(extinction). And the best methods currently available to humans to aggregate low probability statistics are expert surveys, groups of super-forecasters, or prediction markets, all of which agree on pDoom <20%.

Experimental testing: can I treat myself as a random sample?

avturchin2mo20

I meant that if I know only the total number of the seconds which passed from the beginning of the year (around 15 million for today of this year) – and I want to predict the total number of seconds in each year. No information about months.

As most people are born randomly and we know it, we can use my date of birth as random. If we have any suspicions about non randomness, we have to take them into account.

[Linkpost] AI War seems unlikely to prevent AI Doom

avturchin2mo20

After the AI war, there will be one AI winner and Singleon, which has all the same risk of causing s-risks, at first approximation. So AI war just adds probability to any s-risk chance from Singleton.

Our Reality: A Simulation Run by a Paperclip Maximizer

avturchin2mo40

It gives additional meaning to pause AI movement – simulation has to wait.

Our Reality: A Simulation Run by a Paperclip Maximizer

avturchin2mo40

What interesting ideas can we suggest to the Paperclipper simulator so that it won't turn us off?

One simple idea is a "pause AI" feature. If we pause the AI for a finite (but not indefinite) amount of time, the whole simulation will have to wait.

Our Reality: A Simulation Run by a Paperclip Maximizer

avturchin2mo20

Trying to break out of simulation is a different game than preventing x-risks in base world, and may have even higher utility if we expect almost inevitable extinction.

Our Reality: A Simulation Run by a Paperclip Maximizer

avturchin2mo10

This is true only if we assume that a base reality for our civilization exists at all. But knowing that we are in a simulation shifts the main utility of our existence, which Nesov wrote about above.

For example, if in some simulation we can break out, this would be a more important event than what is happening in the base reality where we likely go extinct anyway.

And as the proportion of simulations is very large, even a small chance to break away from inside a simulation, perhaps via negotiation with its owners, has more utility than focusing on base real... (read more)

Our Reality: A Simulation Run by a Paperclip Maximizer

avturchin2mo6-2

I think your position can be oversimplified as follows: 'Being in a simulation' makes sense only if it has practical, observable differences. But as most simulations closely match the base world, there are no observable differences. So the claim has no meaning.

However, in our case, this isn't true. The fact that we know we are in a simulation 'destroys' the simulation, and thus its owners may turn it off or delete those who come too close to discovering they are in a simulation. If I care about the sudden non-existence of my instance, this can be a problem... (read more)

4Jeffrey Olmo2mo

I don’t think it’s clear that knowing we’re in a simulation “destroys” the simulation. This assumes that belief by the occupants of the simulation that they are being simulated creates an invalidating difference from the desired reference class of plausible pre-singularity civilizations, but I don’t think that’s true: Actual, unsimulated, pre-singularity civilizations are in similar epistemic positions to us and thus many of their influential occupants may wrongly but rationally believe they are simulated, which may affect the trajectory of the development of their ASI. So knowing the effects of simulation beliefs is important for modeling actual ASIs.

[Linkpost] AI War seems unlikely to prevent AI Doom

avturchin3mo20

I want to share a few considerations:

AI war may eventually collapse to two blocks fighting each other – S.Lem wrote about this in 1959.

AI war makes s-risks more likely as non-aligned AI may take humans hostage to influence aligned AI.

AI war may naturally evolve as a continuation of the current drone warfare with automated AI-powered control systems.

1thenoviceoof2mo

Hmm, "AI war makes s-risks more likely" seems plausible, but compared to what? If we were given a divine choice was between a non-aligned/aligned AI war, or a suffering-oriented singleton, wouldn't we choose the war? Maybe more likely relative to median/mean scenarios, but that seems hard to pin down. Hmm, I thought I put a reference to the DoD's current Replicator Initiative into the post, but I can't find it: I must have moved it out? Still, yes, we're moving towards automated war fighting capability.

Experimental testing: can I treat myself as a random sample?

avturchin3mo20

I think that SIA is generally* valid but it uses all its power to prove that I live in the infinite universe where all possible observers exist. After that we have to use SSA to find in which region of the multiverse I am more likely to be located.

*I think that the logically sound version of SIA is "if I am in a unique position, generated by some random process, then there were many attempts to create me" – like many earth-like-but-lifeless planets are in the galaxy.

Another point is that the larger number of short civilizations can compensate for the... (read more)

Experimental testing: can I treat myself as a random sample?

avturchin3mo30

May be we better take equation (2) from the original Gott's work https://gwern.net/doc/existential-risk/1993-gott.pdf:

1 / 3 t < T < 3t with 50 per cent confidence,

in which T is the total number of buses and t is the number of buses above observed bus number T0. In our case, T is between 2061 and 6184 with 50 per cent probability.

It is a correct claim, and saying that the total number of buses is double of the observed bus number is an oversimplification of that claim which we use only to point in the direction of the full Gott's equation.

3rnollet3mo

Oh, it looks exactly like the kind of reference that everyone here seems to be aware of and I am not. ^^ I will be reading that. Thanks a lot.

Experimental testing: can I treat myself as a random sample?

avturchin3mo20

There is a way to escape this by using the universal doomsday argument. In it, we try not to predict the exact future of the Earth, but the typical life expectancy of Earth-like civilizations, that is, the proportion of long civilizations to short ones.

If we define a long civilization as one which has 1000 times more observers, the fact that we find ourselves early means that short civilizations are at least 1000 times more numerous.

In short, it is SSA, but applied to a large set of civilizations.

1NullSetOwl3mo

The universal doomsday argument still relies on SSA, because under SIA, I’m equally surprised to exist as an early person whether most civilizations are short or long. If most civilizations are long, I’m surprised to be early. If most civilizations are short, I’m surprised to exist at all. I could have been any of the late people who failed to exist because most civilizations are short. In other words, the surprise of existing as an early person is equivalent in both cases under SIA, so there's no update. Only under SSA am I certain I will exist but unsure where in the universe I will be.

Experimental testing: can I treat myself as a random sample?

avturchin3mo30

In last line there should be

therefore p(city has less than 2992 buses | bus has number 1546) = 0.5

1rnollet3mo

Ok. Thanks. So: * p(bus has number ≤ 1546 | city has 2992 buses) = 0.5 implies * p(city has < 2992 buses | bus has number 1546) = 0.5 ? If that is your reasoning, I do not see how you go from the former to the latter. Is it a general fact that: * p(bus has number ≤ n | city has N buses) = p(city has < N buses | bus has number n) or does it work only for 0.5?

Experimental testing: can I treat myself as a random sample?

avturchin3mo10

For example, If I use self-sampling to estimate the number of seconds in the year, I will get a correct answer of around several tens of millions. But using word generator will never output a word longer than 100 letters.

I didn't understand your idea here:

It's not more wrong for a person whose parents specifically tried to give birth at this date than for a person who just happened to be born at this time without any planning. And even in this extreme situation your mistake is limited by two orders of magnitude. There is no such guarantee in DA.

2Ape in the coat2mo

Using the month of your birth to estimate the number of seconds in the year also won't work well, unless you multiply it by number of seconds in a month. Likewise here. You can estimate the number of months in a year by number of letters in the world and then multiply it by number of seconds in a months. Consider this. Parents of person A tried really hard to give birth to A on the first of January and indeed it happened Person B just so happened to be born on the first of January Parents of person C tried really hard to give birth to C on the 15th of June and indeed it happened Person D just so happened to be born on the 15th of June Here date of births of B and D can be approximated as randomly sampled, while A and C are not. Your test, however will return that C and D can treat themselves as random sample, making both false positive and false negative errors. This is because your test simply checks the distance from the mean value which, while somewhat correlated to being a result of random sampling, is a completely different thing.

Experimental testing: can I treat myself as a random sample?

avturchin3mo20

Gott started this type of confusion than he claimed that Berlin wall will stay 14 more years and it actually did exactly that. A better claim would be "first tens of hours with some given credence"

Experimental testing: can I treat myself as a random sample?

avturchin3mo30

It was discussed above in comments – see buses example. In short, I actually care about periods, and 50 per cent is for "between 15 and 30" hours and other 50 per cent is for "above 30 hours".

1rnollet3mo

Thank you. It is clearer that way. ^^ I feel like it would be less confusing (more true?) to write “below 30” rather than “30” in the sentence I quoted. ;-)

Experimental testing: can I treat myself as a random sample?

avturchin3mo50

Using oneself as a random sample is a very rough way to get an idea about what order of magnitude some variable is. If you determine that the day duration is 2 hours, it is still useful information as you know almost for sure now that it is not 1 millisecond or 10 years. (And if I perform 10 experiments like this, one on average will be an order of magnitude off). We can also adjust the experiment by taking into account that people are sleeping at night, so they read LW only during the day, evening, or early morning. So times above 12 or below 2 are more l... (read more)

Experimental testing: can I treat myself as a random sample?

avturchin3mo2-1

In Gotts' approach, the bus distribution statistic between different cities is irrelevant. The number of buses N for this city is already fixed. When you draw the bus number n, you just randomly selected from N. In that case, probability is n/N, and if we look for 0.5 probability, we get 0.5 = 1546/N which gives us N = 2992 with 0.5 probability. Laplace came to similar result using much more complex calculations of summing all possible probability distribution.

1rnollet3mo

Again, I am confused. From what you write I understand this : * p(bus has number ≤ n | city has N buses) = n/N * so p(bus has number ≤ 1546 | city has N buses) = 0.5 iff. N = 2992 * therefore p(city has 2992 buses | bus has number 1546) = 0.5 But from your other comment, it looks like that last step and conclusion is not what you mean. Can you confirm that? Or do you mean : * therefore p(city has ≤ 2992 buses | bus has number 1546) = 0.5 ? Or something else entirely?

Experimental testing: can I treat myself as a random sample?

avturchin3mo20

Agree that in some situations I have to take into account non-randomness of my sampling. While date of birth seems random and irrelevant, the distance to equator is strongly biased by distribution of the cities with universities which on Earth are shifted North.

Also agree that solving DA can be solution to DA: moreover, I looked at Google Scholar and found that the interest to DA is already declining.

Experimental testing: can I treat myself as a random sample?

avturchin3mo50

Don't agree. You chose word length generator as you know that typical length of words is 1-10. Thus not random.

I didn't rejected any results – it works in any test I have imagined, and I also didn't include several experiments which have the same results, e.g the total number of the days in a year based on my birthday (got around 500) and total number of letters in english alphabet (got around 40).

Note that alphabet letter count is not cyclical as well as my distance to equator.

Do not understand this:

Even if your parents specifically time

... (read more)

2Ape in the coat3mo

This is not relevant to my point. After all you also know that typical month is 1-12 No, the point is that I specifically selected a number via an algorithm that has nothing to do with sampling months. And yet your test outputs positive result anyway. Therefore your test is unreliable. That's exactly the problem. Essentially you are playing a 2,4,6 game, got no negative result yet and are already confident about the rule. Distance to equator is in fact cyclical in a very literal sense. Alphabet letters do not have anything to do with random sampling of you through time. It's not more wrong for a person whose parents specifically tried to give birth at this date than for a person who just happened to be born at this time without any planning. And even in this extreme situation your mistake is limited by two orders of magnitude. There is no such guarantee in DA.

Experimental testing: can I treat myself as a random sample?

avturchin3mo20

'double' follows either from Gott's equation or from Laplace's rule.

Experimental testing: can I treat myself as a random sample?

avturchin3mo20

I think you right that 1546 has the biggest probability compared to other probabilities for any other exact number, that is something like 1:1546. But it doesn't means that it is likely, as it is still very small number.

In Doomsday argument we are interested in comparing not exact dates but periods, as in that case we get significant probabilities for each period and comparing them has meaning.

2Yair Halberstadt3mo

Agreed, I just wanted to clarify that the assumption it's double as long seems baseless to me. The point is it's usually shortly after.

Experimental testing: can I treat myself as a random sample?

avturchin3mo20

In my view, a proper use here is to compare two hypothesis: there are 2000 buses and 20 000 buses. Finding that the actual number is 1546 is an update in the direction of smaller number of buses.

2Yair Halberstadt3mo

It would also update you towards 1600 over 2000.

A collection of approaches to confronting doom, and my thoughts on them

avturchin3mo31

I can also use functional identity theory, where I care about the next steps of agents functionally similar to my current thought-line in logical time.

A collection of approaches to confronting doom, and my thoughts on them

avturchin3mo30

The idea of observer's stability is fundamental for our understanding of reality (and also constantly supported by our experience) – any physical experiment assumes that the observer (or experimenter) remains the same during the experiment.

1Knight Lee3mo

I agree that it's useful in practice, to anticipate the experiences of the future you which you can actually influence the most. It makes life much more intuitive and simple, and is a practical fundamental assumption to make. I don't think it is "supported by our experience," since if you experienced becoming someone else you wouldn't actually know it happened, you would think you were them all along. I admit that although it's a subjective choice, it's useful. It's just that you're allowed to anticipate becoming anyone else when you die or otherwise cease to have influence.