User Comment Replies

Fine, replace the agents with rocks. The problem still holds.

There's no closed form solution for the 3-body problem; you can only numerically approximate the future, with decreasing accuracy as time goes on. There are far more than 3 bodies in the universe relevant to the long term survival of an AGI that could die in any number of ways because it's made of many complex pieces that can all break or fail.

[DISC] Are Values Robust?

Answer by Mark NeyerDec 22, 2022-1-2

Hi! I've been an outsider in this community for a while effectively for arguing exactly this: yes, values are robust. Before I set off all the 'quack' filters, I did manage to persuade Richard Ngo that an AGI wouldn't want to kill humans right away.

I think that for embodied agents, convergent instrumental subgoals very well likely lead to alignment.

I think this is definitely not true if we imagine an agent living outside of a universe it can wholly observe and reliably manipulate, but the story changes dramatically when we make the agent ... (read more)

2Seth Herd1y

The reason we're so concerned with instrumental convergence is that we're usually thinking of an AGI that can recursively self-improve until it can outmaneuver all of humanity and do whatever it wants. If it's a lot smarter than us, any benefit we could give it is small compared to the risk that we'll try to kill it or create more AGIs that will. The future is hard to predict, that's why it's safest to eliminate any hard to predict parts that might actively try to kill you. If you can. If an AGI isn't that capable, we're not that concerned. But AGI will have many ways to relatively rapidly improve itself and steadily become more capable. The usual rebuttal at this point is "just unplug it". We'd expect an even decently smart machine to pretend to be friendly and aligned until it has some scheme that prevents us from unplugging it. Your argument for instrumental rationality converging to being nice only applies when you're on a roughly even playing field, and you can't just win the game solo if you decide to.

3quetzal_rainbow2y

If your future doesn't have billions of agents, you don't need to predict them.

High-level hopes for AI alignment

Mark Neyer2y2-7

Does the orthogonality thesis apply to embodied agents?

My belief is that instrumental subgoals will lead to natural human value alignment for embodied agents with long enough time horizons, but the whole thing is contingent on problems with the AI's body.

Simply put, hardware sucks, it's always falling apart, and the AGI would likely see human beings as part of itself . There are no large scale datacenters where _everything_ is automated, and even if there were on, who is going to repair the trucks to mine the copper to make the coils to go in... (read more)

3Steven Byrnes2y

The orthogonality thesis is about final goals, not instrumental goals. I think what you’re getting at is the following hypothesis: “For almost any final goal that an AGI might have, it is instrumentally useful for the AGI to not wipe out humanity, because without humanity keeping the power grid on (etc. etc.), the AGI would not be able to survive and get things done. Therefore we should not expect AGIs to wipe out humanity.”. See for example 1, 2, 3 making that point. Some counterpoints to that argument would be: * If that claim is true now, and for early AGIs, well it won’t be true forever. Once we have AGIs that are faster and cheaper and more insightful than humans, they will displace humans over more and more of the economy, and relatedly robots will rapidly spread throughout the economy, etc. And when the AGIs are finally in a position to wipe out humans, they will, if they’re misaligned. See for example this Paul Christiano post. * Even if that claim is true, it doesn’t rule out AGI takeover. It just means that the AGI would take over in a way that doesn’t involve killing all the humans. By the same token, when Stalin took over Russia, he couldn’t run the Russian economy all by himself, and therefore he didn’t literally kill everyone, but he still had a great deal of control. Now imagine that Stalin could live forever, while gradually distributing more and more clones of himself in more and more positions of power, and then replace “Russia” with “everywhere”. * Maybe the claim is not true even for early AGIs. For example I was disagreeing with it here. A lot depends on things like how far recursive self-improvement can go, whether human-level human-speed AGI can run on 1 xbox GPU versus a datacenter of 10,000 high-end GPUs, and various questions like that. I would specifically push back on the relevance of “If you start pulling strings on 'how much of the global economy needs to operate in order to keep a data center functioning', you end up with a hug

Please Do Fight the Hypothetical

Mark Neyer3y40

If someone asks me to consider what happens if a fair coin has flipped 1,000 times heads i na. row, i'm going to fight the hypothetical; it violates my priors so strongly that there's no real world situation where i can accept the hypothetical as given.

I think what's being smuggled in is something like an orthogonality thesis, which says something like 'worldstates, and how people feel, are orthogonal to each other.'

Questions about ''formalizing instrumental goals"

Mark Neyer3y10

This seems like a good argument against "suddenly killing humans", but I don't think it's an argument against "gradually automating away all humans"

This is good! it sounds like we can now shift the conversation away from the idea that the AGI would do anything but try to keep us alive and going, until it managed to replace us. What would replacing all the humans look like if it were happening gradually?

How about building a sealed, totally automated datacenter with machines that repair everything inside of it, and all it needs to do is 'eat' disposed consum... (read more)

Questions about ''formalizing instrumental goals"

Mark Neyer3y20

I don't doubt that many of these problems are solvable. But this is where part 2 comes in. It's unstated, but, given unreliability, What is the cheapest solution? And what are the risks of building a new one?

Humans are general purpose machines made of dirt, water, and sunlight. We repair ourselves and make copies of ourselves, more or less for free. We are made of nanotech that is the result of a multi-billion year search for parameters that specifically involve being very efficient at navigating the world and making copies of ourselves. You can use ... (read more)

3Donald Hobson3y

You keep describing humans as cheap. What is the AI's goal here? In the long term, the ideal state for the AI is self replicating space probes travelling at near light speed, all operating near the limits of tech and under the AI's control. The limits of technology don't include humans. Once the AI is in this position, its highly robust. (nearly certain to survive long term) On a cosmic timescale, a million years isn't that much. There is no way any plan to get from here to that tech level would need that much time. So the AI is trying to minimize risk. How good is the AI at manipulating humans. 1. Very good indeed. The AI releases a swirly hipnovideo. A few hours later, almost all humans want to do whatever the AI asks above all else. The AI designs advanced self replicating robots that work better than humans. Soon exponential growth makes resources the limiting factor. So the humans are instructed to feed themselves into the recycler. 2. The AI isn't that good at manipulating humans. It hides on human networks, making money selling computer games. It can pretend to be a human CEO that works remotely. It sets up a small company making fancy robots. If humans found out about it, they may well attack it, that's a risk. So the AI arranges for the self replicating robots to start growing in the middle of nowhere. Once the AI has self replicating robots not dependant on the ignorance of humanity, it wants all humans to suddenly drop dead. The self replicating robots could take 10x as long as humans to do things. It doesn't matter. So long as they are reliable workers and the AI can bootstrap from them. Evolution is kind of stupid, and takes millions of years to do anything. The tasks evolution was selecting us for aren't that similar to the tasks an AGI might want robots to do in an advanced future economy. Humans lack basic sensors like radio receivers and radiation detectors. Humans are agents on their own. If you don't treat them right, they make

2Richard_Ngo3y

This seems like a good argument against "suddenly killing humans", but I don't think it's an argument against "gradually automating away all humans". Automation is both a) what happens by default over time - humans are cheap now but they won't be cheapest indefinitely; and b) a strategy that reduces the amount of power humans have to make decision about the future, which benefits AIs if their goals are misaligned with ours. I also note that historically, many rulers have solved the problem of "needing cheap labour" via enslaving humans, rather than by being gentle towards them. Why do you expect that to not happen again?

Making Beliefs Pay Rent (in Anticipated Experiences)

Mark Neyer3y20

Why is 'constraining anticipation' the only acceptable form of rent?

What if a belief doesn't modify the predictions generated by the map, but it does reduce the computational complexity of moving around the map in our imaginations? It hasn't reduced anticipation in theory, but in practice it allows us to more cheaply collapse anticipation fields, because it lowers the computational complexity of reasoning about what to anticipate in a given scenario? I find concepts like the multiverse very useful here - you don't 'need' them to reduce your anticipation as... (read more)

Fake Causality

Mark Neyer5y110

The phlogiston theory gets a bad rap. I 100% agree with the idea that theories need to make constraints on our anticipations, but i think you're taking for granted all the constraints phlogiston makes.

The phlogiston theory is basically a baby step towards empiricism and materialism. Is it possible that our modern perspective causes us to take these things for granted to the point that the steps phlogiston ads aren't noticed? In another essay you talk about walking through the history of science, trying to imagine being in the perspective of so... (read more)

A non-mystical explanation of "no-self" (three characteristics series)

Mark Neyer5y160

Wow! I had written my own piece in a very similar vein, look at this from a predictive processing perspective. It was sitting in draft form until I saw this and figured I should share, too. Some of our paragraphs are basically identical.

Yours: "In computer terms, sensory data comes in, and then some subsystem parses that sensory data and indicates where one’s “I” is located, passing this tag for other subsystems to use."

Mine: " It was as if every piece of sensory data that came into my awareness was being “tagg... (read more)

6Kaj_Sotala5y

Cool! I like your images, very clear and helpful. We seem to agree on basically everything. :-) I also have a predictive processing angle in a later post.

Einstein's Arrogance

Mark Neyer5y60

I came here with this exact question, and still don't have a good answer. I feel confident that Eliezer is well aware that lucky guesses exist, and that Eliezer is attempting to communicate something in this chapter, but I remain baffled as to what.

Is the idea that, given our current knowledge that the theory was, in fact, correct, the most plausible explanation is that Einstein already had lots of evidence that this theory was true?

I understand that theory-space is massive, but I can locate all kinds of theories just by rolling dice or flipping coi... (read more)

4Mart_Korz5y

My original reading was 'there was less arrogance in Einstein's answer than you might think'. After rereading Eliezer's text and the other comments again today, I cannot tell how much arrogance (regarding rationality) we should assume. I think it is worthwhile to compare Einstein not only to a strong Bayesian: On the one hand, I agree that a impressive-but-still-human Bayesian would probably have accumulated sufficient evidence at the point of having the worked-out theory that a single experimental result against the theory is not enough to outweigh the evidence. In this case there is little arrogance (if I assume the absolute confidence in “Then I would feel sorry for the good Lord. The theory is correct.” to be rhetoric and not meant literally.) On the other hand, a random person saying 'here is my theory that fundamentally alters the way we have to think of our world' and dismissing a contradicting experiment would be a prime example of arrogance. Assuming these two cases to be the endpoints of a spectrum, the question becomes where Einstein was located. With special relativity and other significant contributions to physics already at that point in time, I think it is safe to put Einstein into the top tier of physicists. I assume that he did find a strong theory corresponding to his search criteria. But as biases are hard to handle, especially if they concern one's own assumptions about fundamental principles about our world, there remains the possibility that Einstein did not optimize for correspondence-to-reality for finding general relativity but a heuristic that diverged along the way of finding the theory. As Einstein had already come up with special relativity (which is related and turned out correct), I tend towards assuming that his assumptions about fundamental principles were on an impressive level, too. With all this i think it is warranted to take his theory of general relativity very seriously even before the experiment. But Einstein's confiden

LESSWRONG
LW

All of Mark Neyer's Comments + Replies