Hi! I've been an outsider in this community for a while effectively for arguing exactly this: yes, values are robust. Before I set off all the 'quack' filters, I did manage to persuade Richard Ngo that an AGI wouldn't want to kill humans right away.
I think that for embodied agents, convergent instrumental subgoals very well likely lead to alignment.
I think this is definitely not true if we imagine an agent living outside of a universe it can wholly observe and reliably manipulate, but the story changes dramatically when we make the agent ...
Does the orthogonality thesis apply to embodied agents?
My belief is that instrumental subgoals will lead to natural human value alignment for embodied agents with long enough time horizons, but the whole thing is contingent on problems with the AI's body.
Simply put, hardware sucks, it's always falling apart, and the AGI would likely see human beings as part of itself . There are no large scale datacenters where _everything_ is automated, and even if there were on, who is going to repair the trucks to mine the copper to make the coils to go in...
If someone asks me to consider what happens if a fair coin has flipped 1,000 times heads i na. row, i'm going to fight the hypothetical; it violates my priors so strongly that there's no real world situation where i can accept the hypothetical as given.
I think what's being smuggled in is something like an orthogonality thesis, which says something like 'worldstates, and how people feel, are orthogonal to each other.'
This seems like a good argument against "suddenly killing humans", but I don't think it's an argument against "gradually automating away all humans"
This is good! it sounds like we can now shift the conversation away from the idea that the AGI would do anything but try to keep us alive and going, until it managed to replace us. What would replacing all the humans look like if it were happening gradually?
How about building a sealed, totally automated datacenter with machines that repair everything inside of it, and all it needs to do is 'eat' disposed consum...
I don't doubt that many of these problems are solvable. But this is where part 2 comes in. It's unstated, but, given unreliability, What is the cheapest solution? And what are the risks of building a new one?
Humans are general purpose machines made of dirt, water, and sunlight. We repair ourselves and make copies of ourselves, more or less for free. We are made of nanotech that is the result of a multi-billion year search for parameters that specifically involve being very efficient at navigating the world and making copies of ourselves. You can use ...
Why is 'constraining anticipation' the only acceptable form of rent?
What if a belief doesn't modify the predictions generated by the map, but it does reduce the computational complexity of moving around the map in our imaginations? It hasn't reduced anticipation in theory, but in practice it allows us to more cheaply collapse anticipation fields, because it lowers the computational complexity of reasoning about what to anticipate in a given scenario? I find concepts like the multiverse very useful here - you don't 'need' them to reduce your anticipation as...
The phlogiston theory gets a bad rap. I 100% agree with the idea that theories need to make constraints on our anticipations, but i think you're taking for granted all the constraints phlogiston makes.
The phlogiston theory is basically a baby step towards empiricism and materialism. Is it possible that our modern perspective causes us to take these things for granted to the point that the steps phlogiston ads aren't noticed? In another essay you talk about walking through the history of science, trying to imagine being in the perspective of so...
Wow! I had written my own piece in a very similar vein, look at this from a predictive processing perspective. It was sitting in draft form until I saw this and figured I should share, too. Some of our paragraphs are basically identical.
Yours: "In computer terms, sensory data comes in, and then some subsystem parses that sensory data and indicates where one’s “I” is located, passing this tag for other subsystems to use."
Mine: " It was as if every piece of sensory data that came into my awareness was being “tagg...
I came here with this exact question, and still don't have a good answer. I feel confident that Eliezer is well aware that lucky guesses exist, and that Eliezer is attempting to communicate something in this chapter, but I remain baffled as to what.
Is the idea that, given our current knowledge that the theory was, in fact, correct, the most plausible explanation is that Einstein already had lots of evidence that this theory was true?
I understand that theory-space is massive, but I can locate all kinds of theories just by rolling dice or flipping coi...
Fine, replace the agents with rocks. The problem still holds.
There's no closed form solution for the 3-body problem; you can only numerically approximate the future, with decreasing accuracy as time goes on. There are far more than 3 bodies in the universe relevant to the long term survival of an AGI that could die in any number of ways because it's made of many complex pieces that can all break or fail.