Eliezer talked about human imitations quite a bit in Challenges to Christiano’s capability amplification proposal, specifically the safety implications of deviations from perfect imitation.
I've discussed (some difficulties of) imitating humans as a way to accelerate philosophical progress.
I also share shminux's concern about humans (and hence human imitations) not being safe in extreme/unusual circumstances, and have discussed it under "human safety problems".
ETA: However I think it's definitely worth investigating further.
Found a few more relevant posts (which I haven't read and digested yet, but I figure I'll post here before I forget to):
ETA:
I actually spent a bunch of time in the last weeks fixing and updating Arbital, so it should be reasonably fast now. The arbital pages loaded for me in less than a second.
arbital.greaterwrong is obviously still faster, but it's no longer as massive of a difference.
In Challenges to Christiano’s capability amplification proposal, Eliezer mentioned "challenges I’ve given about how perfect imitation would be very expensive". Unfortunately I'm not sure where those challenges are so I can't check the details of his arguments. On the face of it, it seems likely that (at design/training time) creating human imitations that are accurate enough to be safe (or as safe as humans) will require a lot more compute and/or advances in AI/ML research and/or resources in general than creating human-level AGI, since "human imitations that are accurate enough to be safe" seems like a much smaller target in configuration space than "human-level AGI", and the former also requires much more specific and expensive training data than the latter.
It's less clear to me that human imitations have to be more expensive at run time. ETA: One argument in favor of that is that "human imitations that are accurate enough to be safe" is a much smaller region in configuration space so there's less room to optimize for other desirable properties like efficiency/performance on particular tasks.
This seems like an important question to answer and I wonder if anyone knows Eliezer's specific arguments, or any other relevant arguments.
I think this is an idea worth exploring. The biggest problem I have with it right now is that it seems like current ML methods would get us mesa-optimizers.
To spell it out a bit: At first the policy would be a jumble of heuristics that does decently well. Eventually, though, it would have to be something more like an agent, to mimic humans. But the first agent that forms wouldn't also be the last, the perfectly accurate one. Rather, it would be somewhat accurate. Thenceforth further training could operate on the AIs values and heuristics to make it more human-like... OR it could operate on the AIs values and heuristics to make it more rational and smart so that it can predict and then mimic human behavior better. And the latter seems more likely to me.
So what we'd end up with is something that is similar to a human, except with values that are a more random and alien, and maybe also more rational and smart. This seems like exactly the sort of thing we are trying to avoid.
Imitating humans is both hard and dangerous.
Let's talk about dangerous. Humans are reasonably benign in the situation where they do not have a lot of power or control compared to others. Once you look into unusual cases, people quickly become unaligned with other people, or even with the whole of humanity. Same applies to groups of people who gain power. I am guessing your intention is to try to imitate humans in the situations where they are mostly harmless, and then extrapolate this imitation by ramping up the computational power to make decisions ...
Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
This seems very related to the question of whether uploads would be safer than some other kind of AGI. Offhand, I remember a comment from Eliezer suggesting that he thought that would be safer (but that uploads would be unlikely to happen first).
Not sure how common that view is though.
Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at...
I don't know if it's come up in the comments, but naive (e.g. not cognitive-architecturally-informed) approaches seem fairly likely (~40%? OTTMH) to produce mesa-optimizationy-things, to me, see: https://www.lesswrong.com/posts/whRPLBZNQm3JD5Zv8/imitation-learning-considered-unsafe
Otherwise, yes, seems great, esp. if we just imitate AI safety researchers and let them go on to solve all the safety problems.
Humans learn their morals in complex interactions with their environment. It's unlikely that the AGI you herd together will learn their morals in a similar way as you can't expose them in the same way.
Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
I think the problem of scale doesn't necessarily gets solved through quantity - because there are just qualitative issues (eg. loss of human life) that no amount of infrastructure upscale can compensate.
The first question is whether you have enough information to locate human behavior. The concept of optimization is fairly straightforward, and it could get a rough estimate of our intelligence by seeing humans trying to solve some puzzle. In other words, the amount of data needed to get an optimizer is small. The amount of data needed to totally describe every detail of human values is large. This means that a random hypothesis based on a small amount of data will be an optimizer with non-human goals.
For example, maybe the human trainers value having real...
Do people think we could make a singleton (or achieve global coordination and preventative policing) just by imitating human policies on computers? If so, this seems pretty safe to me.
Some reasons for optimism: 1) these could be run much faster than a human thinks, and 2) we could make very many of them.
Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at the keyboard. Their keyboard actions are the actions, and everything between actions is an observation. Then learn the policy of the group of humans. By the way, these can be happy humans who earnestly try to follow instructions. To model their policy, we can take the maximum a posteriori estimate over a set of policies which includes the truth, and freeze the policy once we're satisfied. (This is with unlimited computation; we'd have to use heuristics and approximations in real life). With a maximum a posteriori estimate, this will be quick to run once we freeze the policy, and we're no longer tracking tons of hypotheses, especially if we used some sort of speed prior. Let T be the number of interaction cycles we record before freezing the policy. For sufficiently large T, it seems to me that running this is safe.
What are people's intuitions here? Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
If we think this would work, there would still be the (neither trivial nor hopeless) challenge of convincing all serious AGI labs that any attempt to run a superhuman AGI is unconscionably dangerous, and we should stick to imitating humans.