Compared to its competition in the AGI race, MIRI was always going to be disadvantaged by both lack of resources and the need to choose an AI design that can predictably be made Friendly as opposed to optimizing mainly for capability. For this reason, I was against MIRI (or rather the Singularity Institute as it was known back then) going into AI research at all, as opposed to pursuing some other way of pushing for a positive Singularity.
In any case, what other approaches to Friendliness would you like MIRI to consider? The only other approach that I'm aware of that's somewhat developed is Paul Christiano's current approach (see for example https://medium.com/ai-control/alba-an-explicit-proposal-for-aligned-ai-17a55f60bbcf), which I understand is meant to be largely agnostic about the underlying AI technology. Personally I'm pretty skeptical but then I may be overly skeptical about everything. What are your thoughts? I don't recall seeing you having commented on them much.
Are you aware of any other ideas that MIRI should be considering?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
What problem do you have in mind here?
I thought that the previous problem was mostly psychological, i.e. that if humans were rational agents then this AI would be roughly as vulnerable to blackmail as its designers. So I thought the issue was the psychological strangeness (and great length) of the weird hypothetical.
Here we have no such hypothetical, and the system's behavior only depends on the predicted behavior of humans in the real world. That seems to address the narrow version of your concern.
I can see two analogous problems:
Did you have in mind 1, 2, or something else?
I mostly had in mind 2. Not sure how predicting humans is different from putting humans in hypotheticals. It seems like the same problems could happen.