I see the LLM side of this as a first step, both as a proof of concept and because agents get built on top of LLMs (for the forseeable future at least).
I think that, no, it isn't any easier to align an agent's environment as to align the agent itself. I think for perfect alignment, that will last in all cases and for all time, they amount to the same thing, and this is why the problem is so hard. When an agent or any AI learns new capbilities, it draws the information it needs out of the environment. It's trying to answer the question: "Given the informati...
Thanks for the comment! Taking your points in turn:
- I am curious that you see this as me saying superintelligent AI will be less dangerous, as to me it means it will be more. It will be able to dominate you in the usual hyper-competent sense but also may accidentally screw up some super-advanced physics and kill you that way too. It sounds like I should have stressed this more. I guess there are people that think AI sucks and will continue to suck, and therefore why worry about existential risk, so maybe by stressing AI fallibility I'm riding their energy...
I'm glad to see someone talking about pragmatism!
I find it interesting that the goal of a lot of alignment work seems to be to align AI with human values, when humans with human values spend so much of their time in (often lethal) conflict. I'm more inclined to the idea of building AI with a value-set that is complementary to human values in some widely-desirable way, rather than literally having a bunch of AIs that behave like humans.
I wonder if this perspective intersects with some of your points about thick and thin moralities, as well as social technol... (read more)