Wikitag Contributions

Comments

Sorted by
juggins21

I see the LLM side of this as a first step, both as a proof of concept and because agents get built on top of LLMs (for the forseeable future at least).

I think that, no, it isn't any easier to align an agent's environment as to align the agent itself. I think for perfect alignment, that will last in all cases and for all time, they amount to the same thing, and this is why the problem is so hard. When an agent or any AI learns new capbilities, it draws the information it needs out of the environment. It's trying to answer the question: "Given the information coming into me from the world, how do I get the right answer?" So the environment's structure basically determines what the agent ends up being.

So the key question is the one you say, and that I try to allude to by talking about an aligned ontology: is there a particular compression, a particular map of the territory, which is good enough to initialise acceptable long-term outcomes?