User Comment Replies

Thanks, Owen. What a nourishing post. The evocative images help.

"what is good and right and comfortable?" - mh, I would switch 'comfortable' for 'at ease' (to include consciously preferred discomfort, which is ok).

It could appear a sazen, to some. Also, a bit cordially funny-sad, how the explanation has underperformed in seducing some of those who may benefit from it. Would need to refine the teaching.

I'll try to add my very subjective take, since I have not noticed this understanding in the comments yet:

'Wholesomeness' is used as a an evocativ... (read more)

2owencb1y

Thanks, yes, I think that you're looking at things essentially the same way that I am. I particularly like your exploration of what the inner motions feel like; I think "unfixation" is a really good word.

Why multi-agent safety is important

Dimitri Molerov3y10

Interesting read! Thank you.

On the last evaluation problem: One could give an initial set of indicators of trustworthiness, deception, and alignment; this does not solve the issue of an initial deceptive agent misleading babyAGI or inconsistencies. If attaching meta-data about sourcing is possible, i.e., where/with whom an input was acquired, the babyAGI could also sort it into approach box and re-evaluate the learning later, or could attempt to relearn.

Further suppose we impose requirement for double feedback before acceptance by the deceptive agent and trustworthy trainer, babyAGI could include negative feedback from a trainer (developer or more advanced stable version). That might help stall a bit.

2Akbir Khan3y

Hey thank you for the comments! (Sorry for slow response i'll try reply in line). 1) So i think input sourcing could be a great solution! However one issue we have especially with current systems (and in particular Independent Reinforcement Learning) is that it's really really difficult to disentangle other-agents from the environment. As a premise, imagine watching a law of nature and not being able to work out if this a learned behaviour or some omniscient being. Agents need not come conveniently packaged in some "sensors-actuators-internal structure-utility function" form [1]. 2) I think you've actually alluded to the class of solutions I see for multi-agent issues. Agents in the environment can shape other opponents learning, and as such can move entire populations to more stable equilibria (and behaviours). There are some great solutions that are starting to look at this [2, 3] and it's something I'm spending time developing currently. [1] https://www.lesswrong.com/posts/ieYF9dgQbE9NGoGNH/detecting-agents-and-subagents [2] LOLA - https://arxiv.org/abs/1709.04326 [3] MFOS - https://arxiv.org/abs/2205.01447

AI Could Defeat All Of Us Combined

Dimitri Molerov3y20

I'm pretty new to this field and only a hobby philosopher with only basic IT knowledge. Sorry for the lack of structure.
Do you know somebody who has framed the problem in the following ways? Let me know.

Here, I aim for an ideal future, and try to have it torn down to see where things could go wrong, but if not, still progress has been made regarding solutions.
My major assumption is, at point X in the future, AI has managed to dominate the world, embodied through robots or with a hold of major life-supporting organizational systems or has masked its presenc... (read more)

LESSWRONG
LW

All of Dimitri Molerov's Comments + Replies