I find this interesting, thanks for working on it. I’ve been thinking about similar things for a while and have heard related discussions, but I’m happy to have more standardized terminology and the links to existing literature.
I am more interested in how this could be used improve our thinking abilities for broad range of valuable purposes, rather than on the implications specifically for them to be unsafe.
I'm brainstorming ways this post may be off the mark. Curious if you have any :)
Epistemic status: speculative. I’m no expert in delegation nor in CAIS. I think I have good intuitions on entrepreneurship and user design. I started reading and thinking about negotiation agents in 2017. I also read too many psychology papers in my spare time.
I’ll give a short talk at the AI Safety Discussion Day on Monday 14 December.
→ Please comment with feedback on what I might be missing below to inform my talk.
Drexler posits that artificial general intelligence may be developed recursively through services that complete bounded tasks in bounded time. The Comprehensive AI Services (CAIS) technical report, however, doesn’t cover dynamics where software companies are incentivised to develop personalised services.
The more a company ends up personalising an AI service around completing tasks according to individual users’ needs, the more an instantiation of that service will resemble an agent acting on a user’s behalf. From this general argument follows that both continued increases in customers’ demand for personalised services and in companies’ capacity to process information to supply them (as this whitepaper suggests) will, all else equal, result in more agent-like services.
Neat example: Asana Inc. is developing its online task-management platform into a virtual assistant that helps subscribed teams automate task allocation and prepares documents for employees to focus on their intended tasks. Improved team productivity and employee satisfaction in turn enable Asana to expand their subscriber base and upgrade models.
Counterexamples: Utilities in cloud compute or broadband internet can’t add sufficient value through mass customisation or face legal backlash if they do. Online media and networking services like Google or Facebook rely on third parties for revenue, making it hard to earn the trust of users and get paid to personalise interconnected interfaces.
CAIS neglects what I dub delegated agents: agents designed to act on a person’s behalf.
A next-generation software company could develop and market delegated agents that
Developments in commercially available delegated agents – such as negotiation agents and virtual assistants – will come with new challenges and opportunities for deploying AI designs that align with shared human values and assist us to make wiser decisions.
There’s a bunch of research on delegation spanning decades that I haven’t yet seen discussed in the AI Safety community. Negotiation agents, for example, serve as clean toy models for inquiring into how users can delegate to agents in complex, multi-party exchanges. This paper disentangles dimensions across which negotiation agents must perform to be adopted widely: domain knowledge and preference elicitation, user trust, and long-term perspective.
Center for Human-Compatible AI researchers have published mechanisms to keep humans in the loop such as cooperative IRL, and recently discussed multi-multi delegation in a research considerations paper. As in the CAIS report, the mentioned considerations appear rather decoupled from human contexts in which agent-like designs need to be developed and used to delegate work.
In my talk, I’ll explain outside research I read and practical scenarios/concrete hypotheses I tried to come up with from the perspective of a software company and its user base. Then, let’s discuss research considerations!
Before you read further:
Think up your own scenario where a user would start delegating their work to an agent.
A scenario
A ‘pure’ delegated agent may start out as a personal service hosted through an encrypted AWS account. Wealthy, tech-savvy early adopters pay a monthly fee to use it as an extension of themselves – to pre-process information and automate decisions on their behalf.
The start-up's founders recognise that their new tool is much more intimate and intrusive than good ol' GMail and Facebook (which show ads to anonymised user segments). To market it successfully, they invest in building trust with target users. They design the delegated agent to assuage their user's fears around data privacy and unfeeling autonomous algorithms, leave control firmly in the user's hands, explain its actions, and prevent outsiders from snooping or interfering in how it acts on the user’s behalf (or at least give consistent impressions thereof). This instils founder effects in terms of the company's core expected design and later directions of development.
Research directions that may be relevant to existential safety
See this technical paper if that question interests you.
Will a service designed to act on behalf of consumers or coalitions converge on a bounded planning horizon? Would the average planning horizon of a delegated agent be longer than that of ‘conventional’ CAIS? How would stuff like instrumental convergence and Goodharting look like in a messy system of users buying delegated agents that complete tasks across longer time horizons but flexibly elicit and update their model of the users’ preferences and enforcers’ policies?