Epistemic status: speculative.  I’m no expert in delegation nor in CAIS. I think I have good intuitions on entrepreneurship and user design. I started reading and thinking about negotiation agents in 2017. I also read too many psychology papers in my spare time.

I’ll give a short talk at the AI Safety Discussion Day on Monday 14 December. 
→ Please comment with feedback on what I might be missing below to inform my talk. 

Drexler posits that artificial general intelligence may be developed recursively through services that complete bounded tasks in bounded time. The Comprehensive AI Services (CAIS) technical report, however, doesn’t cover dynamics where software companies are incentivised to develop personalised services

The more a company ends up personalising an AI service around completing tasks according to individual users’ needs, the more an instantiation of that service will resemble an agent acting on a user’s behalf. From this general argument follows that both continued increases in customers’ demand for personalised services and in companies’ capacity to process information to supply them (as this whitepaper suggests) will, all else equal, result in more agent-like services

Neat example: Asana Inc. is developing its online task-management platform into a virtual assistant that helps subscribed teams automate task allocation and prepares documents for employees to focus on their intended tasks. Improved team productivity and employee satisfaction in turn enable Asana to expand their subscriber base and upgrade models.

Counterexamples: Utilities in cloud compute or broadband internet can’t add sufficient value through mass customisation or face legal backlash if they do. Online media and networking services like Google or Facebook rely on third parties for revenue, making it hard to earn the trust of users and get paid to personalise interconnected interfaces. 

CAIS neglects what I dub delegated agents: agents designed to act on a person’s behalf. 
A next-generation software company could develop and market delegated agents that 

  • elicit and model a user’s preferences within and across relevant contexts.
  • build trust with the user to represent their interests within a radius of influence.
  • plan actions autonomously in interaction with other agents that represent other consumers, groups with shared interests, and governance bodies.

Developments in commercially available delegated agents – such as negotiation agents and virtual assistants – will come with new challenges and opportunities for deploying AI designs that align with shared human values and assist us to make wiser decisions. 

There’s a bunch of research on delegation spanning decades that I haven’t yet seen discussed in the AI Safety community. Negotiation agents, for example, serve as clean toy models for inquiring into how users can delegate to agents in complex, multi-party exchanges. This paper disentangles dimensions across which negotiation agents must perform to be adopted widely: domain knowledge and preference elicitation, user trust, and long-term perspective.

Center for Human-Compatible AI researchers have published mechanisms to keep humans in the loop such as cooperative IRL, and recently discussed multi-multi delegation in a research considerations paper. As in the CAIS report, the mentioned considerations appear rather decoupled from human contexts in which agent-like designs need to be developed and used to delegate work.


In my talk, I’ll explain outside research I read and practical scenarios/concrete hypotheses I tried to come up with from the perspective of a software company and its user base. Then, let’s discuss research considerations!

 

Before you read further
Think up your own scenario where a user would start delegating their work to an agent.

A scenario

A ‘pure’ delegated agent may start out as a personal service hosted through an encrypted AWS account. Wealthy, tech-savvy early adopters pay a monthly fee to use it as an extension of themselves – to pre-process information and automate decisions on their behalf.

The start-up's founders recognise that their new tool is much more intimate and intrusive than good ol' GMail and Facebook (which show ads to anonymised user segments). To market it successfully, they invest in building trust with target users. They design the delegated agent to assuage their user's fears around data privacy and unfeeling autonomous algorithms, leave control firmly in the user's hands, explain its actions, and prevent outsiders from snooping or interfering in how it acts on the user’s behalf (or at least give consistent impressions thereof). This instils founder effects in terms of the company's core expected design and later directions of development.
 

Research directions that may be relevant to existential safety

  • Narrow value learning:  Protocols for eliciting preferences that are user time/input-efficient, user-approved/friendly and context-sensitive (reducing elicitation fatigue, and ensuring that users know how to interact and don’t disengage). Models for building accurate (hierarchical?) and interpretable (semi-symbolic?) representations of the user’s preferences on the fly within the service’s defined radius of influence.
  • Defining delegation:  How to define responsibility and derive enforceable norms in cases where a person and an agent acting on its behalf collaborate on exercising control and alternate in the taking of initiative?
  • Heterogeneity of influence:  How much extra negotiation power or other forms of influence does paying extra for a more sophisticated and computationally powerful delegated agent with more access to information offer? Where does it make sense for groups to pool funds to pay for a delegated agent to represent shared interests? To what extent does being an early mover or adopter in this space increase later influence?
  • Governance and enforcement:  How to coordinate the distribution of punishments and rewards to heterogeneous delegated agents (and to the users who choose which designs to buy so they have skin in the game) such that they steer away from actions that impose negative externalities (including hidden systemic risks) onto other, less-represented persons and towards cooperating on creating positive externalities? 
    See this technical paper if that question interests you.
  • Emergence of longer-term goals:  Drexler argues for a scenario where services are developed that complete tasks within bounded times (including episodic RL). 
    Will a service designed to act on behalf of consumers or coalitions converge on a bounded planning horizon? Would the average planning horizon of a delegated agent be longer than that of ‘conventional’ CAIS? How would stuff like instrumental convergence and Goodharting look like in a messy system of users buying delegated agents that complete tasks across longer time horizons but flexibly elicit and update their model of the users’ preferences and enforcers’ policies?
New Comment
5 comments, sorted by Click to highlight new comments since:

I find this interesting, thanks for working on it. I’ve been thinking about similar things for a while and have heard related discussions, but I’m happy to have more standardized terminology and the links to existing literature.

I am more interested in how this could be used improve our thinking abilities for broad range of valuable purposes, rather than on the implications specifically for them to be unsafe.

Sure! I'm curious to hear any purposes you thought of that delegated agents could assist with.

I'm brainstorming ways this post may be off the mark. Curious if you have any :)

  • You can personalise an AI service across some dimensions that won’t make it more resemble an agent acting on a person’s behalf (or won't meet all criteria of 'agentiness')
    • not acting *over time* - more like a bespoke tool customised once to a customer’s preferred parameters, e.g. a website-builder like wix.com
    • an AI service personalising content according to a user’s likes/reads/don't show clicks isn't agent-like
    • efficient personalised services will be built on swappable modules and/or shared repositories of consumer preference components and contexts (meaning that the company never actually runs an independent instantiation of the service)
  • Personalisation of AI services will fall short of delegated agents except in a few niches because of lack of demand or supply
    • a handful of the largest software corporations (FAAMG, etc.) have locked in customers into networks and routines but are held back from personalising customer experiences because they tend to rely on third-party revenue streams
    • it's generally more profitable to specialise in and market a service that caters to either high-paying discerning customers, or a broad mass audience that's basically okay with anything you give them
    • too hard to manage mass customisation or not cost-effective compared to other forms of business innovation
    • humans are already well-adapted and trained for providing personalised services; AI can compete better in other areas
    • humans already have very similar preferences within the space of theoretical possibilities – making catering to individual differences less fruitful than you'd intuitively think
    • it’s easier to use AI to shape users to have more homogenous preferences than to cater to preference differences
    • eliciting human preferences takes up too much of the user's attention and/or runs up against too many possible interpretations (based on assumptions of user's rationality and prior knowledge, as well as relevant contextual cues) to work
    • you can make more commercial progress by designing and acclimatising users to a common interface that allows those users to meet their diverging preferences themselves (than to design AI interfaces that elicits the users' preferences and acts on their behalf)
    • software engineers need a rare mix of thing- and person-oriented skills to develop delegated agents
    • a series of bad publicity incidents impede further development (analogous to self-driving car crashes)
    • data protection or anonymisation laws in Europe and beyond limit personalisation efforts (or further down the line, restrictions on autonomous algorithms do)
    • doesn’t fit current zeitgeist somehow in high-income nations
  • Research directions aren't priorities
    • Advances in preference learning will be used for other unhelpful stuff (just read Andrew Critch's post)
    • Research on how much influence delegated agents might offer can, besides being really speculative, be misused or promote competitive dynamics
  • Context assumptions:
    • Delegated agents will be developed first inside say military labs (or other organisational structures in other places) that involve meaningfully dissimilar interactions than at a Silicon Valley start-up.
    • Initial contexts in which delegated agents are produced and used really don’t matter for how AI designs are deployed in later decades (something like, it’s overdetermined)
  • Conceptual confusion:
    • Terms in this post are ambiguous or used to refer to different things (e.g. general AI 'tasks' vs. 'tasks' humans conceive and act on, 'service' infrastructure vs. online 'service' aimed at human users, 'virtual assistant' conventionally means a remote human assistant, 'model')
    • An ‘AI agent’ is a vague, leaky concept that should be replaced with more exacting dimensions and mechanisms
    • Carving out humans and algorithms into separate individuals with separate ‘preferences’ is a fundamentally impoverished notion. This post assumes that perspective and therefore fosters mistaken/unskillful reasoning.
[+][comment deleted]00
[+][comment deleted]00