Disclaimer

Sorry for the length of this "question". The post in which I cover the preliminary context hasn't been published yet (4,000+ words and I'm < 40% through my outline).


 

Introduction

I have done some thinking of how one might quantify the capabilities of an AI system to influence the "real world" environment.

I have identified three broad interfaces for executing capabilities:

  1. Humans (and groups thereof)
  2. Human infrastructure
  3. Physics/Bare reality


 

Humans

The AI could influence the world by convincing humans to do things that it wants.

Some relevant skills:

  • Communication
  • Negotiation/Bargaining (broadly construed)
  • Trade (broadly construed)
  • Persuasion
  • Flirtation
  • Deception
  • Manipulation
  • Etc.

The idea here is influence the real world via influencing individual humans or groups of humans

 

Quantifying Human Interfacing Capabilities

To a first approximation, a function to quantify human capabilities might be a positive (aggregate) function of something like: “the likelihood that the AI could convince a(n arbitrary) human to perform a(n arbitrary) act” (let's call this the likelihood of successful persuasion [LSP]). The function may apply suitable modifications to LSP such as:

  • Positive weighted by the power/influence of the human(s)
  • Inversely weighted by the time taken to convince said human(s)
  • Positive weighted by the influence of the act(s)
  • Inversely weighted by the human('s/s') disposition towards the act(s)

 

Defence of the Method

The human interface is just the AI influencing the world via influencing humans. A naive way to quantify the AI's ability to influence humans is something like an aggregate of LSP. Upon thinking on it for a few seconds, you'd like to modify the LSP in some ways.

This is vague, because I am not trying to design a concrete measure of "ability to influence humans", at this stage of my thinking, I'm fine with a high level abstraction of what such a measure might look like.


 

Human Infrastructure

The AI could influence the world via the levers of human civilisation. A non-exhaustive list of relevant human infrastructure follows:

  • Social
  • Political
  • Cultural
  • Economic and financial
  • Security
  • Information Technology
  • Legal
  • Etc.
  • Subclasses of the above
  • Superclasses of the above
  • Intersections of the above
  • Unions of the above

 

An example of influencing the world via human infrastructure is hacking into a web server. I'm not going to attempt to list the relevant classes of skills for interacting with human infrastructure because they are far too many. 

Some ways of interfacing with human infrastructure

  • All jobs/occupations in the global economy.
  • Any service for which monetary, social, or other compensation can be provided.
  • Miscellaneous edge case

 

Quantifying Human Infrastructure Interfacing Capabilities

To a first approximation, you want something like "economic power". One might try to capture related concepts like social power, but I think "economic power" will capture it adequately.

Some ways to operationalise economic power:

  • Net present value of aggregate available economic resources
  • Maximum utilisation rate of aggregate available economic resources
  • Some others
    • If I needed a concrete measure, I'd ask an economist.

At the level of abstraction at which I'm thinking of things, I find his vague notion satisfactory.

 

Defence of the Method

My intuition for quantifying "human infrastructure interfacing capabilities" in economic terms is something like:

  • Economists have been working really hard to quantify (analogues of) this for centuries.
    • Don't reinvent the wheel.
    • Stand on the shoulders of giants.
  • The market is smarter and wiser than all of us and already attempts to value the entire human economy.
    • Ideally, free markets aggregate information from all participants in the economy.
    • The global economy isn't a true free market, but "high level of abstraction" and it's still a much better instrument than any others we have.


 

Physics

AKA the "bare interface".

The AI can also attempt to manipulate its environment directly without using the humans or human infrastructure interfaces. Any way of interacting with the real world that doesn't use the "human"/"human infrastructure" proxies — interfacing with bare reality itself — are interacting via the "physics" interface.

One way of an AI system interacting with a human through the physics interface is shooting them down with a lethal autonomous weapon.

The other natural sciences (chemistry, biology, geology, meteorology etc.) apply here (but they seem to be emergent physics/physics at higher levels of abstraction, so I still think of it as the "physics interface" [but I'm willing to change/drop the name if people are sufficiently opposed to it]).

Some relevant skills to influence physics include:

  • Tool use (broadly construed)
  • Scientific research (broadly construed)
  • Engineering (broadly construed)
  • Technological innovation/invention (broadly construed)

 

My Question

At a very high level of abstraction (consider the methods I proposed for the other interfaces) how would you quantify the ability of an AI System to influence bare reality?

 

Desiderata for an Answer

A good abstraction for quantifying the physics interfacing capabilities of an AI system should have the following properties:

  • Intuitively sensible
    • It should adequately match what a commonsense notion of "real world capabilities" is.
    • There shouldn't be actions that intuitively come across as quite impactful that would get assessed as not impactful by the measure.
  • Robust to scale
    • The measure should adequately quantify capabilities of an AI system at both the low and very high ends.
      • Example low end action: raising the elevation of a 100g ball by one metre
      • Example high end action: stellar engineering
      • Example very high end action: tiling the affectable universe with paperclips
  • Orthogonal to Motivations
    • The measure shouldn't care what the motivations, goals, or values of the AI system it's assessing are.
    • No AI systems should be systematically upgraded or degraded based on their motivations.
  • General
    • Able to quantify the myriad ways in which an agent may influence base reality via direct action.

 

Money is very good a measure for quantifying capability deployed via the human infrastructure interface because it satisfies the above criteria (when reinterpreted to fit human infrastructure).

So for clues on where to look, what fits the analogy: "money but for physics"? What's the currency with which arbitrary physical capability can be purchased?

New Answer
New Comment
1 comment, sorted by Click to highlight new comments since:

To be clear, I do have a few ideas, but they're hypotheses I privileged, so I don't want to poison your thinking by mentioning them.

Maybe in three days (or whenever I finish the mega essay), I'll lay them out later and explore why I'm dissatisfied with them.