solid, but I still think you're missing structure that makes this approach less effective than it seems on the face:
in full generality, what's a "threat"?
in full generality, what's a "dangerous" collision?
I worry that the current failure mode of attempting to empower in order to defend is that the defense is actually used to strike inside another's boundary, as has been the case for ~all weapons
in full generality, what's a "threat"?
in full generality, what's a "dangerous" collision?
Hm I'm not immediately sure how to define these
is that the defense is actually used to strike inside another's boundary, as has been the case for ~all weapons
Yeah, I am worried about this.
This is notably not the case for infosec and encryption, where defensive capability doesn't imply offensive capability. However, I'm unsure if this is also true for any physical interventions. (e.g.: Vaccines? No, bioweapons… Nanotech? No…)
That said, physical interventions do seem to be defense-dominant when there is coordination among a sufficiently large portion of society/power.
I don't think I'm convinced physical interactions are defense dominant. The easiest-to-formally-certify defense is to enclose something in a hunk of impenetrable matter, and that only can be certified up to a given impact energy level. Above that energy level, the defense will simply be stripped away. Only MAD seems able to be game theoretically durable, and certifying that a MAD situation will endure requires proving through a simulation of the opposition.
Might be obvious, but perhaps seems worth noting anyway: Ensuring that our boundaries are respected is, at least with a straightforward understanding of "boundaries", not sufficient for being safe.
For example:
Yes, see Agent membranes/boundaries and formalizing “safety” and davidad's comment.
(Also, I'm not necessarily agreeing that your examples are not violations of boundaries. First one isn't a violation of end-person (although probably the farmer). Second one could be.)
If the preservation of an agent's boundary is necessary for that agent's safety, how can that boundary/membrane be protected?
How agent boundaries get violated
In order to protect boundaries, we must first understand how they get violated.
Let’s say there’s a cat, and it gets stabbed by a sword. That’s a boundary violation (a.k.a. membrane piercing). In order for that to have happened, three conditions must have been met:
More generally, in order for any existing membrane to be pierced, three conditions must have all been met:
Protecting agent boundaries
Each of these three conditions then implies ways of preventing boundary violations (a.k.a. membrane piercing):
1. There was a potential threat.
2. There was a collision.
3. The victim failed to defend itself.
How human societies already try to solve this problem
As a helpful analogy, here’s some examples of how modern human societies try to solve this problem:
Minimize potential threats
Minimize dangerous collisions
Empower membranes to be better at self-defense
How this applies to AI safety:
Minimize potential AI threats
(this is obvious/boring so I'm omitting it)
Minimize dangerous AI collisions
(this is obvious/boring so I'm omitting it)
Empower membranes to be better at self-defense
Empower the membranes of humans and other moral patients to be more resilient to collisions with threats. Examples: