User Comment Replies

Glad to see this published—nice work!

2adamShimi4y

Thanks!

[AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI

Re Regulatory markets for AI safety: You say that the proposal doesn’t seem likely to work if “alignment is really hard and we only get one shot at it” (i.e. unbounded maximiser with discontinuous takeoff). Do you expect that status-quo government regulation would do any better, or just that any regulation wouldn’t be helpful in such a scenario? My intuition is that even if alignment is really hard, regulation could be helpful e.g. by reducing races to the bottom, and I’d rather have a more informed group (like people from a policy and technical safety team at a top lab) implementing it instead of a less-informed government agency. I’m also not sure what you mean by legible regulation.

4Rohin Shah5y

I agree that regulation could be helpful by reducing races to the bottom; I think what I was getting at here (which I might be wrong about, as it was several months ago) was that it is hard to build regulations that directly attack the technical problem. Consider for example the case for car manufacturing. You could have two types of regulations: 1. Regulations that provide direct evidence of safety: For example, you could require that all car designs be put through a battery of safety tests, e.g. crashing them into a wall and ensuring that the airbags deploy. 2. Regulations that provide evidence of thinking about safety: For example, you could require that all car designs have at least 5 person-years of safety analysis done by people with a degree in Automotive Safety (which is probably not an actual field but in theory could be one). Iirc, the regulatory markets paper seemed to have most of its optimism on the first kind of regulation, or at least that's how I interpreted it. That kind of regulation seems particularly hard in the one-shot alignment case. The second kind of regulation seems much more possible to do in all scenarios, and preventing races to the bottom is an example of that kind of regulation. I'm not sure what I meant by legible regulation -- probably I was just emphasizing the fact that for regulations to be good, they need to be sufficiently clear and understood by companies so that they can actually be in compliance with them. Again, for regulations of the first kind this seems pretty hard to do.

2019 AI Alignment Literature Review and Charity Comparison

apc5y10

Great post.

Re Regulatory Markets for AI Safety: I’d be interested in hearing more about why you think that they might not be useful in an AGI scenario, because “the goals and ex post measurement for private regulators are likely to become outdated and irrelevant”.

Why do you think the goals and ex post measurement are likely to become irrelevant? Furthermore, isn’t this also an argument against any kind of regulatory action on AI? Because with status-quo regulation, the goals and ex post measurement of regulatory outcomes are o... (read more)

Cortés, Pizarro, and Afonso as Precedents for Takeover

apc5y*Ω8130

Is this a fair description of your disagreement re the 90% argument?

Daniel thinks that a 90% reduction in the population of a civilization corresponds to a ~90% reduction in their power/influentialness. Because the Americans so greatly outnumbered the Spanish, this ten-fold reduction in power/influentialness doesn’t much alter the conclusion.

Matthew thinks that a 90% reduction in the population of a civilization means that “you don’t really have a civilization”, which I interpret to mean something like a ~99.9%+ reduction in t... (read more)

6Matthew Barnett5y

For my part, I think you summarized my position fairly well. However, after thinking about this argument for another few days, I have more points to add. * Disease seems especially likely to cause coordination failures since it's an internal threat rather than an external threat (which unlike internal threats, tend to unite empires). We can compare the effects of the smallpox epidemic in the Aztec and Inca empires alongside other historical diseases during wartime, such as the Plauge of Athens which arguably is what caused Athens to lose the Peloponnesian War. * Along these same lines, the Aztec/Inca didn't have any germ theory of disease, and therefore didn't understand what was going on. They may have thought that the gods were punishing them for some reason, and therefore they probably spent a lot of time blaming random groups for the catastrophe. We can contrast these circumstances to eg. the Paraguayan War which killed up to 90% of the male population, but people probably had a much better idea what was going on and who was to blame, so I expect that the surviving population had an easier time coordinating. * A large chunk of the remaining population likely had some sort of disability. Think of what would happen if you got measles and smallpox in the same two year window: even if you survived it probably wouldn't look good. This means that the pure death rate is an underestimate of the impact of a disease. The Aztecs, for whom "only" 40 percent died of disease, were still greatly affected

5Daniel Kokotajlo5y

Thanks Alexis, this seems like an accurate description to me. Strong-upvoted, partly because I want to reward people for doing these sorts of summary-and-distillation stuff. As for your question, hmm, I'm not sure. I tentatively say yes, but my hesitations are (1) cases where 90% of the population dies are probably very rare, and (2) how would we measure power anyway? Presumably most civilizations that lose 90% of their population do end up conquered by someone else pretty quickly, since most civilizations aren't 10x more powerful than all their neighbors. I think the crux is this business about the chain of command. Cortez and Pizarro succeeded by getting Americans to ally with them and/or obey them. The crux is, would they have been able to do this as well or mostly as well without the disease? I think that reading a bunch of books on what happened might more or less answer this question. For example, maybe the books will say that the general disarray caused by the disease created a sense of desperation and confusion in the people which led them to be open to the conquistador's proposals when otherwise they would have dismissed them. In which case, I concede defeat in this disagreement. Or maybe the books will say that if only the conquistadors had been outnumbered even more, they would have lost. But what I predict is that the books will say that, for the most part, the reasons why people allied with Cortes and Pizarro had more to do with non-disease considerations: "Here is this obviously powerful representative of an obviously powerful faraway empire, wielding intriguing technology that we could benefit from. There is our hated enemy, Tenochtitlan, who has been oppressing us for decades. Now is our chance to turn the tables on our oppressors!" Similarly, I predict that the reason why the emperors allowed the conquistadors to get close enough to ambush them have little to do with disease and more to do with, well, just not predicting that the conquistadors w

What can the principal-agent literature tell us about AI risk?

apc5y10

Thanks for clarifying. That's interesting and seems right if you think we won't draft legal contracts with AI. Could you elaborate on why you think that?

2DanielFilan5y

Well because I think they wouldn't be enforceable in the really bad cases the contracts would be trying to prevent :) And also by default people currently delegate tasks to computers by writing software, which I expect to continue in future (although I guess smart contracts are an interesting edge case here).

What can the principal-agent literature tell us about AI risk?

apc5y10

I think it's worth distinguishing between a legal contract and setting the AI's motivational system, even though the latter is a contract in some sense. My reading of Stuart's post was that it was intended literally, not as a metaphor. Regardless, both are relevant; in PAL, you'd model motivational system via the agents utility function, and the contract enforceability via the background assumption.

But I agree that contract enforceability isn't a knock-down, and indeed won't be an issue by default. I think we should have frame... (read more)

2DanielFilan5y

To restate/clarify my above comment, I agree, but think that we are likely to delegate tasks to AIs by setting their motivational system and not by drafting literal legal contracts with them. So the PAL is relevant to the extent that it works as a metaphor for setting an AIs motivational system and source code, and in this context contract enforceability isn't an issue, and Stuart is making a mistake to be thinking about literal legal contracts (assuming that he is doing so).

What can the principal-agent literature tell us about AI risk?

apc5yΩ230

I agree that this seems like a promising research direction! I think this would be done best while also thinking about concrete traits of AI systems, as discussed in this footnote. One potential beneficial outcome would be to understand which kind of systems earn rents and which don't; I wouldn't be surprised if the distinction between rent earning agents vs others mapped pretty cleanly onto a Bostromian utility maximiser vs CAIS distinction, but maybe it won't.

In any case, the alternative perspective offered by the agency rents framing comp... (read more)

What can the principal-agent literature tell us about AI risk?

apc5y20

Thanks! Yeah, we probably should have included a definition. The wikipedia page is good.

What can the principal-agent literature tell us about AI risk?

apc5y80

Thank you! :)

I wouldn't characterise the conclusion as "nope, doesn't pan out". Maybe more like: we can't infer too much from existing PAL, but AI agency rents are an important consideration, and for a wide range of future scenarios new agency models could tell us about the degree of rent extraction.

What can the principal-agent literature tell us about AI risk?

apc5yΩ590

The claim that this couldn't work because such models are limited seems just arbitrary and wrong to me.

The economists I spoke to seemed to think that in agency unawareness models conclusions follow pretty immediately from the assumptions and so don't teach you much. It's not that they can't model real agency problems, just that you don't learn much from the model. Perhaps if we'd spoken to more economists there would have been more disagreement on this point.

4RobinHanson5y

We have lots of models that are useful even when the conclusions follow pretty directly. Such as supply and demand. The question is whether such models are useful, not if they are simple.

What can the principal-agent literature tell us about AI risk?

apc5y60

Aside from the arguments we made about modelling unawareness, I don't think we were claiming that econ theory wouldn't be useful. We argue that new agency models could tell us about the levels of rents extracted by AI agents; just that i) we can't infer much from existing models because they model different situations and are brittle, ii) that models won't shed light on phenomena beyond what they are trying to model

1RobinHanson5y

"models are brittle" and "models are limited" ARE the generic complaints I pointed to.

What can the principal-agent literature tell us about AI risk?

apc5y30

The intuition is that if the principal could perfectly monitor whether the agent was working or shirking, they can just specify a cause in the contract the punishes them whenever they shirk. Equivalently, if the principal knows the agent's cost of production (or ability level), they can extract all the surplus value without leaving any rent.

Pages 40-53 of The Theory of Incentives contrasts these "first best" and "second-best" solutions (it's easy to find online).

1MichaelA5y

(This rambly comment is offered in the spirit of Socratic grilling.) I hadn't noticed I should be confused about the agency rent vs monopoly rent distinction till I saw Wei Dai's comment, but now I realise I'm confused. And the replies don't seem to clear it up for me. Tom wrote: That's definitely one way in which they're different. Is that the only way? Are they basically the same concept, and it's just that you use one label (agency rents) when focusing on rents the worker can extract due to lack of competition between workers, and the other (monopoly rents) when focusing on rents the firms can extract due to lack of competition between firms? But everything is the same on an abstract/structural level? Could we go a little further, and in fact describe the firm as an agent, with consumers as its principal? The agent (the firm) can extract agency rents to the extent that (a) its activities at least somewhat align with those of the principal (e.g., it produces a product that the public prefers to nothing, and that they're willing to pay something for), and (b) there's limited competition (e.g., due to a patent). I.e., are both types rents due to one actor (a) optimising for something other than what the other actors wants, and (b) being able to get away with it? That seems consistent with (but not stated in) most of the following quote from you: What my proposed framing seems to not account for is that discussion of agency rents involves mention of imperfect monitoring as well as imperfect competition. But I think I share Wi Dai's confusion there. If the principal had no other choice (i.e., there's no competition), then even with perfect monitoring, wouldn't there still be agency rents, as long as the agent is optimising for something at least somewhat correlated with the principal's interests? Is it just that imperfect monitoring increases how much the agent can "get away with", at any given level of correlation between its activities and the principal's inter

What can the principal-agent literature tell us about AI risk?

apc5yΩ360

Thanks for catching this! You’re correct that that sentence is inaccurate. Our views changed while iterating the piece and that sentence should have been changed to: “PAL confirms that due to diverging interests and imperfect monitoring, AI agents could get some rents.”

This sentence too: “Overall, PAL tells us that agents will inevitably extract some agency rents…” would be better as “Overall, PAL is consistent with AI agents extracting some agency rents…”

I’ll make these edits, with a f... (read more)

2Wei Dai5y

Thanks for making the changes, but even with "PAL confirms that due to diverging interests and imperfect monitoring, AI agents could get some rents." I'd still like to understand why imperfect monitoring could lead to rents, because I don't currently know a model that clearly shows this (i.e., where the rent isn't due to the agent having some other kind of advantage, like not having many competitors). Also, I get that the PAL in its current form may not be directly relevant to AI, so I'm just trying to understand it on its own terms for now. Possibly I should just dig into the literature myself...

Robin Hanson on the futurist focus on AI

apc5y*50

I've also found it hard to find relevant papers.

Behavioural Contract Theory reviews papers based on psychology findings and notes:

In almost all applications, researchers assume that the agent (she) behaves according to one psychologically based model, while the principal (he) is fully rational and has a classical goal (usually profit maximization).

Optimal Delegation and Limited Awareness is relevant insofar as you consider an agent knowing more facts about the world is akin to them being more capable. Papers which consider contracting scenarios with b... (read more)

What is the evidence for productivity benefits of weightlifting?

apc6y140

Thanks for writing this! It was useful when organising my workout routine.

I read the Kovacevic et al paper on sleep you cite, and there are some caveats probably relevant to some LW readers. In particular, the benefits are less clear for younger adults.

Acute resistance exercise studies

"There was some evidence that an acute bout of resistance exercise may reduce the number of arousals during sleep"
They base this on three studies. The cohorts are elderly (65-80 years), middle-aged (mean 44.4 +- 8 years), and young (21.9 +- 2.7 years). They note tha

... (read more)

LESSWRONG
LW

All of apc's Comments + Replies