Zero-Knowledge Cooperation
A lot of ink has been spilled about how to get various decision algorithms to cooperate with each other. However, most approaches require the algorithm to provide some kind of information about itself to a potential cooperator. Consider FDT, the hot new kid on the block. In order for FDT to cooperate with you, it needs to reason about how your actions are related to the algorithm that the FDT agent is running. The naive way of providing this information is simple: give the other agent a copy of your “source code”. Assuming that said code is analyzable without running into halting problem issues, this should allow FDT to work out whether it wants to cooperate with you. Security needs of decision algorithms However, just handing out your source code to anyone isn’t a great idea. This is because agents often want to pretend to have a different decision algorithm to the one that they really do. Threats and capitulation thresholds Many people adopt some version of the “don’t give in to threats” heuristic. The reasoning here is solid: while refusing to give in to a threat carries a penalty in the particular instance, being the kind of person who doesn’t give in to threats pays off in the long run because you receive fewer threats. However, usually this will come with an escape hatch in case the stakes get too high. If I am trying to extort £100 from you by plausibly threatening to destroy the planet, you should just pay the money. Why is this okay? Well, threats of higher magnitude are usually harder to create, or carry higher costs for the threatener, so an agent can expect to face relatively few of them. The absolute best thing for an agent, though, would be to have the reputation of never giving in to threats, no matter how high the cost, while actually being willing to give in to threats above some threshold. That way you never get threatened, but if by some ill luck you do face an enormous threat, you can still capitulate. In general, an agent would like their
Often the effect of being blinded is that you take suboptimal actions. As you pointed out in your example, if you see the problem then all sorts of cheap ways to reduce the harmful impact occur to you. So perhaps one way of getting to the issue could be to point at that: "I know you care about my feelings, and it wouldn't have made this meeting any less effective to have had it more privately, so I'm surprised that you didn't"?