Pentashagon comments on Robust Cooperation in the Prisoner's Dilemma - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (145)
What about optimizing proof search order? An extreme example would just be an agent with a lookup table indexed by the Goedel number of each possible agent up to a maximum value of some N, with either Defect or Cooperate in the lookup table which is the action the agent will take. If impatient agents came first in the lookup table they could quickly prove cooperation. The agent could append its quine to the end of the lookup table before searching in order to cooperate with itself. There should be an optimal lookup table for any length N and set S of games against other agents, defined as the lookup table that generates the highest score when playing all the games in set S.
Instead of including the lookup table the agent could perform the same sort of proof search necessary to generate the lookup table at runtime. If it searched for proofs in roughly the same order as the corresponding entries in the lookup table then impatient opposing agents would conveniently find a cooperation proof early in the simulation of the agent. In this case there would be an optimal list of rules for proof search order (perhaps including direct proofs for the most likely impatient opposing agents) that would approach the optimality of the lookup table, or perhaps exceed it if enough opposing agents give up when they see the size of the lookup-table-agent.
I have no idea if any of this is theoretically interesting or just a hack to potentially play PD slightly better.
And presumably more like O(2^N) in practice? I don't know what percentage of bit strings are well formed formulas in an average formal language, but I would imagine it's a significant fraction. The proof searcher would have to check every wff to completion because it doesn't have Con(PA) to rule out contradictions, and I assumed that most of the proofs would be of the form "First the agent writes a 0 and moves the read-head right and goes to state 5, ... , then the agent halts in a Cooperating state", so N would probably be very large.