V_V comments on Naturalistic trust among AIs: The parable of the thesis advisor's theorem - Less Wrong

24 Post author: Benja 15 December 2013 08:32AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (20)

You are viewing a single comment's thread.

Comment author: V_V 16 December 2013 03:44:10PM *  3 points [-]

I'm under the impression that all these "Löbian obstacles" essentially boil down to the "you can't be sure that you are not insane" kind of stuff.

IIUC, your thesis advisors scenario can be solved by having a first thesis advisor who only outputs theorems which are provable within some formal system and every other thesis advisor trusting by axiom his predecessor.
This doesn't work in your (MIRI's) framework because you want the agent to trust not its ancestors but its successors, and you want this to hold for an unbounded number of generations.
But this may just be a problem of your framework, not some fundamental propriety of computational agents. At least, you haven't produced a satisfactory argument for that.

Anyway, I think that framing these problems in terms of provability logic might be inappropriate, as concepts like "beliefs", "trust" or "agents" are difficult to formalize in that language, hence you get a mix of formal and informal reasoning, which is the traditional recipe for paradoxes and errors.

It's much better to frame these problems directly in terms of computer programs. This way agents, internal states, world physics, etc., can be defined in a much more precise way.

In this framework the main "Gödelian" result is Rice's theorem, which, among other things, implies that functional equivalence between arbitrary programs is uncomputable. Nevertheless, there exist under-approximations of functional equivalence which are computable, and this can be exploited by "cliquing" strategies in scenarios where programs have to "trust" each other (e.g. program-equilibrium prisoner's dilemma).
Cliquing based on a lexical equivalence predicate is restrictive, obviously, but it can be generalized on any predicate that under-approximates functional equivalence.
Of course, this assumes that you start with a program that is free of bugs, OS and compilers are free of bugs, hardware never glitches, and so on. But, unrealistic as it is, this seems to be an unavoidable assumption: no program will ever be able to debug itself, or to certify that the hardware it runs on works as advertised.

Comment author: cousin_it 17 December 2013 08:54:59AM *  1 point [-]

Yeah, good points. Previously discussed in this post and this comment. (I see you also commented there.) I'd love to see more progress in that direction.

Comment author: V_V 20 December 2013 02:06:50AM 0 points [-]

Thanks. Added a comment in the WD/EY subthread.