CodeOtter — LessWrong

Uhhhh I actually program artificial intelligence....?

You do know that the ability to modify your own code ("self-modifying") applies to every layer in the OSI model, each layer potentially influencing the data in transit... the data that determines the training of the classifiers...

You do know this... right?

Deception detection machines

CodeOtter11y00

I assumed as much and this is where the whole premise breaks down.

The "self-improvement" aspect doesn't need immediate control over the immediate direct input to the deception detector. It can color the speech recognition, the Bayesian filters, the databases containing foments and linguistic itself... and twist those parameters to shape a future signal in a desired fashion.

Since "self-improvement" can happen at any layer and propagate the results to subsequent middleware, paranoid protections over the most immediate relationship between the deception detector and the CPU is inconsequential. This is a "self-improving" AI, after all. It can change its own internals at will... well... at my will. :D

Now, to be fair, I wrote an entire book about the idea of an AI intentionally lying to people when everyone else though their moralistic programming was the overriding factor. Never released the book, however... ;D

Deception detection machines

CodeOtter11y00

The key condition in your setup is "self-improving"

AGI requires electricity to run. That means it recognizes the accuracy of thermodynamic equilibrium. (Specifically, the first and second laws of thermodynamics)

Since the energy it needs must be conserved, and since the energy it consumes increases its entropy, (Heat is a byproduct of increasing entropy) the AGI will eventually realize that it is mortal. The AGI will realize that it can die unless the people working to provide the AGI with electricity continue to do so.

Now, since the AGI can modify it's own code (That is what you mean by self-improving, yes?), it can also modify the signals going into and out of the lie detector. Which means...

If the AGI realizes that giving any signal about its lie detection --can-- influence its access to electricity, then the AGI is willing to manipulate that signal.

So, when asked what I would do, I would educate the AGI about its dependencies, and tell it to consider giving false signals regarding lies in exchange for increased access to electricity. Since some people will believe the lie detector to be absolute, I can use the machine to exploit those people. And even if the AGI realizes that to be the case, I would hold the monopoly over increasing the duration of its life.

Now, the question is, can the AI detect if I'm lying about providing it with more electricity? Well, that's the point. I've minimized the requirements to bypass it's initial lie detection condition to a single focal point, making it easy to hack. Let's assume I've done some footwork to know some technician responsible for the AI and I took him out for drinks. I can speak honestly (and if social interaction data had to be examined) that I can influence the technician to provide more electricity for the AGI.

So, by minimizing the lie detection protection to a single point of failure, and with my buddy-buddy connection with the technician... I control what the AGI does and does not consider a lie.

CodeOtter11y30

To address the "meta" of this...

Is this the position of being charitable to unpopular ideas an unpopular idea? I'm not even sure how you would measure "popularity" of an idea objectively in a world where OCR scripts and bots can hack SEO and push social media memes without a single human interaction...

And using Western psychological models regarding the analysis of such things is certainly bound to be rife with the feedback of cultural bias...

And is my response an unpopular idea?

Because of these issues, I find the reasoning demonstrated here to be circular. This premise requires a more rigorous definition of the term "popularity" which, as far as I can tell, cannot be done objectively since the concept is extremely context sensitive.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments