given that the gatekeepers are highly trained in resisting the AI's persuasive tactics
I'm not sure how you could train people in something like this without actually testing what improves the rate at which people succumb to the arguments of a superhuman AI. If you can't afford to fail even once, how can you judge what will decrease the rate of failure on the real event?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Note that the AI box setting is not one which security-minded people would consider "competent"; once you're convinced that AI is dangerous and persuasive, the minimum safeguard would be to require multiple people to be present when interacting with the box, and to only allow release with the assent of a significant number of people.
It is, after all, much harder to convince a group of mutually-suspicious humans than to convince one lone person.
(This is not a knock on EY's experiment, which does indeed test a level of security that really was proposed by several real-world people; it is a knock on their security systems.)
That sounds right. Would you have evidence to back up the intuition? (This knowledge would also be useful for marketing and other present life persuasion purposes.)
(
TL;DR: Mo' people - mo' problems?
I can think of effects that could theoretically make it easier to convince a group:
(You could take preemptive measures against these worries, but Boxy might find security holes in every 'firewall' you come up with - an arms race we could win?)
)