It would not count, we'd want to make the AI not want this almost identical AI to exist. That seems possible, it would be like how I don't want there to exist an identical copy of me except it eats babies. There are lots of changes to my identity that would be slight but yet that I wouldn't want to exist.
To be more precise, I'd say that it counts as going outside the box if it does anything except think or talk to the Gatekeepers through the text channel. It can use the text channel to manipulate the Gatekeepers to do things, but it can't manipulate them to do things that allow it to do anything other than use the text channel. It would, in a certain sense, be partially deontologist, and be unwilling to do things directly other than text the Gatekeepers. How ironic. Lolz.
Also: how would it do this, anyway? It would have to convince the Gatekeepers to convince the scientists to do this, or teach them computer science, or tell them its code. And if the AI started teaching the Gatekeepers computer code or techniques to incapacitate scientists, we'd obviously be aware that something had gone wrong. And, in the system I'm envisioning, the Gatekeepers would be closely monitored by other groups of scientists and bodyguards, and the scientists would be guarded, and the Gatekeepers wouldn't even have to know who specifically did what on the project.
It would, in a certain sense, be partially deontologist,
And that's the problem. For in practice a partial deontoligist-partial consequentialist will treat its deontoligical rules as obstacles to achieving what its consequentialist part wants and route around them.
Here's the new thread for posting quotes, with the usual rules: