What source code and what machine code is actually being executed on some particular substrate is an empirical fact about the world, so in general, an AI (or a human) might learn it the way we learn any other fact - by making inferences from observations of the world.
For example, maybe you hook up a debugger or a waveform reader to the AI's CPU to get a memory dump, reverse engineer the running code from the memory dump, and then prove some properties you care about follow inevitably from running the code you reverse engineered.
In general though, this is a pretty hard, unsolved problem - you probably run into a bunch of issues related to embedded agency pretty quickly.
To get an intuition for why these problems might be possible to solve, consider the tools that humans who want to cooperate or verify each other's decision processes might use in more restricted circumstances.
For example, maybe two governments which distrust each other conduct a hostage exchange by meeting in a neutral country, and bring lots of guns and other hostages they intend not to exchange that day. Each government publicly declares that if the hostage exchange doesn't go well, they'll shoot a bunch of the extra hostages.
Or, suppose two humans are in a True Prisoner's Dilemma with each other, in a world where fMRI technology has advanced to the point where we can read off semantically meaningful inner thoughts from a scan in real-time. Both players agree to undergo such a scan and make the scan results available to their opponent, while making their decision. This might not allow them to prove that their opponent will cooperate iff they cooperate, but it will still probably make it a lot easier for them to robustly cooperate, for valid decision theoretic reasons, rather than for reasons of honor or a spirit of friendliness.
It is much easier to agree on a technology than on goals. Technology can serve many purposes, you just have to avoid technologies that have too one-sided uses - which is probably the risk you mean. But as long as cooperation overall is favored by the parties, I think it is likely that the agents can agree on technology.
Oh, they can. They just don't trust that the other party will uphold their part of the bargain in their part of the lightcone. That you can only achieve with an agent that embodies the negotiated mix of both agents' goals/values.
Note that humans and human organizations build such agents, so there is precedent. We don't call them that, but I think it is close: States often set up organizations such as the United Nations that get resources and are closely monitored by the founding countries, and that serves the joint purpose of the founding states.
In the small, when two people set up a legal entity, such as a society, and write the memorandum of association, they also create a separate entity that executes to some degree independently.