What source code and what machine code is actually being executed on some particular substrate is an empirical fact about the world, so in general, an AI (or a human) might learn it the way we learn any other fact - by making inferences from observations of the world.
For example, maybe you hook up a debugger or a waveform reader to the AI's CPU to get a memory dump, reverse engineer the running code from the memory dump, and then prove some properties you care about follow inevitably from running the code you reverse engineered.
In general though, this is a pretty hard, unsolved problem - you probably run into a bunch of issues related to embedded agency pretty quickly.
To get an intuition for why these problems might be possible to solve, consider the tools that humans who want to cooperate or verify each other's decision processes might use in more restricted circumstances.
For example, maybe two governments which distrust each other conduct a hostage exchange by meeting in a neutral country, and bring lots of guns and other hostages they intend not to exchange that day. Each government publicly declares that if the hostage exchange doesn't go well, they'll shoot a bunch of the extra hostages.
Or, suppose two humans are in a True Prisoner's Dilemma with each other, in a world where fMRI technology has advanced to the point where we can read off semantically meaningful inner thoughts from a scan in real-time. Both players agree to undergo such a scan and make the scan results available to their opponent, while making their decision. This might not allow them to prove that their opponent will cooperate iff they cooperate, but it will still probably make it a lot easier for them to robustly cooperate, for valid decision theoretic reasons, rather than for reasons of honor or a spirit of friendliness.
I mean incomputable in that computation would exceed the physical resources of the universe, whether it is P or NP. We can have a weaker definition of incomputable too, say it would exceed the capabilities of the agent by at least 1 OOM.
Specifically in regards to interpretability, if it isn't possible to do easily, my intuition tells me it will require brute force simulation/exhaustive search. (Think mining bitcoin). In that case you need a system more powerful than whatever you are trying to interpret, making it kind of difficult in the context of two agents who are roughly on a similar capability level.
I think it will be difficult for goal maximizers to cooperate for the same reasons it would be difficult for humans to survive in a world with a superintelligent goal maximizer. In almost all argument leading to doom for humans and a paperclip maximizing kind of ASI, you can replace human with agent and the argument will still stand.
It is easy for intelligent humans and groups of humans to cooperate because very rarely do they have fanatical goals., and historically, the ones that did went to war.