Now suppose humanity does figure out that it's living in a simulation, and figures out the source code of P. Then it knows its own Gödel sentence.
Wait, why does that follow?
I postulated that "given the output of program P, you can easily find in it the list of theorems found so far" -- by which I meant that it's easy to write a program that takes the output of P until step t, and returns everything written on the list up to time t (was the confusion that it wasn't clear that this was what I meant?). If you also know the source of P, you have a program that for every t returns the list up to time t, so it's easy to write down the predicate L(n) of PA that says "there is some time t such that the proposition with...
Building on the very bad Gödel anti-AI argument (computers's are formal and can't prove their own Gödel sentence, hence no AI), it occurred to me that you could make a strong case that humans could never recognise a human Gödel sentence. The argument goes like this:
Now, the more usual way of dealing with human Gödel sentences is to say that humans are inconsistent, but that the inconsistency doesn't blow up our reasoning system because we use something akin to relevance logic.
But, if we do assume humans are consistent (or can become consistent), then it does seem we will never knowingly encounter our own Gödel sentences. As to where this G could hide and we could never find it? My guess would be somewhere in the larger ordinals, up where our understanding starts to get flaky.