But constructing the hypothesis isn't evidence that it's true, and if it is true, that still leaves us with (so far) no information about our simulators, and no way to guess their motives, let alone try to trick them.
I've actually been considering the possibility of a process that would create random universes and challenges. But even if the AI discovered some things about our physics, it does not significantly narrow the range of possible minds. It doesn't know if it's dealing with paperclippers or a pebblesorters. It might know roughly how smart we are.
The other half of the communication channel would be the solutions and self-modifications it provides at each iteration. These should not be emotionally compelling and would be subject to an arbitrary amount of review.
There are other advantages to this kind of sandbox, we can present it the task of inferring our physics at various levels of its development, and archive any versions that have learned more than we are comfortable with. (anything)
Keeping secrets from a hostile intelligence is something we already have formal and intuitive experience with. Controlling it's universe and peering into it mind are bonuses.
Interesting Cognitive bias side note: While writing this, I was inclined to write in a style to make it seem silly that an AI could mindhack us based on a few bits. I do think that it's very unlikely, but if I wrote as I was thinking, it would probably have sounded dismissive.
I do think a design goal should be zero bits.
But even if the AI discovered some things about our physics, it does not significantly narrow the range of possible minds. It doesn't know if it's dealing with paperclippers or a pebblesorters. It might know roughly how smart we are.
You're using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.
It might not operate on the same timescales as us. It might do things that appear like pure magic. No matter how often you took snapshots and checked how far it had gotten in figuring out details abo...
In the early 1980s Douglas Lenat wrote EURISKO, a program Eliezer called "[maybe] the most sophisticated self-improving AI ever built". The program reportedly had some high-profile successes in various domains, like becoming world champion at a certain wargame or designing good integrated circuits.
Despite requests Lenat never released the source code. You can download an introductory paper: "Why AM and EURISKO appear to work" [PDF]. Honestly, reading it leaves a programmer still mystified about the internal workings of the AI: for example, what does the main loop look like? Researchers supposedly answered such questions in a more detailed publication, "EURISKO: A program that learns new heuristics and domain concepts." Artificial Intelligence (21): pp. 61-98. I couldn't find that paper available for download anywhere, and being in Russia I found it quite tricky to get a paper version. Maybe you Americans will have better luck with your local library? And to the best of my knowledge no one ever succeeded in (or even seriously tried) confirming Lenat's EURISKO results.
Today in 2009 this state of affairs looks laughable. A 30-year-old pivotal breakthrough in a large and important field... that never even got reproduced. What if it was a gigantic case of Clever Hans? How do you know? You're supposed to be a scientist, little one.
So my proposal to the LessWrong community: let's reimplement EURISKO!
We have some competent programmers here, don't we? We have open source tools and languages that weren't around in 1980. We can build an open source implementation available for all to play. In my book this counts as solid progress in the AI field.
Hell, I'd do it on my own if I had the goddamn paper.
Update: RichardKennaway has put Lenat's detailed papers up online, see the comments.