In the early 1980s Douglas Lenat wrote EURISKO, a program Eliezer called "[maybe] the most sophisticated self-improving AI ever built". The program reportedly had some high-profile successes in various domains, like becoming world champion at a certain wargame or designing good integrated circuits.
Despite requests Lenat never released the source code. You can download an introductory paper: "Why AM and EURISKO appear to work" [PDF]. Honestly, reading it leaves a programmer still mystified about the internal workings of the AI: for example, what does the main loop look like? Researchers supposedly answered such questions in a more detailed publication, "EURISKO: A program that learns new heuristics and domain concepts." Artificial Intelligence (21): pp. 61-98. I couldn't find that paper available for download anywhere, and being in Russia I found it quite tricky to get a paper version. Maybe you Americans will have better luck with your local library? And to the best of my knowledge no one ever succeeded in (or even seriously tried) confirming Lenat's EURISKO results.
Today in 2009 this state of affairs looks laughable. A 30-year-old pivotal breakthrough in a large and important field... that never even got reproduced. What if it was a gigantic case of Clever Hans? How do you know? You're supposed to be a scientist, little one.
So my proposal to the LessWrong community: let's reimplement EURISKO!
We have some competent programmers here, don't we? We have open source tools and languages that weren't around in 1980. We can build an open source implementation available for all to play. In my book this counts as solid progress in the AI field.
Hell, I'd do it on my own if I had the goddamn paper.
Update: RichardKennaway has put Lenat's detailed papers up online, see the comments.
You're using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.
It might not operate on the same timescales as us. It might do things that appear like pure magic. No matter how often you took snapshots and checked how far it had gotten in figuring out details about us, there might be no way of ruling out progress, especially if you gave it motives for hiding that progress (such as pulling the plug every time it came close). Sooner or later you'd conclude that nothing interesting was happening and putting it on autopilot. A small self-improvement might cascade in an enormous difference in understanding, with the notorious FOOM following.
I don't usually like quoting myself, but
If the scenario makes you nervous you should be pretty much equally nervous at the idea of giving your maybe-self-improving AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.
Let me be clear here: I'm not assigning any significant probability to someone recreating EURISKO or something like it in their spare time and having it recursively self-improve any time soon. My confidence intervals are spread widely enough that I can spend some time being worried about it, though. I'm just pointing out that sandboxing adds approximately zero extra defense in those situations we would need it.
The parallel to the simulation argument was interesting though, thanks.
I don't think the number of Planck intervals is especially useful to cite... it seems like the relevant factor is CPU cycles, and while I'm not an expert on CPUs, I'm pretty sure that we're not bumping up on Planck intervals yet.
Relatedly, if you were worried about self-improving superintelligence, you could give your AI a slow CPU.