The forces that have been operating all along, that we don't perceive or name because they aren't close enough to any of the patterns our brains evolved to perceive. http://lesswrong.com/lw/10n/why_safety_is_not_safe/
The forces that have been operating all along, that we don't perceive or name because they aren't close enough to any of the patterns our brains evolved to perceive.
This belongs in a piece of Lovecraftian fiction.
This is a report from a LessWrong perspective, on the 30th Soar workshop. Soar is a cognitive architecture that has been in continuous development for nearly 30 years, and is in a direct line of descent from some of the earliest AI research (Simon's LT and GPS). Soar is interesting to LessWrong readers for two reasons:
Where I'm coming from: I'm a skeptic about EY/SIAI dogmas that AI research is more risky than software development, and that FAI research is not AI research, and has little to learn from the field of AI research. In particular, I want to understand why AI researchers are generally convinced that their experiments and research are fairly safe - I don't think that EY/SIAI are paying sufficient attention to these expert opinions.
Overall summary: John Laird and his group are smart, dedicated, and funded. Their theory and implementation moves forward slowly but continuously. There's no (visible) work being done on self-modifying, bootstrapping or approximately-universal (e.g. AIXItl) entities. There is some concern about how to build trustworthy and predictable AIs (for the military's ROE) - for example, Scott Wallace's research.
As far as I can tell, the Soar group's work is no more (or less) risky than narrow AI research or ostensibly non-AI software development. To be blunt - package managers like APT seem more risky than Soar, because the economic forces that push them to more capability and complexity are more difficult to control.
Impressions of (most of) the talks - they can be roughly categorized into three types.
Integrating reinforcement learning with the (already complicated) Soar architecture must have been difficult, but tabular Q-learning/SARSA is now well integrated, and there's some support in the released code for eligibility traces and hierarchical reinforcement learning, but not value function approximators. I believe that means that Soar-RL is not as capable at RL tasks as the cutting edge of RL research, but of course, the cutting edge of RL research is not as capable at the symbolic processing tasks that are Soar's bread and butter.
SMem is essentially a form of content-addressable storage that is under Soar's explicit control. This is in contrast to Soar's working memory, which is content-addressible using (Rete) pattern-matching, which is more analogous to being involuntarily reminded of something, than deliberately building a cue and searching one's memory for a match. This means that SMem scales to larger sizes than working memory.
EpMem is a memory of the content of past working memories. Unlike SMem, (if this feature is turned on) Soar needn't explicitly store into this memory - every working memory will be stored into EpMem. Fetching from EpMem is content-addressible similarly to SMem, though once an episode has been fetched, Soar can ask what happened next or before that.
In his second talk, he spoke about an architectural variant of Soar-RL, that learns a model and a policy simultaneously, and eliminates a free parameter. His domain here was a probabilistic version of vector racer. Again, Soar-RL is steadily advancing.
I studied Rogue partly because Rogue has escape-to-shell functionality (so a sufficiently clever Rogue AI could easily escape and become a "rogue AI"), and I wanted to understand my own implicit safety case while developing it.
I was a bit disappointed, since (as far as I can tell) this work simply used Soar as an exotic programming language. I believe this is one of the primary ways that Soar could help expand human rationality: if the procedures that a human is supposed to learn are encoded as Soar productions, and the training software can test whether any student has any given production, and instill it if it is not there, then (assuming Soar is a decent model of how humans think), instilling all of the necessary productions should also instill the complete procedure.
John Laird spoke about using Sproom to study "Situated Interactive Instruction" - so that an agent could be taught by interacting in semi-formal language with a human, while it is performing its task. The domain is robots moving through a building doing IED-clearing; the IEDs and the operations on them (pickup, defuse) are virtual. As I understand it, some but not all of this functionality is currently functioning.
Shiwali Mohan used the same sort of situated interactive instruction in the Infinite Mario domain, though not very much data was conveyed (yet) via instruction. As I understand it, Mohan's previous hardwired agent had three verbs like "tackle-monster" and "get-coin"; the instruction consists of the agent asking "I see a coin, which verb should I use for it?" - so after the human has answered the three object-verb correspondences, it knows everything it will ever learn via instruction. However, it's working and could be extended.
I want to emphasize that these just my impressions (which are probably flawed - because of my inexperience I probably misunderstood important points), and the proceedings (that is, the slides that everyone used to talk with) will soon be available, so you can read them and form your own impressions.
There are three forks to my implicit safety case while developing. I'm not claiming this is a particularly good safety case or that developing Rogue-Soar was safe - just that it's what I have.
The first fork is that tasks vary in their difficulty (Pickering's "resistances"), and entities vary in their strength or capability. There's some domain-ish structure to entity's strengths (a mechanical engineering task will be easier for someone trained as a mechanical engineer than a chemist), and intention matters - difficult tasks are rarely accomplished unintentionally. I'm fairly weak, and my agent-in-progress was and is very, very weak. The chance that I or my agent solves a difficult task (self-improving AGI) unintentionally while writing Rogue-Soar is incredibly small, and comparable to the risk of my unintentionally solving self-improving AGI while working at my day job. This suggests that safer (not safe) AI development might involve: One, tracking ELO-like scores of people's strengths and task difficulties, and Two, tracking and incentivizing people's intentions.
The second fork is that even though I'm surprised sometimes while developing, the surprises are still confined to an envelope of possible behavior. The agent could crash, run forever, move in a straight line or take only one step when I was expecting it to wander randomly, but pressing "!" when I expected it to be confined to "hjkl" would be beyond this envelope. Of course, there are many nested envelopes, and excursions beyond the bounds of the narrowest are moderately frequent. This suggests that safer AI development might involve tracking these behavior envelopes (altogether they might form a behavior gradient), and the frequency and degree of excursions, and deciding whether development is generally under control - that is, acceptably risky compared to the alternatives.
The third fork is that the runaway takeoff arguments necessarily involve circularities and feedback. By structural inspection and by intention, if the AI is dealing with Rogue, and not learning, programming, or bootstrapping, then it's unlikely to undergo takeoff. This suggests that carefully documenting and watching for circularities and feedback may be helpful for safer AI research.