When chaining parallel and sequential calls to large language models (like LangChain), you implicitly create a causal graph that can be analyzed visually if you have the right tracing tools (https://github.com/oughtinc/ice). This notebook describes different agents using an explicit formalism based on causal influence diagrams, which we can treat as a notation for describing the data flow, components and steps involved when a user makes a request. We use the example diagrams to explain and fix risk scenarios, showing how easy it is to debug agent architectures if you can visually reason about the data flow, and ask questions about intent alignment for AGI in the context of such agents.
Examples and Theory in Colab to Get Started:
Work done at the Alignment Jam #8 (Verification), starts at 31:43 but the whole event was great: https://youtu.be/XauqlTQm-o4
TODO:
Answer Set Programming for Automated Verification of Intent Consistency
Brian Muhia, August 2023
The causal influence diagrams introduced here (see the appendix, also here), and the accompanied reasoning that favours certain diagrams over others based on links to the "I" node, are simple enough that we can devise automated rules that check if a diagram is correct or wrong. We call this property "intent consistency". Here we introduce three simple rules written in the Answer-Set Programming (ASP) formalism that
These rules encode our expectations and intuitions, and let us describe a framework for automatically deciding if a diagram satisfies them. We call these rules "intent consistency models" (ICM), after [https://doi.org/10.1017/S1471068410000554].
ASP enables us to encode the graphs described here using facts, and run them through a conventional SAT solver like 'clingo' that checks for satisfiability. We can then describe unsatisfiable graphs as "incorrect".