We're excited to share the first volume of Elements of Computational Philosophy, an interdisciplinary and collaborative project series focused on operationalizing fundamental philosophical notions in ways that are natively compatible with the current paradigm in AI.
The first volume paints a broad-strokes picture of operationalizing truth and truth-seeking. Beyond this high-level focus, its 100+ pages can be framed in several different ways, which is why we placed multiple topic-based summaries at the beginning of the document. The note to the reader and the table of contents should further help scope and navigate the document.
Have a pleasant read, and feel free to use this linkpost to comment on the document as you go. Questions, criticism, and suggestions are all welcome.
PS: There will soon be a presentation about the overarching project series as part of the alignment speaker series hosted by EleutherAI. Expect more information soon on the #announcements channel of their Discord server. In general, keep an eye on this space.
Thanks a lot for the feedback!
For sure, I greatly underestimated the importance of legible and concise communication in the increasingly crowded and dynamic space that is alignment. Future outputs will at the very least include an accompanying paper-overview-in-a-post, and in general a stronger focus on self-contained papers. I see the booklet as a preliminary, highly exploratory bit of work that focused more on the conceptual and theoretical rather than the applied, a goal for which I think it was very suitable (e.g. introducing an epistemological theory with direct applications to alignment).
You mean ArgRank (i.e. PageRank on the argument graph)? The idea was to simply use ArgRank to assign rewards to individual utterances, then use the resulting context-utterance-reward triples as experiences for RL. After collecting experiences, update the weights, and repeat. Now, though, I'd rather do PEFT on the top utterances as a kind of expert iteration, which would also make it feasible to store previous model versions for league training (e.g. by just storing LoRa weight diffs).
Indeed, preliminary results are poor, and the bar was set pretty low at "somehow make these ideas run in this setup." For now, I'd drop ArgRank and instead use traditional methods from computational argumentation on an automatically encoded argument graph (see 5.2), then PEFT on the winning parties. But I'm also interested in extending CCS-like tools for bettering ArgRank (see 2.5). I'm applying to AISC9 for related follow-up work (among others), and I'd find it really valuable if you could send me some feedback on the proposal summary. Could I send you a DM with it?
Is it because of obfuscated arguments and deception, or some other fundamental issue that you find it so?