Articles (or writing in general) is probably best structured as a Directed Acyclic Graph, rather than linearly. At each point in the article, there may be multiple possible lines to pursue, or "sidenotes".

I say "directed acyclic graph" rather than "tree", because it may be natural as thinking of paths as joining back at some point, especially if certain threads are optional.

One may also construct an "And-Or tree" to allow multiple versions of the article preferred by conflicting writers, which may then be voted on with some mechanism. These votes can be used to define values to each vertex, and people can read the tree with their own search algorithm*.

A whole wiki may be constructed as one giant DAG, with each article being sub-components.

*well, realistically nobody would actually just be following a search algorithm blindly/reading a linear article linearly (since straitjacketing yourself with prerequisites is never a good idea), but you know, as a general guide to structure.

(idea came from LLM conversations, which often take this form -- of pursuing various lines of questioning then backtracking to a previous message)

Reply

What Have Been Your Most Valuable Casual Conversations At Conferences?

Abhimanyu Pallavi Sudhir1mo10

"What do you gain from smalltalk?" "I learned not to threaten to nuke countries."

Lmao, amazing.

Reply

o1: A Technical Primer

Abhimanyu Pallavi Sudhir2moΩ250

we'll elide all of the subtle difficulties involved in actually getting RL to work in practice

I haven't properly internalized the rest of the post, but this confuses me because I thought this post was about the subtle difficulties.

The RL setup itself is straightforward, right? An MDP where S is the space of strings, A is the set of strings < n tokens, P(s'|s,a)=append(s,a) and reward is given to states with a stop token based on some ground truth verifier like unit tests or formal verification.

Reply

Abhimanyu Pallavi Sudhir's Shortform

Abhimanyu Pallavi Sudhir2mo4-6

The third virtue of rationality, lightness, is wrong. In fact: the more you value information to change your mind on some question, the more obstinate you should be to changing your mind on that question. Lightness implies disinterest in the question.

Imagine your mind as a logarithmic market-maker which assigns some initial subsidy to any new question $Q$ . This subsidy parameter captures your marginal value for information on $Q$ . But it also measures how hard it is to change your mind — the cost of moving your probability from $p$ to $p^{'}$ is $b min [log \frac{1 - p}{1 - p^{'}}, log \frac{p}{p^{'}}]$ .

What would this imply in practice? It means that each individual “trader” (both internal mental heuristics/thought patterns, and external sources of information/other people) will generally have a smaller influence on your beliefs, as they may not have enough wealth. Traders who influence your belief will carry greater risk (to their influence on you in future), though will also earn more reward if they’re right.

Reply

1

Overview of strong human intelligence amplification methods

Abhimanyu Pallavi Sudhir4mo4-3

I don't understand. The hard problem of alignment/CEV/etc. is that it's not obvious how to scale intelligence while "maintaining" utility function/preferences, and this still applies for human intelligence amplification.

I suppose this is fine if the only improvement you can expect beyond human-level intelligence is "processing speed", but I would expect superhuman AI to be more intelligent in a variety of ways.

Reply

Abhimanyu Pallavi Sudhir's Shortform

Abhimanyu Pallavi Sudhir4mo30

Something that seems like it should be well-known, but I have not seen an explicit reference for:

Goodhart’s law can, in principle, be overcome via adversarial training (or generally learning Multi-Agent Systems)

—aka “The enemy is smart.”

Goodhart’s law only really applies to a “static” objective, not when the objective is the outcome of a game with other agents who can adapt.

This doesn’t really require the other agents to act in a way that continuously “improves” the training objective either, it just requires them to be able to constantly throw adversarial examples to the agent forcing it to “generalize”.

In particular, I think this is the basic reason why any reasonable Scalable Oversight protocol would be fundamentally “multi-agent” in nature (like Debate).

Reply

1

Abhimanyu Pallavi Sudhir's Shortform

Abhimanyu Pallavi Sudhir6mo10

I think only particular reward functions, such as in multi-agent/co-operative environments (agents can include humans, like in RLHF) or in actually interactive proving environments?

Reply

Abhimanyu Pallavi Sudhir's Shortform

Abhimanyu Pallavi Sudhir6mo10

Yes, I also realized that "ideas" being a thing is due to bounded rationality -- specifically they are the outputs of AI search. "Proofs" are weirder though, and I haven't seen them distinguished very often. I wonder if this is a reasonable analogy to make:

Ideas : search
Answers : inference
Proofs: alignment

Reply

Abhimanyu Pallavi Sudhir's Shortform

Abhimanyu Pallavi Sudhir6mo12-1

There is a cliche that there are two types of mathematicians: "theory developers" and "problem solvers". Similarly Dyson’s “birds and frogs”, and Robin Hanson divides the production of knowledge into "framing" and "filling".

It seems to me there are actually three sorts of information in the world:

"Ideas": math/science theories and models, inventions, business ideas, solutions to open-ended problems
"Answers": math theorems, experimental observations, results of computations
"Proofs": math proofs, arguments, evidence, digital signatures, certifications, reputations, signalling

From a strictly Bayesian perspective, there seems to be no "fundamental" difference between these forms of information. They're all just things you condition your prior on. Yet this division seems to be natural in quite a variety of informational tasks. What gives?

adding this from replies for prominence--

Yes, I also realized that "ideas" being a thing is due to bounded rationality -- specifically they are the outputs of AI search. "Proofs" are weirder though, and I haven't seen them distinguished very often. I wonder if this is a reasonable analogy to make:

Ideas : search
Answers : inference
Proofs: alignment

Reply

2

1

Abhimanyu Pallavi Sudhir's Shortform

Abhimanyu Pallavi Sudhir6mo20

Just realized in logarithmic market scoring the net number of stocks is basically just log-odds, lol:

$⟺ p_{i} = e^{x_{i}} / (e^{x_{i}} + 1)$

Reply