We often hear "We don't trade with ants" as an argument against AI cooperating with humans. But we don't trade with ants because we can't communicate with them, not because they're useless – ants could do many useful things for us if we could coordinate. AI will likely be able to communicate with us, and Katja questions whether this analogy holds.
has anyone seen a good way to comprehensively map the possibility space for AI safety research?
in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.
most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.
for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)
More Dakka On Your Expectations
After hearing my friend talk about his roommate’s brash decision-making from the despair at getting rejected by girls he liked several times, my friend mentioned that his roommate had asked out a total of three people since high school. Only three!
While there are more factors in the story involved, I’ve heard similar enough troubles that it seems worth saying: Three people is not a lot. Certainly not enough rejections to merit the magnitude of self-worth issues people can walk away with that few from.
If you had the expec...
A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".
If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.
For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].
But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do...
President Trump's second term has brought sweeping policy overhauls in international aid, trade agreements, and immigration enforcement, alongside growing tensions with the judiciary. These rapid changes have increased uncertainty about the US's future and its role on the world stage. Forecasting can help ground our thinking about the likely impacts of the Trump administration, helping to deliver greater clarity to the public on key issues by transforming competing narratives into quantifiable, testable predictions.
Make your predictions in Metaculus's POTUS Predictions Tournament and compete for $15,000 on questions like:
...First post of @Helen Toner (of OpenAI board crisis fame)'s new Substack
...It used to be a bold claim, requiring strong evidence, to argue that we might see anything like human-level AI any time in the first half of the 21st century. This 2016 post, for instance, spends 8,500 words justifying the claim that there is a greater than 10% chance of advanced AI being developed by 2036.
(Arguments about timelines typically refer to “timelines to AGI,” but throughout this post I’ll mostly refer to “advanced AI” or “human-level AI” rather than “AGI.” In my view, “AGI” as a term of art tends to confuse more than it clarifies, since different experts use it in such different ways.1 So the fact that “human-level AI” sounds vaguer than “AGI” is
“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.”
— 107-word sentence from Stuart Little (1945)
Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50...
Epistemic status: This should be considered an interim research note. Feedback is appreciated.
We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well.
In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o’s image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text).
We find that GPT-4o tends to respond in a consistent manner...
- Should we think about it almost as though it were a base model within the RLHFed model, where there's no optimization pressure toward censored output or a persona?
- Or maybe a good model here is non-optimized chain-of-thought (as described in the R1 paper, for example): CoT in reasoning models does seem to adopt many of the same patterns and persona as the model's final output, at least to some extent.
- Or does there end up being significant implicit optimization pressure on image output just because the large majority of the circuitry is the same?
I think it's...
One downside of an English chain-of-thought, is that each token contains only bits of information, creating a tight information bottleneck.
Don't take my word for it, look at this section from a story by Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo:
...[...] One such breakthrough is augmenting the AI’s text-based scratchpad (chain of thought) with a higher-bandwidth thought process (neuralese recurrence and memory). [...]
Neuralese recurrence and memory
Neuralese recurrence and memory allows AI models to reason for a longer time without having to write down those thoughts as text.
Imagine being a human with short-term memory loss, such that you need to constantly write down your thoughts on paper so that in a few minutes you know what’s going on. Slowly and painfully you could make progress at solving math