John Wentworth explains natural latents – a key mathematical concept in his approach to natural abstraction. Natural latents capture the "shared information" between different parts of a system in a provably optimal way. This post lays out the formal definitions and key theorems.
I'm not sure what the exact process was, tbh my guess is that they were estimated mostly independently but likely sanity checked with the survey to some extent in mind. It seems like they line up about right, given the 2022 vs. 2023 difference, the intuition regarding underadjusting for labor->progress, and giving weight to our own views as well rather than just the survey, given that we've thought more about this than survey takers (while of course they have the advantage of currently doing frontier AI research).
I'd make less of an adjustment if we ask...
We are having another rationalist Shabbat event at Rainbow Star House this Friday. The plan going forward will be to do one most Fridays. Email or DM me for the address if you haven’t been before.
We have pita, dips, and dessert planned, but are still looking for someone to bring a big pot of food this week-- if you’re able to bring a vegan chili or similar, please let me know.
Doors open at 5:30, ritual and food at 6:30.
What is this event?
At rationalist Shabbat each week, we light candles, sing Landsailor, eat together, and discuss topics of interest and relevance to the rationalist crowd. If you have suggestions for topics, would like to help contribute food, or otherwise assist with organizing, let us know.
This is a kid-friendly event -- we have young kids, so we have space and toys for them to play and hang out while the adults are chatting.
Both labor and compute have been scaled up over the last several years at big AI companies. My understanding is the scaling in compute was more important for algorithmic progress
That may be the case, but I suppose that in the last several years, compute has been scaled up more than labor. (Labor cost is entirely reoccurring, while compute cost is a one-time cost plus a reoccurring electricity cost, and a progress in compute hardware, from smaller integrated circuits, means that compute cost is decreasing over time.) Then obviously that doesn't necessari...
Has anyone here had therapy to help handle thoughts of AI doom? How did it go? What challenges did you face explaining it or being taken seriously, and what kind of therapy worked, if any?
I went to a therapist for 2 sessions and received nothing but blank looks when I tried to explain what I was trying to process. I think it was very unfamiliar ground for them and they didn't know what to do with me. I'd like to try again but if anyone here has guideance on what worked for them, I'd be interested.
I've also started basic meditation, which continues to be a little helpful.
John: So there’s this thing about interp, where most of it seems to not be handling one of the standard fundamental difficulties of representation, and we want to articulate that in a way which will make sense to interp researchers (as opposed to philosophers). I guess to start… Steve, wanna give a standard canonical example of the misrepresentation problem?
Steve: Ok so I guess the “standard” story as I interpret it goes something like this:
It is important to remember that although languages and brains evolved alongside each other, these are separate systems.
Human brain has evolved to be a fast learner of whatever common language it is exposed to. And languages themselves have evolved to be as accessible for new speakers as possible.
One may say that language summarizes the shared experience of its speakers. Let me play with an analogy to Arithmetics:
There is an infinite set of numbers that have several kinds of properties (negative or positive, rational or irrational) and several kinds of relations between them (summation, product, etc). To describe them all, you don’t need an infinitely large database. You can describe the entire model with several axioms and theorems. Furthermore, you can build up on it, inventing new concepts...
For months, I had the feeling: something is wrong. Some core part of myself had gone missing.
I had words and ideas cached, which pointed back to the missing part.
There was the story of Benjamin Jesty, a dairy farmer who vaccinated his family against smallpox in 1774 - 20 years before the vaccination technique was popularized, and the same year King Louis XV of France died of the disease.
There was another old post which declared “I don’t care that much about giant yachts. I want a cure for aging. I want weekend trips to the moon. I want flying cars and an indestructible body and tiny genetically-engineered dragons.”.
There was a cached instinct to look at certain kinds of social incentive gradient, toward managing more people or growing an organization or playing...
This post presents a mildly edited form of a new paper by UK AISI's alignment team (the abstract, introduction and related work section are replaced with an executive summary). Read the full paper here.
AI safety via debate is a promising method for solving part of the alignment problem for ASI (artificial superintelligence).
TL;DR Debate + exploration guarantees + solution to obfuscated arguments + good human input solves outer alignment. Outer alignment + online training solves inner alignment to a sufficient extent in low-stakes contexts.
This post sets out:
These gaps form the basis for one of the research agendas of UK AISI’s new alignment team: we aim to dramatically scale up ASI-relevant research on debate. We’ll...
I broadly agree with these concerns. I think we can split it into (1) the general issue of AGI/ASI driving humans out of distribution and (2) the specific issue of how assumptions about human data quality as used in debate will break down. For (2), we'll have a short doc soon (next week or so) which is somewhat related, along the lines of "assume humans are right most of the time on a natural distribution, and search for protocols which report uncertainty if the distribution induced by a debate protocol on some new class of questions is sufficiently differ...
Each brick = one uncertainty
Maybe instead of writing task lists, reframe macro objectives in terms of nested questions until you reach 'root' testable experiments.
I built a tool for myself, ‘Thought-Tree’ here, to try and systematise what I wrote in this post. Maybe it works out for you as well?