I've linked to a short form describing why I work on wise AI advisors. I suspect that there a lot of work that could be done to figure out the best user interface for this:
https://www.lesswrong.com/posts/SbAofYCgKkaXReDy4/?commentId=Zcg9idTyY5rKMtYwo
If you're interested, I could share some thoughts on specific things to experiment with.
I'm still quite confused about why you believe that a long-term pause is viable given the potential for actors to take unilateral action and the difficulties in verifying compliance.
Another possibility that could be included in that diagram would be the possibility of merging various national/coalitional AIs.
Might go away
There's lots of things that "might" happen. When we're talking about the future of humanity, we can't afford to just glaze over mights.
The root cause may be that there is too much inferential distance
Perhaps, although I generally become more sympathetic to someone's point of view the more I read from them.
And it's part of why I think it's useful to create scenes that operate on different worldview assumptions: it's worth working out the implications of specific beliefs without needing to justify those beliefs each time.
I used to lean more strongly towards more schools of thought being good, however I've updated slightly on the margin towards believing thinking some schools of thought just end up muddying the waters.
That said, Epoch has done some great research, so I'm overall happy the scene exists. And I think Matthew Barnett is extremely talented, I just think he's unfortunately become confused.
I used to really like Matthew Barnett's posts as providing contrarian but interesting takes.
However, over the last few years, I've started to few more negatively about them. I guess I feel that his posts tend to be framed in a really strange way such that, even though there's often some really good research there, it's more likely to confuse the average reader than anything else and even if you can untangle the frames, I usually don't find worth it the time.
I should mention though that as I've started to feel more negative about them, I've started to read less of them and to engage less deeply with the ones I do look it, so there's a chance my view would be different if I read more.
I'd probably feel more positive about any posts he writes that are closer to presenting data and further away from interpretation.
That said, Epoch overall has produced some really high-quality content and I'd definitely like to see more independent scenes.
I believe those are useful frames for understanding the impacts.
To "misuse" to me implies taking a bad action. Can you explain what misuse occurred here?
They're recklessly accelerating AI. Or, at least, that's how I see it. I'll leave it to others to debate whether or not this characterisation is accurate.
Obviously details matter
Details matter. It depends on how bad it is and how rare these actions are.
This seems like a better solution on the surface, but once you dig in, I'm not so sure.
Once you hire someone, assuming they're competent, it's very hard for you to decide to permanently bar them from gaining a leadership role. How are you going to explain promoting someone who seems less competent than them to a leadership role ahead of them? Or is the plan to never promote them and refuse to ever discuss it, which would create weird dynamics within an organisation.
I would love to hear if you think otherwise, but it seems unworkable to me.
Why the focus on wise AI advisors?[1]
I'll be writing up a proper post to explain why I've pivoted towards this, but it will still take some time to produce a high quality post, so I decided it was worthwhile releasing a short-form description in the mean time.
Wise AI Advisors refers to AI trained to provide wise advice. I believe this research direction is important because[2]:
a) AI will have a massive impact on society given the infinite ways to deploy such a general technology
b) There are lots of ways this could go well and lots of ways that this could go extremely poorly (election interference, cyber attacks, development of bioweapons, large-scale misinformation, automated warfare, catastrophic malfunctions, permanent dictatorships, mass unemployment ect.)
c) There is massive disagreement on the best strategy (decentralization vs. limiting proliferation, universally accelerating AI vs winning the arms race vs pausing, incremental development of safety vs principled approaches, offence-defence balance favoring the attacker or defender) or even what we expect the development of AI to look like (overhyped bubble vs machine god, business as usual vs this changes everything). Making the wrong call could prove catastrophic.
d) AI is developing incredibly rapidly (no wall, see o3 crushing the ARC challenge, automated coding, IMO silver medal). We have limited time to act and to figure out how to act.
e) Given both the difficulty and the number of different challenges and strategic choices we'll be facing far too soon, humanity needs to rapidly improve its capability to navigate such situations
f) Whilst we can and should be developing top governance and strategy talent, this is unlikely to be sufficient by itself. We need every advantage we can get, we can't afford to leave anything on the table.
g) Another way of framing this: Given the potential of AI development to feed back into itself, if it isn't also feeding back into increased wisdom in how we navigate the world, our capabilities are likely to far outstrip our ability to handle them.
For these reasons, I think it is vitally important for society to be working on training these advisors now.
Why frame this in terms of a vague concept like wisdom rather than specific capabilities?
I think the chance of us being able to steer the world towards a positive direction is much higher if we're able to combine multiple capabilities, so it makes sense to have a handle for the broader project, in addition to handles for individual sub-projects.
Isn't training AI to be wise intractable?
Possibly, though I'm not convinced it's harder than any of the other ambitious agendas and we won't know how far we can go without giving it a serious effort. Is training an AI to be wise really harder than aligning it? If anything, it seems like a less stringent requirement.
Compare:
• Ambitious mechanistic interpretability aims to perfectly understand how a neural network works at the level of individual weights
• Agent foundations attempting to truly understand what concepts like agency, optimisation, decisions are values are at a fundamental level
• Davidad's Open Agency architecture attempting train AI's that come with proof certificates that an AI has less than a certain probability of having unwanted side-effects
Is it obvious that any of these are easier than training a truly wise AI advisor?
In terms of making progress, my initial focus is on investigating the potential of amplified imitation learning, that is training imitation agents on wise people then enhancing them with techniques like RAG or trees of agents.
Does anyone else think wise AI advisors are important?
Going slightly more general to training wise AI rather than specifically advisors[3], there was the competition on the Automation of Wisdom and Philosophy organised by Owen Cotton-Barrett and there's this paper (summary) by Samuel Johnson and others incl. Yoshua Bengio, Melanie Mitchell and Igor Grossmann.
LintzA listed Wise AI advisors for governments as something worth considering in The Game Board Has Been Flipped[4].
Further Discussion:
You may also interested in reading my 3rd prize-winning entry to the AI Impacts Competition on the Automation of Wisdom and Philosophy. It's divided in two parts:
• An Overview of “Obvious” Approaches to Training Wise AI Advisors
• Some Preliminary Notes on the Promise of a Wisdom Explosion
I previously described my agenda as Wise AI Advisors via Imitation Learning. I now see that as overly narrow. The goal is to produce Wise AI Advisors via any means and I think that Imitation Learning is underrated, but I'm sure there's lots of other approaches that are underrated as well.
My view is actually a bit more complicated than this. There are many different arguments that could be made for pursuing this research direction. However, I've shared this one because it is the most legible and compelling.
One key reason why I favour AI advisors rather than directly training wisdom into AI is that the human users can compensate for weaknesses in the advisors. For example, it only has to inspire the humans to make the correct choice rather than make the correct choice. We may take the harder step of training systems that don't have a human in the loop later, but this will be easier if we have AI advisors to help us with this.
No argument included sadly.