I've written up an short-form argument for focusing on Wise AI advisors. I'll note that my perspective is different from that taken in the paper. I'm primarily interested in AI as advisors, whilst the authors focus more on AI acting directly in the world.
Wisdom here is an aid to fulfilling your values, not a definition of those values
I agree that this doesn't provide a definition of these values. Wise AI advisors could be helpful for figuring out your values, much like how a wise human would be helpful for this.
Other examples include buying poor quality food and then having to pay for medical care, buying a cheap car that costs more in repairs, payday loans, ect.
Unless you insist that this system is helpful for the powered privileges such as king, as a reference of the public opinion, that will be legit?
That would make the domain of checkable tasks rather small.
That said, it may not matter depending on the capability you want to measure.
If you want to make the AI hack a computer to turn the entire screen green and it skips a pixel so as to avoid completing the task, well it would have still demonstrated that it possesses the dangerous capability, so it has no reason to sandbag.
On the other hand, if you are trying to see if it has a capability that you wish it use, it can still sandbag.
I'd strongly recommend spending some time in the Bay area (or London as a second best option). Spending time in these spaces will help you build your model of the space.
You may also find this document I created on AI Safety & Entrepreneurship useful.
One of the biggest challenges here is that subsidies designed to be support alignment could be snagged by AI companies misrepresenting capabilities works as safety work. Do you think the government has the ability to differentiate between these?
Become a member of LessWrong or the AI Alignment Forum
I think the goal is for the alignment forum to be somewhat selective in terms of who can comment.
(Removed some of my comments b/c I just noticed the clarification that you meant average member of the EA forum/Less Wrong. I would suggest changing the title of your post though).
For the record, I see the new field of "economics of transformative AI" as overrated.
Economics has some useful frames, but it also tilts people towards being too "normy" on the impacts of AI and it doesn't have a very good track record on advanced AI so far.
I'd much rather see multidisciplinary programs/conferences/research projects, including economics as just one of the perspectives represented, then economics of transformative AI qua economics of transformative AI.
(I'd be more enthusiastic about building economics of transformative AI as a field if we were starting five years ago, but these things take time and it's pretty late in the game now, so I'm less enthusiastic about investing field-building effort here and more enthusiastic about pragmatic projects combining a variety of frames).
Points for creativity, though I'm still somewhat skeptical about the viability of this strategy,
Why the focus on wise AI advisors?[1]
I'll be writing up a proper post to explain why I've pivoted towards this, but it will still take some time to produce a high quality post, so I decided it was worthwhile releasing a short-form description in the mean time.
By Wise AI Advisors, I mean training an AI to provide wise advice.
a) AI will have a massive impact on society given the infinite ways to deploy such a general technology
b) There are lots of ways this could go well and lots of ways that this could go extremely poorly (election interference, cyber attacks, development of bioweapons, large-scale misinformation, automated warfare, catastrophic malfunctions, permanent dictatorships, mass unemployment ect.)
c) There is massive disagreement on best strategy (decentralization vs. limiting proliferation, universally accelerating AI vs winning the arms race vs pausing, incremental development of safety vs principled approaches, offence-defence balance favoring the attacker or defender) or even what we expect the development of AI to look like (overhyped bubble vs machine god, business as usual vs this changes everything). Making the wrong call could prove catastrophic.
d) The AI is developing incredibly rapidly (no wall, see o3 crushing the ARC challenge!). We have limited time to act and to figure out how to act.
e) Given both the difficulty and the number of different challenges and strategic choices we'll be facing in short order, humanity needs to rapidly improve its capability to navigate such situations
f) Whilst we can and should be developing top governance and strategy talent, this is unlikely to be sufficient by itself. We need every advantage we can get, we can't afford to leave anything on the table.
g) Another way of framing this: Given the potential of AI development to feed back into itself, if it isn't also feeding back into increased wisdom in how we navigate the world, our capabilities are likely to far outstrip our ability to handle them.
For these reasons, I think it is vitally important for society to be working on training these advisors now.
Why frame this in terms of a vague concept like wisdom rather than specific capabilities?
I think the chance of us being able to steer the world towards a positive direction is much higher if we're able to combine multiple capabilities, so it makes sense to have a handle for the broader project, in addition to handles for individual sub-projects.
Isn't training AI to be wise intractable?
Possibly, though I'm not convinced it's harder than any of the other ambitious agendas and we won't know how far we can go without giving it a serious effort. Is training an AI to be wise really harder than aligning it? If anything, it seems like a less stringent requirement.
Compare:
• Ambitious mechanistic interpretability aims to perfectly understand how a neural network works at the level of individual weights
• Agent foundations attempting to truly understand what concepts like agency, optimisation, decisions are values are at a fundamental level
• Davidad's Open Agency architecture attempting train AI's that come with proof certificates that an AI has less than a certain probability of having unwanted side-effects
Is it obvious that any of these are easier?
In terms of making progress, my initial focus is on investigating the potential of amplified imitation learning, that is training imitation agents on wise people then enhancing them with techniques like RAG or trees of agents.
Does anyone else think wise AI advisors are important?
Going slightly more general to training wise AI rather than specifically advisors[2], there was the competition on the Automation of Wisdom and Philosophy organised by Owen Cotton-Barrett and there's this paper (summary) by Samuel Johnson and others incl. Yoshua Bengio, Melanie Mitchell and Igor Grossmann.
LintzA listed Wise AI advisors for governments as something worth considering in The Game Board Has Been Flipped[3].
Further Discussion:
You may also interested in reading my 3rd prize-winning entry to the AI Impacts Competition on the Automation of Wisdom and Philosophy. It's divided in two parts:
• An Overview of “Obvious” Approaches to Training Wise AI Advisors
• Some Preliminary Notes on the Promise of a Wisdom Explosion
I previously described my agenda as Wise AI Advisors via Imitation Learning. I now see that as overly narrow. The goal is to produce Wise AI Advisors via any means and I think that Imitation Learning is underrated, but I'm sure there's lots of other approaches that are underrated as well.
One key reason why I favour AI advisors rather than directly training wisdom into AI is that the human users can compensate for weaknesses in the advisors. For example, it only has to inspire the humans to make the correct choice rather than make the correct choice. We may take the harder step of training systems that don't have a human in the loop later, but this will be easier if we have AI advisors to help us with this.
No argument included sadly.