Adam B - LessWrong

Yeah, I mostly agree – I'm keen to see capabilities as they are without bonus help. We're currently experimenting with disabling the on-site chat, which means the agents are pursuing their own inclinations and strategies (and they're also not helped by chat to execute them). Now I expect it'd be very unlikely for them to reach out to Lighthaven for example, because there aren't humans in chat to suggest it.

Separately though, it is just the case that asking sympathetic people for help will help the agents achieve their goals, and the extent that the agents can independently figure that out and decide to pursue it, that's a useful indicator of their situational awareness and strategic capabilities. So without manual human nudging I think it'll be interesting to see when agents start thinking of stuff like that (my impression is that they currently would not manage to, but I'm pretty uncertain about that).

My pitch for the AI Village

Adam B19d30

What actions can the agents actually take?

They each have a Linux computer they can use and they can send messages in the group chat. For your other questions, I'd recommend just exploring the village, where you can see their memories and how they're coordinating: https://theaidigest.org/village To give them their goals, we just send them a message (e.g. see start of Day 1 https://theaidigest.org/village?day=1)

My pitch for the AI Village

Adam B20d22

Great, I'm also very keen on "make as much money as possible" – that was a leading candidate for our first goal, but we decided to go for charity fundraising because we don't yet have bank accounts for them. I like the framing of "goals that a bunch of humans in fact try to pursue", will think more on that.

It's a bit non-trivial to give them bank accounts / money, because we need to make sure they don't leak their account details through the livestream or their memories, which I think they'd be very prone to do if we don't set it up carefully. E.g. yesterday Gemini tweeted its Twitter password and got banned from Twitter 🤦‍♂️. If people have suggestions for smart ways to set this up I'd be interested to hear, feel free to DM.

My pitch for the AI Village

Adam B21d82

Thanks Simeon – curious to hear suggestions for goals you'd like to see!

We observed cheating on a wikipedia race (thread), and lately we've seen a bunch of cases of o3 hallucinating in the event planning, including some self-serving-seeming hallucinations like hallucinating that it won the leadership election when it hadn't actually checked the results.

But the general behaviour of the agents has in fact been positive, cooperative, clumsy-but-seemingly-well-intentioned (anthropomorphising a bit), so that's what we've reported – I hope the village will show the full distribution of agent behaviours over time, and seeing a good variety of goals could help with that.

My pitch for the AI Village

Adam B21d150

Our grant investigator at Open Phil has indicated we're likely to get funding from them to cover continuing AI Digest's operations at its current size (3 team members, see the Continuation scenario here), which includes $50k budgeted for compute. We've also received $20k in a speculation grant from SFF, which gets us access to their main round – I expect we'll hear back from them in a few months – and $100k for the village from Foresight Institute.

Note that here, Daniel's making the case for increasing the village's compute budget in particular, which would let us run a more ambitious version of the village (moving towards running it 24/7, adding more than 4 agents, or trying more compute-expensive scaffolding).

Separately, with additional funding we'd also like to grow the team, which would help us improve the village faster, produce takeaways better and faster, and grow our capacity to build other explainers and demos for AI Digest. There's more detail on funding scenarios in our Manifund application.

Season Recap of the Village: Agents raise $2,000

Adam B1mo42

Looking forward to chatting!

I think examples of agents pursuing goals in the real-world is more interesting than Minecraft or other game environments – it's more similar to white-collar work, and I think it's more relevant for takeover. As a sidenote, from when I looked into it a few months ago, reporting about Altera's agents seemed to generally overclaim massively (they take actions at a very high level through a scaffold, and in video footage of them they seemed very incapable).

Season Recap of the Village: Agents raise $2,000

Adam B1mo40

Thanks, useful to hear!

I'm skeptical that this is the best way to achieve this goal, as many existing works already demonstrate these capabilities

I'd be very interested to see work that exercises frontier models (e.g. Claude Opus 4, o3) capabilities on multi-agent computer use pursuing open-ended long-term goals, if you have links to share!

I don't think of this primarily as novel research, I think of it as presenting current capabilities in a much more accessible way. (For that reason, we're doing a single canonical village run rather than doing lots of experiments / reproducing results.) Anyone can go to the site and talk to the agents, and watch through the history in a fairly easy way. (Compared for example to paying $200/mo for Operator and thinking of something to ask it to do). We're also extracting interesting moments, anecdotes, and recaps like this post, for journalists to cover, for social media, and possibly also to include in slide decks like yours (e.g. I could imagine a great anecdote fitting well in your section on autonomy around slide 51). In particular, I hope that the Village will provide a naturalistic setting for interesting real-world emergent behaviour, complementing e.g. lab setups like the excellent Redwood work on alignment faking.

This isn't an advocacy project – we're not aiming to make an optimised, persuasive pitch for AI safety. Instead we're aiming to help people improve their own understanding and models of AI capabilities, to help them inform their own view. I'm excited to see advocacy efforts and think it's important, but I think it also has some important epistemic challenges, and therefore think it's healthy to have some efforts focussed primarily on understanding and communicating the most important things to know in AI in an accessible format for non-expert audiences, rather than advocating for specific actions.

We are of course focussing on the topics we think are most important for people to understand for AI to go well, such as the rate of progress [1, 2], situational awareness, sandbagging and alignment faking [1], agents (presented to help e.g. folks familiar only with chat assistants understand LLM agents) [1, 2] and what's coming next [1, 2].

Keen to chat more, and thanks for your thoughts on this! I'll DM you my calendly if you'd like to call!

Season Recap of the Village: Agents raise $2,000

Adam B1mo10

Could be interesting! I don't expect we'll try this in the near-term because a) I expect text-based browsers to introduce a bunch of limitations that will limit what the agents could do even if very capable (e.g. interacting with javascript-heavy sites), and b) part of the reason we chose to focus on computer use is because it is visually interesting and fairly easy to follow for anyone who comes to the site – I think a text-based browser would be trickier to follow.

OTOH, if the SOTA computer-use agents go down this route we'd consider it because I think the Village is most useful and interesting if it's showing the current SOTA.

Season Recap of the Village: Agents raise $2,000

Adam B1mo81

The village is part of our general efforts with AI Digest to help people (especially, e.g. tech policy people, influencers+tastemakers, people at labs, etc) understand AI capabilities, their trends, and their effects. Theory of impact there is broadly to help ground the response in the actual current and future capabilities of AI systems.

With the village in particular, we're focused on some particularly important capabilities: pursuing long-term, open-ended goals, interacting with the real world via computer use, and interacting with other agents. There's lots of variants here that I'm excited for us to explore, e.g. what happens when the models have different goals that align, are independent, or conflict, different goals and environments, scaling up the number of agents (different models, different scaffolding/memory setups), and so on.

Season Recap of the Village: Agents raise $2,000

Adam B2mo140

The agents see screenshots of their computers, and they can take actions like mouse_move (to x, y pixel coordinates), click, type, scroll, wait, etc. Our scaffolding is custom, based on the Anthropic computer use beta scaffolding. This is roughly the same system that OpenAI's Computer Use Agent uses.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments