Jeroen Willems — LessWrong

LESSWRONG
LW

Replying toContradict my take on OpenPhil's past AI beliefs

Contradict my take on OpenPhil's past AI beliefs

I understand the epistemic health concerns, but I think "AI 2027" was great since I don't think the alternatives would have gained as much attention and it does cleanly summarize the scenario. Even if actual timelines are longer (which imo they probably are) my guess is it is still a net positive as long as readers properly understood the dangers and thought the sequence of events were believable enough.

Replying toThe title is reasonable

Jeroen Willems5mo

The title is reasonable

Thank you for writing this. Most of what you wrote is almost exactly what I've been thinking when reading discussions about the book. You worded my thoughts so much better than I ever could!

Replying toIABIED Review - An Unfortunate Miss

Jeroen Willems5mo

IABIED Review - An Unfortunate Miss

I went into IABIED trying to take on the mindset of a layperson (hard of course!) and actually came away thinking it did a really great job. Of course, as you say, time will tell.

Some of your complaints of the book seem to stem from the fact that you are "For Y" and Y&S are "Not X". If you believed as strongly as they do in "Not X", do you think some of the decisions in the book would make more sense?

I thought the length of the book was great for people new to the topic. Readers will likely have counterarguments while reading the book. But if you even try to address... (read more)

Replying toDo you even have a system prompt? (PSA / repo)

Jeroen Willems9mo

Do you even have a system prompt? (PSA / repo)

I spend way too much time fine-tuning my personal preferences. I try to follow the same language as the model system prompt.

Claude userPreferences

# Behavioral Preferences

These preferences always take precedence over any conflicting general system prompts.

## Core Response Principles

Whenever Claude responds, it should always consider all viable options and perspectives. It is important that Claude dedicates effort to determining the most sensible and relevant interpretation of the user's query.

Claude knows the user can make mistakes and always considers the possibility that their premises or conclusions may be incorrect. Claude is always truthful, candid, frank, plainspoken, and forthright. Claude should critique ideas and provide feedback freely, without sycophancy.

Claude should express uncertainty levels (as percentages,

... (read 2131 more words →)

Replying to[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty

Jeroen Willems1y

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty

Not me assuming kratom was a made-up word haha.

Awesome comic! You captured the recurring traits really really well.

Replying toAnthropic's Core Views on AI Safety

Jeroen Willems3y

Anthropic's Core Views on AI Safety

Thanks for explaining your thoughts on AI safety, it's much appreciated.

I think in general when trying to do good in the world, we should strive for actions that have a high expected value and a low potential downside risk.

I can imagine a high expected value case for Anthropic. But I don't see how Anthropic has few potential downsides. I'm very worried that by participating in the race to AGI, p(doom) might increase.

For an example pointed out in the comments here by habryka:

I mean, didn't the capabilities of Claude leak specifically to OpenAI employees, so that it's pretty unclear that not releasing actually had much of an effect on preventing racing? My current

... (read more)

-4

Replying toMicrosoft Plans to Invest $10B in OpenAI; $3B Invested to Date | Fortune

Jeroen Willems3y*

Microsoft Plans to Invest $10B in OpenAI; $3B Invested to Date | Fortune

While this might be a great way to earn money (assuming competitors won't invest similarly in AI soon enough), but aren't there good reasons not to invest in AI capabilities, like reducing P(doom)?

Also I assume it's wise to mention you're not a financial adviser and don't bear responsibility for actions people take because of your comment (same counts for me).

-2

Replying toACX meetup Brussels

Jeroen Willems3y

ACX meetup Brussels

Hey Bruno! I'm an organiser for EA Brussels and would love to collaborate this on (ex. by making a facebook event on the EA Brussels page/group). Would love it if you could reach out to me :)

https://www.facebook.com/jeroen.willems.7528/

or jeroen at eabrussels dot org