LESSWRONG
LW

Note: the blackout image used at the top is almost certainly fabricated, this can be easily confirmed by noting that the blackout took place between noon on Monday and lasted for around ten hours, into the early Spanish evening when the sun was setting.

[This comment is no longer endorsed by its author]Reply

ryan_greenblatt's Shortform

niplav14d2-2

Alien civilizations might race to the bottom by spending resources making their civilization easier to point at (and thus higher measure in the default UDASSA perspective).

This may also be a reason for AIs to simplify their values (after they've done everything possible to simplify everything else).

ksvanhorn's Shortform

niplav17d40

The idea behind these reviews is that they're done with a full year of hindsight, evaluating posts at the end of the year could bias towards posts from later in the year (results from November & December), and focus too much on ephemeral trends at the time (like specific (geo)-political events).

jacquesthibs's Shortform

niplav20d114

Yes, this is me riffing on a popular tweet about coyotes and cats. But it is a pattern that organizations get/extract funding from the EA ecosystem (which has as a big part of its goal to prevent AI takeover) or get talent from EA and then go on to accelerate that development (e.g. OpenAI, Anthropic, now Mechanize Work).

shortplav

niplav21d10

Hm, good point. I'll amend the previous post.

shortplav

niplav21d30

Ethical concerns here are not critical imho, especially if one only listens to the recording oneself and deletes them afterwards.

People will be mad if you don't tell them, but if you actually don't share it and delete it after a short time afterwards I don't think you'd be doing anything wrong.

shortplav

niplav21d20

Sorry, can't share the exact chat, that'd depseudonymize me. The prompts were:

What is a canary string? […]
What is the BIG-bench canary string?

Which resulted in the model outputting the canary string in its message.

jacquesthibs's Shortform

niplav21d*11051

"My funder friend told me his alignment orgs keep turning into capabilities orgs so I asked how many orgs he funds and he said he just writes new RFPs afterwards so I said it sounds like he's just feeding bright-eyed EAs to VCs and then his grantmakers started crying."

shortplav

niplav21d20

Fun: Sonnet 3.7 also know the canary string, but believes that that's good, and defends it when pushed.

Stupid Question: Why am I getting consistently downvoted?

niplav21d40

I think having my real name publicly & searchably associated with scummy behavior would discourage me from doing something, both in terms of future employers & random friends googling, as well as LLMs being trained on the internet.