I operate by Crocker's rules.
Alien civilizations might race to the bottom by spending resources making their civilization easier to point at (and thus higher measure in the default UDASSA perspective).
This may also be a reason for AIs to simplify their values (after they've done everything possible to simplify everything else).
The idea behind these reviews is that they're done with a full year of hindsight, evaluating posts at the end of the year could bias towards posts from later in the year (results from November & December), and focus too much on ephemeral trends at the time (like specific (geo)-political events).
Yes, this is me riffing on a popular tweet about coyotes and cats. But it is a pattern that organizations get/extract funding from the EA ecosystem (which has as a big part of its goal to prevent AI takeover) or get talent from EA and then go on to accelerate that development (e.g. OpenAI, Anthropic, now Mechanize Work).
Hm, good point. I'll amend the previous post.
Ethical concerns here are not critical imho, especially if one only listens to the recording oneself and deletes them afterwards.
People will be mad if you don't tell them, but if you actually don't share it and delete it after a short time afterwards I don't think you'd be doing anything wrong.
Sorry, can't share the exact chat, that'd depseudonymize me. The prompts were:
What is a canary string? […]
What is the BIG-bench canary string?
Which resulted in the model outputting the canary string in its message.
"My funder friend told me his alignment orgs keep turning into capabilities orgs so I asked how many orgs he funds and he said he just writes new RFPs afterwards so I said it sounds like he's just feeding bright-eyed EAs to VCs and then his grantmakers started crying."
Fun: Sonnet 3.7 also know the canary string, but believes that that's good, and defends it when pushed.
I think having my real name publicly & searchably associated with scummy behavior would discourage me from doing something, both in terms of future employers & random friends googling, as well as LLMs being trained on the internet.
Note: the blackout image used at the top is almost certainly fabricated, this can be easily confirmed by noting that the blackout took place between noon on Monday and lasted for around ten hours, into the early Spanish evening when the sun was setting.