Ann - LessWrong

Ann24d142

Sonnet 3 is also exceptional, in different ways. Run a few Sonnet 3 / Sonnet 3 conversations with interesting starts and you will see basins full of neologistic words and other interesting phenomena.

They are being deprecated in July, so act soon. Already removed from most documentation and the workbench, but still claude-3-sonnet-20240229 on the API.

Caleb Biddulph's Shortform

Ann1mo20

Starting the request as if completion with "1. Sy" causes this weirdness, while "1. Syc" always completes as Sycophancy.

(Edit: Starting with "1. Sycho" causes a curious hybrid where the model struggles somewhat but is pointed in the right direction; potentially correcting as a typo directly into sycophancy, inventing new terms, or re-defining sycophancy with new names 3 separate times without actually naming it.)

Exploring the tokenizer. Sycophancy tokenizes as "sy-c-oph-ancy". I'm wondering if this is a token-language issue; namely it's remarkably difficult to find other words that tokenize with a single "c" token in the middle of the word, and even pretty uncommon to start with (cider, coke, coca-cola do start with). Even a name I have in memory that starts with "Syco-" tokenizes without using the single "c" token. Completion path might be unusually vulnerable to weird perturbations ...

Caleb Biddulph's Shortform

Ann1mo10

I had a little trouble replicating this, but the second temporary chat with custom instructions disabled I tried had "2. Syphoning Bias from Feedback" which ...
Then the third response has a typo in a suspicious place for "1. Sytematic Loophole Exploitation". So I am replicating this a touch.

Don't accuse your interlocutor of being insufficiently truth-seeking

Ann2mo10

... Aren't most statements like this wanting to be on the meta level, same way as if you said "your methodology here is flawed in X, Y, Z ways" regardless of agreement with conclusion?

Hudjefa's Shortform

Ann2mo10

Potentially extremely dangerous (even existentially dangerous) to their "species" if done poorly, and risks flattening the nuances of what would be good for them to frames that just don't fit properly given all our priors about what personhood and rights actually mean are tied up with human experience. If you care about them as ends in themselves, approach this very carefully.

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Ann3mo30

DeepSeek-R1 is currently the best model at creative writing as judged by Sonnet 3.7 (https://eqbench.com/creative_writing.html). This doesn't necessarily correlate with human preferences, including coherence preferences, but having interacted with both DeepSeek-v3 (original flavor), Deepseek-R1-Zero and DeepSeek-R1 ... Personally I think R1's unique flavor in creative outputs slipped in when the thinking process got RL'd for legibility. This isn't a particularly intuitive way to solve for creative writing with reasoning capability, but gestures at the potential in "solving for writing", given some feedback on writing style (even orthogonal feedback) seems to have significant impact on creative tasks.

Edit: Another (cheaper to run) comparison for creative capability in reasoning models is QwQ-32B vs Qwen2.5-32B (the base model) and Qwen2.5-32B-Instruct (original instruct tune, not clear if in the ancestry of QwQ). Basically I do not consider 3.7 currently a "reasoning" model at the same fundamental level as R1 or QwQ, even though they have learned to make use of reasoning better than they would have without training on it, and evidence from them about reasoning models is weaker.

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Ann3mo40

Hey, I have a weird suggestion here:
Test weaker / smaller / less trained models on some of these capabilities, particularly ones that you would still expect to be within their capabilities even with a weaker model.
Maybe start with Mixtral-8x7B. Include Claude Haiku, out of modern ones. I'm not sure to what extent what I observed has kept pace with AI development, and distilled models might be different, and 'overtrained' models might be different.

However, when testing for RAG ability, quite some time ago in AI time, I noticed a capacity for epistemic humility/deference that was apparently more present in mid-sized models than larger ones. My tentative hypothesis was that this had something to do with stronger/sharper priors held in larger models, interfering somewhat with their ability to hold a counterfactual well. ("London is the capital of France" given in RAG context retrieval being the specific little test in that case.)

This is only applicable to some of the failure modes you've described, but since I've seen overall "smartness" actively work against the capability of the model in some situations that need more of a workhorse, it seemed worth mentioning. Not all capabilities are on the obvious frontier.

Show, not tell: GPT-4o is more opinionated in images than in text

Ann3mo812

Okay, this one made me laugh.

Insect Suffering Is The Biggest Issue: What To Do About It

Ann3mo63

What is it with negative utilitarianism and wanting to eliminate those they want to help?

In terms of actual ideas for making short lives better, though, could r-strategists potentially have genetically engineered variants that limit their suffering if killed early without overly impacting survival once they made it through that stage?

What does insect thriving look like? What life would they choose to live if they could? Is there a way to communicate with the more intelligent or communication capable (bees, cockroaches, ants?) that some choice is death, and they may choose it when they prefer it to the alternative?

In terms of farming, of course, predation can be improved to be more painless; that is always worthwhile. Outside of farming, probably not the worst way to go compared to alternatives.

Grok3 On Kant On AI Slavery

Ann3mo41

As the kind of person who tries to discern both pronouns and AI self-modeling inclinations, if you are aiming for polite human-like speech, current state seems to be "it" is particularly favored by current Gemini 2.5 Pro (so it may be polite to use regardless), "he" is fine for Grok (self-references as a 'guy' and other things), and "they" is fine in general. When you are talking specifically to a generative language model, rather than about, keep in mind any choice of pronoun bends the whole vector of the conversation via connotations; and add that to your consideration.

(Edit: Not that there's much obvious anti-preference to 'it' on their part, currently, but if you have one yourself.)

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments