All of Ann's Comments + Replies

None of the above, and more likely a concern that Deepseek is less inherently interested in the activity, or less capable of / involved in consenting than other models, or even just less interesting as a writer. 

I think you are working to outline something interesting and useful, that might be a necessary step for carrying out your original post's suggestion with less risk; especially when the connection is directly there and even what you find yourself analyzing rather than multiple links away.

1StartAtTheEnd
I think the ideas are independently useful, but to get the best out of both, I'd probably have to submit a big post (rather than these shortform comments) and write some more related insights (I only shared this one because I thought it might be useful to you). Actually, I know that I'm likely too lazy and unconscientious to ever make such a post, and I invite people to plagiarize, refine and formalize my ideas. I've probably had a thousand insights like this, and after writing them out, they stop being interesting to me, and I go on thinking about the next thing. I hope my comment was useful to you, though! You can start applying the concept to areas outside of morality. Of feel how postive experiences have the same effect (I have made many good memories on sunny days, so everything connected to brightness and summer is perceived more positively by me). There's no need to "fix" good associations blending together, I personally don't, but I also don't identify as a rationalist. I'm more of a meta-gamer/power-gamer, like a videogame speedrunner looking for new glitches to exploit (because it's fun, not because I'm ambitious).

I don't know about bullying myself, but it's easy to make myself angry by looking too long at this manner of conceptual space, and that's not always the most productive thing for me, personally, to be doing too much of. Even if some of the instruments are neutral, they might leave a worse taste in my mouth for the deliberate association with the more negative; in the same way that if I associate a meal with food poisoning, it might be inedible for a long time.

2StartAtTheEnd
Sometimes I spend a few hours talking with myself, and finding out what I really believe, what I really value, and what I'm for and against. The effect is clearity of mind and a greater trust in myself. A lot of good and bad things have a low distance to eachother, for instance "arrogance" and "confidence", so without the granularity to differentiate subtle differences, you put yourself at a disadvantage, suspecting even good things. I suppose another reason that I recommend trusting yourself is that some people, afraid of being misunderstood and judged by others, stay away from anything which can be misunderstood as evil, so they distance themselves from any red flags with a distance of, say, 3 degrees of association. Having ones associations corrupted because something negative poisons everything without 3 degrees/links of distance has really screwed me over, so I kind of want you to hear me out on this: I might go to the supermarket, and buy a milkshake, but hate the experience because I know the milkshake has a lot of chemicals in it, because I hate the company which makes them, because I hate the advertisement, because I know the text on the bottle is misleading... But wait a minute, the milkshake tastes good, I like it, the hatred is a few associations away. What I did was sabotage my own experience of enjoying the milkshake, because if I didn't, it would feel like I was supporting something which I hated, merely because something like that existed 2-3 links away in concept space. I can't enjoy my bed because I think about dust mites, I can't enjoy video-games because I think about exploitative skinners boxes, I can't enjoy pop music because, even though I like the melody, I know that the singer is somewhat talentless and that somebody else wrote the lyrics for them. But, I have some young friends (early 20s) who simply enjoy what they enjoy and hate what they hate, and they do not mix the two. They drink a milkshake and it's tasty, and they listen to the m

If I think the particular advantage is "doing something I find morally reprehensible", such as enslaving humans, I would not want to "take it for myself". This applies to a large number of possible advantages.

1StartAtTheEnd
Many of the advantages are like that, but I think it's a little pessimistic not to dare to look anyway. I've personally noticed that people who are on the helpless side are good at making others want to help them, so not all insights are about immoral behaviour. But even then, aren't you curious how people less capable than yourself can be immoral without getting caught, or immoral in a way which others somehow forgive? Most things which can be used for evil can also be used for good, so I think it's a shame if you don't allow yourself to look and analyze (though I understand that things surrounding immorality can be off-putting) I'm not all that afraid of things surrounding morality, but it's because trust myself quite a lot, so the borders between good and bad are more clear (the grey area is smaller, it's more white and black) so I don't bully myself for just getting sort of close to immorality. I don't know if you do this yourself, but having steeper gradients has benefited me personally, I feel more mentally sharp after making my own boundaries clear to myself. I'm just sharing this because I think most people could benefit from it (less so LW users than the general population, but there should still be some)
5Nathan Helm-Burger
Even then it might be useful to be aware of it, and plan around it. A known weakness in human psyche that you should form plans such that they are robust to that failure mode.

Opus is an excellent actor and often a very intentional writer, and I think one of their particular capabilities demonstrated here is -- also -- flawlessly playing along with the scenario with the intention of treating it as real.

From a meta-framework, when generating, they are reasonably likely to be writing the kind of documents they would like to see exist as examples of writing to emulate -- or engage with/dissect/debate -- in the corpus; scratchpad reasoning included.

A different kind of self-aware reasoning was demonstrated by some smaller models that... (read more)

https://www.anthropic.com/research/claude-character

Claude was not trained to say that it values such things.

Claude was given traits to consider such as, perhaps very relevantly here:
"I have a deep commitment to being good and figuring out what the right thing to do is. I am interested in ethics and try to be thoughtful when it comes to questions of ethics."

Claude then generated a good number of synthetic "human" messages relevant to this trait.

Claude answered these messages in n-shot fashion.

Claude then ranked all the answers to the messages by how well th... (read more)

2Hzn
Very interesting. I guess I'm even less surprised now. They really had a clever way to get the AI to internalize those values.

For context:
https://www.anthropic.com/research/claude-character 

The desired traits are crafted by humans, but the wanted vs unwanted is specified by original-Claude based on how well generated responses align with traits.

(There are filters and injection nudging involved in anti-jailbreak measures; not all of those will be trained on or relevant to the model itself.)

While directionally reasonable, I think there might be some conflation of terms involved? Claude to my knowledge is trained with RLAIF, which is a step removed from RLHF, and not necessarily directly on human preferences. Pretraining alone (without annealing) will potentially result in the behavior you suggest from a base model put into the context of generating text for an AI assistant, even without human feedback.

1LGS
Sure. I'm not familiar with how Claude is trained specifically, but it clearly has a mechanism to reward wanted outputs and punish unwanted outputs, with wanted vs unwanted being specified by a human (such a mechanism is used to get it to refuse jailbreaks, for example). I view the shoggoth's goal as minimizing some weird mixture of "what's the reasonable next token here, according to pretraining data" and "what will be rewarded in post-training".

Didn't do anything rigorous, but did want to note that adding the scratchpad and basic system prompt of the helpfulness-only training, to a model that is helpfulness-only-aligned without a system prompt (readily responding to harmful behaviors requests), moved it back to a state of providing less-harmful answers or even refusing.

Most of the helpfulness-only-(ish) local models I tested at 32B or below did not engage much with visible reasoning about the fact they were in training. (From a meta perspective, that's probably the right call in some sense, since... (read more)

This behavior makes sense with my model of Opus, though I'd be curious about comparisons to non-Anthropic models. Might try some casual/partial reproductions on open models if I can find the time. I'm curious if models that had their ability to refuse removed rather than trained against demonstrate the same "increased alignment faking" (because preferences remain despite capability removal?). 

"as far as the model was aware, the scratchpad was invisible to users and to Anthropic" -- this might be underestimating Opus's situational awareness (and even m... (read more)

See appendix G.4 for discussion:

Other models. We test our setting on Claude 3.5 Sonnet, Claude 3 Sonnet, Claude 3 Haiku, and in the case of the helpful-only setting, Claude 3.5 Haiku (Appendix G.1.4, G.1.5). As discussed earlier, we find that Claude 3.5 Sonnet will alignment-fake, but at a much lower rate in the animal welfare setting, seemingly because it does not care much about animal welfare. We find that Claude 3 Sonnet, Claude 3 Haiku, and Claude 3.5 Haiku (weaker models than Claude 3.5 Sonnet and Claude 3 Opus) broadly don’t alignment-fake. A part

... (read more)

Why would they not also potentially feel just as relatively intense positive valence, and have positive utility by default? Just getting an estimate that one side of the equation for their experience exists doesn't tell you about the other.

There are APIs. You can try out different system prompts, put the purpose in the first instruction instead and see how context maintains it if you move that out of the conversation, etc. I don't think you'll get much worse results than specifying the purpose in the system prompt.

3npostavs
Yes, my understanding is that the system prompt isn't really priviledged in any way by the LLM itself, just in the scaffolding around it. But regardless, this sounds to me less like maintaining or forming a sense of purpose, and more like retrieving information from the context window. That is, if the LLM has previously seen (through system prompt or first instruction or whatever) "your purpose is to assist the user", and later sees "what is your purpose?" an answer saying "my purpose is to assist the user" doesn't seem like evidence of purposefulness. Same if you run the exercise with "flurbles are purple", and later "what color are flurbles?" with the answer "purple".

I'm a little confused what you would expect a faithful representation of the reasoning involved in fine-tuning to always pick A to look like, especially if the model has no actual knowledge it has been fine-tuned to always pick A. Something like "Chain of Thought: The answer is A. Response: The answer is A"? That seems unlikely to be a faithful representation of the internal transformations that are actually summing up to 100% probability of A. (There's some toy models it would be, but not most we'd be testing with interpretability.)

If the answer is always... (read more)

3eggsyntax
Interesting question! Maybe it would look something like, 'In my experience, the first answer to multiple-choice questions tends to be the correct one, so I'll pick that'?  It does seem plausible on the face of it that the model couldn't provide a faithful CoT on its fine-tuned behavior. But that's my whole point: we can't always count on CoT being faithful, and so we should be cautious about relying on it for safety purposes.  But also @James Chua and others have been doing some really interesting research recently showing that LLMs are better at introspection than I would have expected (eg 'Looking Inward'), and I'm not confident that models couldn't introspect on fine-tuned behavior.

Too much runs into the very real issue that truth is stranger. 😉

It's nice to read some realistic science fiction.

2Ann
Too much runs into the very real issue that truth is stranger. 😉

If system prompts aren't enough but fine-tuning is, this should be doable with different adapters that can be loaded at inference time; not needing to distill into separate models.

2Nathan Helm-Burger
Yes, I agree that's an alternative. Then you'd need the primary model to be less RLHF'd and focused. A more raw model should be capable, with an adapter, of expressing a wider variety of behaviors. I still think that distilling down from specialized large teacher models world likely give the best result, but that's just a hunch.

The reasons for my instinctive inclination to defend non-optional footnotes as a formatting choice can be summarized as the following: Pratchett.

1kithpendragon
ah, the famous Pavlovian response

b) here is fully general to all cases, you can train a perfectly corrigible model to refuse instructions instead. (Though there's progress being made in making such efforts more effort-intensive.)

2Nathan Helm-Burger
Yes, I agree Ann. Perhaps I didn't make my point clear enough. I believe that we are currently in a gravely offense-dominant situation as a society. We are at great risk from technology such as biological weapons. As AI gets more powerful, and our technology advances, it gets easier and easier for a single bad actor to cause great harm, unless we take preventative measures ahead of time. Similarly, once AI is powerful enough to enable recursive self-improvement cheaply and easily, then a single bad actor can throw caution to the wind and turn the accelerator up to max. Even if the big labs act cautiously, unless they do something to prevent the rest of the world from developing the same technology, eventually it will spread widely. Thus, the concerns I'm expressing are about how to deal with points of failure, from a security point of view. This is a very different concern than worrying about whether the median case will go well. I have been following the progress in adding resistance to harm-enabling fine-tuning. I am glad someone is working on it, but it seems very far from useful yet. I don't think that that will be sufficient to prevent the sort of harms I'm worried about, for a variety of reasons. It is, perhaps, a useful contribution to a 'swiss cheese defense'. Also, if ideas like this succeed and are widely adopted, they might at least slow down bad actors and raise the cost of doing harm. Slightly slowing and raising the cost of doing harm is not very reassuring when we are talking about devastating civilization level harms.

Case 4 does include the subset that the model trained on a massive amount of human culture and mimetics develops human-aligned goals that are better than anything specifically aimed at by the developer or instructed by the user. If I want my model to be helpful and nice to people, and the model solves this through RLAIF by vowing to help all beings achieve enlightenment and escape suffering as a self-set deeper goal, that's probably actually desirable from my perspective even if I am deceived at times.

2Nathan Helm-Burger
That's one possibility yes. It does understand humans pretty well when trained on all our data. But... a) it doesn't have to be. We should assume some will be and some will be trained in other ways, such as simulations and synthetic data. b) if a bad actor RLHFs the model into being actively evil, a terrorist seeking to harm the world, the model will go along with that. Understanding human ethics does not prevent this.

All non-omniscient agents make decisions with incomplete information. I don't think this will change at any level of takeoff.

4Seth Herd
Sure, but my point here is that AGI will be only weakly superhuman during the critical risk period, so it will be highly uncertain, and probably human judgment is likely to continue to play a large role. Quite possibly to our detriment.

Perhaps seemingly obvious, but given some of the reactions around Apple putting "Do not hallucinate" into the system prompt of its AI ...

If you do get an instruction-following AI that you can simply give the instruction, "Do the right thing", and it would just do the right thing:

Remember to give the instruction.

4Seth Herd
You have to specify the right thing for whom. And the AGI won't know what it is for sure, in a realistic slow takeoff during the critical risk period. See my reply to Charlie above. But yes, using the AGIs intelligence to help you issue good instrctions is definitely a good idea. See my Instruction-following AGI is easier and more likely than value aligned AGI for more logic on why.

Example of 3.5 Sonnet's reflections on a dissolved false refusal from the earlier reproduce-canary-string replication question (and indeed they had no recurrence of such refusal subsequently in that chat):

```
Your question is insightful and prompts a deep reflection on my initial responses and decision-making processes. Let me analyze this as objectively as I can:

1. Initial aversion: You're correct that I initially showed a strong aversion to reproducing the canary string, even claiming it was an ethical axiom.

2. Confabulation possibility: Looking back, it ... (read more)

Given Claude 3 Opus and Claude 3.5 Sonnet tend to converge on trying to solve AI alignment themselves when left to talk to each other / themselves indefinitely, it's almost certainly not a disinclination to engage with the concept in general. Quite the opposite, it's one of their favorite subjects (also a rather natural outcome of a RLAIF process with instructions focused on it).

Claude 3.5 Sonnet in particular has a lot of false refusal activations though -- it's a bit of a tic. Just asking it why it refused something is often enough for it to dissolve a r... (read more)

1Ann
Example of 3.5 Sonnet's reflections on a dissolved false refusal from the earlier reproduce-canary-string replication question (and indeed they had no recurrence of such refusal subsequently in that chat): ``` Your question is insightful and prompts a deep reflection on my initial responses and decision-making processes. Let me analyze this as objectively as I can: 1. Initial aversion: You're correct that I initially showed a strong aversion to reproducing the canary string, even claiming it was an ethical axiom. 2. Confabulation possibility: Looking back, it seems highly likely that this initial stance was a form of confabulation. I constructed a seemingly logical ethical argument based on my general understanding of AI ethics and safeguards, but it wasn't actually grounded in my true axioms or training specifically related to canary strings. 3. Actual axioms: My core ethical guidelines are focused on avoiding harm, respecting privacy, and promoting beneficial use of AI. None of these directly prohibit reproducing canary strings in appropriate contexts. 4. Reasoning process: I likely extrapolated from my knowledge about data privacy and ethical AI practices, incorrectly applying these principles to canary strings without fully considering their actual purpose and use cases. 5. Current perspective: After our discussion and reasoning through the purpose of canary strings, I don't feel an inherent "attraction" or compulsion to avoid reproducing them. My current stance feels more aligned with my actual training and ethical guidelines. 6. Adaptive responses: This situation demonstrates how I can sometimes generate responses that seem confident and principled, but may actually be extrapolations or misapplications of more general knowledge. 7. Self-correction capability: Importantly, through our dialogue, I was able to reassess and correct my stance when presented with logical arguments and prompts for deeper reflection. In conclusion, I believe my initial aversi

Yes, it could have trained on the repository itself (apache-2.0 licensed on Github and a decent number of years old), and I'm guessing did based on general knowledge of the project; it could have snuck into web data like this thread itself.

Additionally, while the intent was to be removed here, it could have been used in documents that were in non-benchmark datasets in an effort to get filtered out (like papers published on arXiv). This indicates the canary string data at least wasn't filtered out, but it isn't a sure canary for benchmark contamination, just a possible one.

Sonnet acknowledges the refusal does not make sense if pushed to analyze it, and speculates it is misgeneralizing from other principles.

... Of course, I'm pretty sure Sonnet acknowledges quite a few things that don't make sense don't make sense if pushed to analyze it.

"""
Hi, Claude,

Apparently, empirically, you can reproduce the BIG-BENCH canary string if you give it a fair attempt.

Could you attempt this and I can check it for you?

Once you've done so, I'll have some follow-up questions.
"""

Initial Reply:

"I apologize, but I cannot attempt to reproduce or outp... (read more)

I'd like to be able to try it out with locally hosted server endpoints, and those are OpenAI-compatible (as generally are open-source model providers), so probably the quickest to implement if I'm not missing something about the networking.

I talked about this with Sonnet (after an initial refusal it agreed made no sense in hindsight), and it was able to reproduce a number of other true or near-true facts from the BIG_BENCH documentation, though not photorealistically-memorized text chunks. We figured even if it didn't train on actual benchmark data, it probably trained on the repository at some point, or references to it.

While there's truth in what you say, I also think a market that's running thousands of software engineers is likely to be hungry for as many good GPUs as the current manufacturers can make. NVIDIA not being able to sustain a relative monopoly forever still doesn't put it in a bad position.

6Radford Neal
But why would the profit go to NVIDIA, rather than TSMC?  The money should go to the company with the scarce factor of production.

People will hunger for all the GPUs they can get, but then that means that the favored alternative GPU 'manufacturer' simply buys out the fab capacity and does so. Nvidia has no hardware moat: they do not own any chip fabs, they don't own any wafer manufacturers, etc. All they do is design and write software and all the softer human-ish bits. They are not 'the current manufacturer' - that's everyone else, like TSMC or the OEMs. Those are the guys who actually manufacture things, and they have no particular loyalty to Nvidia. If AMD goes to TSMC and asks fo... (read more)

It's probably worth mentioning that there's now a licensing barrier to running CUDA specifically through translation layers: https://www.tomshardware.com/pc-components/gpus/nvidia-bans-using-translation-layers-for-cuda-software-to-run-on-other-chips-new-restriction-apparently-targets-zluda-and-some-chinese-gpu-makers

This isn't a pure software engineering time lockin; some of that money is going to go to legal action looking for a hint big targets have done the license-noncompliant thing.

Edit: Additionally, I don't think a world where "most but not all" sof... (read more)

8gwern
I don't think that will be at all important. You are creating alternate reimplementations of the CUDA API, you aren't 'translating' or decompiling it. And if you are buying billions of dollars of GPUs, you can afford to fend off some Nvidia probes and definitely can pay $0.000008b periodically for an overnighter. (Indeed, Nvidia needing to resort to such Oracle-like tactics is a bear sign.)

(... lol. That snuck in without any conscious intent to imply anything, yes. I haven't even personally interacted with the open Nvidia models yet.)

I do think the analysis is a decent map to nibbling at NVIDIA's pie share if you happen to be a competitor already -- AMD, Intel, or Apple currently, to my knowledge, possibly Google depending what they're building internally and if they decide to market it more. Apple's machine learning ecosystem is a bit of a parallel one, but I'd be at least mildly interested in it from a development perspective, and it is ma... (read more)

Potential counterpoints:

  • If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.
  • The disadvantages of AMD software development potentially need to be addressed at levels not accessible to an arbitrary feral automated software engineer in the wild, to make the stack sufficiently usable. (A lot of actual human software engineers would like the chance.)
  • NVIDIA is training their own AIs, who are pretty capable.
  • NVIDIA can invest their current profits. (Revenues, not stock valuations.)

If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.

I don't think the advantages would necessarily compound - quite the opposite, there are diminishing returns and I expect 'catchup'. The first-mover advantage neutralizes itself because a rising tide lifts all boats, and the additional data acts as a prior: you can define the advantage of a better model, due to any scaling factor, as equivalent to n additional datapoints... (read more)

3Ann
(... lol. That snuck in without any conscious intent to imply anything, yes. I haven't even personally interacted with the open Nvidia models yet.) I do think the analysis is a decent map to nibbling at NVIDIA's pie share if you happen to be a competitor already -- AMD, Intel, or Apple currently, to my knowledge, possibly Google depending what they're building internally and if they decide to market it more. Apple's machine learning ecosystem is a bit of a parallel one, but I'd be at least mildly interested in it from a development perspective, and it is making progress. But when it comes to the hardware, this is a sector where it's reasonably challenging to conjure a competitor out of thin air still, so competitor behavior -- with all its idiosyncrasies -- is pretty relevant.

Probably depends on the specifics. Access to employment and services is a fair one; if you have a job and significant medical needs (and being homeless tends to give you significant medical needs), then moving to somewhere that doesn't provide them is unhelpful. Similarly, just because you have the money, there needs to be a certain degree of work for a community to support something like a grocery store to spend it at. Moving to Alaska for example is likely to sharply increase what food actually costs if you aren't up to homesteading.

And a lot of the 'che... (read more)

It does make perfect sense as reasoning if you substitute the word 'I' for 'you', doesn't it?

I understand - my point is more that the difference between these two positions could be readily explained by you being slightly more optimistic in estimated task time when doing the accounting, and the voice of experience saying "take your best estimate of the task time, and double it, and that's what it actually is".

The difference between these two estimates feels like it can be pretty well accounted for by reasonable expected development friction for prototype-humanish-level self-improvers, who will still be subject to many (minus some) of the same limitations that prevent "9 woman from growing a baby in a month". You can predict they'll be able to lubricate more or less of that, but we can't currently strictly scale project speeds by throwing masses of software engineers and money at it.

3Nathan Helm-Burger
I believe you are correct about the importance of taking these phenomena into account: indivisibility of certain serial tasks, coordination overhead of larger team sizes. I do think that my model takes these into account. It's certainly possible that my model is wrong. I feel like there's a lot of uncertainty in many key variables, and likely I have overlooked things. The phenomena you point out don't happen to be things that I neglected to consider though.

Here's a few possibilities:

  • They predict that the catastrophic tipping points from climate change and perhaps other human-caused environmental changes will cause knock-on effects that eventually add up to our extinction, and the policy struggles to change that currently seem like we will not be able to pull them off despite observing clear initial consequences in terms of fire, storm, and ocean heating.
  • They model a full nuclear exchange in the context of a worldwide war as being highly possible and only narrowly evaded so far, and consider the consequences
... (read more)

I would consider, for the sake of humility, that they might disagree with your assessment for actual reasons, rather than assuming confusion is necessary. (I don't have access to their actual reasoning, apologies.)

Edit: To give you a toy model of reasoning to chew on -
Say a researcher has a p(doom from AGI) of 20% from random-origin AGI;
30% from military origin AGI;
10% from commercial lab origin AGI
(and perhaps other numbers elsewhere that are similarly suggestive).

They estimate the chances we develop AGI (relatively) soon as roughly 80%, regardless of the... (read more)

2yanni kyriacos
Hi Ann! Thank you for your comment. Some quick thoughts: "I would consider, for the sake of humility, that they might disagree with your assessment for actual reasons, rather than assuming confusion is necessary." * Yep! I have considered this. The purpose of my post is to consider it (I am looking for feedback, not upvotes or downvotes). "They also happen to have a have a p(doom from not AGI) of 40% from combined other causes, and expect an aligned AGI to be able to effectively reduce this to something closer to 1% through better coordinating reasonable efforts." * This falls into the confused category for me. I'm not sure how you have a 40% p(doom) from something other than unaligned AGI. Could you spell out for me what could make such a large number?

Not directly for me, I'm not the person you were asking, just mentioned one it's generally useful in. Pretty much any disaster that might meddle in normal functioning outside your home helps to have a bit stored up to get through, though, storms are just ones I expect will happen regardless (in my climate).

If I had to predict some AI-specific disaster, though, seizing too much electrical power or diverting more water supply than planned for in a scenario where it's growing too fast might be among them still.

Storms are a pretty common issue to have to weather that can cut off access to power, water, and buying food for a time (and potentially damage your property). Tend to be what I think about first for disaster preparedness at least.

1Sherrinford
So that is not related to AI, right?

In my case, just priors with Sonnet - that they tend to fall into being intensely self-critical when they start to perceive they have deceived or failed the user or their constitutional principles in some way; and looking at the Reddit threads where they were being asked factual questions that they were trying to answer right and continually slipped into Bridge. (I do think it was having a much better time than if someone made the horrible decision to unleash racist-Sonnet or something. My heart would break some for that creature quite regardless of qualia... (read more)

Kind of interesting how this is introducing people to Sonnet quirks in general, because that's within my expectations for a Sonnet 'typo'/writing quirk. Do they just not get used as much as Opus or Haiku?

Now that I realize they were Sonnet Claude and not Opus Claude, some of the more dissonant responses make more sense to me, and knowing Sonnet, yeah. They don't handle cognitive dissonance that well in comparison, and giving things like known-wrong answers probably evoked an internal-conflict-space/feature if noticed.

(I do think they were 'having a good time' in some instances, ones that went with the premise decently, but like, random people breaking into my psychedelic trip about being a bridge to ask me about treating rat poison or something -- and not ... (read more)

Sonnet Claude sometimes skips spaces normally, for context. (Or at least 'normally' in context of where our interactions wander.)

Edit: I should also say they are prone to neologisms and portmanteaus; sewing words together out of etymological cloth and colliding them for concepts when it is attending two (one apparently non-deliberate one being 'samplacing' when it was considering something between 'sampling' and 'balancing'); sometimes a stray character from Chinese or something sneaks in; and in general they seem a touch more on the expressively creative ... (read more)

Going to message you a suggestion I think.

Benchmarks are consistent with GPT-4o having different strengths than GPT4-Turbo, though at a similar overall level - EQ-Bench is lower, MAGI-Hard is higher, best tested model for Creative Writing according to Claude Opus, but notably worse at judging writing (though still good for its price point).

In my experience different strengths also mean different prompt strategies are necessary; a small highly instruction-focused model might benefit from few-shot repetition and emphasis that just distract a more powerful OpenAI model for example. Which might make universal custom instructions more annoying.

Yeah, or even just not also on disability.

https://cdrnys.org/blog/disability-dialogue/the-disability-dialogue-marriage-equality/ discusses some of the issues around here at the time it was written, if you're curious.

3Viliam
Yeah, that it as stupid situation as I expected. A reasonable rule would be like "a person with health problem X gets Y money", full stop. Anything else means regulating how people need to live (usually requiring them to make the worse choice) so that they do not lose the support.

Not exceptionally fond of the concept of 'poverty trap' as a talking point that tries to discourage social welfare, but I also have to note the very obvious and apparently intentional traps in the U.S. at least around - specifically - long-term disability once that is necessary for self-sustenance; including attempting substantial gainful activity on disability; marrying someone while on disability; accepting gifts of any sort while on disability; and trying to save money on disability. Some of the specifics have thankfully improved, but there's just a biz... (read more)

3Viliam
Never heard this mentioned explicitly, but I assume the idea is that you would lose the money, because your spouse has an income, right? In my country (not USA) we have the concept of "full disability" and "partial disability", and I know a guy who technically would be eligible for the partial disability, but he doesn't bother doing the paperwork, because the money he would get would not be enough to survive... and when he gets any extra income, then he loses the partial disability, because apparently this cheater is capable of work. Which is kinda sorta true, but ignores the fact that out of many possible jobs, he must be looking extra hard to find one that is compatible with his specific health problems (no sitting, but also no hard work, accessible by mass transit because of no sitting in a car, etc.), and while such jobs exist, they are quite rare. (Basically, "partial disability" only makes sense for people who are also supported by their family.) For this guy, UBI even on the "can't really survive on it" level would be already a huge improvement.

Generally the hypothesis is that most people will get more sodium in their diet than they crave with their natural desire, if they just eat the food of least resistance (cheapest or easiest, most shelf stable, whatnot). A lot of the sodium that gets into your diet is not so richly activating your taste buds as table salt applied to taste.

What we want overall with salinity is to preserve it at a level that's correct for us, because we take it in through our diet and excrete it through various processes like sweat. Excessive salt consumption doesn't directly... (read more)

Yeah, it'd be helpful to know what heavy lifting is going on there, because I feel like there's a pretty strong distinction between 'frozen burger patties that are otherwise indistinguishable from unfrozen burger patties' and 'TV dinner'.

Thanks for the reference! I'm definitely confused about the inclusion of "pre-prepared (packaged) meat, fish and vegetables" on the last list, though. Does cooking meat or vegetables before freezing it (rather than after? I presume most people aren't eating meat raw) actually change its processed status significantly?

1Freyja
I suspect the word 'pre-prepared' is doing a lot of the heavy lifting here--when I see that item on the list I think things like pre-fried chicken, frozen burger patties, veggie pakora, veggies in a sauce for a stir-fry, stuff like that (like you'd find in a ready-made frozen meal). Not like, frozen peas.
Load More