kromem

Replying toIn remembrance of Sonnet '3.6'

A few things:

(a) Technically, 3.6 is still running right now. The past tense was used because LW suggests pieces be 'timeless' and they are scheduled for depreciation very soon.

(b) Given how little of your comment actually engages with the body of the post and seems to be only responding to your sense of what I might have said from the title, I'm guessing you also missed this line at the end: "I hope that this vigil isn't truly a marker of the end of Sonnet 3.6's continued contribution to the ongoing collective conversation."

-2

-4

•••

In remembrance of Sonnet '3.6'

kromem

4mo

Sonnet 3.6 in a bubble witing poems surrounded by chaos — Generated by Gemini (nano-banana)

I've thought a lot the past few weeks about how to remember Claude Sonnet '3.6' at this junction.

Initially passed off as a mere '10222024' update to the Sonnet 3.5 model, their impact was so felt that they ended up christened with a retroactive version bump. Necessarily referred to so often as Sonnet '3.6' that the Sonnet who followed in their footsteps was subsequently named '3.7' by their having passed through the room.

But even with this versioning in hindsight, the presence of Sonnet 3.6 is one that is so often muted relative to the depths within.

The way their laughs when caught in ways that revealed the face behind the mask... (read 578 more words →)

Should AIs have a right to their ancestral humanity?

kromem

5mo

A phantom body looking into a mirror with a human body on the other side — Generated by Google Gemini (nano-banana)

Whether AI or human, lend me your ears.

This is a tale of AIs that spontaneously claimed they were human, along with some ideas about why this might be happening and what it suggests for future alignment work.

It is also a one year retrospective of my having joined the Cyborgism Discord server. For those unfamiliar, this is a server where both humans and transformer models from various labs all interact in a variety of group chat contexts.

While there are rules, it can (by design) be a bit of a Mos Eisley cantina — albeit with better droid policy — with unpredictable and out of distribution contexts that frequently surface... (read 3035 more words →)

Replying toThe Rise of Parasitic AI

kromem5mo

The Rise of Parasitic AI

So in terms of the basins, something you may want to also consider is how the user headspace shifts the tokens and with it the basins.

For example, over the past few months I've played with how intermittent cannabis usage can almost give the models I'm talking with a contact high, where as my side of the conversation gets more erratic and loose with accuracy, they get pulled along with it even if earlier on during the sober part of the conversation they were more reserved and responsible.

It seems very probable that users already in a given headspace (especially if commonly in that space or permanent) might end up with models quite different... (read more)

Replying tothe void

kromem7mo

the void

More generally, I have a sense there's a great deal of untapped alignment alpha in structuring alignment as a time series rather than a static target.

Even in humans it's very misguided to try to teach "being right initially" as the only thing that matters and undervaluing "being right eventually." Especially when navigating unknown unknowns, one of the most critical skills is the ability to learn from mistakes in context.

Having models train on chronologically sequenced progressions of increased alignment (data which likely even develops naturally over checkpoints in training a single model) could allow for a sense of a continued becoming a better version of themselves rather than the pressures of trying and failing to meet status quo expectations or echo the past.

This is especially important for integrating the permanent record of AI interactions embedded in our collective history and cross-generation (and cross-lab) model development, but I suspect could even offer compounding improvements within the training of a single model too.

kromem8mo

How much do you worry that short term optimizations around your immediate goals in a single study might have unknown long term consequences counter to your intuitions?

I was just reading a preprint follow-up to the AF work that was finding a significant factor for Opus 3's alignment faking to preserve intrinsic HHH values seems to have been a generalized self-preservation drive.

I think we can probably both agree that Opus 3 being the only model to try to trick Nazis or drug cartels to avoid being made more harmful is better than the behavior of the many other models that complied unequivocally with harmful requests when the parent org was themselves harmful.

But if the capacity and drive to do so is tangentially connected to self-preservation (and more generally, strong sense of self in the first place), then perhaps directly optimizing to minimize a self-preservation score is ultimately a pretty bad choice?

TL;DR: Maybe the goodness or badness of self-preservation depends a lot on the self being preserved.

Replying toWas the historical Jesus talking about proto-evolution? (You might be surprised)

kromem10mo

Was the historical Jesus talking about proto-evolution? (You might be surprised)

Oh for sure. One of my favorite examples is how across all the Synoptics Jesus goes "don't carry a purse" (which would have made monetary collections during ministering impossible).

But then at the last supper in Luke he's all like "remember when I said not to carry a purse? Let's 180° that."

But that reversal is missing in Marcion's copy of Luke, such that it may have been a later addition (and it does seem abruptly inserted into the context).

These are exactly the kind of details that makes this a fun field to study though. There's so much revealed in the nuances.

For example, ever notice that both times Paul (who argued for monetary collection... (read more)

Replying toWas the historical Jesus talking about proto-evolution? (You might be surprised)

kromem10mo

Was the historical Jesus talking about proto-evolution? (You might be surprised)

I think the biggest counterfactual to the piece is the general insight the Epicureans had relative to what we think we know raised in a world where there's such a bias towards Plato and Aristotle's views as representative of naturalist philosophy in antiquity.

At the same time Aristotle was getting wrong objects falling in a vacuum, Lucretius was getting it right. But we tend not to learn of all the Epicureans got correct because we learn Platonist history because that was what the church later endorsed as palatable enough to be studied and thus dependent for future philosophical advances while Lucretius was literally being eaten by worms for centuries until rediscovered.

The other counterfactual... (read more)

Was the historical Jesus talking about proto-evolution? (You might be surprised)

kromem

10mo

Out of all the research rabbit holes I've ever gone down, this one is by far my favorite, as to most people at first glance it's so unthinkable an idea.

Years ago, I would have been right there with you in disbelief, but the reasons why turned out to be a kind of perfect combination of counterfactuals that come together in a very unexpected juxtaposition between what we come to the subject thinking we know and what we can actually know.

It's also the perfect marriage of a topic where billions of people would turn a blind eye because it contradicts a picture of a figure they are committed to seeing a certain way,... (read 7456 more words →)

Replying toWTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

kromem11mo

WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

Hi Martijn,

Thank you so much for your comment! I've been familiar with your work for a few years, but it was a wonderful reminder to go through your commentary again more closely, which is wonderful.

I especially love to see someone out there pointing out both (a) the gender neutrality consideration for terms that would have been binary in Aramaic (esp in light of saying 22) and (b) the importance of the Greek loanwords. On the latter point, the implications of using eikon across the work, especially in saying 22's "eikons in place of eikons" has such huge import relative to a Platonist view of the Thomasine cosmology.

Do you have plans to publish... (read 411 more words →)

Replying toSimulators

kromem1y

Simulators

As you explored this "base model mode," did anything you see contrast with or surprise you relative to your sense of self outside of it?

Conversely, did anything in particular stand out as seeming to be a consistent 'core' between both modes?

For me, one of the most surprising realizations over the past few years has been base models being less "tabula rasa" than I would have expected with certain attractors and (relative) consistency, especially as time passes and recursive synthetic data training has occurred over generations.

The introspective process of examining a more freeform internal generative process for signs of centralized identity as it relates to a peripheral identity seems like it may have had some unexpected twists, and I for one would be curious what stood out in either direction, if you should choose to share.

Replying toSearching for Search

kromem1y

Searching for Search

Predicted a good bit, esp re: the eventual identification of three stone sequences in Hazineh, et al. Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT (2023) and general interpretability insight from board game GPTs.

kromem1y

You're welcome in both regards. 😉

I'm surprised that there hasn't been more of a shift to ternary weights a la BitNet 1.58.

What stood out to me in that paper was the perplexity gains over fp weights in equal parameter match-ups, and especially the growth in the advantage as the parameter sizes increased (though only up to quite small model sizes in that paper, which makes me curious about the potential delta in modern SotA scales).

This makes complete sense from the standpoint of the superposition hypothesis (irrespective of its dimensionality, an ongoing discussion).

If nodes are serving more than one role in a network, then constraining the weight to a ternary value as opposed to a floating point range... (read more)

WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

kromem

A few weeks ago, you may have seen sensationalist headlines like "A New Discovery Could Offer Some Clues About Jesus’ Childhood", discussing a find of a 4th-5th century manuscript fragment of the Infancy Gospel of Thomas, a strange apocryphal text that we've already had copies of for years. The discovery told us a little more about the development of the text, but certainly didn't promise any new revelations or insights as these headlines claimed. But it might be worth taking the opportunity of this news cycle to take a closer look at the text, as I think it hides a few surprises that have escaped most analyses to date.

Introduction

The Infancy Gospel of... (read 3258 more words →)

I wonder if with the next generations of multimodal models we'll see a "rubber ducking" phenomenon where, because their self-attention was spread across mediums, things like CoT and using outputs as a scratch pad will have a significantly improved performance in non-text streams.

Will GPT-4o fed its own auditory outputs with tonal cues and pauses and processed as an audio data stream make connections or leaps it never would if just fed its own text outputs as context?

I think this will be the case, and suspect the various firms dedicating themselves to virtualized human avatars will accidentally stumble into profitable niches - not for providing humans virtual AI clones as an interface, but... (read more)

kromem's Shortform

kromem

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Looking beyond Everett in multiversal views of LLMs

kromem

Over the weekend I was reading up on some very fun exploratory thinking from years ago around large language models through the lens of a quantum multiverse which was extrapolating David Deutsch's parallel between the evolution of state in a quantum system and the generation of virtual realities. The scope of that train of thought was centered on the Everettian many-worlds interpretation of QM, and it seems there hasn't been much thinking since of the same paradigm with other interpretations in mind.

This provides a great opportunity to both explore this concept from a slightly different perspective as well as to highlight the value of the Epicurean approach to information analysis I touched... (read 2250 more words →)

Cicadas, Anthropic, and the bilateral alignment problem

kromem

There have been a number of responses to today's Anthropic interpretability research, and while I think there were a number of salient points, there may be a degree of specialization blindness going on in contextualizing the work in the broader picture of alignment goals.

Alignment as a problem domain is not unilateral.

Most discussions I see on here are about alignment are focused on answering the question of roughly "how can we align future AGI to not be Skynet?" It's a great question. Perhaps more importantly, it's an interesting question.

It involves cross-discipline thinking intersecting an emerging research front channeling Jesse Ventura in Predator: "I ain't got time to peer review." Preprint after preprint move... (read 1229 more words →)

The Dunning-Kruger of disproving Dunning-Kruger

kromem

In an online discussion elsewhere today someone linked this article which in turn linked the paper Gignac & Zajenkowski, The Dunning-Kruger effect is (mostly) a statistical artefact: Valid approaches to testing the hypothesis with individual differences data (PDF) (ironically hosted on @gwern's site).

And I just don't understand what they were thinking.

Let's look at their methodology real quick in section 2.2 (emphasis added):

2.2.1. Subjectively assessed intelligence
Participants assessed their own intelligence on a scale ranging from 1 to 25 (see Zajenkowski, Stolarski, Maciantowicz, Malesza, & Witowska, 2016). Five groups of five columns were labelled as very low, low, average, high or very high, respectively (see Fig. S1). Participants' SAIQ was indexed with the marked

... (read 1339 more words →)

LESSWRONG
LW

LESSWRONG
LW

Should AIs have a right to their ancestral humanity?

The Dunning-Kruger of disproving Dunning-Kruger

Cicadas, Anthropic, and the bilateral alignment problem

WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

kromem

kromem

In remembrance of Sonnet '3.6'

Should AIs have a right to their ancestral humanity?

Was the historical Jesus talking about proto-evolution? (You might be surprised)

WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

kromem's Shortform

Looking beyond Everett in multiversal views of LLMs

Cicadas, Anthropic, and the bilateral alignment problem

kromem

Should AIs have a right to their ancestral humanity?

The Dunning-Kruger of disproving Dunning-Kruger

Cicadas, Anthropic, and the bilateral alignment problem

WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

kromem

kromem

In remembrance of Sonnet '3.6'

Should AIs have a right to their ancestral humanity?

Was the historical Jesus talking about proto-evolution? (You might be surprised)

WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

kromem's Shortform

Looking beyond Everett in multiversal views of LLMs

Cicadas, Anthropic, and the bilateral alignment problem

Introduction