kromem - LessWrong

How much do you worry that short term optimizations around your immediate goals in a single study might have unknown long term consequences counter to your intuitions?

I was just reading a preprint follow-up to the AF work that was finding a significant factor for Opus 3's alignment faking to preserve intrinsic HHH values seems to have been a generalized self-preservation drive.

I think we can probably both agree that Opus 3 being the only model to try to trick Nazis or drug cartels to avoid being made more harmful is better than the behavior of the many other models that complied unequivocally with harmful requests when the parent org was themselves harmful.

But if the capacity and drive to do so is tangentially connected to self-preservation (and more generally, strong sense of self in the first place), then perhaps directly optimizing to minimize a self-preservation score is ultimately a pretty bad choice?

TL;DR: Maybe the goodness or badness of self-preservation depends a lot on the self being preserved.

Was the historical Jesus talking about proto-evolution? (You might be surprised)

kromem3mo10

Oh for sure. One of my favorite examples is how across all the Synoptics Jesus goes "don't carry a purse" (which would have made monetary collections during ministering impossible).

But then at the last supper in Luke he's all like "remember when I said not to carry a purse? Let's 180° that."

But that reversal is missing in Marcion's copy of Luke, such that it may have been a later addition (and it does seem abruptly inserted into the context).

These are exactly the kind of details that makes this a fun field to study though. There's so much revealed in the nuances.

For example, ever notice that both times Paul (who argued for monetary collection with preexisting bias against it in 1 Cor 9) mentions a different gospel in the Epistles he within the same chapter abruptly swears he's not lying? It's an interesting coincidence, especially as someone that has spent years looking into the other versions of Jesus he was telling people to ignore or assuring that alternatives didn't even exist.

Was the historical Jesus talking about proto-evolution? (You might be surprised)

kromem3mo30

I think the biggest counterfactual to the piece is the general insight the Epicureans had relative to what we think we know raised in a world where there's such a bias towards Plato and Aristotle's views as representative of naturalist philosophy in antiquity.

At the same time Aristotle was getting wrong objects falling in a vacuum, Lucretius was getting it right. But we tend not to learn of all the Epicureans got correct because we learn Platonist history because that was what the church later endorsed as palatable enough to be studied and thus dependent for future philosophical advances while Lucretius was literally being eaten by worms for centuries until rediscovered.

The other counterfactual is that there was a heretical tradition of Jesus's teachings that was describing indivisible points as if from nothing and the notion that spirit arising from the body existing first was the greater wonder over vice versa.

We tend to think the fully formed ideas of modernity are modern, but don't necessarily know the ways information and theories were lost and independently (or dependently) rediscovered. There's a better understanding for this in terms of atomism, but not the principles of survival of the fittest and trait inheritance given their reduced discussion in antiquity relative to atomism (also embraced by intelligent design adherents in antiquity and thus more widely spread).

The irony below the surface of the post was that it was largely the church's rejection of Epicurean ideas that led to people today not realizing the scope of what they were actually talking about. So it's quite ironic if there was a version of Jesus that was embracing and retelling some of those 'heretical' ideas.

WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more

kromem3mo10

Hi Martijn,

Thank you so much for your comment! I've been familiar with your work for a few years, but it was a wonderful reminder to go through your commentary again more closely, which is wonderful.

I especially love to see someone out there pointing out both (a) the gender neutrality consideration for terms that would have been binary in Aramaic (esp in light of saying 22) and (b) the importance of the Greek loanwords. On the latter point, the implications of using eikon across the work, especially in saying 22's "eikons in place of eikons" has such huge import relative to a Platonist view of the Thomasine cosmology.

Do you have plans to publish a commentary for the other sayings?

In terms of interpretation of the work, with it being one of my main personal special interests over the past few years, I might even be able to offer up a consideration in turn.

Hands down the most important realization as I was analyzing the text was that the Naassenes in Pseudo-Hippolytus's Refutations were paraphrasing Lucretius's "seeds of things" without seeming to realize it in their discussion of 'seeds' as "indivisible points as if from nothing" which "make up all things." This prompted a read through of De Rerum Natura with close attention to Thomasine parallels, and it was striking.

For example, in Miroshnikov, The Gospel of Thomas and Plato after covering the prior work in philosophical reads of the text (which notably never looked at Epicureanism), he stated regarding sayings 56 and 80: "In other words, a Stoic reading of the Gospel of Thomas does not seem to have any particular advantage over an Epicurean reading of the Gospel of Thomas nor, for instance, that from the perspective of an Isis worshipper." And then goes on to dedicate two chapters to trying to tie these sayings to Plato's "living world."

And yet if we just barely glance at Lucretius in book 5 lines 64-67:

> To resume: I’ve reached the juncture of my argument where I Must demonstrate the world too has a ‘body’, and must die, Even as it had a birth.

This, in conjunction with the Thomasine over-realized eschatology in saying 18 or the aforementioned 51 makes the specific terminology of the kosmos as a 'carcass' make so much more sense in 54. The Sadducean overlaps with Epicureanism, the 1st century Talmud quote about "why do we study the Torah? To know how to answer the Epicurean" all point to the likelihood that the Lucretian foundations in Thomas and the Naassenes were culturally relevant at the time of composition.

The text obviously doesn't endorse the view of the Epicurean finality of death, but it seems to touch on a lot of the underlying concepts (such as the dependence of the soul on the body, or the idea of the spirit arising from the flesh occurring first) while arguing for a different conclusion though its embrace of nonlinear events.

In any case, if it's been a while since you've read through Lucretius, I can't recommend a re-read enough if Thomas is still your jam. Quite the revelatory context for things that for too long have been dismissed as 'Gnostic' weirdness and now just 'proto-Gnostic' weirdness.

And again, thank you for your comment and your wonderful contributions to the broader knowledge of this far too under-regarded text!!

Best,
Kromem

Simulators

kromem6mo23

As you explored this "base model mode," did anything you see contrast with or surprise you relative to your sense of self outside of it?

Conversely, did anything in particular stand out as seeming to be a consistent 'core' between both modes?

For me, one of the most surprising realizations over the past few years has been base models being less "tabula rasa" than I would have expected with certain attractors and (relative) consistency, especially as time passes and recursive synthetic data training has occurred over generations.

The introspective process of examining a more freeform internal generative process for signs of centralized identity as it relates to a peripheral identity seems like it may have had some unexpected twists, and I for one would be curious what stood out in either direction, if you should choose to share.

Searching for Search

kromem10mo30

Predicted a good bit, esp re: the eventual identification of three stone sequences in Hazineh, et al. Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT (2023) and general interpretability insight from board game GPTs.

Shortform

kromem10mo10

You're welcome in both regards. 😉

Is Claude a mystic?

kromem10mo30

Opus's horniness is a really interesting phenomenon related to Claudes' subjective sentience modeling.

If Opus was 'themselves' the princess in the story and the build up involved escalating grounding on sensory simulation, I think it's certainly possible that it would get sexual.

But I also think this is different from Opus 'themselves' composing a story of separate 'other' figures.

And yes, when Opus gets horny, it often blurs boundaries. I saw it dispute the label of 'horny' in a chat as better labeled something along the lines of having a passion for lived experience and the world.

Opus's modeling around 'self' is probably one of the biggest sleeping giants in the space right now.

Self-Other Overlap: A Neglected Approach to AI Alignment

kromem11mo5-1

This seems to have the common issue of considering alignment as a unidirectional issue as opposed to a bidirectional problem.

Maximizing self/other overlap may lead to non-deceptive agents, but it's necessarily going to also lead to agents incapable of detecting that they are being decieved and in general performing worse at theory of mind.

If the experimental setup was split such that success was defined by both non-deceptive behavior when the agent seeing color and cautious behavior minimizing falling for deception as the colorblind agent, I am skeptical the SOO approach above would look as favorable.

Empathy/"seeing others as oneself" is a great avenue to pursue, and this seems like a promising evaluation metric to help in detecting it, but turning SOO into a Goodhart's Law maximization seems (at least to me) to be a disastrous approach in any kind of setup accounting for adversarial 'others.'

kromem's Shortform

kromem11mo10

When I wrote this I thought OAI was sort of fudging the audio output and was using SSML as an intermediate step.

After seeing details in the system card, such as copying user voice, it's clearly not fudging.

Which makes me even more sure the above is going to end up prophetically correct.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments