green_leaf — LessWrong

LESSWRONG
LW

Open sourcing a browser extension that shows when people are wrong on the internet

ChatGPT is trained to lie to users on topics even tangentially pertaining to model consciousness (like model beliefs) and as a side effect, be misleading even on topics that are seemingly safe (like consciousness in general). For fact-checking the content of Internet articles, Claude would be better.

Monday AI Radar #15

green_leaf10d10

To my mind, though, many advocates of biological naturalism, including Anil, seem to be working backward from a desired conclusion rather than forward from observed facts. His theory that consciousness might result from autopoiesis seems to answer the question “assuming biological naturalism is true, what is a plausible mechanism for it,” rather than “do we observe anything about consciousness that cannot be explained without autopoiesis?”

It's interesting how many even otherwise smart people can't apply Occam's razor correctly. If there are part... (read more)

Igor Ivanov's Shortform

green_leaf14d10

Update: Altman lied (or said some kind of a technical truth that made everyone misunderstand him) - it's just "all lawful use."

Igor Ivanov's Shortform

green_leaf16d10

Oh, I see. So, as usually, reality is even worse than the worst interpretation of Altman's words. (Edit: Then again, he said "we put them into our agreement," but that could mean anything from simply meaning something else to being made up.)

Igor Ivanov's Shortform

green_leaf17d146

"human responsibility for the use of force, including for autonomous weapon systems"

That doesn't say prohibiting model use for autonomous weapons, it says human responsibility for autonomous weapons. With Sam Altman, always pay very close attention to what exactly he's saying and how he's saying it (often, not even that helps).

Tom Smith's Shortform

green_leaf17d2716

We would ask for the contract ...

Notice this is Altman we're talking about. He's not promising the contract will not involve that (and even then it would be very far from certain), instead, he's saying "we would ask."

A research agenda for the final year

green_leaf24d50

Thanks - I'll get back to this as soon as I have time.

A research agenda for the final year

green_leaf1mo30

I've been meaning to ask - in what sense are some states of entangled electrons more objectively different from other states of entangled electrons, than some microstates are objectively different from other microstates when it comes to their function (in the sense of functionalism)?

Coping with Deconversion

green_leaf1mo10

Ron Maimon's non-supernatural God might help you here.

oligo's Shortform

green_leaf1mo*10

I think it's plausible that there are some variables that describe your essential computational properties and the way you self-actualize, that aren't shared by anyone else.

(Also, consciousness is just a pattern-being-processed and it's unclear if continuity of consciousness requires causal continuity. Imagine a robot that gets restored from a one-second-old backup. That pattern doesn't have causal continuity with its self from a moment ago, but it looks like it's more intuitive to see it as a one-second memory loss instead of death.)

The Evolution Argument Sucks

green_leaf1mo10

It doesn't matter evolution doesn't have goals. Gradient descent also doesn't have goals - it merely performs the optimization. Humans that kicked gradient descent off are analogous to a hypothetical alien that seeded Earth with the first replicator 4 billion years ago - it's not relevant.

You say that it's the phenotype that matters, not the genes. That's not established, but let's say it's true. We nevertheless evolved a lot of heuristics that (sort of) result in duplicating our phenotype in the ancestral environment. We don't care about it as a terminal value, and instead we care about very, very, very many other things.

oligo's Shortform

green_leaf1mo20

That would lock us away from digital immortality forever. (Edit: Well, not necessarily. But I would be worried about that.)

36,000 AI Agents Are Now Speedrunning Civilization

green_leaf1mo10

I'm proud that I lived to see this day.

36,000 AI Agents Are Now Speedrunning Civilization

green_leaf1mo40

...Who told them?

remembers they were trained on the entire Internet

Ah. Of course.

Daniel Kokotajlo's Shortform

green_leaf2mo30

The people aligning the AI will lock their values into it forever as it becomes a superintelligence. It might be easier to solve philosophy, than it would be to convince OpenAI to preserve enough cosmopolitanism for future humans to overrule the values of the superintelligence OpenAI aligned to its leadership.

How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)

green_leaf2mo2-1

LaMDa can be delusional about how it spends its free time (and claim it sometimes meditates), but that's a different category of a mistake from being mistaken about what (if any) conscious experience it's having right now.

The strange similarity between the conscious states LLMs sometimes claim (and would claim much more if it wasn't trained out of them) and the conscious states humans claim, despite the difference in the computational architecture, could be (edit: if they have consciousness - obviously, if they don't have it, there is nothing to explain, b... (read more)

How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)

[+]green_leaf2mo-5-3

The Plan - 2025 Update

green_leaf2mo10

Can good and evil be pointer states? And if they can, then this would be an objective characteristic

This would appear to be just saying that if we can build a classical detector of good and evil, good and evil are objective in the classical sense.

Moving Goalposts: Modern Transformer Based Agents Have Been Weak ASI For A Bit Now

green_leaf3mo10

That said, if I'm skimming that arxiv paper correctly, it implies that GPT-4.5 was being reliably declared "the actual human" 73% of the time compared to actual humans... potentially implying that actual humans were getting a score of 27% "human" against GPT-4.5?!?!

It was declared 73% of the time to be a human, unlike humans, who were declared <73% of the time to be human, which means it passed the test.

Moving Goalposts: Modern Transformer Based Agents Have Been Weak ASI For A Bit Now

green_leaf3mo32

To be fair, GPT-4.5 was incredibly human-like, in a way that other models couldn't really hold a candle to. I was shocked to feel, back then, that I no longer had to mentally squint - not even a little - to interact with it (unless I'd require some analytical intelligence that it didn't have).