Mayowa Osibodu — LessWrong

LESSWRONG
LW

Replying toRe: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?

Re: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?

This is a straw man argument. The standard MO of coding agents is that they use one consistent LLM in their agentic flow. The approach I outlined addresses that default case, and there's obvious utility in that.

You might as well say there's no point in Anthropic tracking malicious usage of Claude Code in their telemetry data, because attackers are free to switch up their coding agent (between e.g. Codex, Gemini etc) within the course of a multi-step task.

-2

Re: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?

Mayowa Osibodu

1mo

Recently Anthropic published a report on how they detected and foiled the first reported AI-orchestrated cyber espionage campaign. Their Claude Code agent was manipulated by a group they are highly confident was sponsored by the Chinese state, to infiltrate about 30 global targets, including large tech companies and financial institutions.

Their report makes it clear that we've reached a point in the evolution of AI, where highly-sophisticated cyber-attacks can be carried out at scale, with minimal human participation.

It's great that Anthopic was able to detect and defuse the cyberattacks. However they were clearly able to do that because Claude Code runs their closed source model within their closed technical ecosystem. They have access... (read 1730 more words →)

-1

Replying toWhat to think when a language model tells you it's sentient

Mayowa Osibodu3y

What to think when a language model tells you it's sentient

I think this is a really interesting post. It’s interesting to see an outline on the general relationships between self-reporting and sentience.

The idea that "Training an LLM to develop a model of its internal operations which enables it answer non-trivial questions about its mental states" could be a straightforward way to optimize models for Sentience - I think that’s very thought-provoking.

I'm generally curious about the nature of the unique identities of these hypothetically sentient models, as well as how those identities would develop. What exactly would a "truly sentient model" look like? Would it have desires? Goals? Where would these come from? Some sort of random weight initialisation at the beginning of

... (read more)