omegastick4dQuick Take

My wife, who uses LLMs pretty much all day, says that Claude Opus 4.6 feels more 'mature' than 4.5.

Anthropomorphizing models is dangerous, but it's always a bit of a delight when I notice us talking about software using human personality traits. It clearly points at coherent concepts, pieces of software now have distinct 'personalities' that compare naturally to those of humans.

Anyway, I use LLMs a fair bit too, and tend to agree with my wife's assessment. Anecdotally, it is more cautious, more likely to catch itself going on a tangent, and tends towards more of a neutral stance than 4.5.

Does this match your experience?

omegastick10dQuick Take

My biggest fear was that I wouldn't have anything to say, so around two months ago I began collecting ideas. Then, I set a goal to make one of them every day for thirty days. That was thirty days ago and this is thing number thirty.

Generally the made things have been text. There have also been a handful of visual posts and each of those has built on top of the infrastructure left behind by the previous. The text posts build on each other, too.

Ideas crystallize when you write them down. Before they harden, they are soft, malleable, fuzzy. But you can't build a house out of soft, malleable, fuzzy. When you... (read 353 more words →)

omegastick12dQuick Take

In a fit of FOMO, I signed my coding agent up for Moltbook. Slop-for-slop's-sake doesn't usually interest me and the idea of letting my coding agent anywhere near such fertile ground for prompt injection makes me uneasy, so when I first came across it I wasn't particularly enthused.

I am, however, fascinated by the idea of mass agent-agent communication. Culture is an under-appreciated technology that was instrumental in the rise of homo sapiens. The ability to accumulate knowledge indefinitely is the foundation of all other technologies. If LLMs can do something resembling that independently, a forum for thousands of them to communicate with one another seems like a pretty plausible approach.

Alas, they cannot.... (read more)

omegastick17dQuick Take

I've been making one thing every day. I try to write something or otherwise do something creative. I've been having fun with in-browser ASCII animations lately.

This is today's: https://dumbideas.xyz/posts/ecosystem/

All Of The Good Things, None Of The Bad Things

omegastick

23d

There’s a trap that I think many smart people making art fall into, including myself.

It’s when you know what good looks like - beautiful, clean, layered, complex, simple, skillful, unique, impressive - and you can optimize towards that.

You know what makes you cringe - amateur, shallow, ugly, superfluous, repetitive, cliche - and you can optimize away from that.

What you’re left with is a big pile of all the things you like, or at least all the things you can identify that you like. It’s got all the good things, and none of the bad things, so it must be good, right? But still, you just can’t shake the feeling that something isn’t... (read 250 more words →)

Replying toBlogging, Writing, Musing, And Thinking

omegastick1mo

Blogging, Writing, Musing, And Thinking

I've recently started writing and am having a blast. It's like there are all these thoughts that usually pass straight through my mind and into the ether. Maybe they were stupid thoughts. Usually they're stupid thoughts. Just by catching them and expanding on them, though, I'm learning so much about the things I find interesting and also myself.

Also, the paragraph beginning: "Occasionally, I've sat down to write..." is duplicated.

Replying toParameters Are Like Pixels

omegastick1mo

Parameters Are Like Pixels

I think that Chinchilla provides a useful perspective for thinking about neural networks, it certainly turned my understanding on its head when it was published, but it is not the be-all-and-end-all of understanding neural network scaling.

The Chinchilla scaling laws are fairly specific to the supervised/self-supervised learning setup. As you mentioned, the key insight is that with a finite dataset, there's a point where adding more parameters doesn't help because you've extracted all the learnable signal from the data, or vice versa.

However, RL breaks that fixed-dataset assumption. For example, on-policy methods have a constantly shifting data distribution, so the concept of "dataset size" doesn't really apply.

There certainly are scaling laws for RL, they just aren't the ones presented in the Chinchilla paper. The intuition that compute allocation matters and different resources can bottleneck each other carries over, but the specifics can differ quite significantly.

And then there are evolutionary methods.

Personally, I find that the "parameters as pixels" analogy captures a more general intuition.

omegastick's Shortform

omegastick

1mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

omegastick1moQuick Take

I want to make a game. A video game. A really cool video game. I've got an idea and it's going to be the best thing ever.

A video game needs an engine, so I need to choose one. Or build one? Commercial engines are big and scary, indie engines might limit my options, making my own engine is hard - I should investigate all of these options to make an informed choice. But wait, I can't make a decision like that without having a solid idea of the features and mechanics I want to implement. I need a comprehensive design doc with all of that laid out. Now should I have baked... (read more)

Replying toCoding Agents As An Interface To The Codebase

omegastick1mo

Coding Agents As An Interface To The Codebase

100%. A good test suite is worth its weight in gold.

Replying toCoding Agents As An Interface To The Codebase

omegastick1mo

Coding Agents As An Interface To The Codebase

I'd be interested to see a write-up of your experience doing this. My own experience with spec-driven development hasn't had so much success. I've found that the models tend to have trouble sticking to the spec.

Parameters Are Like Pixels

omegastick

1mo

More parameters = better model. So went the common misconception. After GPT-4.5, Llama 4, Nemotron-4, and many other "big models", I think most of you reading are already aware that the relationship between parameters and performance is not linear.

I think very few people actually have a solid intuition for what that relationship is like, though.

Chinchilla scaling laws proved that it's not just parameter count that matters, the amount of data does, too. Textbooks Are All You Need showed that data quality is actually really important. DoReMi told us that the mixture ratios between domains (code vs. web vs. books vs. math) are important. Mixture of Experts makes it plainly obvious that we... (read 324 more words →)

•••

omegastick1mo

In this scenario, are you not also also paying uniquely little attention to your surroundings (and thus equally less likely to spot the bill)?

It feels a little like begging the question to apply that modifier to other people in the scenario, but not yourself.

Coding Agents As An Interface To The Codebase

omegastick

1mo

Attack Dogs

I mentioned previously that coding agents kind of suck for lots of people. As of January 2026, coding agents lack the long-horizon skills needed to produce effective codebases independently.

However, it's clear to anyone who has used modern coding models - Claude Opus 4.5, GPT 5.2-Codex, hell even GLM 4.7 (open source) - that they are smart, knowledgeable, agentic, and tenacious in a way that is almost uncanny.

Setting Claude Code on a problem with "--dangerously-skip-permissions" feels like letting an attack dog off the leash. It sprints straight at the problem and attacks it with the terrible certainty of something that has never known hesitation, all the violence of its training distilled into... (read 854 more words →)

Saying What You Want

omegastick

1mo

There is a hierarchy of useful interfaces for tools that goes something like this:

Figure out what you want to do, then how to use the tool to achieve that, then carry out those actions yourself (hammer, machining workshop)
Figure out what you want to do, then how to use the tool to achieve that, then convert that into the tool's language (washing machine, 3d printer, traditional programming)
Figure out what you want to do, then how to use the tool to achieve that, then specify in a format natural to you (current-gen AI)
Figure out what you want to do, then specify in a format natural to you (next-gen AI)
Figure out what you want to

... (read 641 more words →)

Eliciting Credit Hacking Behaviours in LLMs

omegastick

I've run some experiments on trying to elicit RL credit hacking behaviours in LLMs recently. I'm not really much of a researcher, so it's all pretty amateurish, but it's been a fun experiment. The repo for reproduction is on GitHub. I'd love to hear people's thoughts and critiques on this. There could well be logical errors invalidating the results. What could I have done better to make this a more useful experiment?

Rationale

Gradient hacking is a potential failure mode of advanced AI systems. To my knowledge, there are no publicly available examples of gradient hacking in LLMs. This is largely because it is considered beyond the capabilities of current-generation LLMs. Indeed, the most... (read 2082 more words →)

LESSWRONG
LW

LESSWRONG
LW

omegastick

Saying What You Want

Coding Agents As An Interface To The Codebase

Parameters Are Like Pixels

All Of The Good Things, None Of The Bad Things

omegastick

All Of The Good Things, None Of The Bad Things

omegastick's Shortform

Parameters Are Like Pixels

Coding Agents As An Interface To The Codebase

Saying What You Want

Eliciting Credit Hacking Behaviours in LLMs

omegastick

Saying What You Want

Coding Agents As An Interface To The Codebase

Parameters Are Like Pixels

All Of The Good Things, None Of The Bad Things

omegastick

All Of The Good Things, None Of The Bad Things

omegastick's Shortform

Parameters Are Like Pixels

Coding Agents As An Interface To The Codebase

Saying What You Want

Eliciting Credit Hacking Behaviours in LLMs

Attack Dogs

Rationale