One, writing a good blog post is not the same task as running a good blog. The latter is much longer-horizon, and the quality of the blog posts (subjectively, from the human perspective) depends on it in important ways. Much of the interest value of Slate Star Codex, or the Sequences - for me, at least - was in the sense of the blogger's ideas gradually expanding and clarifying themselves over time. The dense, hyperlinked network of posts referring back to previous ideas across months or years is something I doubt current LLM instances have the 'lifespan' to replicate. How long would an LLM blogger who posted once a day be able to remember what they'd already written, even with a 200k token context window? A month, maybe? You could mitigate this with a human checking over the outputs and consciously managing the context, but then it's not a fully LLM blogger anymore - it's just AI writing scaffolded by human ideas, which people are already doing.
The same is maybe true of a forum poster or commenter, though the expectation that the ideas add up to a coherent worldview is much less strict. I'm not sure why there aren't more of these. Maybe because when people want to know Claude's opinion on such-and-such post, they can just paste it into a new instance to ask the model directly?
Two, the bad posting quality might not just be a limitation of the HHH-assistant paradigm, but of chatbot structures in general. What I mean by this is that, even setting aside ChatGPT's particular brand of dull gooey harmlessness, conversational skill is a different optimization target than medium- or long-form writing, and it's not obvious to me that they inherently correlate. Take video games as an example. There are games that are good at being passive entertainment, and there are games that are very engaging to play, but it's hard to optimize for both of these at once. The best games to watch someone else play are usually walking sims, where the player is almost entirely passive. These tend to do well on YouTube and Twitch (Mouthwashing is the most recent example I can think of), since very little is lost by taking control away from the player. But Baba is You, which is far more interesting to actively play, is almost unwatchable; all you can see from the outside is a little sheep-rabbit thing running in circles for thirty minutes, until suddenly the puzzle is solved. All the interesting parts are happening in the player's head in the act of play, not on the screen.
I think chatbot outputs make for bad passive reading for a similar reason. They're not trying to please a passive observer, they're trying to engage the user they're currently speaking with. I've had some conversations with bots that I thought were incredibly insightful and entertaining, but I also suspect that if I shared any of them here they'd look, to you, like just more slop. And other peoples' "insightful and entertaining" LLM conversations look like slop to me, too. So it might be more useful to model these outputs as more like a Let's Play: even if the game is interesting to both of us, I might still not find watching your run as valuable as having my own. And making the chatbot 'game' more fun doesn't necessarily make the outputs into better blogposts, either, any more than filling Baba is You with cutscenes and particle effects would make it a better puzzle game.
Three, even still... this was the one of the best things I read in 2024, if not the best. You might think this doesn't count toward your question, for any number of reasons. It's not exactly a blog post, and it's specifically playing to the strengths of AI-generated content in ways that don't generalize to other kinds of writing. It's deliberately using plausible hallucinations, for example, as part of its aesthetic... which you probably can't do if you want your LLM blogger to stay grounded in reality. But it is, so far as I know, 100% AI. And I loved it - I must've read it four or five times by now. You might have different tastes, or higher standards, than I do. To my (idiosyncratic) taste, though, this very much passes the bar for 'extremely good' writing. Is this missing any capabilities necessary for 'actually worth reading', in your view, or is this just an outlier?
Three things I notice about your question:
One, writing a good blog post is not the same task as running a good blog. The latter is much longer-horizon, and the quality of the blog posts (subjectively, from the human perspective) depends on it in important ways. Much of the interest value of Slate Star Codex, or the Sequences - for me, at least - was in the sense of the blogger's ideas gradually expanding and clarifying themselves over time. The dense, hyperlinked network of posts referring back to previous ideas across months or years is something I doubt current LLM instances have the 'lifespan' to replicate. How long would an LLM blogger who posted once a day be able to remember what they'd already written, even with a 200k token context window? A month, maybe? You could mitigate this with a human checking over the outputs and consciously managing the context, but then it's not a fully LLM blogger anymore - it's just AI writing scaffolded by human ideas, which people are already doing.
The same is maybe true of a forum poster or commenter, though the expectation that the ideas add up to a coherent worldview is much less strict. I'm not sure why there aren't more of these. Maybe because when people want to know Claude's opinion on such-and-such post, they can just paste it into a new instance to ask the model directly?
Two, the bad posting quality might not just be a limitation of the HHH-assistant paradigm, but of chatbot structures in general. What I mean by this is that, even setting aside ChatGPT's particular brand of dull gooey harmlessness, conversational skill is a different optimization target than medium- or long-form writing, and it's not obvious to me that they inherently correlate. Take video games as an example. There are games that are good at being passive entertainment, and there are games that are very engaging to play, but it's hard to optimize for both of these at once. The best games to watch someone else play are usually walking sims, where the player is almost entirely passive. These tend to do well on YouTube and Twitch (Mouthwashing is the most recent example I can think of), since very little is lost by taking control away from the player. But Baba is You, which is far more interesting to actively play, is almost unwatchable; all you can see from the outside is a little sheep-rabbit thing running in circles for thirty minutes, until suddenly the puzzle is solved. All the interesting parts are happening in the player's head in the act of play, not on the screen.
I think chatbot outputs make for bad passive reading for a similar reason. They're not trying to please a passive observer, they're trying to engage the user they're currently speaking with. I've had some conversations with bots that I thought were incredibly insightful and entertaining, but I also suspect that if I shared any of them here they'd look, to you, like just more slop. And other peoples' "insightful and entertaining" LLM conversations look like slop to me, too. So it might be more useful to model these outputs as more like a Let's Play: even if the game is interesting to both of us, I might still not find watching your run as valuable as having my own. And making the chatbot 'game' more fun doesn't necessarily make the outputs into better blogposts, either, any more than filling Baba is You with cutscenes and particle effects would make it a better puzzle game.
Three, even still... this was the one of the best things I read in 2024, if not the best. You might think this doesn't count toward your question, for any number of reasons. It's not exactly a blog post, and it's specifically playing to the strengths of AI-generated content in ways that don't generalize to other kinds of writing. It's deliberately using plausible hallucinations, for example, as part of its aesthetic... which you probably can't do if you want your LLM blogger to stay grounded in reality. But it is, so far as I know, 100% AI. And I loved it - I must've read it four or five times by now. You might have different tastes, or higher standards, than I do. To my (idiosyncratic) taste, though, this very much passes the bar for 'extremely good' writing. Is this missing any capabilities necessary for 'actually worth reading', in your view, or is this just an outlier?