Raemon — LessWrong

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Thanks for writing this up! It was nice to get an outside perspective.

"Why no in-between?"
Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?

Part of the point here is, sure, there'd totally be a period where the AI might be able to kill us but we might win. But, in those cases, it's most likely better for the AI to wait, and it will know that it's better to wait, until it gets more powerful.

(A counterargument here is "an AI might want to launch a pre-emptive strike before other more powerful AIs show up", which could happen. But, if we win that war, we're still left with "the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI", and we still have to solve the same problems.)

Curated. Often, when someone proposes "a typology" for something, it feels a bit, like, okay, you could typologize it that way but does that actually help?

But, I felt like this carving was fairly natural, and seemed to be trying to be exhaustive, and even if it missed some things it seemed like a reasonable framework to fit more possible-causes into.

I felt like I learned things thinking about each plausible way that CoT might evolve. (i.e. thinking about what laws-of-language might affect LLMs naturally improving the efficiency of the CoT for problem solving, how we might tell the difference between meaningless spandrels and sort-of-meaningful filler words).

I'd like it if you just taboo'd all abstractions and value judgments and just describe the physical situations and consequences you expect to see in the world.

This still feels like someone who has some kind of opinion about an abstract level that I still haven't been persuaded is a useful abstraction. We can tap out of the convo here, but, like, the last few rounds felt like you were repeating the same things without actually engaging with my cruxes.

(like, when you talking about "assuming you have control..." can you describe that the sort of naturalist way Logan Strohl would probably describe it, and like what predictions you make about what physically will happen to people doing that thing vs other nearby things?)

Okay yeah those are all posts that won Best of LessWrong. We generate like 8 AI descriptions, and then a LW teammate goes through, picks the best starting one, and then fine-tunes it to create the spotlights you see at the top of the page. (Sometimes this involves mostly rewriting it, sometimes we end up mostly sticking with the existing one).

I'm honestly not really happy with describing the author in the third person in spotlight either, I think we should just try to find a different way of accomplishing the goal there (which I think is to avoid "I" speak which also feels jarring in the summaries)

"The autopsy of Jane Doe" is decent rationalist horror. It is worth watching without spoilers, but, here is my review of it anyway.

(A reason I am so hardcore about spoilers is that I find little subtle delight in things like "figure out what kind of movie this even is." The opening scenes do a good job of mood setting and giving you a slow drip of bits of what-kind-of-horror-movie this is. Here is your last saving throw for maybe watching the movie)

...

...

In some sense it's kinda like a "horror Doctor House episode".

The core thread (really the only thread) is about an old coroner and his son who is also a corner in the old-family-coroner-business.

The police deliver a mysterious corpse of a woman who has no external skin damage and confusing combinations of "symptoms" (I'm not sure what you call it when they're already dead).

Early on, the son is jumping to conclusions, and the dad is like "boy, we have not even finished looking at the *external* evidence let alone opened her up, chill out on the conclusions until we have all the evidence."

They start their investigation. It gets weirder.

Horror-y stuff eventually starts happening, and I think they do a decent job of having the characters update towards "something really fucking weird is happening" at a reasonable pace, given reasonable reasoning and priors.

...

Here is your second saving throw for watching the rest of the movie. *another* reason I like this movie unspoiled is that, it it's own fucked up way, it's doing at least a decent job of cleaving to Dr. House Mystery Format where evidence is coming in, and even as a genre-savvy viewer there are degrees of freedom of what sort of thing is going on and I found it fun to try to figure it out.

(I'm mostly not going to spoil that here because it's not a very important part of the movie review, in some sense, but, will get into some final details)

...

...

...

Eventually Horror Shit starts to go down, and the characters are freaking out dealing with a bunch of horror stuff.

But, my favorite part of the movie is when it commits to the bit, where the the characters say "okay, we still haven't even finished the autopsy. We still don't know what's going on. We autopsied our way into this situation and by damn we are going to autopsy our way out of it."

It was a nice commitment to both the specific "this is an autopsy horror drama" and the somewhat more general "this is rationalist horror, where figuring out what's going on, and figuring out how to deal with the fact that what's going on is *confusing* and doesn't fit any of your original frames.

Also, when they started piecing the final bits together, there was an obvious conclusion to reach that I felt annoyed by because it was wrong (based on real world knowledge), and then characters were like "no, that conclusion is wrong based on our real world knowledge, a _different_ thing must be going on instead."

My main complaint is that, after all his cautioning the son about jumping to conclusions, the dad... does immediately fixate on the first hypothesis that seems to fit the data at all, and double down on it... but, okay, I will cop to that being a fair take on rationalist horror too. :/

The way I happened to go about building it made it easier to build for posts, but, seems good to have it for both.

I think there's something cool about having llm-assistance to help keep track of sprawling comment threads and not miss points.

Can you give specific examples?

One thing this might possibly be is that there is a secret field for "custom highlights" for a post that admins can manually create, which basically only I-in-particular have ever used (although I might have set it so that Best of LessWrong posts use the description from their spotlight item?)

I'm working on (currently admin-only) features for having LLMs do fairly common classes of "suggest edits for your post."

This is... a tool that I think is totally quite useful if you are using it responsibly, but I would not trust most people to use it responsibly.

The things it currently does that I expect find straightforwadly useful useful are formulaic things like:

Flag places where I accidentally forgot to finish a sentence so it doesn't
Flag run-on-sentences and suggest rewrites that make them more "right-branching" (i.e. it wraps up each clause as early as possible while preserving the semantic meaning).
Flagging typos

Things that require more judgment, and might not work well yet but probably will improve over time:

Write the critical lesswrong response comments that you'd expect
Identify places where you make a factual claim, and do an epistemic spot check on that claim.
Maybe check against specific common reasoning fallacies that I expect myself to run into.
I don't write posts with math or code in them, but if it I did, sanity checking those seems pretty straightforward.

A lot of other ideas get dicier because LLMs in fact don't have good enough judgment, and I'm worried about LLMs making me lazy and stupider, and even more worried about it making other people lazier and stupid and worse writers. (i.e. I don't want to undermine the "no LLM content" policy, which is there for a reason. I do want LW to be a place that

My current guess is it'll eventually be correct to ship "whitelisted" LLM post-edit-prompts to users (with heuristics like "this is a tool for helping a writer think and notice errors, that isn't likely to slowly sap away agency").

I think it might be correct to ship "custom" LLM post-edit queries to users with like 5000 karma or you've gotten into Best of LW twice or something. (maybe with a mechanism for prompts that end up getting re-used can eventually graduate to whitelisted public prompts).

Curious for random takes.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments