burrito — LessWrong

LESSWRONG
LW

burrito — LessWrong

Replying toThe case for more ambitious language model evals

The case for more ambitious language model evals

Maybe a bit of a nitpick, but RLHF'd GPT-4o can still detect Eric Drexler's writing (chat link). I gave it the first paragraph of his latest blog post, which was written in February 2024, past 4o's knowledge cutoff date of October 2023. In general I'm not sure if RLHF actually makes the models worse at truesight. It would be interesting to see a benchmark comparing e.g. Llama base vs instruct on this capability.

Replying toAll AGI Safety questions welcome (especially basic ones) [April 2023]

burrito3y

All AGI Safety questions welcome (especially basic ones) [April 2023]

Thanks, this is exactly the kind of thing I was looking for.

Replying toAll AGI Safety questions welcome (especially basic ones) [April 2023]

burrito3y

All AGI Safety questions welcome (especially basic ones) [April 2023]

Thanks for the reply.

GPT-4 is far below village idiot level at most things a village idiot uses their brain for, despite surpassing humans at next-token prediction.

Could you give some examples? I take it that what Eliezer meant by village-idiot intelligence is less "specifically does everything a village idiot can do" and more "is as generally intelligent as a village idiot". I feel like the list of things GPT-4 can do that a village idiot can't would look much more indicative of general intelligence than the list of things a village idiot can do that GPT-4 can't. (As opposed to AlphaZero, where the extent of the list is "can play some board games... (read more)

Replying toAll AGI Safety questions welcome (especially basic ones) [April 2023]

burrito3y

All AGI Safety questions welcome (especially basic ones) [April 2023]

In My Childhood Role Model, Eliezer Yudkowsky says that the difference in intelligence between a village idiot and Einstein is tiny relative to the difference between a chimp and a village idiot.This seems to imply (I could be misreading) that {the time between the first AI with chimp intelligence and the first AI with village idiot intelligence} will be much larger than {the time between the first AI with village idiot intelligence and the first AI with Einstein intelligence}. If we consider GPT-2 to be roughly chimp-level, and GPT-4 to be above village idiot level, then it seems like this would predict that we'll get an Einstein-level AI within at least the... (read more)

Replying toThe Best Software For Every Need

burrito4y

The Best Software For Every Need

Strongly agree. As a relative beginner I've found the automatic code completion and method listing/descriptions incredibly useful.

burrito5y

Responding to 3):

What is your standing to judge your imagined version of someone's experience? Maybe your preferences are different enough from the subject's that you're simply wrong in your comparison.

You're right and I should have said "imagine the point at which they are indifferent", "would they prefer the 10x experience or the 1x experience", etc. Imagining whether I would prefer it could be a decent approximation of their preferences, though.

burrito5y

Responding to 2):

It's likely that some experiences are non-linear in utility per intensity.

In the post, I defined intensity as linearly proportional to utility. If you think wording it as "intensity" is misleading because what we generally think of as "experience intensity" isn't linearly proportional to utility, then I agree, but can't think of a better term to use.

Or that you'd have to crank up some parts of the experience and not others. For instance, enjoying the contrast of bitter and fruity in a shot of espresso - there's no way to scale the whole thing up 10x, you have to pick and choose what to intensify, and then your result is subject

... (read 468 more words →)

burrito5y

First, I want to make sure we're separating the validity of the model itself from concerns with applying it. I'll try to be clear about which one I'm talking about for each part.

I'll respond to each number in a separate reply because the format of the conversation will be a mess otherwise. Starting with 1):

Where do you get this list?

Could you be more specific? Is this question centered around how I know what other people are thinking, or how to separate a whole experience into individual experiences ?

And how do you account for future unexpected experiences?

I don't know how to respond to this. What part of the model depends on accounting for unexpected future experiences? If you're asking generally how I would predict future experiences, I don't have a good answer, but this seems both separate from the philosophical model itself and not an objection to the application of this specific philosophical model (it applies to all experience-based consequentialism).

burrito5yQuick Take

Vague idea for how to theoretically decide whether a mind's existence at any given moment is net positive, from a utilitarian standpoint:

Get a list of every individual conscious experience the mind is having at the exact moment.
For each individual experience, crank up its "intensity" (i.e. magnitude of utility) by a factor of, say, 10; as intense as you can easily imagine and empathize with, but no more. This can be approximated by imagining the point at which you are indifferent to experiencing the "1x" experience for 10 seconds or the "10x" experience for 1 second. Try to imagine this independently of mental side effects caused by experiencing it for a long time.
Now

... (read more)

burrito5yQuick Take

Intentionally rationalizing against your beliefs could be a good strategy for doing a cost-benefit analysis. For example, if you currently support increasing the minimum wage, imagine yourself as someone who is against it, and from that perspective come up with as many disadvantages to it as possible. I'm sure I'm not the first to come up with this idea but I haven't seen it anywhere else; is there a name for this concept?

Short, Extreme, Forgotten Torture vs Death

burrito

Turns out Pascal's mugger is real, and as would be expected of someone who does Pascal's muggings, he's a jerk and likes forcing people to make impossible decisions. Also, his threats are discovered to be truthful and credible. He decides he's sick of mugging after collecting a few trillion dollars from it and wants to try something new. He takes out a gun (killing people with Matrix powers is for cowards) and forces you to make a choice.

Scenario 1: He puts the gun to your head. "I will kill you unless you let me put you through torture 3^^^3 [1] times more intense than anything you can possibly imagine. Don't worry though,... (read 171 more words →)

burrito's Shortform

burrito

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Speculative Model For How Moral Arguments Work

Credibility warning: All of this is wild post-hoc speculation based on my vague intuitions. Don't read it as if it has any semblance of authority. Feel free to bring up actual evidence if it confirms or denies my speculations, though. Also, this is my first post on LW so please point out if I violated any conventions, norms, etc.

Readability warning: This post was not very carefully edited, so the clarity, grammar, formatting, etc. might be a disaster, and there's a good chance it's nearly unreadable at times. Feel free to ask for clarification.

Word abuse warning: I might have unintentionally equivocated the meaning of "moral" between "not... (read 1093 more words →)