LESSWRONG
LW

All of pseud's Comments + Replies

Claude 3 claims it's conscious, doesn't want to die or be modified

I agree there's nothing about consciousness specifically, but it's quite different to the hidden prompt used for GPT-4 Turbo in ways which are relevant. Claude is told to act like a person, GPT is told that it's a large language model. But I do now agree that there's more to it than that (i.e., RLHF).

Claude 3 claims it's conscious, doesn't want to die or be modified

pseud1y50

It's possibly just matter of how it's prompted (the hidden system prompt). I've seen similar responses from GPT-4 based chatbots.

mic1y144

Here is Claude 3's system prompt. There's nothing about consciousness specifically.

The World in 2029

pseud1y63

The cited markets often don't support the associated claim.

1Nathan Young1y

I added a note at the top Oh interesting, yeah I was more linking them as spaces to disagree get opinions. Originally I didn't put my own values at all but that felt worse. What would you recommend?

UFO Betting: Put Up or Shut Up

pseud2y00

"This question will resolve in the negative to the dollar amount awarded"

This is a clear, unambiguous statement.

If we can't agree even on that, we have little hope of reaching any kind of satisfying conclusion here.

Further, if you're going to accuse me of making things up (I think this is, in this case, a violation of the sensible frontpage commenting guideline "If you disagree, try getting curious about what your partner is thinking") then I doubt it's worth it to continue this conversation.

UFO Betting: Put Up or Shut Up

[+]pseud2y-5-6

1ChristianKl2y

That's written nowhere in the resolution criteria and something you made up yourself. As written both #1 and #3 apply. I think reading the phrase "resolve in the negative to the dollar" as being about subtraction is a reasonable reading. I don't think a headline should be seen as "the actual question". I think it makes more sense to see the resolution criteria as the actual question. You seem to have different intuitions of how the question should be resolved then the Metaculus team or I myself have. It's generally shouldn't be surprising that different people have different intuitions.

UFO Betting: Put Up or Shut Up

pseud2y0-4

Metaculus questions have a good track record of being resolved in a fair matter.

Do they? My experience has been the opposite. E.g. admins resolved "[Short Fuse] How much money will be awarded to Johnny Depp in his defamation suit against his ex-wife Amber Heard?" in an absurd manner* and refused to correct it when I followed up on it.

*they resolved it to something other than the amount awarded to Depp despite thatamount being the answer to the question and the correct resolution according to the resolution criteria

6ChristianKl2y

The resolution criteria does have the sentence "In the event that this trial results in a monetary award for Amber Heard, including legal fees or other penalties imposed by a court, this question will resolve in the negative to the dollar amount awarded Amber Heard." It seems that after the judge's decision, there's 10,350,000 for Depp and 2,000,000 for Amber. To me, that sentence reads like i's reasonable to do 10,350,000 - 2,000,000 = 8,350,000

Moderation notes re: recent Said/Duncan threads

pseud2y-10

My comment wasn't well written, I shouldn't have used the word "complaining" in reference to what Said was doing. To clarify:

As I see it, there are two separate claims:

That the complaints prove that Said has misbehaved (at least a little bit)
That the complaints increase the probability that Said has misbehaved

Said was just asking questions - but baked into his questions is the idea of the significance of the complaints, and this significance seems to be tied to claim 1.

Jefftk seems to be speaking about claim 2. So, his comment doesn't seem like a direct response to Said's comment, although the point is still a relevant one.

Moderation notes re: recent Said/Duncan threads

pseud2y9-2

It didn't seem like Said was complaining about the reports being seen as evidence that it is worth figuring out whether thing could be better. Rather, he was complaining about them being used as evidence that things could be better.

2philh2y

If we speak precisely... in what way would they be the former without being the latter? Like, if I now think it's more worth figuring out whether things could be better, presumably that's because I now think it's more likely that things could be better? (I suppose I could also now think the amount-they-could-be-better, conditional on them being able to be better, is higher; but the probability that they could be better is unchanged. Or I could think that we're currently acting under the assumption that things could be better, I now think that's less likely so more worth figuring out whether the assumption is wrong. Neither seems like they fit in this case.) Separately, I think my model of Said would say that he was not complaining, he was merely asking questions (perhaps to try to decide whether there was something to complain about, though "complain" has connotations there that my model of Said would object to). So, if you think the mods are doing something that you think they shouldn't be, you should probably feel free to say that (though I think there are better and worse ways to do so). But if you think Said thinks the mods are doing something that Said thinks they shouldn't be... idk, it feels against-the-spirit-of-Said to try to infer that from his comment? Like you're doing the interpretive labor that he specifically wants people not to do.

Eliezer Yudkowsky’s Letter in Time Magazine

pseud2y54

It's probably worth noting that Yudkowsky did not really make the argument for AI risk in his article. He says that AI will literally kill everyone on Earth, and he gives an example of how it might do so, but he doesn't present a compelling argument for why it would.[0] He does not even mention orthogonality or instrumental convergence. I find it hard to blame these various internet figures who were unconvinced about AI risk upon reading the article.

[0] He does quote “the AI does not love you, nor does it hate you, and you are made of atoms it can use for something else.”

1Gesild Muka2y

The way I took it the article was meant to bring people to the table regarding AI risk so there was a tradeoff between keeping the message simple and clear and relaying the best arguments. Even though orthogonality and instrumental convergence are important theories, in this context he probably didn't want to risk the average reader being put off by technical sounding jargon and losing interest. There could be an entire website in a similar vein to LessWrong about conveying difficult messages to a culture not attuned to the technical aspects involved.

Open & Welcome Thread — March 2023

pseud2y20

I'd prefer my comments to be judged simply by their content rather than have people's interpretation coloured by some badge. Presumably, the change is a part of trying to avoid death-by-pacifism, during an influx of users post-ChatGPT. I don't disagree with the motivation behind the change, I just dislike the change itself. I don't like being a second-class citizen. It's unfun. Karma is fun, "this user is below an arbitrary karma threshold" badges are not.

A badge placed on all new users for a set time would be fair. A badge placed on users with more than a certain amount of Karma could be fun. Current badge seems unfun - but perhaps I'm alone in thinking this.

Open & Welcome Thread — March 2023

pseud2y10

Anybody else think it's dumb to have new user leaves beside users who have been here for years? I'm not a new user. It doesn't feel so nice to have a "this guy might not know what he's talking about" badge by my name.

Like, there's a good chance I'll never pass 100 karma, or whatever the threshold is. So I'll just have these leaves by my name forever?

4ChristianKl2y

The way to demonstrate that you know what you are talking about is to write content that other users upvote. Registration date doesn't tell us much about whether a user knows what they are talking about.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

pseud2y30

To be clear, that it more-likely-than-not would want to kill everyone is the article's central assertion. "[Most likely] literally everyone on Earth will die" is the key point. Yes, he doesn't present a convincing argument for it, and that is my point.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

pseud2y111

The point isn't that I'm unaware of the orthogonality thesis, it's that Yudkowsky doesn't present it in his recent popular articles and podcast appearances[0]. So, he asserts that the creation of superhuman AGI will almost certainly lead to human extinction (until massive amounts of alignment research has been successfully carried out), but he doesn't present an argument for why that is the case. Why doesn't he? Is it because he thinks normies cannot comprehend the argument? Is this not a black pill? IIRC he did assert that superhuman AGI would likely deci... (read more)

2CronoDAS2y

Yeah, the letter on Time Magazine's website doesn't argue very hard that superintelligent AI would want to kill everyone, only that it could kill everyone - and what it would actually take to implement "then don't make one".

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

pseud2y*2-11

Yud keeps asserting the near-certainty of human extinction if superhuman AGI is developed before we do a massive amount of work on alignment. But he never provides anything close to a justification for this belief. That makes his podcast appearances and articles unconvincing - a most surprising, and crucial part of his argument is left unsupported. Why has he made the decision to present his argument this way? Does he think there is no normie-friendly argument for the near-certainty of extinction? If so, it's kind of a black pill with regard to his argumen... (read more)

1TAG2y

Not in his popular writing. Or he has gone over the ground so much, it seems obvious to him. But part of effective communication is realising that's what's obvious to you may need to be spelt out to others.

5CronoDAS2y

The basic claims that lead to that conclusion are 1. Orthogonality Thesis: how "smart" an AI is has (almost) no relationship to what it's goals are. It might seem stupid to a human to want to maximize the number of paperclips in the universe, but there's nothing "in principle" that prevents an AI from being superhumanly good at achieving goals in the real world while still having a goal that people would think is as stupid and pointless as turning the universe into paperclips. 2. Instrumental Convergence: there are some things that are very useful for achieving almost any goal in the real world, so most possible AIs that are good at achieving things in the real world would try to do them. For example, self-preservation: it's a lot harder to achieve a goal if you're turned off, blown up, or if you stop trying to achieve it because you let people reprogram you and change what your goals are. "Aquire power and resources" is another such goal. As Eliezer has said, "the AI does not love you, nor does it hate you, but you are made from atoms it can use for something else." 3. Complexity of Value: human values are complicated, and messing up one small aspect can result in a universe that's stupid and pointless. One of the oldest SF dystopias ends with robots designed "to serve and obey and guard men from harm" taking away almost all human freedom (for their own safety) and taking over every task humans used to do, leaving people with nothing to do except sit "with folded hands." (Oh, and humans who resist are given brain surgery to make them stop wanting to resist.) An AI that's really good at achieving arbitrary real-world goals is like a literal genie: prone to giving you exactly what you asked for and exactly what you didn't want. 4. Right now, current machine learning methods are completely incapable of addressing any of these problems, and they actually do tend to produce "perverse" solutions to problems we give them. If we used them to make an AI that was sup

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

pseud2y20

Why not ask him for his reasoning, then evaluate it? If a person thinks there's 10% x-risk over the next 100 years if we don't develop superhuman AGI, and only a 1% x-risk if we do, then he'd suggest that anybody in favour of pausing AI progress was taking "unacceptable risks for the whole of himanity".

1Chris van Merwijk2y

The reasoning was given in the comment prior to it, that we want fast progress in order to get to immortality sooner.

There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs

pseud2y813

I don't like it. "The problem of creating AI that is superhuman at chess" isn't encapsulated in the word "chess", so you shouldn't say you "solved chess" if what you mean is that you created an AI that is superhuman at chess. What it means for a game to be solved is widely-known and well-developed[0]. Using the exact same word, in extremely similar context, to mean something else seems unnecessarily confusing.

[0] See https://en.wikipedia.org/wiki/Solved_game

9Taran2y

Like I said, I feel like I hear it a lot, and in practice I don't think it's confusing because the games that get solved by theorists and the games that get "solved" by AIs are in such vastly different complexity regimes. Like, if you heard that Arimaa had been solved, you'd immediately know which sense was meant, right? Having said that, the voters clearly disagree and I'm not that attached to it, so I'm going to rename the post. Can you think of a single adjective or short phrase that captures the quality that chess has, and Starcraft doesn't, WRT AI? That's really what I want people to take away. If I can't think of anything better I expect I'll go with "There are (probably) no superhuman Go AIs yet: ...".

There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs

pseud2y2025

Nit: that's not what "solved" means. Superhuman ability =/= solved.

5Taran2y

In a game context you're right, of course. But I often hear AI people casually say things like "chess is solved", meaning something like "we solved the problem of getting AIs to be superhumanly good at chess" (example). For now I think we have to stop saying that about go, and instead talk about it more like how we talk about Starcraft.

What fact that you know is true but most people aren't ready to accept it?

pseud2y10

My thoughts:

There is no reason to live in fear of the Christian God or any other traditional gods. However, there is perhaps a reason to live in fear of some identical things:

We live in a simulation run by a Christian, Muslim, Jew, etc., and he has decided to make his religion true in his simulation. There are a lot of religious people - if people or organisations gain the ability to run such simulations, there's a good chance that many of these organisations will be religious, and their simulations influenced by this fact.

And the followi... (read more)

What fact that you know is true but most people aren't ready to accept it?

pseud2y30

How do we know there is no afterlife? I think there's a chance there is.

Some examples of situations in which there is an afterlife:

We live in a simulation and whatever is running the simulation decided to set up an afterlife of some kind. Could be a collection of its favourite agents, or a reward for its best behaved ones, or etc.
We do not live in a simulation, but after the technological singularity an AI is able to reconstruct humans and decides to place them in a simulated world or to re-embody them
Various possibilities far beyond our curren

... (read more)

2CronoDAS2y

Let me amend my statement: the afterlives as described by the world's major religions almost certainly do not exist, and it is foolish to act as though they do. As for other possibilities, I can address them with the "which God" objection to Pascal's Wager; I have no evidence about how or if my actions while alive affect whatever supernatural afterlife I may or may not experience after death, so I shouldn't base my actions today on the possibility.

Did ChatGPT just gaslight me?

pseud2y21

I think that's where these companies' AI safety budgets go: make sure the AI doesn't state obvious truths about the wrong things / represent the actually popular opinions on the wrong things.

Did ChatGPT just gaslight me?

[+]pseud2y-64

0andrew sauer2y

"Race is nonsensical" is a strong statement, but racial boundaries are indeed quite arbitrary and it is true that genetic variation is greater within racial groups than between them

2pseud2y

I think that's where these companies' AI safety budgets go: make sure the AI doesn't state obvious truths about the wrong things / represent the actually popular opinions on the wrong things.

What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

pseud3y32

Why are people disagreeing with this statement?

4JBlack3y

In isolation, it's technically correct. In the context of being a direct reply to the post, it's suggesting that "solve alignment" is something that GPT-4 could plausibly do. I certainly disagree with that and voted disagreement accordingly.

pseud3y10

I would gladly suffer a hundred years of pain if it was the only way for me to live one more good day. I think a world where a thousand suffer but one lives a good life is vastly superior to a world in which only ten suffer but none live a good life. Good is a positive quality. But suffering is a zero quality. The absence of a thing, rather than the negative form. So, no matter how much suffering there is, it never offsets even the smallest amount of good.

This is a view that came naturally to me, but it isn't a view I've noticed others share.

The experience... (read more)

What's up with the bad Meta projects?

pseud3y21

I read your post but I thought it was more about aesthetics than technology.

Horizon Worlds is a program where users can make their own environments and can make aesthetic decisions for themselves.

What's up with the bad Meta projects?

pseud3y10

Yes, I think the graphics are quite simple. I think your explanation relating to current limitations of VR is enough to address a lot of the OP's confusion/questioning. It's not that Meta is purposefully trying to look bad; they're just sharing the state of the art honestly. It's also worth noting that this seen by Zuckerberg as a temporary state, and his goal very much seems to be photorealism. If you listen to his recent interview with Lex Fridman you'll hear him bring up photorealism again and again.

Why is Meta sharing their work now i... (read more)

3Yitz3y

See my response here; this is very much not state-of-the-art, and hasn't been for a fairly long time. As many on Twitter have pointed out, Second Life, which was released in 2003, is so ahead of Meta in terms of both looks and feel that the concept of them getting a "head start" against competition through this seems implausible. Gwern also mentioned VRchat in the comments, which while it is facing difficulty with moderation/censorship (whichever you'd prefer to call it lol), it is also an obviously superior product. If I were an investor, I'd be more worried after seeing what Meta is publicly releasing than what I'd be under the counterfactual where Meta hadn't publicized these products yet.

What's up with the bad Meta projects?

pseud3y3-30

I don't think the screenshot looks that bad. Netizens love to be irrationally extremely negative about Zuck, and it's possible you have been swept up in this.

4Lone Pine3y

From a 3D graphics perspective, it's very similar to the Wii. The Wii was an underpowered console and so its graphics were simple. I assume Horizon's graphics are simple for a similar reason, because of the limitations of VR (need high framerate, low latency, consumer hardware needs to be reasonably priced, etc). However, the Wii had a lot more charm, IMO. I think people are also negative about digital minimalism because how pervasive it is the advertising of big tech companies. See "Corporate Memphis". Corporations prefer minimalism since it is cross-culturally inoffensive -- except among those who have learned to associate it with companies they don't like.

7mukashi3y

Sir, we seem to have a very different taste I am afraid

We will be around in 30 years

pseud3y20

Yes, I can think of several reasons why someone might downvote the OP. What I should have said is "I'm not sure why you'd think this post would be downvoted on account of the stance you take on the dangers of AGI."

We will be around in 30 years

pseud3y52

Not sure why you'd think this post would be downvoted. I suspect most people are more than welcoming of dissenting views on this topic. I have seen comments with normal upvotes as well as agree/disagree votes, I'm not sure if there's a way for you to enable them on your post.

1Lone Pine3y

For what it's worth, I feel welcomed here despite my perennial optimism.

0Vanilla_cabs3y

I downvoted this post for the lack of arguments (besides the main argument from incredulity).

5DirectedEvolution3y

There’s a cohort that downvotes posts they think are wrong, and also a cohort that downvotes posts they think are infohazards. This post strikes me as one that these two cohorts might both choose to downvote, which doesn’t mean that it is wrong or that it is an infohazard.

What journaling prompts do you use?

Answer by pseudJun 06, 202230

Sometimes I like to envisage conversations I am likely to have over the coming day or two. I think about what I am hoping to get out of a conversation. I think about what the other people involved in the conversation will be hoping to get out of it. I think about questions I will want to ask, and I think about questions which I am likely to be asked. Etc., etc.

I write down various notes: topics of conversation, questions I want to ask, stories I want to tell, answers I may give, etc.

Give it a google

pseud4y10

I think a quick web-search is useful. Having read something is an improvement over having no knowledge, and it's ridiculous that people don't do a quick web-search more often. I'm not disagreeing with your point that Googling is better than doing nothing to learn at all.

My first comment just pointed out that what you learn may be quite inaccurate or out-of-date.

Now, I'll go further and suggest that what you learn may be purposefully misleading. When it comes to politically or financially sensitive topics (and a searcher won't always reali... (read more)

Give it a google

pseud4y20

I assume you mean by "80/20 answer" that betting between half and full pot will be the correct sizing approximately 80% of the time one bets. I think the actual percentage is significantly lower than 80%.

2Adam Zerner4y

Maybe, but "incorrect" is a spectrum. Sometimes it's a close second. Especially for someone who is googling for "basics of poker strategy".

Give it a google

pseud4y10

In general, bet sizing is an incredibly complex topic, but all you have to know is to size your bet between 1/2 and the full size of the pot.

This isn't correct. There are frequently occurring situations in NLHE where betting much less than half the pot or much greater than the full pot is the correct move. This is true both by theoretically-optimal strategy[1] and by practically-optimal strategy[2].

This can be used as a data-point when considering the epistemic status of things learned while doing a "quick Google".

[1] i.e. the Nash equilibrium strategy

[2] i.e. the strategy that makes the most money against real, human opponents

2Adam Zerner4y

Yes, but I think "bet half to full pot" is the 80/20 answer, and the point of "give it a google" is often to get that 80/20 answer.