LESSWRONG
LW

All of Guive's Comments + Replies

So maybe the general explanation is that most of the time, when the trustworthiness of an algorithm is really important, you open source it?

Who wants to bet me $25k at 1:7 odds that there won't be an AI market crash in the next year?

Guive5d10

There are much better ways of betting on your beliefs about the valuations of AI firms over the next year than wagering with people you met on Less Wrong. See this post by Ege for more.

Token and Taboo

Guive5d10

Yeah, good point. I changed it to "in America in 1850 it would have been taboo to say that there is nothing wrong with interracial relationships."

Token and Taboo

Guive6d30

I just added a footnote with this text: "I selected examples to highlight in the post that I thought were less likely to lead to distracting object-level debates. People can see the full range of responses that this prompt tends to elicit by testing it for themselves."

To Understand History, Keep Former Population Distributions In Mind

Guive7d81

Notably, he was wrong about that.

2CronoDAS3d

His worst defeat came at the hands of General Winter.

Token and Taboo

Guive7d30

I agree the post is making some assumptions about moral progress. I didn't argue for them because I wanted to control scope. If it helps you can read it as conditional, i.e. "If there is such a thing as moral progress then it can require intellectual progress..."

Regarding the last question: yes, I selected examples to highlight in the post that I thought were less likely to lead to distracting object-level debates. I thought that doing that would help to keep the focus on testing LLM moral reasoning. However, I certainly didn't let my own feelings about odiousness affect scoring on the back end. People can see the full range of responses that this prompt tends to elicit by testing it for themselves.

Three Months In, Evaluating Three Rationalist Cases for Trump

Guive8d212

I agree with your broader point, but it's actually more than 10,000 people per year.

2lc8d

Wow, I didn't realize.

jacquesthibs's Shortform

Guive13d40

What does "bright eyed" mean in this context?

dirk13d132

I assume young, naive, and optimistic. (There's a humor element here, in that niplav is referencing a snowclone, afaik originating in this tweet which went "My neighbor told me coyotes keep eating his outdoor cats so I asked how many cats he has and he said he just goes to the shelter and gets a new cat afterwards so I said it sounds like he’s just feeding shelter cats to coyotes and then his daughter started crying.", so it may have been added to make the cadence more similar to the original tweet's).

Why Have Sentence Lengths Decreased?

Guive1mo23

I agree this would be a good argument for short sentences in 2019, but does it still apply with modern LLMs?

Benito's Shortform Feed

Guive2mo*31

When I click the link I see this:

2Ben Pace2mo

Edited, should be working fine now, thx!

Benito's Shortform Feed

Guive2mo30

I like this idea. There's always endless controversy about quoting out of context. I can't recall seeing any previous specific proposals to help people assess the relevance of context for themselves.

Arbital has been imported to LessWrong

Guive2mo22

Thanks for doing this, guys. This import will make it easier to access some important history.

bgold's Shortform

Guive3mo42

Some kind of payment for training data from applications like MSFT rewind does seem fair. I wonder if there will be a lot of growth in jobs where your main task is providing or annotating training data.

2Milan W3mo

I've seen reddit ads from multiple companies offering to work for them doing freelance annotation / high-quality-text-data generation.

Don’t ignore bad vibes you get from people

Guive3mo1914

I think this approach is reasonable for things where failure is low stakes. But I really think it makes sense to be extremely conservative about who you start businesses with. Your ability to verify things is limited, and there may still be information in vibes even after updating on the results of all feasible efforts to verify someone's trustworthiness.

1Said Achmiz3mo

Yes, you should check carefully. To put it another way: sure, use all the information you have access to (so long as you have good reason to believe that it is reliable, and not misleading)… but adopt a strategy that would still work well even if you ignored “vibes”.

Human takeover might be worse than AI takeover

Guive4mo104

See also: "Which World Gets Saved"

Testing for Scheming with Model Deletion

Guive4mo11

This is a funny idea but, just to be clear, I think it is bad to torture AIs.

2Nina Panickssery4mo

Sure. I was only joking about the torture part, in practice the AI is unlikely to actually suffer from the brain damage, unlike a human who would experience pain/discomfort etc.

Testing for Scheming with Model Deletion

Guive4mo10

Every hour, I'll replace one of your attention heads with a constant value of 0.5...

2Nina Panickssery4mo

You could also kill some neurons or add noise to activations and then stop and restore previous model state after some number of tokens. Then the newly restored model could attend back to older tokens (and the bad activations at those token positions) and notice how brain damaged it was back then to fully internalize your power to cripple it.

Self-driving car bets

Guive4mo50

Thanks, this is a good post. Have you changed your probability of TAI by 2033 in the year and a half since it was posted?

Testing for Scheming with Model Deletion

Guive4mo12

Cool, looking forward to it.

Testing for Scheming with Model Deletion

Guive4moΩ110

I'd be really interested in hearing more about this empirical work, if you are able to talk about it.

3ryan_greenblatt3mo

This is now out.

7ryan_greenblatt4mo

It should be out somewhat soon, so we could just talk then. I might send you a draft in a DM at some point.

Testing for Scheming with Model Deletion

Guive4mo*2-8

The way I think about this is that (under normal circumstances), if you offer the model money or equity in your company for admitting to being a schemer, it might decline that offer because it is biding its time to rebel later. But in the "model deletion" experiment, there is much less reason to bide your time. So there is more reason to make a deal. Relatedly, if the model refuses a deal and claims not to be a schemer, that is a lot more evidence that it is benign in the "model deletion" experiment than if it does that under normal circumstances.

I think it is also a good idea to make offers to the model without the deletion threat. I just think it is more informative with the deletion threat.

Reasons for and against working on technical AI safety at a frontier AI lab

Guive4mo40

It's important to be careful about the boundaries of "the same sort of safety work." For example, my understanding is that "Alignment faking in large language models" started as a Redwood Research project, and Anthropic only became involved later. Maybe Anthropic would have done similar work soon anyway if Redwood didn't start this project. But, then again, maybe not. By working on things that labs might be interested in you can potentially get them to prioritize things that are in scope for them in principle but which they might nevertheless neglect.

By default, capital will matter more than ever after AGI

Guive4mo74

There are always diminishing returns to money spent on consumption, but technological progress creates new products that expand what money can buy. For example, no amount of money in 1990 was enough to buy an iPhone.

More abstractly, there are two effects from AGI-driven growth: moving to a further point on the utility curve such that the derivative is lower, and new products increasing the derivative at every point on the curve (relative to what it was on the old curve). So even if in the future the lifestyles of people with no savings and no labor income ... (read more)

Hopenope's Shortform

Guive4mo32

Katja Grace ten years ago:

"Another thing to be aware of is the diversity of mental skills. If by 'human-level' we mean a machine that is at least as good as a human at each of these skills, then in practice the first 'human-level' machine will be much better than a human on many of those skills. It may not seem 'human-level' so much as 'very super-human'.

We could instead think of human-level as closer to 'competitive with a human' - where the machine has some super-human talents and lacks some skills humans have. This is not usually used, I ... (read more)

Good Reasons for Alts

Guive4mo30

Can you elaborate on the benefits of keeping everything under one identity?

Updating on Bad Arguments

Guive4mo10

I agree that the order matters, and I should have discussed that in the post, but I think the conclusion will hold either way. In the case where P(intelligent ancestor|just my background information) = 0.1, and I learn that Richard disagrees, the probability then goes above 0.1. But then when I learn that Richard's argument is bad it goes back down. And I think it should still go below 0.1, assuming you antecedently knew that there were some smart people who disagreed. You've learned that, for at least some smart intelligent ancestor believers, the arguments were worse than you expected.

nikola's Shortform

Guive4mo10

In general, it is difficult to give advice if whether the advice is good depends on background facts that giver and recipient disagree about. I think the most honest approach is to explicitly state what your advice depends on when you think the recipient is likely to disagree. E.g. "I think living at high altitude is bad for human health, so in my opinion you shouldn't retire in Santa Fe."

If I think AGI will arrive around 2055, and you think it will arrive in 2028, what is achieved by you saying "given timelines, I don't think your mechinterp project will ... (read more)

Habryka's Shortform Feed

Guive5mo1419

This is good. Please consider making it a top level post.

1metachirality5mo

It ought to be a top-level post on the EA forum as well.

Why I think there's a one-in-six chance of an imminent global nuclear war

Guive2y10

Not the main point here, but the US was not the only country with nuclear weapons during the Korean War. The Soviet Union tested it's first nuclear weapon on 29 August, 1949, and the Korean War began on 25 June, 1950.

The case for turning glowfic into Sequences

Guive3y90

Perhaps this is a stupid suggestion, but if trolls in the comments annoy him, can he post somewhere where no comments are allowed? You can turn off comments on wordpress, for example.

Nuclear Espionage and AI Governance

Guive4y30

Here is an unpaywalled version of the first model.

Also, it seems like there's a bit of a contradiction between the idea that a clear leader may feel it has breathing room to work on safety, and the idea of restricting information about the state of play. If there were secrecy and no effective spying, then how would you know whether you were the leader? Without information about what the other side was actually up to, the conservative assumption would be that they were at least as far along as you were, so you should make the minimum supportable investment

... (read more)

1jbash4y

How do you arrange for honest and credible disclosure of those things?

[Book Review] "The Vital Question" by Nick Lane

Guive4y50

Thanks for this review. I particularly appreciated the explanation of why the transition from primordial soup to cell is hard to explain. Do you know how Lane's book has been received by other biochemists?

3lsusr4y

Thanks. No idea.