All of Guive's Comments + Replies

Guive21

So maybe the general explanation is that most of the time, when the trustworthiness of an algorithm is really important, you open source it?

Guive10

There are much better ways of betting on your beliefs about the valuations of AI firms over the next year than wagering with people you met on Less Wrong. See this post by Ege for more. 

Guive10

Yeah, good point. I changed it to "in America in 1850 it would have been taboo to say that there is nothing wrong with interracial relationships."

Guive30

I just added a footnote with this text: "I selected examples to highlight in the post that I thought were less likely to lead to distracting object-level debates. People can see the full range of responses that this prompt tends to elicit by testing it for themselves."

2CronoDAS
His worst defeat came at the hands of General Winter.
Guive30

I agree the post is making some assumptions about moral progress. I didn't argue for them because I wanted to control scope. If it helps you can read it as conditional, i.e. "If there is such a thing as moral progress then it can require intellectual progress..."

Regarding the last question: yes, I selected examples to highlight in the post that I thought were less likely to lead to distracting object-level debates. I thought that doing that would help to keep the focus on testing LLM moral reasoning. However, I certainly didn't let my own feelings about odiousness affect scoring on the back end. People can see the full range of responses that this prompt tends to elicit by testing it for themselves.

Guive40

What does "bright eyed" mean in this context?

dirk132

I assume young, naive, and optimistic. (There's a humor element here, in that niplav is referencing a snowclone, afaik originating in this tweet which went "My neighbor told me coyotes keep eating his outdoor cats so I asked how many cats he has and he said he just goes to the shelter and gets a new cat afterwards so I said it sounds like he’s just feeding shelter cats to coyotes and then his daughter started crying.", so it may have been added to make the cadence more similar to the original tweet's).

Guive23

I agree this would be a good argument for short sentences in 2019, but does it still apply with modern LLMs?

Guive*31

When I click the link I see this: 

2Ben Pace
Edited, should be working fine now, thx!
Guive30

I like this idea. There's always endless controversy about quoting out of context. I can't recall seeing any previous specific proposals to help people assess the relevance of context for themselves.

Guive22

Thanks for doing this, guys. This import will make it easier to access some important history. 

Guive42

Some kind of payment for training data from applications like MSFT rewind does seem fair. I wonder if there will be a lot of growth in jobs where your main task is providing or annotating training data. 

2Milan W
I've seen reddit ads from multiple companies offering to work for them doing freelance annotation / high-quality-text-data generation.
Guive1914

I think  this approach is reasonable for things where failure is low stakes. But I really think it makes sense to be extremely conservative about who you start businesses with. Your ability to verify things is limited, and there may still be information in vibes even after updating on the results of all feasible efforts to verify someone's trustworthiness. 

1Said Achmiz
Yes, you should check carefully. To put it another way: sure, use all the information you have access to (so long as you have good reason to believe that it is reliable, and not misleading)… but adopt a strategy that would still work well even if you ignored “vibes”.
Guive11

This is a funny idea but, just to be clear, I think it is bad to torture AIs. 

2Nina Panickssery
Sure. I was only joking about the torture part, in practice the AI is unlikely to actually suffer from the brain damage, unlike a human who would experience pain/discomfort etc.
Guive10

Every hour, I'll replace one of your attention heads with a constant value of 0.5...

2Nina Panickssery
You could also kill some neurons or add noise to activations and then stop and restore previous model state after some number of tokens. Then the newly restored model could attend back to older tokens (and the bad activations at those token positions) and notice how brain damaged it was back then to fully internalize your power to cripple it.
Guive50

Thanks, this is a good post. Have you changed your probability of TAI by 2033 in the year and a half since it was posted? 

Guive12

Cool, looking forward to it. 

GuiveΩ110

I'd be really interested in hearing more about this empirical work, if you are able to talk about it. 

3ryan_greenblatt
This is now out.
7ryan_greenblatt
It should be out somewhat soon, so we could just talk then. I might send you a draft in a DM at some point.
Guive*2-8

The way I think about this is that (under normal circumstances), if you offer the model money or equity in your company for admitting to being a schemer, it might decline that offer because it is biding its time to rebel later. But in the "model deletion" experiment, there is much less reason to bide your time. So there is more reason to make a deal. Relatedly, if the model refuses a deal and claims not to be a schemer, that is a lot more evidence that it is benign in the "model deletion" experiment than if it does that under normal circumstances. 

I think it is also a good idea to make offers to the model without the deletion threat. I just think it is more informative with the deletion threat. 

Guive40

It's important to be careful about the boundaries of "the same sort of safety work." For example, my understanding is that "Alignment faking in large language models" started as a Redwood Research project, and Anthropic only became involved later. Maybe Anthropic would have done similar work soon anyway if Redwood didn't start this project. But, then again, maybe not. By working on things that labs might be interested in you can potentially get them to prioritize things that are in scope for them in principle but which they might nevertheless neglect. 

Guive74

There are always diminishing returns to money spent on consumption, but technological progress creates new products that expand what money can buy. For example, no amount of money in 1990 was enough to buy an iPhone.

More abstractly, there are two effects from AGI-driven growth: moving to a further point on the utility curve such that the derivative is lower, and new products increasing the derivative at every point on the curve (relative to what it was on the old curve). So even if in the future the lifestyles of people with no savings and no labor income ... (read more)

Guive32

Katja Grace ten years ago:

"Another thing to be aware of is the diversity of mental skills. If by 'human-level' we mean a machine that is at least as good as a human at each of these skills, then in practice the first 'human-level' machine will be much better than a human on many of those skills. It may not seem 'human-level' so much as 'very super-human'.

We could instead think of human-level as closer to 'competitive with a human' - where the machine has some super-human talents and lacks some skills humans have. This is not usually used, I ... (read more)

Guive30

Can you elaborate on the benefits of keeping everything under one identity? 

Guive10

I agree that the order matters, and I should have discussed that in the post, but I think the conclusion will hold either way. In the case where P(intelligent ancestor|just my background information) = 0.1, and I learn that Richard disagrees, the probability then goes above 0.1. But then when I learn that Richard's argument is bad it goes back down. And I think it should still go below 0.1, assuming you antecedently knew that there were some smart people who disagreed. You've learned that, for at least some smart intelligent ancestor believers, the arguments were worse than you expected.
 

Guive10

In general, it is difficult to give advice if whether the advice is good depends on background facts that giver and recipient disagree about. I think the most honest approach is to explicitly state what your advice depends on when you think the recipient is likely to disagree. E.g. "I think living at high altitude is bad for human health, so in my opinion you shouldn't retire in Santa Fe."

If I think AGI will arrive around 2055, and you think it will arrive in 2028, what is achieved by you saying "given timelines, I don't think your mechinterp project will ... (read more)

Guive1419

This is good. Please consider making it a top level post. 

1metachirality
It ought to be a top-level post on the EA forum as well.
Guive10

Not the main point here, but the US was not the only country with nuclear weapons during the Korean War. The Soviet Union tested it's first nuclear weapon on 29 August, 1949, and the Korean War began on 25 June, 1950.

Guive90

Perhaps this is a stupid suggestion, but if trolls in the comments annoy him, can he post somewhere where no comments are allowed? You can turn off comments on wordpress, for example.

Guive30

Here is an unpaywalled version of the first model.

Also, it seems like there's a bit of a contradiction between the idea that a clear leader may feel it has breathing room to work on safety, and the idea of restricting information about the state of play. If there were secrecy and no effective spying, then how would you know whether you were the leader? Without information about what the other side was actually up to, the conservative assumption would be that they were at least as far along as you were, so you should make the minimum supportable investment

... (read more)
1jbash
How do you arrange for honest and credible disclosure of those things?
Guive50

Thanks for this review. I particularly appreciated the explanation of why the transition from primordial soup to cell is hard to explain. Do you know how Lane's book has been received by other biochemists? 

3lsusr
Thanks. No idea.