kave

Hello! I work at Lightcone and like LessWrong :-). I have made some confidentiality agreements I can't leak much metadata about (like who they are with). I have made no non-disparagement agreements.

Posts

Sorted by New

5kave's Shortform

9mo

76Gwern: Why So Few Matt Levines?

1mo

59Linkpost: Surely you can be serious

4mo

150Daniel Dennett has died (1942-2024)

7mo

564LessWrong's (first) album: I Have Been A Good Bing

8mo

174

5kave's Shortform

9mo

147If you weren't such an idiot...

9mo

105New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)

9mo

41On plans for a functional society

24A bet on critical periods in neural networks

26Singular learning theory and bridging from ML to brain emulations

Wiki Contributions

Vote Strength

8mo

(-35)

AI Control

10mo

Comments

Sorted by

Newest

Epistemic status: poetry (and other poems)

kave3d20

If I had to pick a favourite, I'd probably go for Fire and AIs, but The GPT is also great: very terrifying sublime

lemonhope's Shortform

kave3d30

Perhaps say some words on why they might want to?

"It's a 10% chance which I did 10 times, so it should be 100%"

kave4d92

Curated! A short and sweet post, with a useful rule of thumb for those who live and die by the Fermi. There are some more good ones in the comments, too!

Why imperfect adversarial robustness doesn't doom AI control

kave6dΩ3102

I don't think this distinction is robust enough to rely on as much of a defensive property. I think it's probably not that hard to think "I probably would have tried something in direction X, or direction Y", and then gather lots of bits about how well the clusters X and Y work.

Why imperfect adversarial robustness doesn't doom AI control

kave6dΩ242

I'm curious about Claude's coauthorship of this post. How did it help?

Why imperfect adversarial robustness doesn't doom AI control

kave6dΩ241

The crucial point is: when the untrusted model gets to step 6, it sees R in the history instead of A, so it knows its attempt was replaced.

Why does it know its attempt was replaced? Because it can guess it wouldn't have produced R? If it can tell that, how come it can't guess it specifically would have produced A?

Making a conservative case for alignment

kave9d68

I thought this title meant the post would be making a case from conservative (i.e. minimal) assumptions.

Maybe change the title to "making a politically conservative case for alignment" or something?

Eli's shortform feed

kave14d35

I wonder what the lifetime spend on dating apps is. I expect that for most people who ever pay it's >$100

Eli's shortform feed

kave15d94

I think the credit assignment is legit hard, rather than just being a case of bad norms. Do you disagree?

Eli's shortform feed

kave16d82

I would guess they tried it because they hoped it would be competitive with their other product, and sunset it because that didn't happen with the amount of energy they wanted to allocate to the bet. There may also have been an element of updating more about how much focus their core product needed.

I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.