User Comment Replies

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander

abou.

There are frequently such typos in your posts. I'm curious if this is an intentional choice to differentiate yourself from AI writing or if just don't bother with a spell-checker?

(p.s. thank you for your hard work on your great blog posts!)

I hired 5 people to sit behind me and make me productive for a month

Simon Mendelsohn2y4439

This is hilarious and beautiful and exactly what I expect from LessWrong. Also, hello fellow Simon.

What I mean by "alignment is in large part about making cognition aimable at all"

Simon Mendelsohn2y81

I feel like we already can point powerful cognition at certain things pretty well (e.g. chess), and the problem is figuring out how to point AIs to more useful things (e.g. honestly answering hard questions well). So I don’t know if I’m nit-picky, but I think that the problem is not pointing powerful cognition at anything at all, but rather pointing powerful cognition at whatever we want.

jacopo2y154

It's not clear here, but if you read the linked post it's spelled out (the two are complementary really). The thesis is that it's easy to do a narrow AI that knows only about chess, but very hard to make an AGI that knows the world, can operate in a variety of situations, but only cares about chess in a consistent way.

I think this is correct at least with current AI paradigms, and it has both some reassuring and some depressing implications.

The Alignment Problem from a Deep Learning Perspective (major rewrite)

Simon Mendelsohn2y20

Good paper! Thank you for sharing. I have a few nit-picky suggestions with wording and grammar. I will put them here rather than email directly because some of them are subjective. This way others can feel free to chime in if they feel inclined to nit-pick my nit-picks :)

"artificial general intelligence (AGI) may surpass" -> "artificial general intelligence (AGI) seems likely to surpass" (I feel like "may" is a somewhat weak word in this context, but I don't feel strongly here.)

"undesirable (in other words, misaligned)" -> "undesira... (read more)

5Ilio2y

Thanks for this manuscript, already a key reading for newcomers. This innocent looking sentence is actually a very strong statement. Could we ensure the safety of TWA Flight 800 and Tacoma bridge? Or we couldn’t but we could have, if only we did understand better the principles that govern electric sparks and mechanical resonance? Imho we should instead thank learning from trying, and that’s a damn as alignement likely includes errors we can’t learn from. On the other hand, you have a point that better interpretability should help, so I suggest to replace « whose safety we can ensure… » by a lighter statement, for example « whose safety is made easier… ».

LESSWRONG
LW

All of Simon Mendelsohn's Comments + Replies