apollonianblues — LessWrong

LESSWRONG
LW

Replying toMy Plan to Build Aligned Superintelligence

My Plan to Build Aligned Superintelligence

TBH my naive thought is that if John's project succeeds it'll solve most of what I think of as the hard part of alignment, and so it seems like one of the more promising approaches to me, but in my model of the world it seems quite unlikely that there are natural abstractions in the way that John seems to think there are.

Replying toMy Plan to Build Aligned Superintelligence

apollonianblues3y

My Plan to Build Aligned Superintelligence

I have LOL thanks tho

Replying toMy Plan to Build Aligned Superintelligence

apollonianblues3y

My Plan to Build Aligned Superintelligence

My assumption is that it would do this to prevent other people from making superintelligences that are unaligned. At least Eliezer thinks you need to do this (see bullet point 6 in this post), and I think it generally comes up in conversations people have about pivotal acts. Some people think if you think of an alignment solution that's good and easy to implement, everyone building AGI will use it, and so you won't have to prevent other people from building unaligned AGI, but this seems unrealistic and risky to me.

My Plan to Build Aligned Superintelligence

apollonianblues

2ish months ago, I realized making sure superintelligent AI doesn’t kill everyone was the biggest problem period. I resolved to understand and maybe help solve the problem. A friend of mine who does alignment research said that, in order to clarify my thinking, I should sit down and try to write my best solution to aligning a superintelligent AI in an hour. I did this, and found it extremely helpful; I then edited it and stared angrily at it for a few more hours to make it less embarrassing to post on Lesswrong. This is, so far, my best plan to solve AI alignment. I found writing this very helpful for clarifying... (read 2225 more words →)

Can We Align AI by Having It Learn Human Preferences? I’m Scared (summary of last third of Human Compatible)

apollonianblues

Epistemic status: I am just learning about alignment and just read Human Compatible. Below is a summary of the paradigm he outlines for aligning AI in the last third of his book, and the questions I have of this project as a new reader.

AI researcher Stuart Russell’s 2019 book Human Compatible is one of the most popular/widely-circulated books on AI alignment right now. It argues that we need to change the paradigm of AI development in general from what he calls goal-directed behavior, where a machine optimizes on a reward function written by humans, to behavior that attempts to learn and follow human objectives.

Russell provides a set of general design principles that... (read 1556 more words →)