User Comment Replies

Learning from Human Preferences - from OpenAI (including Christiano, Amodei & Legg)

Re: title, probably worth pointing out that DeepMind was also involved in this paper.

0Dr_Manhattan8y

Fair. The title was getting longish, but I at least added Shane. Great work you guys!

Learning from Human Preferences - from OpenAI (including Christiano, Amodei & Legg)

I have a hunch that semi-neat approaches to AI may come back as a layer on top of neural nets -- consider the work on using neural net heuristics to decide the next step in theorem-proving (https://arxiv.org/abs/1606.04442). In such a system, the decision process is opaque, but the result is fully verifiable, at least in the world of math (in a powerful system the theorems may be being proved for ultimate use in some fuzzy interface with reality). The extent to which future systems might look like this, or what that means for safety, isn't very clear yet (at least not to me), but it's another paradigm to consider.

Learning from Human Preferences - from OpenAI (including Christiano, Amodei & Legg)

DarioAmodei8y30

To clarify and elaborate a bit on Paul’s point, our explicit methodology was to take a typical reinforcement learning system, with standard architecture and hyperparameter choices, and add in feedback mostly without changing the hyperparameters/architecture. There were a couple exceptions — an agent that’s learning a reward function needs more incentive to explore than an agent with a fixed reward function, so we had to increase the exploration bonus, and also there are a few parameters specific to the reward predictor itself that we had to choose. Howev... (read more)

2cousin_it8y

Thank you Dario! All good points, I didn't wish to detract from your work, it's the most hopeful thing I've seen about AI progress in years. Maybe one reason for my comment is that I've worked on "neat" decision theory math, and now you have this promising new idea using math that feels stubbornly alien to me, so I can't jump into helping you guys save the world :-)

LESSWRONG
LW

All of DarioAmodei's Comments + Replies