cross-posted to Twitter:
https://twitter.com/DavidSKrueger/status/1573643782377152514
cross-posted to Twitter:
https://twitter.com/DavidSKrueger/status/1573643782377152514
-What sorts of skills an AI would need to achieve powerbase ability
-Timelines till powerbase ability
-Takeoff speeds
-How things are going to go by default, alignment-wise (e.g. will it be as described in this?)
-How things are going to go by default, governance-wise (e.g. will it be as described in this?)
I have two questions I'd love to hear your thoughts about.
1. What is the overarching/high-level research agenda of your group? Do you have a concrete alignment agenda where people work on the same thing or do people work on many unrelated things?
2. What are your thoughts on various research agendas to solve the alignment that exists today? Why do you think they will fall short of their goal? What are you most excited about?
Feel free to talk about any agendas, but I'll just list a few that come to my mind (in no particular order).
IDA, Debate, Interpretability (I read a tweet I think, where you said you are rather skeptical about this), Natural Abstraction Hypothesis, Externalized Reasoning Oversight, Shard Theory, (Relaxed) Adversarial Training, ELK, etc.
I feel like I'm pretty off outer vs. inner alignment.
People have had a go at inner alignment, but they keep trying to affect it by taking terms for interpretability, or modeled human feedbacks, or characteristics of the AI's self-model, and putting them into the loss function, diluting the entire notion that inner alignment isn't about what's in the loss function.
People have had a go at outer alignment too, but (if they're named Charlie) they keep trying to point to what we want by saying that the AI should be trying to learn good moral reasoning, which me...
ERO: I do buy the argument of Steganography everywhere if you are optimizing for outcomes. As described here (https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes) outcome-based optimization is an attractor and will make your sub-compoments uninterpretable. While not guaranteed, I do think that process based optimization might suffer less from steganography (although only experiments will eventually show what happens). Any thoughts on process based optimization?
Shard Theory: Yeah, the word research agenda was maybe wrongly picked. I was mainly trying to refer to research directions/frameworks.
RAT: Agree at the moment this is not feasible.
See above, I don't have strong views on how to call this. Probably for some things research agenda might be too strong of a word. I appreciate your general comment, this is helpful in better understanding your view on Lesswrong vs. for example peer-reviewing. I think you are right to some degree. There is a lot of content that is mostly about framing and does not provide concrete results. However, I think that sometimes a correct framing is needed for people to actually come up with interesting results, and for making things more concrete. Some examples I like for example are the inner/outer alignment framing (which I think initially didn't bring any concrete examples), or the recent Simulators (https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators) post. I think in those cases the right framing helps tremendously to make progress with concrete research afterward. Although I agree that grounded, concrete, and result-oriented experimentation is indeed needed to make concrete progress on a problem. So I do understand your point, and it can feel like flag planting in some cases.
Note: I'm also coming from academia, so I definitely understand your view and share it to some degree. However, I've personally come to appreciate some posts (usually by great researchers) that allow me to think about the Alignment Problem in a different way.
I read "Film Study for Research" just the other day (https://bounded-regret.ghost.io/film-study/, recommended by Jacob Steinhardt). In retrospect I realized that a lot of the posts here give a window into the rather "raw & unfiltered thinking process" of various researchers, which I think is a great way to practice research film study.