Was on Vivek Hebbar's team at MIRI, now working with Adrià Garriga-Alonso on various empirical alignment projects.
I'm looking for projects in interpretability, activation engineering, and control/oversight; DM me if you're interested in working with me.
Does MIRI have a statement on recent OpenAI events? I'm pretty excited about frank reflections on current events as helping people to orient.
Points 1-3 and the idea that superintelligences will be able to understand our values (which I think everyone believes). But the conclusion needs a bunch of additional assumptions.
I suggest people read both that and Deep Deceptiveness (which is not about deceptiveness in particular) and think about how both could be valid, because I think they both are.
Prerat: Everyone should have a canary page on their website that says “I’m not under a secret NDA that I can’t even mention exists” and then if you have to sign one you take down the page.
Does this work? Sounds like a good idea.
The third one was a typo which I just fixed. I have also changed it to use "base policy" everywhere to be consistent, although this may change depending on what terminology is most common in an ML context, which I'm not sure of.
I have strong downvoted without reading most of this post because the author appears to be trying to make something harmful for the world.
Thanks, fixed.
Then I think you should specify that progress within this single innovation could be continuous over years and include 10+ ML papers in sequence each developing some sub-innovation.
I think a single innovation left to create LTPA is unlikely because it runs contrary to the history of technology and of machine learning. For example, in the 10 years before AlphaGo and before GPT-4, several different innovations were required-- and that's if you count "deep learning" as one item. ChatGPT actually understates the number here because different components of the transformer architecture like attention, residual streams, and transformer++ innovations were all developed separately.
The title of this dialogue promised a lot, but I'm honestly a bit disappointed by the content. It feels like the authors are discussing exactly how to run particular mentorship programs and structure grants and how research works in full generality, while no one is actually looking at the technical problems. All field-building efforts must depend on the importance and tractability of technical problems, and this is just as true when the field is still developing a paradigm. I think a paradigm is established only when researchers with many viewpoints build a sense of which problems are important, then try many approaches until one successfully solves many such problems, thus proving the value of said approach. Wanting to find new researchers to have totally new takes and start totally new illegible research agendas is a level of helplessness that I think is unwarranted-- how can one be interested in AF without some view on what problems are interesting?
I would be excited about a dialogue that goes like this, though the format need not be rigid:
[1]: (in the Hamming sense that includes tractability)