I see it now
so here you go, I made this for you
I don't see a flow chart
Strong upvote. Very clearly written and communicated. I've been recently thinking about digging deeper into this paper with the hopes of potentially relating it to some recent causality based interpretability work and reading this distillation has accelerated my understanding of the paper. Looking forward to the rest of the sequence!
Phi-4 is highly capable not despite but because of synthetic data.
Imitation models tend to be quite brittle outside of their narrowly imitated domain, and I suspect the same to be the case for phi-4. Some of the decontamination measures they took provide some counter evidence to this but not much. I'd update more strongly if I saw results on benchmarks which contained in them the generality and diversity of tasks required to do meaningful autonomous cognitive labour "in the wild", such as SWE-Bench (or rather what I understand SWE-Bench to be, I have yet t...
I'm curious how these claims relate to what's proposed by this paper. (note, I haven't read either in depth)
I'm curious what your read of the history is, here? My impression is that most important paradigm-forming work so far has involved empirical feedback somehow, but often in ways exceedingly dissimilar from/illegible to prevailing scientific and engineering practice.
I have a hard time imagining scientists like e.g. Darwin, Carnot, or Shannon describing their work as depending much on "immediate feedback loops with present day" systems.
Thanks for the comment @Adam Scholl and apologies for not addressing it sooner, it was on my list but then time flew. I...
Why are you sure that effective "evals" can exist even in principle?
Relatedly, the point which is least clear to me is what exactly would it mean to solve the "proper elicitation problem" and what exactly are the "requirements" laid out by the blue line on the graph. I think I'd need to get clear on this problem scope before beginning to assess whether this elicitation gap can even in principle be crossed via the methods which are being proposed (i.e. better design & coverage of black box evaluations).
As a non-example, possessing the kind of foun...
Re "big science": I'm not familiar with the term, so I'm not sure what the exact question being asked is. I am much more optimistic in the worlds where we have large scale coordination amongst expert communities. If the question is around what the relationship between governments, firms and academia, I'm still developing my gears around this. Jade Leung's thesis seems to have an interesting model but I have yet to dig very deep into it.
Hey Ryan, thank you for your support for the thoughtful write-up! It’s very useful for us to see what the alignment community at large, and our supporters specifically think of our work. I’ll respond to the point on “pivoting away from blue sky research” here and let Dušan address the other reservations in a separate comment.
As Nora has already mentioned, different people hold different notions on what it means to “keep it weird” and conduct “blue sky” and/or “non-paradigmatic” research. But in as far as this cluster of terms is pointing at research which ...
Given both my personal experience with LLMs and my reading of the role that empirical engagement has historically played in non-paradigmatic research, I tend to advocate for a methodology which incorporates immediate feedback loops with present day deep learning systems over the classical "philosophy -> math -> engineering" deconfusion/agent foundations paradigm.
I'm curious what your read of the history is, here? My impression is that most important paradigm-forming work so far has involved empirical feedback somehow, but often in ways exceedingly di...
For clarity, how do you distinguish between P1 & P4?
It's unclear to me what:
(1) You consider the Yudowskian argument for FOOM to be
(2) Which of the premises in the argument you find questionable
I would like to say that there's a study group being formed in the AI Alignment Slack server with similar intentions! If you are not a part of that server and would like to join, feel free to email me at melembroucarlitos@gmail.com telling me a bit about yourself and your hopes and intentions and I'll send you an invite.
Interesting. Curious to know what your construction ended up looking like and I'm looking forward to reading the resulting proof!