Will this journal be open to very abstract philosophy (as opposed to sticking to experiments, math, neuroscience, etc.)? In my expert opinion most of the key questions in AGI alignment are philosophical in nature--in other words, they are centered on areas that lack foundational concepts and data. A bit more specifically, would it be open to very speculative philosophy, meaning philosophy that is
I think this would be very consonant with your stated scope, though it would be especially hard for reviewers to judge.
We're been discussing scope a lot, and this is indeed a big question. Some considerations:
Incidentally, if someone wanted to help make the case for philosophy in the journal, a very useful thing would be to compile a list of papers (which could be a mix of published in traditional journals and not, and need not be strictly on alignment) to serve as exemplars of what should be included.
Thanks. Makes sense, yeah, seems tough. Good luck :)
but I think this is not the sort of philosophy you're interested in.
Yeah, definitely not, unfortunately.
Incidentally, if someone wanted to help make the case for philosophy in the journal, a very useful thing would be to compile a list of papers (which could be a mix of published in traditional journals and not, and need not be strictly on alignment) to serve as exemplars of what should be included.
Yeah someone should maybe do that. I would submit Eliezer's TDT paper, I think.
I maybe wouldn't directly submit this, because it's too speculative (unclear and unclearly explained), but I would still gesture at it or something:
and followups
(Like, these probably couldn't go in a journal, and this particular work may not be that high quality / may boil down by 5x to a good paper, but this is the general type of investigation that I would hope for there to be room for if feasible.)
I'm sure you've had lots of discussion about this; why the label "AI alignment"?
I think "alignment" refers to the somewhat specific task of aligning an AIs values to human values. But my understanding of your actual scope is more like "theoretical AI safety". A lot of foundational work is done with the intention that it will eventually help with alignment, but which definitely isn't about alignment, and a lot of theoretical AI safety work isn't about alignment per se at all. For example, some of my research problems are trying to understand which types of AI systems are not dangerous, not because their values are aligned with ours, but because they're not unrestrained consequentialists.
I wish you the best of skill standing up to the incentives which have enshittified the academic publishing ecosystem.
Fwiw, I'm trying to address this in a different context (economics/policy) at The Unjournal (Unjournal.org). I think I have some sense of how to make things better there, and what some of the blockers are.
Hopefully some of the insights and tools will carry over/be relevant to this context as well, and we can leverage extend what has worked.
Yes, I'm excited to see what we can learn David's experience, especially given the incentive designer's insight that he brings to this. We also, collectively, have some experience with the ILIAD conferences which was a precursor experimenting with alternative compensation mechanisms. See Proceedings of ILIAD: Lessons and Progress for some analysis of that project.
What happens if AI labs offer to support you, expecting that you make it more prestigiuous to publish capability evals?
Sounds like a lot more risk of bias (and appearance thereof) than it's worth. At the least, I figure you'd need to have a disclosure on every paper authored by an employee of the company, as well as conflict-of-interest rules making sure the action editor and reviewers were un-biased. Would be a pain, and still suspect. (Here's GPT's summary of how existing journals handle this, most commonly in medical research: https://chatgpt.com/share/69a992c3-75d8-8002-a592-a8053ee1cdbe )
An intermediate and more plausible case would be personal donations from a former or current employee of a frontier company; we expect many to be philanthropically motivated in the coming years. Imo, this is something we'd consider, but I haven't thought about it much yet. We're set for funding for the first year.
If we are successful in standing up a good and well-respected journal, I expect there will be many funders interested in supporting us. (And if we're not successful, the issue is moot.) So I'm not too worried about getting backed into a corner where our only option to keep running is money from a potentially biasing source. We'd ideally like a broad diverse base of funders, like the arXiv.
Sounds promising! Curious about whether you have plans to accept papers based on experimental setup instead of results (to reduce publication bias) and if you'll consider a "press abstract" designed to help journalists disseminate information to the broader public?
Hmm. Ultimately it would be up to the editorial board, but here's why I personally think these features are probably low priority given their nontrivial cost: (1) I presume we are talking about numerical experiments, and I expect the foundational/conceptual topics we want to publish on are less vulnerable to publication bias than, say, experimental psychology or economics. It would be more like pre-registering numerical math papers. That said, if you think the alignment literature has big problems with publication bias, I'd be interested to hear more. (2) Our primary audience is other researchers. Often, journals are motivated to provide press abstracts to induce popular coverage (by making a time-pressed journalist's life easier, as with a press release), and increasing popular coverage is not one of our goals. It can also be a corrupting influence (although there are steps that we could take to reduce this). High quality popular-science journalist will generally take the time to talk to the authors and outside researchers to get the story right.
(1) yeah this makes sense! I do think that accepting experimental work based on results rather than experimental setup is a structure that leads to publication bias, but given you're looking to be more foundational/conceptual, I don't think this will be an issue here.
(2) "increasing popular coverage is not one of our goals" fair enough! I look forward to seeing the first issue (:
(Caveat: I'm not an expert in this field) -- I expect there could be some value in a ~'registered reports' approach for these high-cost computational experiments.
In informal reporting (ACX, this forum) recall reading some mentions of something related to the "publication bias" story in econ/social sci. Perhaps more like concerns of 'labs reporting selectively'; both researchers promoting capabilities (selective reporting on successes) and safety-aligned researchers accused of cherry-picking the most alarming failures/misalignment evidence.
Yea, I can definitely see the selective reporting problem, which goes beyond the problem of negative results being unfairly denied publication. But to combat selective reporting, you'd really need to require preregistered experiments, which is more of a collective-action problem between journals, since if any of them allow un-preregistered experiments, the authors can just publish there. (Of course, you can try and convince the broad community to ignore all experiments that aren't preregistered, but if you can do this then you've already won; the journals will be strongly incentivized to follow suit.)
Required preregistration is just very cumbersome and difficult to do for exploratory science; it really seems only feasible for the later stages of things like medical trials or big contentious question requiring a decisive experiment.
This publication bias story in ML is a whole can of worms which I would love to open at some point. tl;dr it is a problem, but the field has semi-accidentally mitigated many of the worse excesses of it. There is an IMO massively under-regarded work on this— Moritz Hardt’s Machine Learning Benchmarks, which I will write a LW review of some day if I have time.
I'm curious about the timeline. E.g., when do you expect to open the first call for papers, when do you expect the first issue to be published, etc?
It depends on few factors, but April at the earliest for initial submissions. Publication will almost certainly be on a rolling basis (no discrete issues). Our ambitious goal is to drive the submission-to-publication time down to something like a month, but it will require combining several new tricks so it won't be that fast at the beginning.
In order to not drown in slop submissions you could require each author to stake as much money as the reviewers would be happy to be paid to reject their submission as slop.
I like this idea aesthetically. I foresee some challenges in making "staking" something that won't trigger alarms in the existing research bureaucracies that host many of our potential authors. If you have clever ideas for how to handle that I would be curious to hear.
I guess during signup you could require authors to say what existing research bureaucracy they are working for, and only if they click the I am an independent researcher. link are they introduced to staking.
An example, for what it's worth: Quantum journal is relatively new physics arXiv-overlay journal (10 years old) that runs on volunteers effort and modest publication fees (~$700). They didn't want the fees to be a barrier to submitting, so they have a very easy process for getting the fees waived; you basically just have to ask. My understanding is that they still have not been overrun with slop, and whenever I am asked to review the papers are reasonable quality. So it does not seem they are foisting the slop handling onto reviewers. Desk-rejection by the editors appears to be enough.
I'm not speaking on behalf of the initiative here, but I do see some promise in author submission fees being used to cover referee compensation. If done judiciously.
I'm interested to hear more. Would it mostly be for practical reasons (financial sustainability), or to reduce the submission of bad work that wastes editor/reviewer time?
We do not yet plan to support replications of empirical work. Organisationally, there is a desire to keep opening scope tight and theoretical to avoid having diffuse messaging at start up
Personally, I would make a case that replications are not as important in the ML/AI research as in the sciences of the physical (although this depends somewhat on what we mean by "replications")
That said, I think that there is a strong argument for replications generally, and maybe in this field too, and if the Editorial Board agreed with that, then that is what we would do. I am beholden at this point to mention the connection to the UnJournal work that David has mentioned elsewhere in these comments.
This post seems written as if it's "addressed to" the lesswrong community, rather than the broader community of researchers who might want to publish in such a journal. Was this intentional?
We are trying to do both, in that we are attempting to be a bridge between LW and wider scientific communities. Where do you feel our tone might be excluding domain scientists?
I think the general sense is that this is written for a LW audience. If I'd point to specific wordings:
I think how other organizations handle this sort of thing is that they may have one post on Lesswrong for this specific audience, and a second, less detailed post for a broader community on their website. E.g., compare Anthropic's RSP update with Holden's post on the topic.
Concretely, I think it seems like your post assumes some of the worldviews and assumptions of the lesswrong-ish alignment community, and so general academics may feel like the post is not addressed to them.
Thanks, this is specific and useful. I think it's less that we're attempting to target LW and more that it's just how we tend to talk. We'll work on keeping the word choice more conventional and professional.
tl;dr We’re incubating an academic journal for AI alignment: rapid peer-review of foundational Alignment research that the current publication ecosystem underserves. Key bets: paid attributed review, reviewer-written synthesis abstracts, and targeted automation. Contact us if you’re interested in participating as an author, reviewer, or editor, or if you know someone who might be.
Experimental Infrastructure for Foundational Alignment Research
This is the first in a series of “build-in-the-open” updates regarding the incubation of a new peer-reviewed journal dedicated to AI alignment. Later updates will contain much more detail, but we want to put this out soon to draw community participation early. Fill out this form to express your interest in participating as an author, reviewer, editor, developer, manager, or board member, or to recommend someone who might be interested.
The Core Bet
Peer review is a crucial public good: it applies scarce researcher time to sort new ideas for focused attention from the community, but is undersupplied because individual reviewers are poorly incentivized. Peer review in alignment research is particularly fragmented. While some parts of the alignment research community are served by existing venues, such as journals and ML conferences, there are significant gaps. These gaps arise from a combination of factors including the lack of appropriate reviewer pools for some kinds of work. Moreover, none of these institutions move as fast we we think they could in this era, mainly because of inertia. Various preprint servers and online forums avoid these problems, but generally at the expense of quality certification and institutional legitimacy. Furthermore, their review coverage can suffer when attention is misallocated due to trends and hype.
Our bet is that we can create a venue that provides institutional leverage (coordination, compensation) and legibility (citations, archival records, stable indexing) without the institutional friction that kills speed. Instead we can operate a small, agile scale that provides dedicated tooling and rapid experimentation.
Operational Design
We are designing the journal around a few specific, high-leverage hypotheses:
Our forthcoming formal description of the journal will have much more detail. Contact us to help shape it.
Scope
“AI Alignment” is a broad and often contested label. To provide a high-signal environment from day one, we are making a deliberate choice regarding our starting point:
This is just a starting point. The current team is not the final arbiter of what constitutes “alignment” for all time. While we are setting the initial direction to get the engine running, the long-term responsibility for expanding, narrowing, or shifting the scope will belong to the editorial board. Our job right now is to build a vessel sturdy enough to support those debates.
Governance
This project is in its incubation phase. As the “plumbing” of the journal grows, editorial and strategic authority will be taken up by an editorial board of respected researchers from the alignment community. The journal will be philanthropically funded, so our funders will naturally influence on how the journal develops, but we are committed to building a self-sustaining, public-good institution that belongs to the field.
Advisory board
We are grateful for the advice and support from the initial members of our advisory board:
Institutional stewardship
This project could fail. Poor execution could create a status-chasing bottleneck, further pollute the signal-to-noise ratio in alignment research, or just waste researchers' time. Poor coordination with other initiatives could hinder rather than help the field.
To reduce this risk, we will engage as a good citizen with the alignment research community. We will track and publish our own performance metrics: turnaround times, reviewer load, and author satisfaction, and solicit assessment by the wider community whether we are participating cooperatively and productively in the publication ecosystem. Continuing the journal will be contingent upon positive community feedback and the editorial board's continuing reassessment of counterfactually positive impact. Accepted papers will remain online, regardless of the ultimate fate of the project.
Next steps
Join the founding team
A journal is only as good as its community, and you could be part of it. We want participation in the Alignment Journal—as an editor, author, or reviewer—to be credibly status-accruing. This should be a justifiable use of time toward your career goals.
If you believe this infrastructure is a missing piece of the safety ecosystem, we want your help.
We’ll soon share an initial description of our design and plans for the journal with much more detail, so reach out now if you’d like to shape it.
Support us online
We welcome you following us on all the usual platforms,
@AlignmentJrnlAbove all, our content will be hosted at our main site alignmentjournal.org.
Contributors to this document
We are grateful to Geoffrey Irving, Victoria Krakovna, and David Duvenaud for their support and feedback on this post. The authors do not commit to every detail of the journal strategy outline, in perpetuity. This is the first stage in an ongoing consultation, and we expect to adjust our positions in the face of new evidence about best strategies. All responsibility for mistakes in content or execution resides with the current managing editors, Dan MacKinlay and Jess Riedel.
We intend to experiment with a variety of possible ratings, certifications and other quality signals. This is our starting proposal, as it is one we have some experience with.
The practical implications of the emphasis on achieving State-of-the-Art results on benchmarks in machine learning research is complicated and contentious, and, we argue, not yet well understood even inside the field. For an opinionated introduction, see Moritz Hardt’s book, The Emerging Science of Machine Learning Benchmarks.