Thank you for this comment!
I think your point that "The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there)." is spot on and maps to my intuitions about the weaknesses of fine-tuning and one of strongest points in favor of the significant risks to open-sourcing foundation models.
I appreciate your suggestions for other methods of auditing that could possibly work such as a model being run within a protected framework and open-sourcing encrypted weights. ...
Thanks for the comment!
I think your observation that biological evolution is a slow, blind, and undirected process is fair. We try to make this point explicit in our section on natural selection (as a main evolutionary selection pressure for biological evolution) where we say "The natural processes for succeeding or failing in survival and reproduction – natural and sexual selection – are both blind and slow."
For our contribution here we are not trying to dispute this. Instead we're seeking to find analogies to the ways in which machine evolution, which we...
This discussion considers a relatively “flat”, dynamic organization of systems. The open-agency model[13] considers flexible yet relatively stable patterns of delegation that more closely correspond to current developments.
I have a questions here that I'm curious about:
I wonder if you have any additional thoughts about the "structure" of the open agencies that you imagine here. Flexible and relatively stable patterns of delegation seem to be important dimensions. You mention here that the discussion focuses on "flat" organization of systems, but...
We want work flows that divide tasks and roles because of the inherent structure of problems, and because we want legible solutions. Simple architectures and broad training facilitate applying structured roles and workflows to complex tasks. If the models themselves can propose the structures (think of chain-of-thought prompting), so much the better. Planning a workflow is an aspect of the workflow itself.
I think this has particular promise, and it's an area I would be excited to explore further. As I mentioned in a previous comment on your The Open ...
Thanks for this post, and really, this series of posts. I had not been following along, so I started with the "“Reframing Superintelligence” + LLMs + 4 years" and worked my way back to here.
I found your initial Reframing Superintelligence report very compelling back when I first came across it, and still do. I also appreciate your update post referenced above.
The thought I'd like to offer here is that it strikes me that your ideas here are somewhat similar to what both Max Weber and Herbert Simon proposed we should do with human agents. After r...
Thanks for this post. As I mentioned to both of you, it feels a little bit like we have been ships passing one another in the night. I really like your idea here of loops and the importance of keeping humans within these loops, particularly at key nodes in the loop or system, to keep Moloch at bay.
I have a couple scattered points for you to consider:
I was interested in seeing what the co-writing process would create. I also wanted to tell a story about technology in a different way, which I hope compliments the other stories in this part of the sequence. I also just think it’s fun to retell a story that was originally told from the point of view of future intelligent machines back in 1968, and then to use a modern intelligent machine to write that story. I think it makes a few additional points about how stable our fears have been, how much the technology has changed, and the plausibility of the story itself.
I love that response! I’ll be interested to see how quickly it strikes others. All the actual text that appears within the story is generated by ChatGPT with the 4.0 model. Basically, I asked ChatGPT to co-write a brief story. I had it pause throughout and ask for feedback in revisions. Then, at the end of the story it generated with my feedback along the way, I asked it to fill in some more details and examples, which it did. I asked for minor changes in these in style and specific type as well.
I’d be happy to directly send you screenshots of the chat as well.
Thanks for reading!
Thank you for providing a nice overview of our Frontier AI Regulation: Managing Emerging Risks to Public Safety that was just released!
I appreciate your feedback, both the positive and critical parts. I'm also glad you think the paper should exist and that it is mostly a good step. And, I think your criticism is fair. Let me also note that I do not speak for the authorship team. We are quite a diverse group from academia, labs, industry, nonprofits, etc. It was no easy task to find common ground across everyone involved.
I think the AI Governance space is d...
As you likely know by now, I think the argument that “Technological Progress = Human Progress” is clearly more complicated than is sometimes assumed. AI is very much already embedded in society and the existing infrastructure makes further deployment even easier. As you say, “more capability dropped into parts of a society isn’t necessarily a good thing.”
One of my favorite quotes from the relationship between technological advancement and human advancement is from Aldous Huxley below:
“Today, after two world wars and three major revolutions, we know that th...
Thanks for the comment, David! It also caused me to go back and read this post again, which sparked quite a few old flames in the brain.
I agree that a collection of different approaches to ensuring AI alignment would be interesting! This is something that I’m hoping (now planning!) to capture in part with my exploration of scenario modeling that’s coming down the pipe. But, a brief overview of the different analytical approaches to AI alignment, would be helpful (if it doesn’t already exist in an updated form that I’m unaware of).
I agree with your insight ...
Thank you for this post! As I may have mentioned to you both, I had not followed this line of research until the two of you brought it to my attention. I think the post does an excellent job describing the trade offs around interpretability research and why we likely want to push it in certain, less risky directions. In this way, I think the post is a success in that it is accessible and lays out easy to follow reasoning, sources, and examples. Well done!
I have a couple of thoughts on the specific content as well where I think my intuitions converge or div...
Thank you for this post! As I mentioned to both of you, I like your approach here. In particular, I appreciate the attempt to provide some description of how we might optimize for something we actually want, something like wisdom.
I have a few assorted thoughts for you to consider:
I would be interested in additional discussion around the inherent boundedness of agents that act in the world. I think self-consistency and inter-factor consistency have some fundamental limits that could be worth exploring within this framework. For example, might different t
Thank you for the comment and for reading the sequence! I posted Chapter 7 Welcome to Analogia! (https://www.lesswrong.com/posts/PKeAzkKnbuwQeuGtJ/welcome-to-analogia-chapter-7) yesterday and updated the main sequence page just now to reflect that. I think this post starts to shed some light on ways of navigating this world of aligning humans to the interests of algorithms, but I doubt it will fully satisfy your desire for a call to action.
I think there are both macro policies and micro choices that can help.
At the macro level, there is an over...
There is a growing academic field of "governance" that exists that would variously be described as a branch of political science, public administration, or policy studies. It is a relatively small field, but has several academic journals where that fit the description of the literature you're looking for. The best of these journals, in my opinion, is Perspectives on Public Management & Governance (although it has a focus on public governance structures to a fault of ignoring corporate governance structures).
In addition to this, there is a 50 chapter OU...
Thank you for this post, Kyoung-Cheol. I like how you have used Deep Mind's recent work to motivate the discussion of the consideration of "authority as a consequence of hierarchy" and that "processing information to handle complexity requires speciality which implies hierarchy."
I think there is some interesting work on this forum that captures these same types of ideas, sometimes with similar language, and sometimes with slightly different language.
In particular, you may find the recent post from Andrew Critch on "Power dynamics as a blind spot or b...
Thanks. I think your insight is correct that governance requires answers to the "how" and "what" questions, and that the bureaucratic structure is one answer, but it leave the "how" unanswered. I don't have a good technical answer, but I do have an interesting proposal by Hannes Alfven in the book "The End of Man?" that he published under the pseudonym of Olof Johnneson called Complete Freedom Democracy that I like. The short book is worth the read, but hard to find. The basic idea is a parliamentary system in which all humans, through something akin to a smart phone, to rank vote proposals. I'll write up the details some time!
Thank you for the comment. There are several interesting points I want to comment on. Here are my thoughts in no particular order of importance:
I think this approach may have something to add to Christiano's method, but I need to give it more thought.
I don't think it is yet clear how this structure could help with the big problem of superintelligent AI. The only contributions I see clearly enough at this point are redundant to arguments made elsewhere. For example, the notion of a "machine beamte" as one that can be controlled through (1) the appropriate training and certification, (2) various motivations and incentives for aligning behavior with the knowledge from training, and (3) nominate...
Thank you for this. I pulled up the thread. I think you're right that there are a lot of open questions to look into at the level of group dynamics. I'm still familiarizing myself with the technical conversation around the iterated prisoner's dilemma and other ways to look at these challenges from a game theory lens. My understanding so far is that some basic concepts of coordination and group dynamics like authority and specialization are not yet well formulated, but again, I don't consider myself up to date in this conversation yet.
From the thread you sh...
Thank you for the insights. I agree with your insight that "bureaucracies are notorious homes to Goodhart effects and they have as yet found no way to totally control them." I also agree with you intuition that "to be fair bureaucracies do manage to achieve a limited level of alignment, and they can use various mechanisms that generate more vs. less alignment."
I do however believe that an ideal type of bureaucratic structure helps with at least some forms of the alignment problem. If for example, Drexler is right, and my conceptualization of the theo...
My name is Justin Bullock. I live in the Seattle area after 27 years in Georgia and 7 years in Texas. I have a PhD and Public Administration and Policy Analysis where I focused on decision making within complex, hierarchical, public programs. For example, in my dissertation I attempted to model how errors (measured as improper payments) are built into the US Unemployment Insurance Program. I spent time looking at how agents are motivated within these complex systems trying to develop general insights into how errors occur in these systems. Until about 2016...
Thanks for this comment. I agree there is some ambiguity here on the types of risks that are being considered with respect to the question of open-sourcing foundation models. I believe the report favors the term "extreme risks" which is defined as "risk of significant physical harm or disruption to key societal functions." I believe they avoid the terms of "extinction risk" and "existential risk," but are implying something not too different with their choice of extreme risks.
For me, I pose the question above as:
... (read more)