LESSWRONG
LW

All of Connor Leahy's Comments + Replies

Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI

Thanks for the comment! I agree that we live in a highly suboptimal world, and I do not think we are going to make it, but it's worth taking our best shot.

I don't think of the CoEm agenda as "doing AGI right." (for one, it is not even an agenda for building AGI/ASI, but of bounding ourselves below that) Doing AGI right would involve solving problems like P vs PSPACE, developing vastly more deep understanding of Algorithmic Information Theory and more advanced formal verification of programs. If I had infinite budget and 200 years, the plan would look very ... (read more)

Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI

Connor Leahy2mo151

Thanks for the comment!

Have I understood this correctly?

I am most confident in phases 1-3 of this agenda, and I think you have overall a pretty good rephrasing of 1-5, thanks! One note is that I don't think "LLM calls" as being fundamental, I think of LLMs as a stand-in for "banks of patterns" or "piles of shards of cognition." The exact shape of this can vary, LLMs are just our current most common shape of "cognition engines", but I can think of many other, potentially better, shapes this "neural primitive/co-processor" could take.

I think there is s... (read more)

4Jeremy Gillen2mo

Here's two ways that a high-level model can be wrong: * It isn't detailed enough, but once you learn the detail it adds up to basically the same picture. E.g. Newtonian physics, ideal gas laws. When you get a more detailed model, you learn more about which edge-cases will break it. But the model basically still works, and is valuable for working out the more detailed model. * It's built out of confused concepts. E.g. free will, consciousness (probably), many ways of thinking about personal identity, four humors model. We're basically better off without this kind of model and should start from scratch. It sounds like you're saying high-level agency-as-outcome-directed is wrong in the second way? If so, I disagree, it looks much more like the first way. I don't think I understand your beliefs well enough to argue about this, maybe there's something I should read? ---------------------------------------- I have a discomfort that I want to try to gesture at: Are you ultimately wanting to build a piece of software that solves a problem so difficult that it needs to modify itself? My impression from the post is that you are thinking about this level of capability in a distant way, and mostly focusing on much earlier and easier regimes. I think it's probably very easy to work on legible low-level capabilities without making any progress on the regime that matters. To me it looks important for researchers to have this ultimate goal constantly in their mind, because there are many pathways off-track. Does it look different to you? ---------------------------------------- I think this is a bad place to rely on governance, given the fuzziness of this boundary and the huge incentive toward capability over legibility. Am I right in thinking that you're making a large-ish gamble here on the way the tech tree shakes out (such that it's easy to see a legible-illegible boundary, and the legible approaches are competitive-ish) and also the way governance shakes out (such th

Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI

Connor Leahy3moΩ113315

Hi habryka, I don't really know how best to respond to such a comment. First, I would like to say thank you for your well-wishes, assuming you did not mean them sarcastically. Maybe I have lost the plot, and if so, I do appreciate help in recovering it. Secondly, I feel confused as to why you would say such things in general.

Just last month, me and my coauthors released a 100+ page explanation/treatise on AI extinction risk that gives a detailed account of where AGI risk comes from and how it works, which was received warmly by LW and the general public al... (read more)

9plex3mo

[epistemic status: way too ill to be posting important things] hi fellow people-who-i-think-have-much-of-the-plot you two seem, from my perspective as having read a fair amount of content from both, to have a bunch of similar models and goals, but quite different strategies. on top of both having a firm grip on the core x-risk arguments, you both call out similar dynamics in capabilities orgs capturing will to save the world and turning it into more capabilities progress[1], you both take issue with somewhat different but i think related parts of openphil's grantmaking process, you both have high p(doom) and not very comfortable timelines, etc. i suspect if connor explained why he was focusing on the things he is here, that would uncover the relevant difference. my current guess is connor is doing a kind of political alliancebuilding which is colliding with some of habryka's highly active integrity reflexes. maybe this doesn't change much, these strategies do seem at least somewhat collision-y as implemented so far, but i hope our kind can get along. 1. ^ e.g. "Turning care into acceleration" from https://www.thecompendium.ai/the-ai-race#these-ideologies-shape-the-playing-field e.g. https://www.lesswrong.com/posts/h4wXMXneTPDEjJ7nv/a-rocket-interpretability-analogy?commentId=md7QvniMyx3vYqeyD and lots of calling out Anthropic

AI alignment via civilizational cognitive updates

Connor Leahy3mo1815

Morality is multifaceted and multilevel. If you have a naive form of morality that is just "I do whatever I think is the right thing to do", you are not coordinating or being moral, you are just selfish.

Coordination is not inherently always good. You can coordinate with one group to more effectively do evil against another. But scalable Good is always built on coordination. If you want to live in a lawful, stable, scalable, just civilization, you will need to coordinate with your civilization and neighbors and make compromises.

As a citizen of a modern coun... (read more)

3Shankar Sivarajan3mo

I do in fact believe morality to be entirely orthogonal to "consensus" or what "many other people" want, and since you call this "selfishness," I shall return the favor and call your view, for all that you frame it as "coordination" or "scalable morality," abject bootlicking. A roaming bandit's "do what I tell you and you get to live" could be thought of a kind of contract, I suppose, but I wouldn't consider myself bound by it if I could get away with breaching it. I consider the stationary bandits' "social contracts" not to be meaningfully different. One clue to how they're similar is how the more powerful party can go, à la Vader, "Here is a New Deal. Pray I don't renew it any further." Unilaterally reneging on such a contract when you are the weaker party would certainly be unwise, for the same reason trying to stand between a lynch mob and its intended victim would be—simple self-preservation—but I condemn the suggestion that it would be immoral. I see what you call "civilization," and I'm against it. I vaguely recall reading of a medieval Christian belief that if everyone stopped sinning for a day, Christ would return and restore the Kingdom of Heaven. This reminds me of that: would be nice, but it ain't gonna happen.

AI alignment via civilizational cognitive updates

Connor Leahy3mo6215

Hi, as I was tagged here, I will respond to a few points. There are a bunch of smaller points only hinted at that I won't address. In general, I strongly disagree with the overall conclusion of this post.

There are two main points I would like to address in particular:

1 More information is not more Gooder

There seems to be a deep underlying confusion here that in some sense more information is inherently more good, or inherently will result in good things winning out. This is very much the opposite of what I generally claim about memetics. Saying that all in... (read more)

2Cleo Scrolls3mo

That last paragraph seems important. There’s a type of person that doesn’t have an opinion yet in AI discourse, which is new, and will bounce off the "side" that appears most hostile to them--which, if they have misguided ideas, might be the truth-seeking side that gently criticizes. (Not saying that's the case for the author of this post!) It’s really hard to change the mind of someone who’s found their side in AI. But not to have them join one in the first place!

2feugjavnpolj3mo

I'm worried you're not seeing this at a long enough timescale. I'm claiming: 1. "information sharing is good" is an invariant as timeless as "people will sacrifice truth and empathy for power", you can't claim Moloch wins based on available evidence. 2. both of these are more powerful than short-effects which we can forecast On 1: Increased information sharing leads to faster iteration. Faster iteration of science and technology leads to increased power derived from technology. Faster iteration of social norms and technologies leads to increased power derived from better coordination. It is not a coincidence that the USA is simultaneously the most powerful and one of the most tolerant societies in human history. Suppose you were the inventor of the gutenberg press deciding whether to release your technology or not. Maybe you could have foreseen the witch burnings. Maybe you could've even foreseen something like the 95 theses. You couldn't have foreseen democracy in France, or that its success would inspire the US. (Which was again only possible because of sharing of information between Europe and US) You couldn't have foreseen that jew physicists leaving Europe for a more tolerant society would invent an atomic bomb that would ultimately bring peace to Europe. You couldn't have foreseen the peace among EU nations in 2024, not enforced just at threat of bomb but more strongly via intermixing of its peoples. If you decided not to release the gutenberg press because of forecasted witch burnings you might have made a collosal mistake. Information sharing is argued as good because it relies on principles of human behaviour that survive long after you die, long after any specific circumstances. Information survives the rise and fall of civilisations. As long as 1-of-n people preserve some information, it is preserved. A basic desire for truth and empathy is universal amongst human beings across space and time, as its encoded in genetics not culture. Yes, peopl

2Shankar Sivarajan3mo

There is a certain type of person who would look at the mountains of skulls that Genghis Khan piled up and before judging it evil, ask whether it was a state acting or a group of individuals. Fuck that. States/governments, "democratic" or otherwise, have absolutely no privileged moral status, and to hell with any norm that suggests otherwise, and to hell with any "civilization" that promotes such a norm. What the state can do is wield violence far more effectively than you, so if you want to level a city, say, Beijing or Moscow, yeah, you should get the US military to do it instead of trying to do it yourself. And it can wield violence against you if you defy its will, so it's a bad idea to do so publicly, but for purely pragmatic reasons, not moral ones.

2AtillaYasar3mo

TLDR: Here's all the ways in which you're right, and thanks for pointing these things out! At a meta-level, I'm *really* excited by just how much I didn't see your criticism coming. I thought I was thinking carefully, and that iterating on my post with Claude (though it didn't write a single word of it!) was taking out the obvious mistakes, but I missed so much. I have to rethink a lot about my process of writing this. I strongly agree that I need a *way* more detailed model of what "memetic evolution" looks like, when it's good vs bad, and why, whether there's a better way of phrasing and viewing it, dig into historical examples, etc. I'm curious if social media is actually bad beyond the surface -- but again I should've anticipated "social media kinda seems bad in a lot of ways" being such an obvious problem in my thinking, and attended to it. Reading it back, it totally reads as an argument for "more information more Gooder", which I didn't see at all. (generally viewing the post as "more X is always more good" is also cool as in, a categorization trick that brings clarity) I think a good way to summarize my mistake is that I didn't "go all the way" in my (pretty scattered) lines of thinking. Thanks :) A big part of why I got into writing ideas explicitly and in big posts (vs off-hand Tweets/personal notes), is because you've talked about this being a coordination mechanism on Discord.

New blog: Expedition to the Far Lands

Connor Leahy6mo40

Nice set of concepts, I might use these in my thinking, thanks!

Meta Questions about Metaphilosophy

Connor Leahy1y20

I don't understand what point you are trying to make, to be honest. There are certain problems that humans/I care about that we/I want NNs to solve, and some optimizers (e.g. Adam) solve those problems better or more tractably than others (e.g. SGD or second order methods). You can claim that the "set of problems humans care about" is "arbitrary", to which I would reply "sure?"

Similarly, I want "good" "philosophy" to be "better" at "solving" "problems I care about." If you want to use other words for this, my answer is again "sure?" I think this is a good use of the word "philosophy" that gets better at what people actually want out of it, but I'm not gonna die on this hill because of an abstract semantic disagreement.

1M. Y. Zuo1y

That's the thing, there is no definable "set of problems humans care about" without some kind of attached or presumed metaphilosophy, at least none that you, or anyone, could possibly figure out in the foreseeable future and prove to a reasonable degree of confidence to the LW readerbase. It's not even 'arbitrary', that string of letters is indistinguishable from random noise. i.e. Right now your first paragraph is mostly meaningless if read completely literally and by someone who accepts the claim. Such a hypothetical person would think you've gone nuts because it would appear like you took a well written comment and inserted strings of random keyboard bashing in the middle. Of course it's unlikely that someone would be so literal minded, and so insistent on logical correctness, that they would completely equate it with random bashing of a keyboard. But it's possible some portion of readers lean towards that.

Meta Questions about Metaphilosophy

Connor Leahy1y40

"good" always refers to idiosyncratic opinions, I don't really take moral realism particularly seriously. I think there is "good" philosophy in the same way there are "good" optimization algorithms for neural networks, while also I assume there is no one optimizer that "solves" all neural network problems.

2TAG1y

That is not a fact.

3M. Y. Zuo1y

'"good" optimization algorithms for neural networks' also has no difference in meaning from '"glorxnag" optimization algorithms for neural networks', or any random permutation, if your prior point holds.

Barriers to Mechanistic Interpretability for AGI Safety

Connor Leahy1y30

I strongly disagree and do not think that will be how AGI will look, AGI isn't magic. But this is a crux and I might be wrong of course.

Meta Questions about Metaphilosophy

Connor Leahy1y40

I can't rehash my entire views on coordination and policy here I'm afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn't model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline.

I'm not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.

Spiritus Dei1y120

Double exponentials can be hard to visualize. I'm no artist, but I created this visual to help us better appreciate what is about to happen. =-)

3Spiritus Dei1y

That sounds like a good plan, but I think a lot of the horses have already left the barn. For example, Coreweave is investing $1.6 billion dollars to create an AI datacenter in Plano, TX that is purported to to be 10 exaflops and that system goes live in 3 months. Google is spending a similar amount in Columbus, Ohio. Amazon, Facebook, and other tech companies are also pouring billions upon billions into purpose-built AI datacenters. NVIDIA projects $1 trillion will be spent over the next 4 years on AI datacenter build out. That would be an unprecedented number not seen since the advent of the internet. All of these companies have lobbyists that will make a short-term legislative fix difficult. And for this reason I think we should be considering a Plan B since there is a very good chance that we won't have enough time for a quick legislative fix or the time needed to unravel alignment if we're on a double exponential curve. Again, if it's a single exponential then there is plenty of time to chat with legislators and research alignment. In light of this I think we need to have a comprehensive "shutdown plan" for these mammoth AI datacenters. The leaders of Inflection, Open-AI, and other tech companies all agree there is a risk and I think it would be wise to coordinate with them on a plan to turn everything off manually in the event of an emergency. Source: $1.6 Billion Data Center Planned For Plano, Texas (localprofile.com) Source: Nvidia Shocker: $1 Trillion to Be Spent on AI Data Centers in 4 Years (businessinsider.com) Source: Google to invest another $1.7 billion into Ohio data centers (wlwt.com) Source: Amazon Web Services to invest $7.8 billion in new Central Ohio data centers - Axios Columbus

Meta Questions about Metaphilosophy

Connor Leahy1y40

Sure, it's not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.

Spiritus Dei1y122

A lot of the debate surrounding existential risks of AI is bounded by time. For example, if someone said a meteor is about to hit the Earth that would be alarming, but the next question should be, "How much time before impact?" The answer to that question effects everything else.

If they say, "30 seconds". Well, there is no need to go online and debate ways to save ourselves. We can give everyone around us a hug and prepare for the hereafter. However, if the answer is "30 days" or "3 years" then those answers will generate very different responses.

The AI al... (read more)

Meta Questions about Metaphilosophy

Connor Leahy1y60

I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, "CERN for AI" and similar. Generally, I endorse the proposals here.

None of these are easy, of course, there is a reason my p(doom) is high.

But what hap

... (read more)

4Wei Dai1y

I prefer to frame it as human-AI safety problems instead of "misuse risk", but the point is that if we're trying to buy time in part to have more time to solve misuse/human-safety (e.g. by improving coordination/epistemology or solving metaphilosophy), but the strategy for buying time only achieves a pause until alignment is solved, then the earlier alignment is solved, the less time we have to work on misuse/human-safety.

Meta Questions about Metaphilosophy

Connor Leahy1y51

I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.

One of the ways I can see a "slow takeoff/alignment by default" world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can't (and wouldn't want to) "undo" that.

4Wei Dai1y

What are you proposing or planning to do to achieve this? I observe that most current attempts to "buy time" seem organized around convincing people that AI deception/takeover is a big risk and that we should pause or slow down AI development or deployment until that problem is solved, for example via intent alignment. But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...

Meta Questions about Metaphilosophy

Connor Leahy1yΩ12317

Hard for me to make sense of this. What philosophical questions do you think you'll get clarity on by doing this? What are some examples of people successfully doing this in the past?

The fact you ask this question is interesting to me, because in my view the opposite question is the more natural one to ask: What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is ... (read more)

1M. Y. Zuo1y

You raised a very interesting point in the last comment, that metaphilosophy already encompasses everything, that we could conceive of at least. So a 'solution' is not tractable due to various well known issues such as the halting problem and so on. (Though perhaps in the very distant future this could be different.) However this leads to a problem, as exemplified by your phrasing here: 'good philosophy' is not a sensible category since you already know you have not, and cannot, 'solve' metaphilosophy. Nor can any other LW reader do so. 'good' or 'bad' in real practice are, at best, whatever the popular consensus is in the present reality, at worst, just someone's idiosyncratic opinions. Very few concepts are entirely independent from any philosophical or metaphilosophical implications whatsoever, and 'good philosophy' is not one of them. But you still felt a need to attach these modifiers, due to a variety of reasons well analyzed on LW, so the pretense of a solved or solvable metaphilosophy is still needed for this part of the comment to make sense. I don't want to single out your comment too much though, since it's just the most convenient example, this applies to most LW comments. i.e. If everyone actually accepted the point, which I agree with, I dare say a huge chunk of LW comments are close to meaningless from a formal viewpoint, or at least very open to interpretation by anyone who isn't immersed in 21st century human culture.

Meta Questions about Metaphilosophy

Connor Leahy1yΩ215131

As someone that does think about a lot of the things you care about at least some of the time (and does care pretty deeply), I can speak for myself why I don't talk about these things too much:

Epistemic problems:

Mostly, the concept of "metaphilosophy" is so hopelessly broad that you kinda reach it by definition by thinking about any problem hard enough. This isn't a good thing, when you have a category so large it contains everything (not saying this applies to you, but it applies to many other people I have met who talked about metaphilosophy), it usually

... (read more)

4Noosphere8918d

In retrospect, this is the reason why debates over the simulation hypothesis/Mathematical Universe Hypothesis/computationalism go nowhere, because there is a motte and bailey in the people advocating for these hypotheses which can validly claim their category encompasses everything, but they don't realize this doesn't constrain their expectations at all, and thus can't be used in basically any debate, but the bailey is that this is something important where you should change something, but only a narrower version of the simulation hypothesis/Mathematical Universe Hypothesis/computationalism that doesn't encompass everything, and the people arguing against those hypotheses don't realize there's no way for their hypothesis to be falsified if made general enough.

2TAG1y

An example of a metaphilosophical question could be "Is the ungroundedness (etc) of philosophy inevitable or fixable". Well, if you could solve epistemology separately from.everything else, that would be great. But a lot of people have tried and failed. It's not like noone is looking for foundations because no one wants them.

1Thoth Hermes1y

We can always fall back to "well, we do seem to know what we and other people are talking about fairly often" whenever we encounter the problem of whether-or-not a "correct" this-or-that actually exists. Likewise, we can also reach a point where we seem to agree that "everyone seems to agree that our problems seem more-or-less solved" (or that they haven't been). I personally feel that there are strong reasons to believe that when those moments have been reached they are indeed rather correlated with reality itself, or at least correlated well-enough (even if there's always room to better correlate). Thus, for said reasons I probably feel more optimistically than you do about how difficult our philosophical problems are. My intuition about this is that the more it is true that "there is no problem to solve" then the less we would feel that there is a problem to solve.

Anthony DiGiovanni1y1512

It seems plausible that there is no such thing as "correct" metaphilosophy, and humans are just making up random stuff based on our priors and environment and that's it and there is no "right way" to do philosophy, similar to how there are no "right preferences"

If this is true, doesn't this give us more reason to think metaphilosophy work is counterfactually important, i.e., can't just be delegated to AIs? Maybe this isn't what Wei Dai is trying to do, but it seems like "figure out which approaches to things (other than preferences) that don't have 'right ... (read more)

Wei Dai1yΩ103120

I expect at this moment in time me building a company is going to help me deconfuse a lot of things about philosophy more than me thinking about it really hard in isolation would

Hard for me to make sense of this. What philosophical questions do you think you'll get clarity on by doing this? What are some examples of people successfully doing this in the past?

It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way”

Connor Leahy1y60

Yep, you see the problem! It's tempting to just think of an AI as "just the model", and study that in isolation, but that just won't be good enough longterm.

2mesaoptimizer1y

I see -- you are implying that an AI model will leverage external system parts to augment itself. For example, a neural network would use an external scratch-pad as a different form of memory for itself. Or instantiate a clone of itself to do a certain task for it. Or perhaps use some sort of scaffolding. I think these concerns probably don't matter for an AGI, because I expect that data transfer latency would be a non-trivial blocker for storing data outside the model itself, and it is more efficient to to self-modify and improve one's own intelligence than to use some form of 'factored cognition'. Perhaps these things are issues for an ostensibly boxed AGI, and if that is the case, then this makes a lot of sense.

2Gunnar_Zarncke1y

It would be nice if the AGI saw the humans running its compute resources as part of its body that it wants to protect. The problem is that we humans also tamper with our bodies... Humans are like hair on the body of the AGI and maybe it wants to shave and use a whig.

Daniel Kokotajlo's Shortform

Connor Leahy2y50

Looks good to me, thank you Loppukilpailija!

Navigating the Open-Source AI Landscape: Data, Funding, and Safety

Connor Leahy2y30

Thanks!

Navigating the Open-Source AI Landscape: Data, Funding, and Safety

Connor Leahy2y83

As I have said many, many times before, Conjecture is not a deep shift in my beliefs about open sourcing, as it is not, and has never been, the position of EleutherAI (at least while I was head) that everything should be released in all scenarios, but rather that some specific things (such as LLMs of the size and strength we release) should be released in some specific situations for some specific reasons. EleutherAI would not, and has not, released models or capabilities that would push the capabilities frontier (and while I am no longer in charge, I stro... (read more)

1André Ferretti2y

Hi Connor, Thank you for taking the time to share your insights. I've updated the post to incorporate your comment. I removed the phrase suggesting a change in beliefs between EleutherAI and Conjecture, and added a paragraph that clarifies EleutherAI's approach to open sourcing. I also made sure to clearly state that CarperAI is a spinoff of EleutherAI but operates independently. I appreciate your feedback, and I hope these changes better represent EleutherAI's position.

Questions about Conjecure's CoEm proposal

Connor Leahy2y132

Thanks for this! These are great questions! We have been collecting questions from the community and plan to write a follow up post addressing them in the next couple of weeks.

3Mikhail Samin2y

Are you still planning to address the questions?

Godzilla Strategies

Connor Leahy3yΩ102012

I initially liked this post a lot, then saw a lot of pushback in the comments, mostly of the (very valid!) form of "we actually build reliable things out of unreliable things, particularly with computers, all the time". I think this is a fair criticism of the post (and choice of examples/metaphors therein), but I think it may be missing (one of) the core message(s) trying to be delivered.

I wanna give an interpretation/steelman of what I think John is trying to convey here (which I don't know whether he would endorse or not):

"There are important... (read more)