All of Connor Leahy's Comments + Replies

Thanks for the comment! I agree that we live in a highly suboptimal world, and I do not think we are going to make it, but it's worth taking our best shot.

I don't think of the CoEm agenda as "doing AGI right." (for one, it is not even an agenda for building AGI/ASI, but of bounding ourselves below that) Doing AGI right would involve solving problems like P vs PSPACE, developing vastly more deep understanding of Algorithmic Information Theory and more advanced formal verification of programs. If I had infinite budget and 200 years, the plan would look very ... (read more)

Thanks for the comment!

Have I understood this correctly?

I am most confident in phases 1-3 of this agenda, and I think you have overall a pretty good rephrasing of 1-5, thanks! One note is that I don't think "LLM calls" as being fundamental, I think of LLMs as a stand-in for "banks of patterns" or "piles of shards of cognition." The exact shape of this can vary, LLMs are just our current most common shape of "cognition engines", but I can think of many other, potentially better, shapes this "neural primitive/co-processor" could take. 

I think there is s... (read more)

4Jeremy Gillen
Here's two ways that a high-level model can be wrong: * It isn't detailed enough, but once you learn the detail it adds up to basically the same picture. E.g. Newtonian physics, ideal gas laws. When you get a more detailed model, you learn more about which edge-cases will break it. But the model basically still works, and is valuable for working out the more detailed model. * It's built out of confused concepts. E.g. free will, consciousness (probably), many ways of thinking about personal identity, four humors model. We're basically better off without this kind of model and should start from scratch. It sounds like you're saying high-level agency-as-outcome-directed is wrong in the second way? If so, I disagree, it looks much more like the first way. I don't think I understand your beliefs well enough to argue about this, maybe there's something I should read? ---------------------------------------- I have a discomfort that I want to try to gesture at: Are you ultimately wanting to build a piece of software that solves a problem so difficult that it needs to modify itself? My impression from the post is that you are thinking about this level of capability in a distant way, and mostly focusing on much earlier and easier regimes. I think it's probably very easy to work on legible low-level capabilities without making any progress on the regime that matters. To me it looks important for researchers to have this ultimate goal constantly in their mind, because there are many pathways off-track. Does it look different to you? ---------------------------------------- I think this is a bad place to rely on governance, given the fuzziness of this boundary and the huge incentive toward capability over legibility. Am I right in thinking that you're making a large-ish gamble here on the way the tech tree shakes out (such that it's easy to see a legible-illegible boundary, and the legible approaches are competitive-ish) and also the way governance shakes out (such th

Hi habryka, I don't really know how best to respond to such a comment. First, I would like to say thank you for your well-wishes, assuming you did not mean them sarcastically. Maybe I have lost the plot, and if so, I do appreciate help in recovering it. Secondly, I feel confused as to why you would say such things in general.

Just last month, me and my coauthors released a 100+ page explanation/treatise on AI extinction risk that gives a detailed account of where AGI risk comes from and how it works, which was received warmly by LW and the general public al... (read more)

9plex
[epistemic status: way too ill to be posting important things] hi fellow people-who-i-think-have-much-of-the-plot you two seem, from my perspective as having read a fair amount of content from both, to have a bunch of similar models and goals, but quite different strategies. on top of both having a firm grip on the core x-risk arguments, you both call out similar dynamics in capabilities orgs capturing will to save the world and turning it into more capabilities progress[1], you both take issue with somewhat different but i think related parts of openphil's grantmaking process, you both have high p(doom) and not very comfortable timelines, etc. i suspect if connor explained why he was focusing on the things he is here, that would uncover the relevant difference. my current guess is connor is doing a kind of political alliancebuilding which is colliding with some of habryka's highly active integrity reflexes. maybe this doesn't change much, these strategies do seem at least somewhat collision-y as implemented so far, but i hope our kind can get along. 1. ^ e.g. "Turning care into acceleration" from https://www.thecompendium.ai/the-ai-race#these-ideologies-shape-the-playing-field e.g. https://www.lesswrong.com/posts/h4wXMXneTPDEjJ7nv/a-rocket-interpretability-analogy?commentId=md7QvniMyx3vYqeyD and lots of calling out Anthropic

Morality is multifaceted and multilevel. If you have a naive form of morality that is just "I do whatever I think is the right thing to do", you are not coordinating or being moral, you are just selfish.

Coordination is not inherently always good. You can coordinate with one group to more effectively do evil against another. But scalable Good is always built on coordination. If you want to live in a lawful, stable, scalable, just civilization, you will need to coordinate with your civilization and neighbors and make compromises.

As a citizen of a modern coun... (read more)

3Shankar Sivarajan
I do in fact believe morality to be entirely orthogonal to "consensus" or  what "many other people" want, and since you call this "selfishness," I shall return the favor and call your view, for all that you frame it as "coordination" or "scalable morality," abject bootlicking. A roaming bandit's "do what I tell you and you get to live" could be thought of a kind of contract, I suppose, but I wouldn't consider myself bound by it if I could get away with breaching it. I consider the stationary bandits' "social contracts" not to be meaningfully different. One clue to how they're similar is how the more powerful party can go, à la Vader, "Here is a New Deal. Pray I don't renew it any further." Unilaterally reneging on such a contract when you are the weaker party  would certainly be unwise, for the same reason trying to stand between a lynch mob and its intended victim would be—simple self-preservation—but I condemn the suggestion that it would be immoral. I see what you call "civilization," and I'm against it. I vaguely recall reading of a medieval Christian belief that if everyone stopped sinning for a day, Christ would return and restore the Kingdom of Heaven. This reminds me of that: would be nice, but it ain't gonna happen.

Hi, as I was tagged here, I will respond to a few points. There are a bunch of smaller points only hinted at that I won't address. In general, I strongly disagree with the overall conclusion of this post.

There are two main points I would like to address in particular:

1 More information is not more Gooder

There seems to be a deep underlying confusion here that in some sense more information is inherently more good, or inherently will result in good things winning out. This is very much the opposite of what I generally claim about memetics. Saying that all in... (read more)

Reply21111
2Cleo Scrolls
That last paragraph seems important. There’s a type of person that doesn’t have an opinion yet in AI discourse, which is new, and will bounce off the "side" that appears most hostile to them--which, if they have misguided ideas, might be the truth-seeking side that gently criticizes. (Not saying that's the case for the author of this post!)  It’s really hard to change the mind of someone who’s found their side in AI. But not to have them join one in the first place! 
2feugjavnpolj
I'm worried you're not seeing this at a long enough timescale. I'm claiming: 1. "information sharing is good" is an invariant as timeless as "people will sacrifice truth and empathy for power", you can't claim Moloch wins based on available evidence. 2. both of these are more powerful than short-effects which we can forecast On 1: Increased information sharing leads to faster iteration. Faster iteration of science and technology leads to increased power derived from technology. Faster iteration of social norms and technologies leads to increased power derived from better coordination. It is not a coincidence that the USA is simultaneously the most powerful and one of the most tolerant societies in human history. Suppose you were the inventor of the gutenberg press deciding whether to release your technology or not. Maybe you could have foreseen the witch burnings. Maybe you could've even foreseen something like the 95 theses. You couldn't have foreseen democracy in France, or that its success would inspire the US. (Which was again only possible because of sharing of information between Europe and US) You couldn't have foreseen that jew physicists leaving Europe for a more tolerant society would invent an atomic bomb that would ultimately bring peace to Europe. You couldn't have foreseen the peace among EU nations in 2024, not enforced just at threat of bomb but more strongly via intermixing of its peoples. If you decided not to release the gutenberg press because of forecasted witch burnings you might have made a collosal mistake. Information sharing is argued as good because it relies on principles of human behaviour that survive long after you die, long after any specific circumstances. Information survives the rise and fall of civilisations. As long as 1-of-n people preserve some information, it is preserved. A basic desire for truth and empathy is universal amongst human beings across space and time, as its encoded in genetics not culture. Yes, peopl
2Shankar Sivarajan
There is a certain type of person who would look at the mountains of skulls that Genghis Khan piled up and before judging it evil, ask whether it was a state acting or a group of individuals. Fuck that. States/governments, "democratic" or otherwise, have absolutely no privileged moral status, and to hell with any norm that suggests otherwise, and to hell with any "civilization" that promotes such a norm. What the state can do is wield violence far more effectively than you, so if you want to level a city, say, Beijing or Moscow, yeah, you should get the US military to do it instead of trying to do it yourself. And it can wield violence against you if you defy its will, so it's a bad idea to do so publicly, but for purely pragmatic reasons, not moral ones.
2AtillaYasar
TLDR: Here's all the ways in which you're right, and thanks for pointing these things out! At a meta-level, I'm *really* excited by just how much I didn't see your criticism coming. I thought I was thinking carefully, and that iterating on my post with Claude (though it didn't write a single word of it!) was taking out the obvious mistakes, but I missed so much. I have to rethink a lot about my process of writing this. I strongly agree that I need a *way* more detailed model of what "memetic evolution" looks like, when it's good vs bad, and why, whether there's a better way of phrasing and viewing it, dig into historical examples, etc. I'm curious if social media is actually bad beyond the surface -- but again I should've anticipated "social media kinda seems bad in a lot of ways" being such an obvious problem in my thinking, and attended to it. Reading it back, it totally reads as an argument for "more information more Gooder", which I didn't see at all. (generally viewing the post as "more X is always more good" is also cool as in, a categorization trick that brings clarity) I think a good way to summarize my mistake is that I didn't "go all the way" in my (pretty scattered) lines of thinking. Thanks :)  A big part of why I got into writing ideas explicitly and in big posts (vs off-hand Tweets/personal notes), is because you've talked about this being a coordination mechanism on Discord.

Nice set of concepts, I might use these in my thinking, thanks!

I don't understand what point you are trying to make, to be honest. There are certain problems that humans/I care about that we/I want NNs to solve, and some optimizers (e.g. Adam) solve those problems better or more tractably than others (e.g. SGD or second order methods). You can claim that the "set of problems humans care about" is "arbitrary", to which I would reply "sure?"

Similarly, I want "good" "philosophy" to be "better" at "solving" "problems I care about." If you want to use other words for this, my answer is again "sure?" I think this is a good use of the word "philosophy" that gets better at what people actually want out of it, but I'm not gonna die on this hill because of an abstract semantic disagreement.

1M. Y. Zuo
That's the thing, there is no definable "set of problems humans care about" without some kind of attached or presumed metaphilosophy, at least none that you, or anyone, could possibly figure out in the foreseeable future and prove to a reasonable degree of confidence to the LW readerbase. It's not even 'arbitrary',  that string of letters is indistinguishable from random noise. i.e. Right now your first paragraph is mostly meaningless if read completely literally and by someone who accepts the claim. Such a hypothetical person would think you've gone nuts because it would appear like you took a well written comment and inserted strings of random keyboard bashing in the middle. Of course it's unlikely that someone would be so literal minded, and so insistent on logical correctness, that they would completely equate it with random bashing of a keyboard. But it's possible some portion of readers lean towards that.

"good" always refers to idiosyncratic opinions, I don't really take moral realism particularly seriously. I think there is "good" philosophy in the same way there are "good" optimization algorithms for neural networks, while also I assume there is no one optimizer that "solves" all neural network problems.

2TAG
That is not a fact.
3M. Y. Zuo
'"good" optimization algorithms for neural networks' also has no difference in meaning from '"glorxnag" optimization  algorithms for neural networks', or any random permutation, if your prior point holds.

I strongly disagree and do not think that will be how AGI will look, AGI isn't magic. But this is a crux and I might be wrong of course.

I can't rehash my entire views on coordination and policy here I'm afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn't model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline. 

I'm not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.

Double exponentials can be hard to visualize. I'm no artist, but I created this visual to help us better appreciate what is about to happen. =-)

3Spiritus Dei
That sounds like a good plan, but I think a lot of the horses have already left the barn. For example, Coreweave is investing $1.6 billion dollars to create an AI datacenter in Plano, TX that is purported to to be 10 exaflops and that system goes live in 3 months. Google is spending a similar amount in Columbus, Ohio. Amazon, Facebook, and other tech companies are also pouring billions upon billions into purpose-built AI datacenters.  NVIDIA projects $1 trillion will be spent over the next 4 years on AI datacenter build out. That would be an unprecedented number not seen since the advent of the internet.  All of these companies have lobbyists that will make a short-term legislative fix difficult. And for this reason I think we should be considering a Plan B since there is a very good chance that we won't have enough time for a quick legislative fix or the time needed to unravel alignment if we're on a double exponential curve. Again, if it's a single exponential then there is plenty of time to chat with legislators and research alignment.  In light of this I think we need to have a comprehensive "shutdown plan" for these mammoth AI datacenters. The leaders of Inflection, Open-AI, and other tech companies all agree there is a risk and I think it would be wise to coordinate with them on a plan to turn everything off manually in the event of an emergency.  Source: $1.6 Billion Data Center Planned For Plano, Texas (localprofile.com) Source: Nvidia Shocker: $1 Trillion to Be Spent on AI Data Centers in 4 Years (businessinsider.com) Source: Google to invest another $1.7 billion into Ohio data centers (wlwt.com) Source: Amazon Web Services to invest $7.8 billion in new Central Ohio data centers - Axios Columbus

Sure, it's not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.

A lot of the debate surrounding existential risks of AI is bounded by time. For example, if someone said a meteor is about to hit the Earth that would be alarming, but the next question should be, "How much time before impact?" The answer to that question effects everything else.

If they say, "30 seconds". Well, there is no need to go online and debate ways to save ourselves. We can give everyone around us a hug and prepare for the hereafter. However, if the answer is "30 days" or "3 years" then those answers will generate very different responses.

The AI al... (read more)

I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, "CERN for AI" and similar. Generally, I endorse the proposals here

None of these are easy, of course, there is a reason my p(doom) is high.

But what hap

... (read more)
4Wei Dai
I prefer to frame it as human-AI safety problems instead of "misuse risk", but the point is that if we're trying to buy time in part to have more time to solve misuse/human-safety (e.g. by improving coordination/epistemology or solving metaphilosophy), but the strategy for buying time only achieves a pause until alignment is solved, then the earlier alignment is solved, the less time we have to work on misuse/human-safety.

I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.

One of the ways I can see a "slow takeoff/alignment by default" world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can't (and wouldn't want to) "undo" that.

4Wei Dai
What are you proposing or planning to do to achieve this? I observe that most current attempts to "buy time" seem organized around convincing people that AI deception/takeover is a big risk and that we should pause or slow down AI development or deployment until that problem is solved, for example via intent alignment. But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...

Hard for me to make sense of this. What philosophical questions do you think you'll get clarity on by doing this? What are some examples of people successfully doing this in the past?

The fact you ask this question is interesting to me, because in my view the opposite question is the more natural one to ask:  What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is ... (read more)

1M. Y. Zuo
You raised a very interesting point in the last comment, that metaphilosophy already encompasses everything, that we could conceive of at least. So a 'solution' is not tractable due to various well known issues such as the halting problem and so on. (Though perhaps in the very distant future this could be different.) However this leads to a problem, as exemplified by your phrasing here: 'good philosophy' is not a sensible category since you already know you have not, and cannot, 'solve' metaphilosophy. Nor can any other LW reader do so. 'good' or 'bad' in real practice are, at best, whatever the popular consensus is in the present reality, at worst, just someone's idiosyncratic opinions.  Very few concepts are entirely independent from any philosophical or metaphilosophical implications whatsoever, and 'good philosophy' is not one of them.  But you still felt a need to attach these modifiers, due to a variety of reasons well analyzed on LW, so the pretense of a solved or solvable metaphilosophy is still needed for this part of the comment to make sense.  I don't want to single out your comment too much though, since it's just the most convenient example, this applies to most LW comments. i.e. If everyone actually accepted the point, which I agree with, I dare say a huge chunk of LW comments are close to meaningless from a formal viewpoint, or at least very open to interpretation by anyone who isn't immersed in 21st century human culture. 

As someone that does think about a lot of the things you care about at least some of the time (and does care pretty deeply), I can speak for myself why I don't talk about these things too much:

Epistemic problems:

  • Mostly, the concept of "metaphilosophy" is so hopelessly broad that you kinda reach it by definition by thinking about any problem hard enough. This isn't a good thing, when you have a category so large it contains everything (not saying this applies to you, but it applies to many other people I have met who talked about metaphilosophy), it usually
... (read more)
4Noosphere89
In retrospect, this is the reason why debates over the simulation hypothesis/Mathematical Universe Hypothesis/computationalism go nowhere, because there is a motte and bailey in the people advocating for these hypotheses which can validly claim their category encompasses everything, but they don't realize this doesn't constrain their expectations at all, and thus can't be used in basically any debate, but the bailey is that this is something important where you should change something, but only a narrower version of the simulation hypothesis/Mathematical Universe Hypothesis/computationalism that doesn't encompass everything, and the people arguing against those hypotheses don't realize there's no way for their hypothesis to be falsified if made general enough.
2TAG
An example of a metaphilosophical question could be "Is the ungroundedness (etc) of philosophy inevitable or fixable". Well, if you could solve epistemology separately from.everything else, that would be great. But a lot of people have tried and failed. It's not like noone is looking for foundations because no one wants them.
1Thoth Hermes
We can always fall back to "well, we do seem to know what we and other people are talking about fairly often" whenever we encounter the problem of whether-or-not a "correct" this-or-that actually exists. Likewise, we can also reach a point where we seem to agree that "everyone seems to agree that our problems seem more-or-less solved" (or that they haven't been).  I personally feel that there are strong reasons to believe that when those moments have been reached they are indeed rather correlated with reality itself, or at least correlated well-enough (even if there's always room to better correlate).  Thus, for said reasons I probably feel more optimistically than you do about how difficult our philosophical problems are. My intuition about this is that the more it is true that "there is no problem to solve" then the less we would feel that there is a problem to solve.  

It seems plausible that there is no such thing as "correct" metaphilosophy, and humans are just making up random stuff based on our priors and environment and that's it and there is no "right way" to do philosophy, similar to how there are no "right preferences"

If this is true, doesn't this give us more reason to think metaphilosophy work is counterfactually important, i.e., can't just be delegated to AIs? Maybe this isn't what Wei Dai is trying to do, but it seems like "figure out which approaches to things (other than preferences) that don't have 'right ... (read more)

Wei DaiΩ103120

I expect at this moment in time me building a company is going to help me deconfuse a lot of things about philosophy more than me thinking about it really hard in isolation would

Hard for me to make sense of this. What philosophical questions do you think you'll get clarity on by doing this? What are some examples of people successfully doing this in the past?

It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way”

... (read more)

Yep, you see the problem! It's tempting to just think of an AI as "just the model", and study that in isolation, but that just won't be good enough longterm.

2mesaoptimizer
I see -- you are implying that an AI model will leverage external system parts to augment itself. For example, a neural network would use an external scratch-pad as a different form of memory for itself. Or instantiate a clone of itself to do a certain task for it. Or perhaps use some sort of scaffolding. I think these concerns probably don't matter for an AGI, because I expect that data transfer latency would be a non-trivial blocker for storing data outside the model itself, and it is more efficient to to self-modify and improve one's own intelligence than to use some form of 'factored cognition'. Perhaps these things are issues for an ostensibly boxed AGI, and if that is the case, then this makes a lot of sense.
2Gunnar_Zarncke
It would be nice if the AGI saw the humans running its compute resources as part of its body that it wants to protect. The problem is that we humans also tamper with our bodies... Humans are like hair on the body of the AGI and maybe it wants to shave and use a whig.

Looks good to me, thank you Loppukilpailija!

As I have said many, many times before, Conjecture is not a deep shift in my beliefs about open sourcing, as it is not, and has never been, the position of EleutherAI (at least while I was head) that everything should be released in all scenarios, but rather that some specific things (such as LLMs of the size and strength we release) should be released in some specific situations for some specific reasons. EleutherAI would not, and has not, released models or capabilities that would push the capabilities frontier (and while I am no longer in charge, I stro... (read more)

1André Ferretti
Hi Connor, Thank you for taking the time to share your insights. I've updated the post to incorporate your comment. I removed the phrase suggesting a change in beliefs between EleutherAI and Conjecture, and added a paragraph that clarifies EleutherAI's approach to open sourcing. I also made sure to clearly state that CarperAI is a spinoff of EleutherAI but operates independently.  I appreciate your feedback, and I hope these changes better represent EleutherAI's position.

Thanks for this! These are great questions! We have been collecting questions from the community and plan to write a follow up post addressing them in the next couple of weeks.

3Mikhail Samin
Are you still planning to address the questions?

I initially liked this post a lot, then saw a lot of pushback in the comments, mostly of the (very valid!) form of "we actually build reliable things out of unreliable things, particularly with computers, all the time". I think this is a fair criticism of the post (and choice of examples/metaphors therein), but I think it may be missing (one of) the core message(s) trying to be delivered. 

I wanna give an interpretation/steelman of what I think John is trying to convey here (which I don't know whether he would endorse or not): 

"There are important... (read more)

I think this is something better discussed in private. Could you DM me? Thanks!

This is a genuinely difficult and interesting question that I want to provide a good answer for, but that might take me some time to write up, I'll get back to you at a later date.

3JenniferRM
I like that you didn't say something glib :-) I worked as an algorithmic ethicist for a blockchain project for several years, and this was (arguably?) my central professional bedevilment. It doesn't really surprise me that you have a hard time with it... I asked it because it is The Tough One, and if you had an actually good answer then such an answer would (probably) count as "non-trivial research progress".

Yes, we do expect this to be the case. Unfortunately, I think explaining in detail why we think this may be infohazardous. Or at least, I am sufficiently unsure about how infohazardous it is that I would first like to think about it for longer and run it through our internal infohazard review before sharing more. Sorry!

3riceissa
Did you end up running it through your internal infohazard review and if so what was the result?

Redwood is doing great research, and we are fairly aligned with their approach. In particular, we agree that hands-on experience building alignment approaches could have high impact, even if AGI ends up having an architecture unlike modern neural networks (which we don’t believe will be the case). While Conjecture and Redwood both have a strong focus on prosaic alignment with modern ML models, our research agenda has higher variance, in that we additionally focus on conceptual and meta-level research. We’re also training our own (large) models, but (we bel... (read more)

For the record, having any person or organization in this position would be a tremendous win. Interpretable aligned AGI?! We are talking about a top .1% scenario here! Like, the difference between egoistical Connor vs altruistic Connor with an aligned AGI in his hands is much much smaller than Connor with an aligned AGI and anyone, any organization or any scenario, with a misaligned AGI.

But let’s assume this.

Unfortunately, there is no actual functioning reliable mechanism by which humans can guarantee their alignment to each other. If there was s... (read more)

1Ofer
Sorry, do you mean that you are actually pledging to "remain in control of Conjecture"? Can some other founder(s) make that pledge too if it's necessary for maintaining >50% voting power? Will you have the ability to transfer full control over the company to another individual of your choice in case it's necessary? (Larry Page and Sergey Brin, for example, are seemingly limited in their ability to transfer their 10x-voting-power Alphabet shares to others).
4Chinese Room
Thank you for your answer. I have very high confidence that the *current* Connor Leahy will act towards the best interests of humanity, however, given the extraordinary amount of power an AGI can provide, confidence in this behavior staying the same for decades or centuries (directing some of the AGIs resources towards radical human life extension seems logical) to come is much less. Another question in case you have time - considering the same hypothetical situation of Conjecture being first to develop an aligned AGI, do you think that immediately applying its powers to ensure no other AGIs can be constructed is the correct behavior to maximize humanity's chances of survival?

Probably. It is likely that we will publish a lot of our interpretability work and tools, but we can’t commit to that because, unlike some others, we think it’s almost guaranteed that some interpretability work will lead to very infohazardous outcomes. For example, obvious ways in which architectures could be trained more efficiently, and as such we need to consider each result on a case by case basis. However, if we deem them safe, we would definitely like to share as many of our tools and insights as possible.

We would love to collaborate with anyone (from academia or elsewhere) wherever it makes sense to do so, but we honestly just do not care very much about formal academic publication or citation metrics or whatever. If we see opportunities to collaborate with academia that we think will lead to interesting alignment work getting done, excellent!

Our current plan is to work on foundational infrastructure and models for Conjecture’s first few months, after which we will spin up prototypes of various products that can work with a SaaS model. After this, we plan to try them out and productify the most popular/useful ones.

More than profitability, our investors are looking for progress. Because of the current pace of progress, it would not be smart from their point of view to settle on a main product right now. That’s why we are mostly interested in creating a pipeline that lets us build and test out products flexibly.

Ideally, we would like Conjecture to scale quickly. Alignment wise, in 5 years time, we want to have the ability to take a billion dollars and turn it into many efficient, capable, aligned teams of 3-10 people working on parallel alignment research bets, and be able to do this reliably and repeatedly. We expect to be far more constrained by talent than anything else on that front, and are working hard on developing and scaling pipelines to hopefully alleviate such bottlenecks.

For the second question, we don't expect it to be a competing force (as in, we ha... (read more)

To point 1: While we greatly appreciate what OpenPhil, LTFF and others do (and hope to work with them in the future!), we found that the hurdles required and strings attached were far greater than the laissez-faire silicon valley VC we encountered, and seemed less scalable in the long run. Also, FTX FF did not exist back when we were starting out.

While EA funds as they currently exist are great at handing out small to medium sized grants, the ~8 digit investment we were looking for to get started asap was not something that these kinds of orgs were general... (read more)

The founders have a supermajority of voting shares and full board control and intend to hold on to both for as long as possible (preferably indefinitely). We have been very upfront with our investors that we do not want to ever give up control of the company (even if it were hypothetically to go public, which is not something we are currently planning to do), and will act accordingly.

For the second part, see the answer here.

To address the opening quote - the copy on our website is overzealous, and we will be changing it shortly. We are an AGI company in the sense that we take AGI seriously, but it is not our goal to accelerate progress towards it. Thanks for highlighting that.

We don’t have a concrete proposal for how to reliably signal that we’re committed to avoiding AGI race dynamics beyond the obvious right now. There is unfortunately no obvious or easy mechanism that we are aware of to accomplish this, but we are certainly open to discussion with any interested parties ab... (read more)

We (the founders) have a distinct enough research agenda to most existing groups such that simply joining them would mean incurring some compromises on that front. Also, joining existing research orgs is tough! Especially if we want to continue along our own lines of research, and have significant influence on their direction. We can’t just walk in and say “here are our new frames for GPT, can we have a team to work on this asap?”.

You’re right that SOTA models are hard to develop, but that being said, developing our own models is independently useful in ma... (read more)

See the reply to Michaël for answers as to what kind of products we will develop (TLDR we don’t know yet).

As for the conceptual research side, we do not do conceptual research with product in mind, but we expect useful corollaries to fall out by themselves for sufficiently good research. We think the best way of doing fundamental research like this is to just follow the most interesting, useful looking directions guided by the “research taste” of good researchers (with regular feedback from the rest of the team, of course). I for one at least genuinely exp... (read more)

We currently have a (temporary) office in the Southwark area, and are open to visitors. We’ll be moving to a larger office soon, and we hope to become a hub for AGI Safety in Europe.

And yes! Most of our staff will be attending EAG London. See you there? 

1Antoine de Scorraille
Ya I'll be there so I'd be glad to see you, especially Adam!

See a longer answer here.

TL;DR: For the record, EleutherAI never actually had a policy of always releasing everything to begin with and has always tried to consider each publication’s pros vs cons. But this is still a bit of change from EleutherAI, mostly because we think it’s good to be more intentional about what should or should not be published, even if one does end up publishing many things. EleutherAI is unaffected and will continue working open source. Conjecture will not be publishing ML models by default, but may do so on a case by case ... (read more)

EAI has always been a community-driven organization that people tend to contribute to in their spare time, around their jobs. I for example have had a dayjob of one sort or another for most of EAI’s existence. So from this angle, nothing has changed aside from the fact my job is more demanding now.

Sid and I still contribute to EAI on the meta level (moderation, organization, deciding on projects to pursue), but do admittedly have less time to dedicate to it these days. Thankfully, Eleuther is not just us - we have a bunch of projects going on at any one ti... (read more)

We strongly encourage in person work - we find it beneficial to be able to talk over or debate research proposals in person at any time, it’s great for the technical team to be able to pair program or rubber duck if they’re hitting a wall, and all being located in the same city has a big impact on team building.

That being said, we don’t mandate it. Some current staff want to spend a few months a year with their families abroad, and others aren’t able to move to London at all. While we preferentially accept applicants who can work in person, we’re flexible, and if you’re interested but can’t make it to London, it’s definitely still worth reaching out.

Currently, there is only one board position, which I hold. I also have triple vote as insurance if we decide to expand the board. We don’t plan to give up board control.

Thanks - we plan to visit the Bay soon with the team, we’ll send you a message! 

2Ben Pace
I look forward to it.

We aren’t committed to any specific product or direction just yet (we think there are many low hanging fruit that we could decide to pursue). Luckily we have the  independence to be able to initially spend a significant amount of time focusing on foundational infrastructure and research. Our product(s) could end up as some kind of API with useful models, interpretability tools or services, some kind of end-to-end SaaS product or something else entirely. We don’t intend to push the capabilities frontier, and don’t think this would be necessary to be profitable.

TL;DR: For the record, EleutherAI never actually had a policy of always releasing everything to begin with and has always tried to consider each publication’s pros vs cons. But this is still a bit of change from EleutherAI, mostly because we think it’s good to be more intentional about what should or should not be published, even if one does end up publishing many things. EleutherAI is unaffected and will continue working open source. Conjecture will not be publishing ML models by default, but may do so on a case by case basis.

Longer version:

Firs... (read more)

1lennart
Thanks for the thoughtful response, Connor. I'm glad to hear that you will develop a policy and won't be publishing models by default.

I really liked this post, though I somewhat disagree with some of the conclusions. I think that in fact aligning an artificial digital intelligence will be much, much easier than working on aligning humans. To point towards why I believe this, think about how many "tech" companies (Uber, crypto, etc) derive their value, primarily, from circumventing regulation (read: unfriendly egregore rent seeking). By "wiping the slate clean" you can suddenly accomplish much more than working in a field where the enemy already controls the terrain. 

If you try to ta... (read more)

2Valentine
That's a good point. I hope you're right.

This was an excellent post, thanks for writing it!

But, I think you unfairly dismiss the obvious solution to this madness, and I completely understand why, because it's not at all intuitive where the problem in the setup of infinite ethics is. It's in your choice of proof system and interpretation of mathematics! (Don't use non-constructive proof systems!) 

This is a bit of an esoteric point and I've been planning to write a post or even sequence about this for a while, so I won't be able to lay out the full arguments in one comment, but let me try to c... (read more)

2Daphne_W
  Sorry, I previously assigned hypercomputers a non-zero credence, and you're asking me to assign it zero credence. This requires an infinite amount of bits to update, which is impossible to collect in my computationally bounded state. Your case sounds sensible, but I literally can't receive enough evidence over the course of a lifetime to be convinced by it. Like, intuitively, it doesn't feel literally impossible that humanity discovers a computationally unbounded process in our universe. If a convincing story is fed into my brain, with scientific consensus, personally verifying the math proof, concrete experiments indicating positive results, etc., I expect I would believe it. In my state of ignorance, I would not be surprised to find out there's a calculation which requires a computationally unbounded process to calculate but a bounded process to verify. To actually intuitively give something 0 (or 1) credence, though, to be so confident in a thesis that you literally can't change your mind, that at the very least seems very weird. Self-referentially, I won't actually assign that situation 0 credence, but even if I'm very confident that 0 credence is correct, my actual credence will be bounded by my uncertainty in my method of calculating credence.
5interstice
This seems dubious. Compare: "the actual credence that the universe contains more computing power than my brain is zero, because an observer with the computing power of my brain can never observe such an object in such a way that differentiates it from a brain-sized approximation". It's true that a bounded approximation to Solomonoff induction would think this way, but that seems like a problem with Solomonoff induction, not a guide for the way we should reason ourselves. See also the discussion here on forms of hypercomputation that could be falsified in principle.
7Slider
The case for observing a hypercomputer might rather be that a claim that has infinidesimal credence requires infinite amounts of proof to get to a finite credence level. So a being that can only entertain finite evidence would treat that credence as effectively zero but it might technically be separate from zero. I could imagine programming a hypertask into an object and finding some exotic trajectory with proper time more than a finite amount and receive the object from such trajectory having completed the task. The hypothesis that it was actually a very potent classical computer is ruled out by the structure of the task. I am not convinced that the main or only method of checking for nature of computation is to check output bit by bit.

I haven't read Critch in-depth, so I can't guarantee I'm pointing towards the same concept he is. Consider this a bit of an impromptu intuition dump, this might be trivial. No claims on originality of any of these thoughts and epistemic status "¯\_(ツ)_/¯"

The way I currently think about it is that multi-multi is the "full hard problem", and single-single is a particularly "easy" (still not easy) special case. 

In a way we're making some simplifying assumptions in the single-single case. That we have one (pseudo-cartesian) "agent" that has some kind of d... (read more)

2Quinn
I wrote out the 2x2 grid you suggested in MS paint I'm not sure I'm catching how multi-inner is game theory. Except that I think "GT is the mesa- of SCT" is an interesting, reasonable (to me) claim that is sort of blowing my mind as I contemplate it, so far.

I am so excited about this research, good luck! I think it's almost impossible this won't turn up at least some interesting partial results, even if the strong versions of the hypothesis don't work out (my guess would be you run into some kind of incomputability or incoherence results in finding an algorithm that works for every environment).

This is one of the research directions that make me the most optimistic that alignment might really be tractable!

Load More