All of Roman Leventov's Comments + Replies

Undermind.ai I think is much more useful for searching concepts and ideas in papers rather than extracting tabular info a la Elicit. Nominally Elicit can do the former, too, but is quite bad in my experience.

https://openmined.org/ develops Syft, a framework for "private computation" in secure enclaves. It potentially reduces the barriers for data integration both within particularly bureaucratic orgs and across orgs.

Thanks for the post, I agree with it!

I just wrote a post with differential knowledge interconnection thesis, where I argue that it is on net beneficial to develop AI capabilities such as

  • Federated learning, privacy-preserving multi-party computation, and privacy-preserving machine learning.
  • Federated inference and belief sharing.
  • Protocols and file formats for data, belief, or claim exchange and validation.
  • Semantic knowledge mining and hybrid reasoning on (federated) knowledge graphs and multimodal data.
  • Structured or semantic search.
  • Datastore federation for r
... (read more)

I think the model of commercial R&D lab would often suit alignment work better than a "classical" startup company. Conjecture and AE Studio come to mind. Answer.AI, founded by Jeremy Howard (of Fast.ai and Kaggle) and Eric Ries (Lean Startup) elaborates on this business and organisational model here: https://www.answer.ai/posts/2023-12-12-launch.html.

2Judd Rosenblatt
Yes, excellent point, and thanks for the callout.  Note though that a fundamental part of this is that we at AE Studio do intend eventually to incubate as part of our skunkworks program alignment-driven startups.  We've seen that we can take excellent people, have them grow on client projects for some amount of time, get better at stuff they don't even realize they need to get better at in a very high-accountability way, and then be well positioned to found startups we incubate internally.  We've not turned attention to internally-incubated startups for alignment specifically yet but hope to by later this year or early next. Meanwhile, there are not many orgs like us, and for various reasons it's easier to start a startup than to start something doing what we do.  If you think you can start something like what we do, I'd generally recommend it. You're probably more likely to succeed doing something more focused though to start.  Also, to start, we flailed a bit till we figured out we should get good at one thing at a time before doing more and more.

But I should add, I agree that 1-3 poses challenging political and coordination problems. Nobody assumes it will be easy, including Acemoglu. It's just another one in the row of hard political challenges posed by AI, along with the questions of "aligned with whom?", considering/accounting for people's voice past dysfunctional governments and political elites in general, etc.

Separately, I at least spontaneously wonder: How would one even want to go about differentiating what is the 'bad automation' to be discouraged, from legit automation without which no modern economy could competitively run anyway? For a random example, say if Excel wouldn't yet exist (or, for its next update..), we'd have to say: Sorry, cannot do such software, as any given spreadsheet has the risk of removing thousands of hours of work...?! Or at least: Please, Excel, ask the human to manually confirm each cell's calculation...?? So I don't know how we'd

... (read more)

If you'd really be able to coordinate globally to enable 1. or 2. globally - extremely unlikely in the current environment and given the huge incentives for individual countries to remain weak in enforcement - then it seems you might as well try to impose directly the economic first best solution w.r.t. robots vs. labor: high global tax rates and redistribution.

If anything, this problem seems more pernicious wrt. climate change mitigation and environmental damage: it's much more distributed, not only in US and China, but Russia and India are also big emitt... (read more)

2Roman Leventov
But I should add, I agree that 1-3 poses challenging political and coordination problems. Nobody assumes it will be easy, including Acemoglu. It's just another one in the row of hard political challenges posed by AI, along with the questions of "aligned with whom?", considering/accounting for people's voice past dysfunctional governments and political elites in general, etc.

What levels of automation does the AI provide and at what rate is what he suggests to influence directly (specifically, slow down), through economic and political measures. So it's not fair to list that as an assumption.

It would depend on exact details, but if a machine can do something as well or better than a human, then the machine should do it.

It's a question of how to design work. Machine can cultivate better than a human a monoculture mega-farm, but not a small permaculture garden (at least, yet). Is a monoculture mega-farm more "effective"? Maybe, if we take the pre-AI opportunity cost of human labour, but also maybe not with the post-AI opportunity cost of human labour. And this is before factoring in the "economic value" of better psychological and physical healt... (read more)

2Viliam
Thank you for the answers, they are generally nice but this one part rubbed me the wrong way: If I live to see a post-scarcity society, I sincerely hope that I will be allowed to organize my remaining free time as I want to, instead of being sent to work on a small farm for psychological and physical health benefits. I would rather get the same benefits from taking a walk with my friends, or something like that. I do not want to dismiss the health concerns, but again these are two different problems -- how to solve technological unemployment, and how to take care of one's health in the modern era -- which can be solved separately.

John Vervaeke calls attunement "relevance realization".

Cf. DeepMind's "Levels of AGI" paper (https://arxiv.org/abs/2311.02462), calling modern transformers "emerging AGI" there, but also defining "expert", "virtuoso", and "superhuman" AGI.

1Mateusz Bagiński
For people who (like me immediately after reading this reply) are still confused about the meaning of "humane/acc", the header photo of Critch's X profile is reasonably informative  

Well, yes, it also includes learning weak agent's models more generally, not just the "values". But I think the point stands. It's elaborated better in the linked post. As AIs will receive most of the same information that humans receive through always-on wearable sensors, there won't be much to learn for AIs from humans. Rather, it's humans that will need to do their homework, to increase the quality of their value judgements.

I agree with the core problem statement and most assumptions of the Pursuit of Happiness/Conventions Approach, but suggest a different solution: https://www.lesswrong.com/posts/rZWNxrzuHyKK2pE65/ai-alignment-as-a-translation-problem

I agree with OpenAI folks that generalisation is the key concept for understanding alignment process. But I think that with their weak-to-strong generalisation agenda, they (as well as almost everyone else) apply it I'm the reverse direction: learning values of weak agents (humans) doesn't make sense. Rather, weak agents should ... (read more)

1Joel_Saarinen
Could you clarify who you're referring to by "strong" agents? You refer to humans as weak agents at the start, yet claim later that AIs should have human inductive biases from the beginning, which makes me think humans are the strong ones.  
2ryan_greenblatt
I don't think "weak-to-strong generalization" is well described as "trying to learn the values of weak agents".

If I understand correctly, by "discreteness" you mean that it simply says that one agent can know neither the meaning of symbols used by another agent nor the "degree" of grokking the meaning. Just cannot say anything.

This is correct, but the underlying reason why this is correct is the same as why solipsism or the simulation hypothesis cannot be disproven (or proven!).

So yeah, I think there is no tangible relationship to the alignment problem, except that it corroborates that we couldn't have 100% (literally, probability=1) certainty of alignment or safety of whatever we create, but it was obvious even without this philosophical argument.

So, I removed that paragraph about Quine's argument from the post.

That also was, naturally, the model in the Soviet Union, with orgs called "scientific research institutes". https://www.jstor.org/stable/284836

Collusion detection and prevention and trust modelling don't trivially follow from the basic architecture of the system described on the level of this article. Some specific mechanisms should be implemented in the Protocol to have collusion detection and trust modelling. And we don't have these mechanisms actually developed yet, but we think that they should be doable (though this is still a research bet, not a 100% certainty) because the Gaia Network directly embodies (or is amenable to) all six general principles for anti-collusion mechanism design ... (read more)

Apart from the view on philosophy as "cohesive stories that bind together and infuse meaning into scientific models", which I discussed with you earlier and you was not very satisfied with, another interpretation of philosophy (natural phil, phil of science, phil of mathematics, and metaphil, at least) is "apex generalisation/abstraction". Think Bengio's "AI scientist", but the GM should be even deeper to first sample a plausible "philosophy of science" given all the observations about the world up to the moment, then sample plausible scientific theory giv... (read more)

Extrapolated volition is a non-sensical concept altogether, as demonstrated in the OP. There is no extrapolated volition outside of it unfolding in real life in a specific context, which affects the trajectory of values/volition in a specific way. And which will be this context is unknown and unknowable (maybe aliens will visit Earth tomorrow, maybe not).

Related, consciousness frame: where is the boundary of it? Is our brain conscious, or the whole nervous system, or the whole human, or the whole human + the entire microbiome populating them, or human + robotic prosthetic limbs, or human + web search + chat AI + personal note taking app, or the whole human group (collective consciousness), etc.

Some computational theories of consciousness attempt to give a specific, mathematically formalised answer to this question.

1Chipmonk
Hmm I'm confused about why people ask "where is the boundary?" in this situation.  I visualize this situation as more of a topological map, kinda like where the horizontal axes are kinda like physical space (more precisely: such that pieces (e.g.: your arm, your brain, web search, personal note taking app) thar are more closely connected are closer horizontally) and the vertical axis is like "communication bandwidth". For example, the communication bandwidth (ability to sense and control) between your brain/nervous system and your arm is pretty darn high, probably higher than almost anything else in the universe. Yeah, you could try to cut out a particular hill in that topology and say "that hill is Bob" but that's obviously cutting a lot of detail.

Psychology may not be "technical enough" because an adequate mathematical science or process theory is not developed for it, yet, but it's ultimately very important, perhaps critically important: see the last paragraph of https://www.lesswrong.com/posts/AKBkDNeFLZxaMqjQG/gaia-network-a-practical-incremental-pathway-to-open-agency. Davidad apparently thinks that it can be captured with an Infra-Bayesian model of a person/human.

Also on psychology: what is the boundary of personality, where just a "role" (spouse, worker, etc) turns into multiple-personality disorder?

1Chipmonk
oh good, my other project is psychology, so I really hope it circles back. I originally got into the AI safety stuff because I was trying to understand psychological boundaries properly and realized that no one did. Then I realized that many people were maybe doing the same thing when thinking about interactions between AIs and humans.

In the most recent episode of his podcast show, Jim Rutt (former president of SFI) and his guest talk about membranes a lot, the word appears 30 times on a transcript page: https://www.jimruttshow.com/cody-moser/

1Chipmonk
thanks just listened to it. This reminds me a lot of what Scott Garrabrant has been thinking about. Perhaps intentionally setting up membranes within a society so that failure/infection/etc. in one region doesn't infect every other region. They talk about the same thing but for insight in solving problems

Related, quantum information theory:

1Chipmonk
Oh, yes! I forgot about your original comment about this

I think this metastrategy classification is overly simplified to the degree that I'm not sure it is net helpful. I don't see how Hendrycks' "Leviathan safety", Drexler's Open Agency Model, Davidad's OAA, Bengio's "AI pure scientist" and governance proposals (see https://slideslive.com/39014230/towards-quantitative-safety-guarantees-and-alignment), Kaufmann and Leventov's Gaia Network, AI Objectives Institute's agenda (and related Collective Intelligence Project's), Conjecture's CoEms, OpenAI's "AI alignment scientist" agenda, and Critch's h/acc (and relate... (read more)

1Mateusz Bagiński
Can you link to what "h/acc" is about/stands for?

Announcement

I think SociaLLM has a good chance of getting OpenAI’s “Research into Agentic AI Systems” grant because it addresses both the challenges of the legibility of AI agent's behaviour by making the agent’s behaviour more “human-like” thanks to weight sharing and regularisation techniques/inductive biases described the post, as well as automatic monitoring: detection of duplicity or deception in AI agent's behaviour by comparing agent’s ToMs “in the eyes” of different other interlocutors, building on the work “Collective Intelligence in Human-AI Team... (read more)

A lot of the examples of the concepts that you list already belong to established scientific fields: math, logic, probability, causal inference, ontology, semantics, physics, information theory, computer science, learning theory, and so on. These concepts don't need philosophical re-definition. Respecting the field boundaries, and the ways that fields are connected to each other via other fields (e.g., math and ontology to information theory/CS/learning theory via semantics) is also I think on net a good practice: it's better to focus attention on the fiel... (read more)

I agree with everything you said. Seems that we should distinguish between a sort of "cooperative" and "adversarial" safety approaches (cf. the comment above). I wrote the entire post as an extended reply to Marc Carauleanu upon his mixed feedback to my idea of adding "selective SSM blocks for theory of mind" to increase the Self-Other Overlap in AI architecture as a pathway to improve safety. Under the view that both Transformer and Selective SSM blocks will survive up until the AGI (if it is going to be created at all, of course), and even with the addit... (read more)

1mishka
Yes, I strongly suspect that "adversarial" safety approaches are quite doomed. The more one thinks about those, the worse they look. We need to figure out how to make "cooperative" approaches to work reliably. In this sense, I have a feeling that, in particular, the approach being developed by OpenAI has been gradually shifting in that direction (judging, for example, by this interview with Ilya I transcribed: Ilya Sutskever's thoughts on AI safety (July 2023): a transcript with my comments).

I agree that training data governance is not robust to non-cooperative actors. But I think there is a much better chance to achieve a very broad industrial, academic, international, and legal consensus about it being a good way to jigsaw capabilities without sacrificing the raw reasoning ability, which the opponents of compute governance hold as purely counter-productive ("intelligence just makes things better"). That's why I titled my post "Open Agency model can solve the AI regulation dilemma" (emphasis on the last word).

This could even be seen not just ... (read more)

2Nathan Helm-Burger
Yes, I agree there's a lot of value in thoughtful regulation of training data (whether government enforced or voluntary) by cooperative actors. You raise good points. I was meaning just to refer to the control of non-cooperative actors.

BTW, this particular example sounds just like Numer.ai Signals, but Gaia Network is supposed to be more general and not to revolve around the stock market alone. E.g., the same nutritional data could be bought by food companies themselves, logistics companies, public health agencies, etc.

Thanks for suggestions.

An actual anecdote may look something like this: "We are a startup that creates nutrition assistant and family menu helper app. We collect anonymised data from the users and ensure differential privacy, yada-yada. We want to sell this data to hedge funds that trade food company stocks (so that we can offer the app for free for to our users), but we need to negotiate the terms of these agreements in an ad-hoc way with each hedge fund individually, and we don't have a principled way to come up with a fair price for the data. We would b... (read more)

3Roman Leventov
BTW, this particular example sounds just like Numer.ai Signals, but Gaia Network is supposed to be more general and not to revolve around the stock market alone. E.g., the same nutritional data could be bought by food companies themselves, logistics companies, public health agencies, etc.

The fact that hybridisation works better than pure architectures (architectures consisting of a single core type of block, we shall say), is exactly the point that Nathan Labenz makes in the podcast and I repeat in the beginning of the post.

(Ah, I actually forgot to repeat this point, apart from noting that Doyle predicted this in his architecture theory.)

Experimental results is a more legible and reliable form of evidence than philosophy-level arguments. When it's available, it's the reason to start paying attention to the philosophy in a way the philosophy itself isn't.

Incidentally, hybrid Mamba/MHA doesn't work significantly better than pure Mamba, at least the way it's reported in appendix E.2.2 of the paper (beware left/right confusion in Figure 9). The effect is much more visible with Hyena, though the StripedHyena post gives more details on studying hybridization, so it's unclear if this was studied for Mamba as thoroughly.

we're lacking all 4. We're lacking a coherent map of the polycrisis (if anyone wants to do and/or fund a version of aisafety.world for the polycrisis, I'm interested in contributing)

Joshua Williams created an initial version of a metacrisis map and I suggested to him a couple of days ago to make the development of such a resource more open, e.g., to turn it into a Github repository.

I think there's a ton of funding available in this space, specifically I think speculating on the markets informed by the kind of worldview that allows one to perceive the polyc

... (read more)
3stavros
  It's a good presentation, but it isn't a map.  A literal map of the polycrisis[1] can show: * The various key facets (pollution, climate, biorisk, energy, ecology, resource constraints, globalization, economy, demography etc etc) * Relative degrees of fragility / timelines (e.g. climate change being one of the areas where we have the most slack) * Many of the significant orgs/projects working on these facets, with special emphasis placed on those that are aware of the wider polycrisis * Many of the significant communities * Many of the significant funders   In a nutshell   1. ^ I mildly prefer polycrisis because it's less abstract. The metacrisis points toward a systems dynamic for which we have no adequate levers, whereas the polycrisis points toward the effects in the real world that we need to deal with. I am assuming we live in a world that is going to be reshaped (or ended) by technology (probably AGI) within a few decades, and that if this fails to occur the inevitable result of the metacrisis is collapse. I think the most impact I can have is to kick the can down the road far enough that the accelerationistas get their shot. I don't pretend this is the world I would choose to be living in, or the horse I'd want to be betting on. It is simply my current understanding of reality. Hence: polycrisis. Deal with the symptoms. Keep the patient alive.

1.) Clearly state the problems that need to be worked on, and provide reasonable guidance as to where and how they might be worked on
2.) Notice what work is already being done on the problems, and who is doing it (avoid reinventing the wheel/not invented here syndrome; EA is especially guilty of this)
3.) Actively develop useful connections between 2.)
4.) Measure engagement (resource flows) and progress

I posted some parts of my current visions of 1) and 2) here and here. I think these, along with the Gaia Network design that we proposed recently (the Gaia N... (read more)

1stavros
  Democracy is a mistake, for all of the obvious reasons. As is the belief amongst engineers that every problem is an engineering problem :P We have a whole bunch of tools going mostly unused and unnoticed that could, plausibly, enable a great deal more trust and collaboration than is currently possible.  We have a whole bunch of people both thinking about and working on the polycrisis already.  My proposal is that we're far more likely to achieve our ultimate goal - a future we'd like to live in - if we simply do our best to empower, rather than direct, others. I expect attempts to direct, no matter how brilliant the plan or the mind(s) behind it, are likely to fail. For all the obvious reasons. (caveat: yes AGI changes this, but it changes everything. My whole point is that we need to keep the ship from sinking long enough for AGI to take the wheel)

Right now, if the Gaia Network already existed, but there were little models and agents on it, there would be no or little advantages (e.g., leveraging the tooling/infra built for the Gaia Network) in joining the network.

This is why I personally think that the bottom-up approach: building these apps and scaling them (thus building up QRFs) first is somewhat more promising path than the top-down approach, the ultimate version of which is the OAA itself, and the research agenda of building Gaia Network is a somewhat milder version, but still top-down-ish. Th... (read more)

6Steven Byrnes
My friendly suggestion would be to have one or more anecdotes of a specific concrete (made-up) person trying to build a specific concrete AI thing, and they find themselves facing one or more very specific concrete transaction costs (in the absence of Gaia), and then Gaia enters the scene, and then you specifically explain how those specific transaction costs thereby become lower or zero, thanks to specific affordances of Gaia. I’ve tried to read everything you’ve linked, but am currently unable to generate even one made-up anecdote like that. (I am not the target audience here, this isn’t really my wheelhouse, instead I’m just offering unsolicited advice for when you’re pitching other people in the future.) (Bonus points if you have talked to an actual human who has built (or tried to build) such an AI tool and they say “oh yeah, that transaction cost you mentioned, it really exists, and in fact it’s one of our central challenges, and yes I see how the Gaia thing you describe would substantially help mitigate that.”)

One completely realistic example of an agent is given in the appendix (an agent that recommends actions to improve soil health or carbon sequestration). Some more examples are given in this comment:

  • An info agent that recommends me info resources (news, papers, posts, op-eds, books, videos) to consume, based on my current preferences and demands (and info from other others, such as those listed below, or this agent that predicts the personalised information value of the comments on the web)
    • Scaling to the group/coordination: optimise informational intake of
... (read more)
4Steven Byrnes
The Gaia network does not currently exist AFAIK, yet it is already possible to build all of those things right now, right? Can you elaborate on what would be different in a Gaia network version compared to my building those tools right now without ever having heard of Gaia?

I absolutely agree that the future TAI may look nothing like the current architectures. Cf. this tweet by Kenneth Stanley, with whom I agree 100%. At the same time, I think it's a methodological mistake to therefore conclude that we should only work on approaches and techniques that are applicable to any AI, in a black-box manner. It's like tying our hands behind our backs. We can and should affect the designs of future TAIs through our research, by demonstrating promise (or inherent limitations) of this or that alignment technique, so that these technique... (read more)

7Marc Carauleanu
I do not think we should only work on approaches that work on any AI, I agree that would constitute a methodological mistake. I found a framing that general to not be very conducive to progress.  You are right that we still have the chance to shape the internals of TAI, even though there are a lot of hoops to go through to make that happen. We think that this is still worthwhile, which is why we stated our interest in potentially helping with the development and deployment of provably safe architectures, even though they currently seem less competitive. In my response, I was trying to highlight the point that whenever we can, we should keep our assumptions to a minimum given the uncertainty we are under.  Having that said, it is reasonable to have some working assumptions that allow progress to be made in the first place as long as they are clearly stated. I also agree with Davidad about the importance of governance for the successful implementation of a technical AI Safety plan as well as with your claim that proliferation risks are important, with the caveat that I am less worried about proliferation risks in a world with very short timelines. 

I think you tied yourself too much to the strict binary classification that you invented (finetuning/scaffolding). You overgeneralise and your classification blocks the truth more than clarifies things.

All the different things that can be done by LLMs: tool use, scaffolded reasoning aka LM agents, RAG, fine-tuning, semantic knowledge graph mining, reasoning with semantic knowledge graph, finetuning for following "virtue" (persona, character, role, style, etc.), finetuning for model checking, finetuning for heuristics for theorem proving, finetuning for gen... (read more)

Also, I would say, retrieval-augmented generation (RAG) is not just a mundane way to industrialise language model, but an important concept whose properties should be studied separately from scaffolding or fine-tuning or other techniques that I listed in the comment above.

On (1), cf. this report: "The current portfolio of work on AI risk is over-indexed on work which treats “transformative AI” as a black box and tries to plan around that. I think that we can and should be peering inside that box (and this may involve plans targeted at more specific risks)."

On (2), I'm surprised to read this from you, since you suggested to engineer Self-Other Overlap into LLMs in your AI Safety Camp proposal, if I understood and remember correctly. Do you actually see a line (or a way) of increasing the overlap without furthering ToM and th... (read more)

1Marc Carauleanu
On (1), some approaches are neglected for a good reason. You can also target specific risks while treating TAI as a black-box (such as self-other overlap for deception). I think it can be reasonable to "peer inside the box" if your model is general enough and you have a good enough reason to think that your assumption about model internals has any chance at all of resembling the internals of transformative AI.  On (2), I expect that if the internals of LLMs and humans are different enough, self-other overlap would not provide any significant capability benefits. I also expect that in so far as using self-representations is useful to predict others, you probably don't need to specifically induce self-other overlap at the neural level for that strategy to be learned, but I am uncertain about this as this is highly contingent on the learning setup.

Notable techniques for getting value out of language models that are not mentioned:

1owencb
Thanks. At a first look at what you're saying I'm understanding these to be subcategories of using finetuning or scaffolding (in the case of leveraging semantic knowledge graphs) in order to get useful tools. But I don't understand the sense in which you think finetuning in this context has completely different properties. Do you mean different properties from the point where I discuss agency entering via finetuning? If so I agree. (Apologies for not having thought this through in greater depth.)
4Roman Leventov
Also, I would say, retrieval-augmented generation (RAG) is not just a mundane way to industrialise language model, but an important concept whose properties should be studied separately from scaffolding or fine-tuning or other techniques that I listed in the comment above.

In another thread, Marc Carauleanu wrote:

The main worry that I have with regards to your approach is how competitive SociaLLM would be with regards to SOTA foundation models given both (1) the different architecture you plan to use, and (2) practical constraints on collecting the requisite structured data. While it is certainly interesting that your architecture lends itself nicely to inducing self-other overlap, if it is not likely to be competitive at the frontier, then the methods uniquely designed to induce self-other overlap on SociaLLM are likely to

... (read more)

Thanks for feedback. I agree with worries (1) and (2). I think there is a way to de-risk this.

The block hierarchy that is responsible for tracking the local context consists of classic Transformer blocks. Only the user's own history tracking really needs to be an SSM hierarchy because it quickly surpasses the scalability limits of self-attention (also, interlocutor's tracking blocks on private 1-1 chats that can also be arbitrarily long, but there is probably no such data available for training). On the public data (such as forums, public chats room logs, ... (read more)

1Marc Carauleanu
Interesting, thanks for sharing your thoughts. If this could improve the social intelligence of models then it can raise the risk of pushing the frontier of dangerous capabilities. It is worth noting that we are generally more interested in methods that are (1) more likely to transfer to AGI (don't over-rely on specific details of a chosen architecture) and (2) that specially target alignment instead of furthering the social capabilities of the model. 

Clarity check: this model has not been trained yet at this time, correct?

Yes, I've changed the title of the post and added a footnote on "is a foundation model".

More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this wo

... (read more)
3Cameron Berg
It's a great point that the broader social and economic implications of BCI extend beyond the control of any single company, AE no doubt included. Still, while bandwidth and noisiness of the tech are potentially orthogonal to one's intentions, companies with unambiguous humanity-forward missions (like AE) are far more likely to actually care about the societal implications, and therefore, to build BCI that attempts to address these concerns at the ground level. In general, we expect the by-default path to powerful BCI (i.e., one where we are completely uninvolved) to be negative/rife with s-risks/significant invasions of privacy and autonomy, etc, which is why we are actively working to nudge the developmental trajectory of BCI in a more positive direction—i.e., one where the only major incentive is build the most human-flourishing-conducive BCI tech we possibly can.

We think we have some potentially promising hypotheses. But because we know you do, too, we are actively soliciting input from the alignment community. We will be more formally pursuing this initiative in the near future, awarding some small prizes to the most promising expert-reviewed suggestions. Please submit any[3] agenda idea that you think is both plausible and neglected (even if you don’t have the bandwidth right now to pursue the idea! This is a contest for ideas, not for implementation). 

This is related to what @Kabir Kumar is ... (read more)

2Cameron Berg
Thanks for calling this out—we're definitely open to discussing potential opportunities for collaboration/engaging with the platform!

Reverse-engineering prosociality

Here's my idea on this topic: "SociaLLM: a language model design for personalised apps, social science, and AI safety research". Though it's more about engineering pro-sociality (including Self-Other Overlap) using architecture and inductive biases directly than reverse-engineering prosociality.

3Marc Carauleanu
Thanks for writing this up—excited to chat some time to further explore your ideas around this topic. We’ll follow up in a private channel.The main worry that I have with regards to your approach is how competitive SociaLLM would be with regards to SOTA foundation models given both (1) the different architecture you plan to use, and (2) practical constraints on collecting the requisite structured data. While it is certainly interesting that your architecture lends itself nicely to inducing self-other overlap, if it is not likely to be competitive at the frontier, then the methods uniquely designed to induce self-other overlap on SociaLLM are likely to not scale/transfer well to frontier models that do pose existential risks. (Proactively ensuring transferability is the reason we focus on an additional training objective and make minimal assumptions about the architecture in the self-other overlap agenda.) One additional worry is that many of the research benefits of SociaLLM may not be out of reach for current foundation models, and so it is unclear if investing in the unique data and architecture setup is worth it in comparison to the counterfactual of just scaling up current methods.
Load More