quetzal_rainbow

Linkpost: Predicting Empirical AI Research Outcomes with Language Models

Abstract (emphasis mine): > Many promising-looking ideas in AI research fail to deliver, but their validation takes substantial human labor and compute. Predicting an idea's chance of success is thus crucial for accelerating empirical AI research, a skill that even expert researchers can only acquire through substantial experience. We build...

Jun 4, 202510

Definition of alignment science I like

There were many attempts to define alignment and derive from it definitions of alignment work/research/science etc. For example, Rob Bensinger: > Back in 2001, we defined "Friendly AI" as "The field of study concerned with the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the...

Jan 6, 202521

How do you shut down an escaped model?

From comment on post about Autonomous Adaptation and Replication: > ARA is just not a very compelling threat model in my mind. The key issue is that AIs that do ARA will need to be operating at the fringes of human society, constantly fighting off the mitigations that humans are...

Jun 2, 202415

Training of superintelligence is secretly adversarial

TL;DR it is possible to model the training of ASI[1] as an adversarial process because: 1. We train safe ASI on the foundation of security assumptions about the learning process, training data, inductive biases, etc. 2. We aim to train a superintelligence, i.e., a more powerful learning system than a...

Feb 7, 202415

There is no sharp boundary between deontology and consequentialism

The purpose of this post is to serve as storage for particular thought so I can link it in dialogues. I've seen logic along these lines several times: 1. Consequentialist reasoning is more complex than following deontological rules; 2. Neural networks have some kind of simplicity bias; 3. Therefore, neural...

Jan 8, 20248

Where Does Adversarial Pressure Come From?

You could see on LW a dialogue like this: > AI Pessimist: Look, you should apply security mindset when thinking about aligning superintelligence. If you have superintelligence, all your safety measures are under enormous optimization pressure, which will seek how to crack them. > > Skeptic: I agree that if...

Dec 14, 202317

Predictable Defect-Cooperate?

Epistemic status: I consider everything written here pretty obvious, but I haven't seen this anywhere else. It would be cool if you could provide sources on topic! Reason to write: I've seen once pretty confused discussion in Twitter about how multiple superintelligences will predictably end up in Defect-Defect equilibrium and...

Nov 18, 20237

quetzal_rainbow

quetzal_rainbow

They are made of repeating patterns

What specific thing would you do with AI Alignment Research Assistant GPT?

Definition of alignment science I like

Where Does Adversarial Pressure Come From?

quetzal_rainbow

They are made of repeating patterns

What specific thing would you do with AI Alignment Research Assistant GPT?

Definition of alignment science I like

Where Does Adversarial Pressure Come From?

Linkpost: Predicting Empirical AI Research Outcomes with Language Models

Definition of alignment science I like

How do you shut down an escaped model?

Training of superintelligence is secretly adversarial

There is no sharp boundary between deontology and consequentialism

Where Does Adversarial Pressure Come From?

Predictable Defect-Cooperate?