Abstract (emphasis mine): > Many promising-looking ideas in AI research fail to deliver, but their validation takes substantial human labor and compute. Predicting an idea's chance of success is thus crucial for accelerating empirical AI research, a skill that even expert researchers can only acquire through substantial experience. We build...
There were many attempts to define alignment and derive from it definitions of alignment work/research/science etc. For example, Rob Bensinger: > Back in 2001, we defined "Friendly AI" as "The field of study concerned with the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the...
From comment on post about Autonomous Adaptation and Replication: > ARA is just not a very compelling threat model in my mind. The key issue is that AIs that do ARA will need to be operating at the fringes of human society, constantly fighting off the mitigations that humans are...
TL;DR it is possible to model the training of ASI[1] as an adversarial process because: 1. We train safe ASI on the foundation of security assumptions about the learning process, training data, inductive biases, etc. 2. We aim to train a superintelligence, i.e., a more powerful learning system than a...
The purpose of this post is to serve as storage for particular thought so I can link it in dialogues. I've seen logic along these lines several times: 1. Consequentialist reasoning is more complex than following deontological rules; 2. Neural networks have some kind of simplicity bias; 3. Therefore, neural...
You could see on LW a dialogue like this: > AI Pessimist: Look, you should apply security mindset when thinking about aligning superintelligence. If you have superintelligence, all your safety measures are under enormous optimization pressure, which will seek how to crack them. > > Skeptic: I agree that if...
Epistemic status: I consider everything written here pretty obvious, but I haven't seen this anywhere else. It would be cool if you could provide sources on topic! Reason to write: I've seen once pretty confused discussion in Twitter about how multiple superintelligences will predictably end up in Defect-Defect equilibrium and...