This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
LW
$
Login
Filters
Filter by posted date
All
Past 24 hours
Past week
Past month
Past year
Curated
Exclude events
Clear all filters
Sort
Relevance
Posts
Comments
Tags and Wiki
Sequences
Users
7868 results
(Non-deceptive) Suboptimality Alignment
Sodium
5 karma
1y
Executive Summary * I present a detailed and slightly different definition of
suboptimality
alignment
compared to the original in Risks
The Inner Alignment Problem
evhub
103 karma
6y
, 2. Approximate
alignment
, and 3.
Suboptimality
alignment
. Proxy
alignment
. The basic idea of proxy
alignment
is that a mesa-optimizer can
More variations on pseudo-alignment
evhub
27 karma
5y
). In particular, we distinguished between proxy
alignment
,
suboptimality
alignment
, approximate
alignment
, and deceptive
alignment
. I still make
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
Evan R. Murphy
58 karma
3y
details. At least some forms of
suboptimality
alignment
can be addressed as well. Robustness techniques such as relaxed adversarial
Concrete experiments in inner alignment
evhub
74 karma
5y
you produce approximate
alignment
if you constrain model capacity? * What about
suboptimality
alignment
? Can you create an environment
Three scenarios of pseudo-alignment
Eleni Angelou
9 karma
2y
, there's an unavoidable difference in the objectives. Scenario 3:
Suboptimality
alignment
Here imagine that we train a robot similar to scenario 1
Relaxed adversarial training for inner alignment
evhub
69 karma
5y
aligned
with the actual loss function.
Suboptimality
alignment
. One concerning form of misalignment discussed in “Risks from Learned
Does SGD Produce Deceptive Alignment?
Mark Xu
96 karma
4y
, myopic training objectives do not favor deception. ↩︎ 4. Evan Hubinger calls the more general case of this phenomenon “
suboptimality
deceptive
alignment
” and explains more here. ↩︎
[AN #74]: Separating beneficial AI into competence, alignment, and coping with impacts
Rohin Shah
19 karma
5y
when the model of the base objective is a non-robust proxy for the true base objective.
Suboptimality
deceptive
alignment
is when deception
How Interpretability can be Impactful
Connall Garrod
18 karma
2y
remain non deceptive after deployment, removing the issue of
suboptimal
deceptive
alignment
. Another way to potentially make this more
«
‹
1
2
3
4
5
6
7
›
»
Filters
Filter by posted date
All
Past 24 hours
Past week
Past month
Past year
Curated
Exclude events
Clear all filters
Sort
Relevance