This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Wikitags
LW
Login
Utility Functions
Settings
Applied to
Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format
by
Roland Pihlakas
1mo
ago
Applied to
Toward a Mathematical Definition of Rationality in Multi-Agent Systems
by
nekofugu
2mo
ago
Applied to
Atlas: Stress-Testing ASI Value Learning Through Grand Strategy Scenarios
by
NeilFox
2mo
ago
Applied to
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
by
Matrice Jacobine
2mo
ago
Applied to
Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)
by
Roland Pihlakas
4mo
ago
Applied to
Building AI safety benchmark environments on themes of universal human values
by
Roland Pihlakas
4mo
ago
Dakara
v1.12.0
Dec 30th 2024 GMT
(
+26
/
-9
)
1
Applied to
Is "VNM-agent" one of several options, for what minds can grow up into?
by
AnnaSalamon
4mo
ago
Applied to
Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
by
Johannes C. Mayer
4mo
ago
Applied to
Better difference-making views
by
MichaelStJules
4mo
ago
Applied to
Expected Utility, Geometric Utility, and Other Equivalent Representations
by
StrivingForLegibility
5mo
ago
Applied to
Value/Utility: A History
by
Raemon
5mo
ago
Applied to
Valence Need Not Be Bounded; Utility Need Not Synthesize
by
Raemon
5mo
ago
Applied to
Galatea and the windup toy
by
Nicolas Villarreal
6mo
ago
Applied to
Resolving von Neumann-Morgenstern Inconsistent Preferences
by
niplav
6mo
ago
Applied to
Doing Nothing Utility Function
by
k64
7mo
ago
Applied to
Sequence overview: Welfare and moral weights
by
MichaelStJules
8mo
ago