This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Home
All Posts
Concepts
Library
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Zuitzerland - Future Society & Crypto City Experiment
Scott Aaronson at UT Austin on May 17 | Computational Complexity & Philosophy
[Today]
AI Safety Thursdays: When Good Rewards Go Bad - Reward Overoptimization in RLHF
[Tomorrow]
May Meetup
Subscribe (RSS/Email)
LW the Album
About
FAQ
Top Questions
147
why assume AGIs will optimize for fixed goals?
Q
Ω
nostalgebraist
,
Rob Bensinger
3y
Q
Ω
60
123
What do coherence arguments actually prove about agentic behavior?
Q
[anonymous]
1y
Q
39
45
Why do many people who care about AI Safety not clearly endorse PauseAI?
Q
humnrdble
,
1a3orn
2mo
Q
41
106
What convincing warning shot could help prevent extinction from AI?
Q
Ω
Charbel-Raphaël
,
cozyfractal
,
peterbarnett
1y
Q
Ω
22
101
What are the strongest arguments for very short timelines?
Q
Kaj_Sotala
,
elifland
5mo
Q
79
Recent Activity
25
Why OpenAI projects only $174B of revenue by 2030?
Q
becausecurious
18h
Q
5
11
Game theory of "Nuclear Prisoner's Dilemma" - on nuking rocks
Q
CronoDAS
3d
Q
6
-3
If only the most powerful AGI is misaligned, can it be used as a doomsday machine?
Q
StanislavKrym
2d
Q
0
-3
How do I design long prompts for thinking zero shot systems with distinct equally distributed prompt sections (mission, goals, memories, how-to-respond,... etc) and how to maintain llm coherence?
Q
ollie_
4d
Q
5
2
Can I publish songs derived from the Sequences' posts on YouTube?
Q
azergante
3d
Q
2
10
Chess - "Elo" of random play?
Q
Shankar Sivarajan
,
winstonBosan
9d
Q
16
2
Programming Language Early Funding?
Q
J Thomas Moros
3mo
Q
6
12
Which journalists would you give quotes to? [one journalist per comment, agree vote for trustworthy]
Q
Nathan Young
8d
Q
26
16
Blue light, 'Adrenal ASMR': strange experiences I can't find any literature about
Q
vernichtung
10d
Q
6
1
What kind of policy by an AGI would make people happy?
Q
StanislavKrym
9d
Q
2
8
Does translating a post with an LLM affect its rating?
Q
ReverendBayes
12d
Q
9
5
Does there exist an interactive reasoning map tool that lets users visually lay out claims, assign probabilities and confidence levels, and dynamically adjust their beliefs based on weighted influences between connected assertions?
Q
Zack Friedman
15d
Q
3