x

LESSWRONG
LW

Academic Papers — LessWrong

Academic Papers

Edited by Kaj_Sotala last updated 9th Jul 2020

Posts either linking to, or summarizing, formal papers published elsewhere.

Add Posts

Posts tagged Academic Papers

8

206Some AI research areas and their relevance to existential safety

5y

37

6

102Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training

21d

12

6

66How to Control an LLM's Behavior (why my P(DOOM) went down)

2y

30

6

35The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

8mo

34

5

37Striking Implications for Learning Theory, Interpretability — and Safety?

2y

4

4

115Thirty-three randomly selected bioethics papers

Rob Bensinger, gwern

5y

46

3

50My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)

3y

0

2

178Scientific breakthroughs of the year

2mo

13

2

1682021 AI Alignment Literature Review and Charity Comparison

4y

28

2

159Comment on "Endogenous Epistemic Factionalization"

6y

8

2

121Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

2y

14

2

109Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

L Rudolf L, bilalchughtai, Jan Betley, kaivu, Jérémy Scheurer, Mikita Balesni, AlexMeinke, Owain_Evans, Marius Hobbhahn

2y

40

2

106Optimized Propaganda with Bayesian Networks: Comment on "Articulating Lay Theories Through Graphical Models"

6y

10

2

90Fallacies as weak Bayesian evidence

14y

42

2

86I Was Not Almost Wrong But I Was Almost Right: Close-Call Counterfactuals and Bias

14y

40

Load More (15/141)

Add Posts