Original Research on Less Wrong

lukeprog

Hundreds of Less Wrong posts summarize or repackage work previously published in professional books and journals, but Less Wrong also hosts lots of original research in philosophy, decision theory, mathematical logic, and other fields. This post serves as a curated index of Less Wrong posts containing significant original research.

Obviously, there is much fuzziness about what counts as "significant" or "original." I'll be making lots of subjective judgment calls about which suggestions to add to this post. One clear rule is: I won't be linking anything that merely summarizes previous work (e.g. Stuart's summary of his earlier work on utility indifference).

Update 09/20/2013: Added Notes on logical priors from the MIRI workshop, Cooperating with agents with different ideas of fairness, while resisting exploitation, Do Earths with slower economic growth have a better chance at FAI?

Update 11/03/2013: Added Bayesian probability as an approximate theory of uncertainty?, On the importance of taking limits: Infinite Spheres of Utility, Of all the SIA-doomsdays in the all the worlds...

Update 01/22/2014: Added Change the labels, undo infinitely good, Reduced impact AI: no back channels, International cooperation vs. AI arms race, Naturalistic trust among AIs: The parable of the thesis advisor’s theorem

General philosophy

Highly Advanced Epistemology 101 for Beginners. Eliezer's bottom-up guide to truth, reference, meaningfulness, and epistemology. Includes practical applications and puzzling meditations.
Seeing Red, A Study of Scarlet, Nature: Red in Tooth and Qualia. Orthonormal dissolves Mary's room and qualia.
Counterfactual resiliency test for non-causal models. Stuart Armstrong suggests testing non-causal models for "counterfactual resiliency."
Thoughts and problems with Eliezer's measure of optimization power. Stuart Armstrong examines some potential problems with Eliezer's concept of optimization power.
Free will. Eliezer's particular compatibilist-style solution to the free will problem from reductionist viewpoint.
The absolute Self-Selection Assumption. A clarification on anthropic reasoning, focused on Wei Dai s UDASSA framework.
SIA, conditional probability, and Jaan Tallinn s simulation tree. Stuart Armstrong makes the bridge between Nick Bostrom s Self-Indication Assumption (SIA) and Jann Tallinn s of superintelligence reproduction.
Mathematical Measures of Optimization Power. Alex Altair tackles one approach to mathematically formalizing Yudkowsky s Optimization Power concept.
Caught in the glare of two anthropic shadows. Stuart_Armstrong provides a detailed analysis of the "anthrophic shadow" concept and its implications.
Bayesian probability as an approximate theory of uncertainty?. Vladimir Slepnev argues that Bayesian probability is an imperfect approximation of what we want from a theory of uncertainty.
Of all the SIA-doomsdays in the all the worlds.... Stuart_Armstrong on the doomsday argument, the self-sampling assumption and the self-indication assumption.

Decision theory / AI architectures / mathematical logic

Towards a New Decision Theory, Explicit Optimization of Global Strategy (Fixing a Bug in UDT1). Wei Dai develops his new decision theory, UDT.
Counterfactual Mugging. Vladimir Nesov presents a new Newcomb-like problem.
Cake, or death! Summary: "the naive cake-or-death problem emerges for a value learning agent when it expects its utility to change, but uses its current utility to rank its future actions," and "the sophisticated cake-or-death problem emerges for a value learning agent when it expects its utility to change predictably in certain directions dependent on its own behavior."
An angle of attack on Open Problem #1, How to cheat Löb's Theorem: my second try. Benja tackles the problem of "how, given Löb's theorem, an AI can replace itself with a better expected utility maximizer that believes in as much mathematics as the original AI."
A model of UDT with a concrete prior over logical statements. Benja attacks the problem of logical uncertainty.
Decision Theories: A Less Wrong Primer, Decision Theories: A Semi-Formal Analysis, Part I, Decision Theories: A Semi-Formal Analysis, Part II, Decision Theories: A Semi-Formal Analysis, Part III, Halt, Melt, and Catch Fire, Hang On, I Think This Works After All. Orthonormal explains the TDT/UDT approach to decision theory and then develops his own TDT-like algorithm called Masquerade.
Decision Theory Paradox: PD with Three Implies Chaos?. Orthonormal describes "an apparent paradox in a three-agent variant of the Prisoner's Dilemma: despite full knowledge of each others' source codes, TDT agents allow themselves to be exploited by CDT, and lose completely to another simple decision theory."
Naive TDT, Bayes nets, and counterfactual mugging. Stuart Armstrong suggests a reason for TDT's apparent failure on the counterfactual mugging problem.
Bounded versions of Gödel's and Löb's theorems, Formalising cousin_it's bounded versions of Gödel's theorem. Vladimir Slepnev proposes bounded versions of Gödel's and Löb's theorems, and Stuart Armstrong begins to formalize one of them.
Satisficers want to become maximisers. Stuart Armstrong explains why satisficers want to become maximisers.
Would AIXI protect itself?. Stuart Armstrong argues that "with practice the AIXI [agent] would likely seek to protect its power source and existence, and would seek to protect its memory from 'bad memories' changes. It would want to increase the amount of 'good memory' changes. And it would not protect itself from changes to its algorithm and from the complete erasure of its memory. It may also develop indirect preferences for or against these manipulations if we change our behaviour based on them."
The mathematics of reduced impact: help needed. Stuart Armstrong explores some ways we might reduce the impact of maximizing AIs.
AI ontology crises: an informal typology. Stuart Armstrong builds a typology of AI ontology crises, following de Blanc (2011).
In the Pareto-optimised crowd, be sure to know your place. Stuart Armstrong argues that "In a population playing independent two-player games, Pareto-optimal outcomes are only possible if there is an agreed universal scale of value relating each players' utility, and the players then acts to maximise the scaled sum of all utilities."
If you don't know the name of the game, just tell me what I mean to you. Stuart Armstrong summarizes: "Both the Nash Bargaining solution (NBS), and the Kalai-Smorodinsky Bargaining Solution (KSBS), though acceptable for one-off games that are fully known in advance, are strictly inferior for independent repeated games, or when there exists uncertainty as to which game will be played."
The Blackmail Equation. Stuart Armstrong summarizes Eliezer's result on blackmail in decision theory.
Expected utility without the independence axiom. Stuart Armstrong summarizes: "Deprived of independence, expected utility sneaks in via aggregation."
An example of self-fulfilling spurious proofs in UDT, A model of UDT without proof limits. Vladimir Slepnev examines several problems related to decision agents with spurious proof-searchers.
The limited predictor problem. Vladimir Slepnev describes the limited predictor problem, "a version of Newcomb's Problem where the predictor has limited computing resources. To predict the agent's action, the predictor simulates the agent for N steps. If the agent doesn't finish in N steps, the predictor assumes that the agent will two-box."
A way of specifying utility functions for UDT. Vladimir Slepnev advances a method for specifying utility functions for UDT agents.
A model of UDT with a halting oracle, Formulas of arithmetic that behave like decision agents. Vladimir Slepnev specifies an optimality notion which "matches our intuitions even though the universe is still perfectly deterministic and the agent is still embedded in it, because the oracle ensures that determinism is just out of the formal system's reach." Then, Nisan revisits some of this result's core ideas by representing the decision agents as formulas of Peano arithmetic.
AIXI and Existential Dispair. Paul Christiano discusses how an approximate implementation of AIXI could lead to an erratically behaving system.
The Absent-Minded Driver. In this post, Wei Dai examines the absent-minded driver problem. He tries to show how professional philosophers failed to reach the solution to time inconsistency, while rejecting Eliezer s people are crazy explanation.
Clarification of AI Reflection Problem. A clear description for the commonly discussed problem in Less Wrong of reflection in AI systems, along with some possible solutions, by Paul Christiano.
Motivating Optimization Processes. Paul Christiano addresses the question of how and when we can expect an AGI to cooperate with humanity, and how it might be easier to implement than a completely Friendly AGI .
Universal agents and utility functions. Anja Heinisch replaces AIXI's reward function with a utility function .
Ingredients of Timeless Decision Theory, Timeless Decision Theory: Problems I Can t Solve, Timeless Decision Theory and Meta-Circular Decision Theory. Eliezer s posts describe the main details of his Timeless Decision Theory, some problems for which he doesn t possess decision theories and reply to Gary Drescher s comment describing Meta-Circular Decision Theory. These insights later culminated in Yudkowsky (2010).
Confusion about Newcomb is confusion about counterfactuals, Why we need to reduce could , would , should , Decision theory: why Pearl helps reduce could and would , but still leaves us with at least three alternatives. Anna Salamon s posts use causal Bayes nets to explore the difficulty of interpreting counterfactual reasoning and the related concepts of should, could, and would."
Bayesian Utility: Representing Preference by Probability Measures. Vladimir Nesov presents a transformation of the standard expected utility formula.
A definition of wireheading. Anja attempts to reach a definition of wireheading that encompasses the intuitions about the concept that have emerged from LW discussions.
Why you must maximize expected utility. Benja presents a slight variant on the Von Neumann-Morgenstern approach to the axiomatic justification of the principle of maximizing expected utility.
A utility-maximizing varient of AIXI. Alex Mennen builds on Anja's specification of a utility-maximizing variant of AIXI.
A fungibility theorem. Nisan s alternative to the von Neumann-Morgenstern theorem, proposing the maximization of the expectation of a linear aggregation of one s values.
Logical uncertainty, kind of. A proposal, at least. Manfred's proposed solution on how to apply the basic laws of belief manipulation to cases where an agent is computationally limited.
Save the princess: A tale of AIXI and utility functions. Anja discusses utility functions, delusion boxes, and cartesian dualism in attempting to improve upon the original AIXI formalism.
Naturalism versus unbounded (or unmaximisable) utility options. Stuart Armstrong poses a series of questions regarding unbounded utility functions.
Beyond Bayesians and Frequentists. Jacob Steinhardt compares two approaches to statistics and discusses when to use them.
VNM agents and lotteries involving an infinite number of possible outcomes. AlexMennen summarizes: The VNM utility theorem only applies to lotteries that involve a finite number of possible outcomes. If an agent maximizes the expected value of a utility function when considering lotteries that involve a potentially infinite number of outcomes as well, then its utility function must be bounded.
A Problem with playing chicken with the universe. Karl explains how a model of UDT with a halting oracle might have some problematic elements.
Intelligence Metrics and Decision Theories, Metatickle Intelligence Metrics and Friendly Utility Functions. Squark reviews a few previously proposed mathematical metrics of general intelligence and proposes his own approach.
Probabilistic L�b theorem, Logic in the Language of Probability. Stuart Armstrong looks at whether reflective theories of logical uncertainty still suffer from L�b's theorem.
Notes on logical priors from the MIRI workshop. Vladimir Slepnev summarizes: "In Counterfactual Mugging with a logical coin, a 'stupid' agent that can't compute the outcome of the coinflip should agree to pay, and a 'smart' agent that considers the coinflip as obvious as 1=1 should refuse to pay. But if a stupid agent is asked to write a smart agent, it will want to write an agent that will agree to pay. Therefore the smart agent who refuses to pay is reflectively inconsistent in some sense. What's the right thing to do in this case?"
Cooperating with agents with different ideas of fairness, while resisting exploitation. Eliezer Yudkowsky investigates some ideas from the MIRI workshop that he hasn’t seen in informal theories of negotiation.
Naturalistic trust among AIs: The parable of the thesis advisor’s theorem. Benja discusses Nik Weaver's suggestion for 'naturalistic trust'.

Ethics

The Metaethics Sequence. Eliezer explains his theory of metaethics. Many readers have difficulty grokking his central points, and may find clarifications in the discussion here.
No-Nonsense Metaethics. Lukeprog begins to outline his theory of metaethics in this unfinished sequence.
The Fun Theory Sequence. Eliezer develops a new subfield of ethics, "fun theory."
Consequentialism Need Not Be Near-Sighted. Orthonormal's summary: "If you object to consequentialist ethical theories because you think they endorse horrible or catastrophic decisions, then you may instead be objecting to short-sighted utility functions or poor decision theories."
A (small) critique of total utilitarianism. Stuart Armstrong analyzes some weaknesses of total utilitarianism.
In the Pareto world, liars prosper. Stuart Armstrong presents a new picture proof of a previously known result, that "if there is any decision process that will find a Pareto outcome for two people, it must be that liars will prosper: there are some circumstances where you would come out ahead if you were to lie about your utility function."
Politics as Charity, Probability and Politics. Carl Shulman analyzes the prospects for doing effective charity work by influencing elections.
Value Uncertainty and the Singleton Scenario. Wei Dai examines the problem of value uncertainty, a special case of moral uncertainty in which consequentialism is assumed.
Pascal's Mugging: Tiny Probabilities of Vast Utilities. Eliezer describes the problem of Pascal's Mugging, later published in Bostrom (2009).
Ontological Crisis in Humans. Wei Dai presents the ontological crisis concept applied to human existence and goes over some examples.
Ideal Advisor Theories and Personal CEV. Luke Muehlhauser and crazy88 place CEV in the context of mainstream moral philosophy, and use a variant of CEV to address a standard line of objections to ideal advisor theories in ethics..
Harsanyi s Social Aggregation Theorem and what it means for CEV. Alex Mennen describes the relevance of Harsanyi's Social Aggregation Theorem to possible formalizations of CEV.
Three Kinds of Moral Uncertainty. Kaj Sotala's attempts to explain what it means to be uncertain about moral theories.
A brief history of ethically concerned scientists. Kaj_Sotala gives historical examples of ethically concerned scientists.
Pascal s Mugging for bounded utility functions. Benja s post describing the Pascal Mugging problem under truly bounded utility functions.
Pascal's Muggle: Infinitesimal Priors and Strong Evidence. Eliezer Yudkowsky discusses the role of infinitesimal priors and their decision-theoretic consequences in a "Pascal's Mugging"-type situation.
An Attempt at Preference Uncertainty Using VNM. nyan_sandwich tackles the problem of making decisions when you are uncertain about what your object-level preferences should be.
Gains from trade: Slug versus Galaxy - how much would I give up to control you? and Even with default points, systems remain exploitable. Stuart_Armstrong provides a suggestion as to how to split the gains from trade in some situations and how such solutions can be exploitable.
On the importance of taking limits: Infinite Spheres of Utility. aspera shows that "if we want to make a decision based on additive utility, the infinite problem is ill posed; it has no unique solution unless we take on additional assumptions."
Change the labels, undo infinitely good. Stuart_Armstrong talks about a small selection of paradoxes connected with infinite ethics.

AI Risk Strategy

AI Risk and Opportunity: A Strategic Analysis. The only original work here so far is the two-part history of AI risk thought, including descriptions of works previously unknown to the LW/SI/FHI community (e.g. Good 1959, 1970, 1982; Cade 1966).
AI timeline predictions: Are we getting better?, AI timeline prediction data. A preview of Stuart Armstrong's and Kaj Sotala's work on AI predictions, later published in Armstrong & Sotala (2012).
Self-Assessment in Expert Ai Prediction. Stuart_Armstrong suggests that the predictive accuracy of self selected experts might be different from those elicited from less selected groups.
Kurzweil's predictions: good accuracy, poor self-calibration. A new analysis of Kurzweil's predictions, by Stuart Armstrong.
Tools versus agents, Reply to Holden on Tool AI. Stuart Armstrong and Eliezer examine Holden's proposal for Tool AI.
What is the best compact formalization of the argument for AI risk from fast takeoff? A suggestion of a few steps on how to compact and clarify the argument for the Singularity Institute s Big Scary Idea , by LW user utility monster.
The Hanson-Yudkowsky AI-Foom Debate. The 2008 debate between Robin Hanson and Eliezer Yudkowsky, largely used to exemplify the difficulty of resolving disagreements even between expert rationalists. It focus on the likelihood of hard AI takeoff, the need for a theory of Friendliness, the future of AI, brain emulations and recursive improvement.
What can you do with an Unfriendly AI? Paul Christiano discusses how we could eventually turn an Unfriendly AI into a useful system by filtering the way it interacts and constraining how its answers are given to us.
Cryptographic Boxes for Unfriendly AI. Following the tone of the previous post and AI boxing in general, this post by Paul Christiano explores the possibility of using cryptography as a way to guarantee friendly outputs.
How can I reduce existential risk from AI?. A post by Luke Muehlhauser describing the Meta, Strategic and Direct work one could do in order to reduce the risk for humanity stemming from AI.
Intelligence explosion vs Co-operative explosion. Kaj Sotala s description of how an intelligence explosion could emerge from the cooperation of individual artificial intelligent systems forming a superorganism.
Assessing Kurzweil: The Results; Assessing Kurzweil: The Gory Details. Stuart_Armstrong's attempt to evaluate the accuracy of Ray Kurzweil's model of technological intelligence development.
Domesticating reduced impact AIs. Stuart Armstrong attempts to give a solid foundation from which one can build a 'reduced impact AI'.
Why Ai may not foom. John_Maxwell_IV explains how the intelligence of a self-improving AI may not grow as fast as we might think.
Singleton: the risks and benefits of one world governments. Stuart_Armstrong attempts to lay out a reasonable plan for tackling the singleton problem.
Do Earths with slower economic growth have a better chance at FAI?. Eliezer Yudkowsky argues that GDP growth acceleration may actually decrease our chances of getting FAI.
Reduced impact AI: no back channels. Stuart_Armstrong presents a further development of the reduced impact AI approach.
International cooperation vs. AI arms race. Brian_Tomasik talks about the role of government in a possible AI arms race.

General philosophy

Highly Advanced Epistemology 101 for Beginners. Eliezer's bottom-up guide to truth, reference, meaningfulness, and epistemology. Includes practical applications and puzzling meditations.
Seeing Red, A Study of Scarlet, Nature: Red in Tooth and Qualia. Orthonormal dissolves Mary's room and qualia.
Counterfactual resiliency test for non-causal models. Stuart Armstrong suggests testing non-causal models for "counterfactual resiliency."
Thoughts and problems with Eliezer's measure of optimization power. Stuart Armstrong examines some potential problems with Eliezer's concept of optimization power.
Free will. Eliezer's particular compatibilist-style solution to the free will problem from reductionist viewpoint.
The absolute Self-Selection Assumption. A clarification on anthropic reasoning, focused on Wei Dai s UDASSA framework.
SIA, conditional probability, and Jaan Tallinn s simulation tree. Stuart Armstrong makes the bridge between Nick Bostrom s Self-Indication Assumption (SIA) and Jann Tallinn s of superintelligence reproduction.
Mathematical Measures of Optimization Power. Alex Altair tackles one approach to mathematically formalizing Yudkowsky s Optimization Power concept.
Caught in the glare of two anthropic shadows. Stuart_Armstrong provides a detailed analysis of the "anthrophic shadow" concept and its implications.
Bayesian probability as an approximate theory of uncertainty?. Vladimir Slepnev argues that Bayesian probability is an imperfect approximation of what we want from a theory of uncertainty.
Of all the SIA-doomsdays in the all the worlds.... Stuart_Armstrong on the doomsday argument, the self-sampling assumption and the self-indication assumption.

Decision theory / AI architectures / mathematical logic

Towards a New Decision Theory, Explicit Optimization of Global Strategy (Fixing a Bug in UDT1). Wei Dai develops his new decision theory, UDT.
Counterfactual Mugging. Vladimir Nesov presents a new Newcomb-like problem.
Cake, or death! Summary: "the naive cake-or-death problem emerges for a value learning agent when it expects its utility to change, but uses its current utility to rank its future actions," and "the sophisticated cake-or-death problem emerges for a value learning agent when it expects its utility to change predictably in certain directions dependent on its own behavior."
An angle of attack on Open Problem #1, How to cheat Löb's Theorem: my second try. Benja tackles the problem of "how, given Löb's theorem, an AI can replace itself with a better expected utility maximizer that believes in as much mathematics as the original AI."
A model of UDT with a concrete prior over logical statements. Benja attacks the problem of logical uncertainty.
Decision Theories: A Less Wrong Primer, Decision Theories: A Semi-Formal Analysis, Part I, Decision Theories: A Semi-Formal Analysis, Part II, Decision Theories: A Semi-Formal Analysis, Part III, Halt, Melt, and Catch Fire, Hang On, I Think This Works After All. Orthonormal explains the TDT/UDT approach to decision theory and then develops his own TDT-like algorithm called Masquerade.
Decision Theory Paradox: PD with Three Implies Chaos?. Orthonormal describes "an apparent paradox in a three-agent variant of the Prisoner's Dilemma: despite full knowledge of each others' source codes, TDT agents allow themselves to be exploited by CDT, and lose completely to another simple decision theory."
Naive TDT, Bayes nets, and counterfactual mugging. Stuart Armstrong suggests a reason for TDT's apparent failure on the counterfactual mugging problem.
Bounded versions of Gödel's and Löb's theorems, Formalising cousin_it's bounded versions of Gödel's theorem. Vladimir Slepnev proposes bounded versions of Gödel's and Löb's theorems, and Stuart Armstrong begins to formalize one of them.
Satisficers want to become maximisers. Stuart Armstrong explains why satisficers want to become maximisers.
Would AIXI protect itself?. Stuart Armstrong argues that "with practice the AIXI [agent] would likely seek to protect its power source and existence, and would seek to protect its memory from 'bad memories' changes. It would want to increase the amount of 'good memory' changes. And it would not protect itself from changes to its algorithm and from the complete erasure of its memory. It may also develop indirect preferences for or against these manipulations if we change our behaviour based on them."
The mathematics of reduced impact: help needed. Stuart Armstrong explores some ways we might reduce the impact of maximizing AIs.
AI ontology crises: an informal typology. Stuart Armstrong builds a typology of AI ontology crises, following de Blanc (2011).
In the Pareto-optimised crowd, be sure to know your place. Stuart Armstrong argues that "In a population playing independent two-player games, Pareto-optimal outcomes are only possible if there is an agreed universal scale of value relating each players' utility, and the players then acts to maximise the scaled sum of all utilities."
If you don't know the name of the game, just tell me what I mean to you. Stuart Armstrong summarizes: "Both the Nash Bargaining solution (NBS), and the Kalai-Smorodinsky Bargaining Solution (KSBS), though acceptable for one-off games that are fully known in advance, are strictly inferior for independent repeated games, or when there exists uncertainty as to which game will be played."
The Blackmail Equation. Stuart Armstrong summarizes Eliezer's result on blackmail in decision theory.
Expected utility without the independence axiom. Stuart Armstrong summarizes: "Deprived of independence, expected utility sneaks in via aggregation."
An example of self-fulfilling spurious proofs in UDT, A model of UDT without proof limits. Vladimir Slepnev examines several problems related to decision agents with spurious proof-searchers.
The limited predictor problem. Vladimir Slepnev describes the limited predictor problem, "a version of Newcomb's Problem where the predictor has limited computing resources. To predict the agent's action, the predictor simulates the agent for N steps. If the agent doesn't finish in N steps, the predictor assumes that the agent will two-box."
A way of specifying utility functions for UDT. Vladimir Slepnev advances a method for specifying utility functions for UDT agents.
A model of UDT with a halting oracle, Formulas of arithmetic that behave like decision agents. Vladimir Slepnev specifies an optimality notion which "matches our intuitions even though the universe is still perfectly deterministic and the agent is still embedded in it, because the oracle ensures that determinism is just out of the formal system's reach." Then, Nisan revisits some of this result's core ideas by representing the decision agents as formulas of Peano arithmetic.
AIXI and Existential Dispair. Paul Christiano discusses how an approximate implementation of AIXI could lead to an erratically behaving system.
The Absent-Minded Driver. In this post, Wei Dai examines the absent-minded driver problem. He tries to show how professional philosophers failed to reach the solution to time inconsistency, while rejecting Eliezer s people are crazy explanation.
Clarification of AI Reflection Problem. A clear description for the commonly discussed problem in Less Wrong of reflection in AI systems, along with some possible solutions, by Paul Christiano.
Motivating Optimization Processes. Paul Christiano addresses the question of how and when we can expect an AGI to cooperate with humanity, and how it might be easier to implement than a completely Friendly AGI .
Universal agents and utility functions. Anja Heinisch replaces AIXI's reward function with a utility function .
Ingredients of Timeless Decision Theory, Timeless Decision Theory: Problems I Can t Solve, Timeless Decision Theory and Meta-Circular Decision Theory. Eliezer s posts describe the main details of his Timeless Decision Theory, some problems for which he doesn t possess decision theories and reply to Gary Drescher s comment describing Meta-Circular Decision Theory. These insights later culminated in Yudkowsky (2010).
Confusion about Newcomb is confusion about counterfactuals, Why we need to reduce could , would , should , Decision theory: why Pearl helps reduce could and would , but still leaves us with at least three alternatives. Anna Salamon s posts use causal Bayes nets to explore the difficulty of interpreting counterfactual reasoning and the related concepts of should, could, and would."
Bayesian Utility: Representing Preference by Probability Measures. Vladimir Nesov presents a transformation of the standard expected utility formula.
A definition of wireheading. Anja attempts to reach a definition of wireheading that encompasses the intuitions about the concept that have emerged from LW discussions.
Why you must maximize expected utility. Benja presents a slight variant on the Von Neumann-Morgenstern approach to the axiomatic justification of the principle of maximizing expected utility.
A utility-maximizing varient of AIXI. Alex Mennen builds on Anja's specification of a utility-maximizing variant of AIXI.
A fungibility theorem. Nisan s alternative to the von Neumann-Morgenstern theorem, proposing the maximization of the expectation of a linear aggregation of one s values.
Logical uncertainty, kind of. A proposal, at least. Manfred's proposed solution on how to apply the basic laws of belief manipulation to cases where an agent is computationally limited.
Save the princess: A tale of AIXI and utility functions. Anja discusses utility functions, delusion boxes, and cartesian dualism in attempting to improve upon the original AIXI formalism.
Naturalism versus unbounded (or unmaximisable) utility options. Stuart Armstrong poses a series of questions regarding unbounded utility functions.
Beyond Bayesians and Frequentists. Jacob Steinhardt compares two approaches to statistics and discusses when to use them.
VNM agents and lotteries involving an infinite number of possible outcomes. AlexMennen summarizes: The VNM utility theorem only applies to lotteries that involve a finite number of possible outcomes. If an agent maximizes the expected value of a utility function when considering lotteries that involve a potentially infinite number of outcomes as well, then its utility function must be bounded.
A Problem with playing chicken with the universe. Karl explains how a model of UDT with a halting oracle might have some problematic elements.
Intelligence Metrics and Decision Theories, Metatickle Intelligence Metrics and Friendly Utility Functions. Squark reviews a few previously proposed mathematical metrics of general intelligence and proposes his own approach.
Probabilistic L�b theorem, Logic in the Language of Probability. Stuart Armstrong looks at whether reflective theories of logical uncertainty still suffer from L�b's theorem.
Notes on logical priors from the MIRI workshop. Vladimir Slepnev summarizes: "In Counterfactual Mugging with a logical coin, a 'stupid' agent that can't compute the outcome of the coinflip should agree to pay, and a 'smart' agent that considers the coinflip as obvious as 1=1 should refuse to pay. But if a stupid agent is asked to write a smart agent, it will want to write an agent that will agree to pay. Therefore the smart agent who refuses to pay is reflectively inconsistent in some sense. What's the right thing to do in this case?"
Cooperating with agents with different ideas of fairness, while resisting exploitation. Eliezer Yudkowsky investigates some ideas from the MIRI workshop that he hasn’t seen in informal theories of negotiation.
Naturalistic trust among AIs: The parable of the thesis advisor’s theorem. Benja discusses Nik Weaver's suggestion for 'naturalistic trust'.

Ethics

The Metaethics Sequence. Eliezer explains his theory of metaethics. Many readers have difficulty grokking his central points, and may find clarifications in the discussion here.
No-Nonsense Metaethics. Lukeprog begins to outline his theory of metaethics in this unfinished sequence.
The Fun Theory Sequence. Eliezer develops a new subfield of ethics, "fun theory."
Consequentialism Need Not Be Near-Sighted. Orthonormal's summary: "If you object to consequentialist ethical theories because you think they endorse horrible or catastrophic decisions, then you may instead be objecting to short-sighted utility functions or poor decision theories."
A (small) critique of total utilitarianism. Stuart Armstrong analyzes some weaknesses of total utilitarianism.
In the Pareto world, liars prosper. Stuart Armstrong presents a new picture proof of a previously known result, that "if there is any decision process that will find a Pareto outcome for two people, it must be that liars will prosper: there are some circumstances where you would come out ahead if you were to lie about your utility function."
Politics as Charity, Probability and Politics. Carl Shulman analyzes the prospects for doing effective charity work by influencing elections.
Value Uncertainty and the Singleton Scenario. Wei Dai examines the problem of value uncertainty, a special case of moral uncertainty in which consequentialism is assumed.
Pascal's Mugging: Tiny Probabilities of Vast Utilities. Eliezer describes the problem of Pascal's Mugging, later published in Bostrom (2009).
Ontological Crisis in Humans. Wei Dai presents the ontological crisis concept applied to human existence and goes over some examples.
Ideal Advisor Theories and Personal CEV. Luke Muehlhauser and crazy88 place CEV in the context of mainstream moral philosophy, and use a variant of CEV to address a standard line of objections to ideal advisor theories in ethics..
Harsanyi s Social Aggregation Theorem and what it means for CEV. Alex Mennen describes the relevance of Harsanyi's Social Aggregation Theorem to possible formalizations of CEV.
Three Kinds of Moral Uncertainty. Kaj Sotala's attempts to explain what it means to be uncertain about moral theories.
A brief history of ethically concerned scientists. Kaj_Sotala gives historical examples of ethically concerned scientists.
Pascal s Mugging for bounded utility functions. Benja s post describing the Pascal Mugging problem under truly bounded utility functions.
Pascal's Muggle: Infinitesimal Priors and Strong Evidence. Eliezer Yudkowsky discusses the role of infinitesimal priors and their decision-theoretic consequences in a "Pascal's Mugging"-type situation.
An Attempt at Preference Uncertainty Using VNM. nyan_sandwich tackles the problem of making decisions when you are uncertain about what your object-level preferences should be.
Gains from trade: Slug versus Galaxy - how much would I give up to control you? and Even with default points, systems remain exploitable. Stuart_Armstrong provides a suggestion as to how to split the gains from trade in some situations and how such solutions can be exploitable.
On the importance of taking limits: Infinite Spheres of Utility. aspera shows that "if we want to make a decision based on additive utility, the infinite problem is ill posed; it has no unique solution unless we take on additional assumptions."
Change the labels, undo infinitely good. Stuart_Armstrong talks about a small selection of paradoxes connected with infinite ethics.

AI Risk Strategy

AI Risk and Opportunity: A Strategic Analysis. The only original work here so far is the two-part history of AI risk thought, including descriptions of works previously unknown to the LW/SI/FHI community (e.g. Good 1959, 1970, 1982; Cade 1966).
AI timeline predictions: Are we getting better?, AI timeline prediction data. A preview of Stuart Armstrong's and Kaj Sotala's work on AI predictions, later published in Armstrong & Sotala (2012).
Self-Assessment in Expert Ai Prediction. Stuart_Armstrong suggests that the predictive accuracy of self selected experts might be different from those elicited from less selected groups.
Kurzweil's predictions: good accuracy, poor self-calibration. A new analysis of Kurzweil's predictions, by Stuart Armstrong.
Tools versus agents, Reply to Holden on Tool AI. Stuart Armstrong and Eliezer examine Holden's proposal for Tool AI.
What is the best compact formalization of the argument for AI risk from fast takeoff? A suggestion of a few steps on how to compact and clarify the argument for the Singularity Institute s Big Scary Idea , by LW user utility monster.
The Hanson-Yudkowsky AI-Foom Debate. The 2008 debate between Robin Hanson and Eliezer Yudkowsky, largely used to exemplify the difficulty of resolving disagreements even between expert rationalists. It focus on the likelihood of hard AI takeoff, the need for a theory of Friendliness, the future of AI, brain emulations and recursive improvement.
What can you do with an Unfriendly AI? Paul Christiano discusses how we could eventually turn an Unfriendly AI into a useful system by filtering the way it interacts and constraining how its answers are given to us.
Cryptographic Boxes for Unfriendly AI. Following the tone of the previous post and AI boxing in general, this post by Paul Christiano explores the possibility of using cryptography as a way to guarantee friendly outputs.
How can I reduce existential risk from AI?. A post by Luke Muehlhauser describing the Meta, Strategic and Direct work one could do in order to reduce the risk for humanity stemming from AI.
Intelligence explosion vs Co-operative explosion. Kaj Sotala s description of how an intelligence explosion could emerge from the cooperation of individual artificial intelligent systems forming a superorganism.
Assessing Kurzweil: The Results; Assessing Kurzweil: The Gory Details. Stuart_Armstrong's attempt to evaluate the accuracy of Ray Kurzweil's model of technological intelligence development.
Domesticating reduced impact AIs. Stuart Armstrong attempts to give a solid foundation from which one can build a 'reduced impact AI'.
Why Ai may not foom. John_Maxwell_IV explains how the intelligence of a self-improving AI may not grow as fast as we might think.
Singleton: the risks and benefits of one world governments. Stuart_Armstrong attempts to lay out a reasonable plan for tackling the singleton problem.
Do Earths with slower economic growth have a better chance at FAI?. Eliezer Yudkowsky argues that GDP growth acceleration may actually decrease our chances of getting FAI.
Reduced impact AI: no back channels. Stuart_Armstrong presents a further development of the reduced impact AI approach.
International cooperation vs. AI arms race. Brian_Tomasik talks about the role of government in a possible AI arms race.

Suppose that our data are coin flips, and consider three hypotheses: H0 = always heads, H1 = fair coin, H2 = heads with probability 25%. Now suppose that the two hypotheses we actually want to test between are H0 and H' = 0.5(H1+H2). After seeing a single heads, the likelihood of H0 is 1 and the likelihood of H' is 0.5(0.5+0.25). After seeing two heads, the likelihood of H0 is 1 and the likelihood of H' is 0.5(0.5^2+0.25^2). In general, the likelihood of H' after n heads is 0.5(0.5^n+0.25^n), i.e. a mixture of multiple geometric functions. In general if H' is a mixture of many hypotheses, the likelihood will be a mixture of many geometric functions, and therefore could be more or less arbitrary.

That's why I specified single possible worlds / hypotheses with no internal parameters that are being learned.

51

Original Research on Less Wrong

51

General philosophy

Decision theory / AI architectures / mathematical logic

Ethics

AI Risk Strategy

51

51

Original Research on Less Wrong

51

General philosophy

Decision theory / AI architectures / mathematical logic

Ethics

AI Risk Strategy

51