You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

A counterfactual and hypothetical note on AI safety design

9 Stuart_Armstrong 11 March 2015 04:20PM

A putative new idea for AI control; index here.

A lot of the new ideas I've been posting could be parodied as going something like this:

The AI A, which is utility indifferent to the existence of AI B, has utility u (later corriged to v', twice), and it will create a subagent C which believe via false thermodynamic miracles that D does not exist, while D' will hypothetically and counterfactually use two different definitions of counterfactual so that the information content of its own utility cannot be traded with a resource gathering agent E that doesn't exist (assumed separate from its unknown utility function)...

What is happening is that I'm attempting to define algorithms that accomplish a particular goal (such as obeying the spirit of a restriction, or creating a satisficer). Typically this algorithm has various underdefined components - such as inserting an intelligent agent at a particular point, controlling the motivation of an agent at a point, effectively defining a physical event, or having an agent believe (or act as if they believed) something that was incorrect.

The aim is to reduce the problem from stuff like "define human happiness" to stuff like "define counterfactuals" or "pinpoint an AI's motivation". These problems should be fundamentally easier - if not for general agents, then for some of the ones we can define ourselves (this may also allow us to prioritise research directions).

And I also have no doubt that once a design is available, that it will be improved upon and transformed and made easier to implement and generalise. Therefore I'm currently more interested in objections of the form "it won't work" than "it can't be done".

Are Cognitive Biases Design Flaws?

1 DonaldMcIntyre 25 February 2015 09:02PM

I am a newbie so today I read the article by Eliezer Yudkowski "Your Strength As A Rationalist" which helped me understand the focus of LessWrong, but I respectfully disagreed with a line that is written in the last paragraph:

It is a design flaw in human cognition...

So this was my comment in the article's comment section which I bring here for discussion:

Since I think evolution makes us quite fit to our current environment I don't think cognitive biases are design flaws, in the above example you imply that even if you had the information available to guess the truth, your guess was another one and it was false, therefore you experienced a flaw in your cognition.

My hypotheses is that reaching the truth or communicating it in the IRC may have not been the end objective of your cognitive process, in this case just to dismiss the issue as something that was not important anyway "so move on and stop wasting resources in this discussion" was maybe the "biological" objective and as such it should be correct, not a flaw.

If the above is true then all cognitive bias, simplistic heuristics, fallacies, and dark arts are good since we have conducted our lives for 200,000 years according to these and we are alive and kicking.

Rationality and our search to be LessWrong, which I support, may be tools we are developing to evolve in our competitive ability within our species, but not a "correction" of something that is wrong in our design.

Edit 1: I realize there is change in the environment and that may make some of our cognitive biases, which were useful in the past, to be obsolete. If the word "flaw" is also applicable to describe something that is obsolete then I was wrong above. If not, I prefer the word obsolete to characterize cognitive biases that are no longer functional for our preservation.

[Sequence announcement] Introduction to Mechanism Design

55 badger 30 April 2014 04:21PM

Mechanism design is the theory of how to construct institutions for strategic agents, spanning applications like voting systems, school admissions, regulation of monopolists, and auction design. Think of it as the engineering side of game theory, building algorithms for strategic agents. While it doesn't have much to say about rationality directly, mechanism design provides tools and results for anyone interested in world optimization.

In this sequence, I'll touch on

  • The basic mechanism design framework, including the revelation principle and incentive compatibility.
  • The Gibbard-Satterthwaite impossibility theorem for strategyproof implementation (a close analogue of Arrow's Theorem), and restricted domains like single-peaked or quasilinear preference where we do have positive results.
  • The power and limitations of Vickrey-Clarke-Groves mechanisms for efficiently allocating goods, generalizing Vickrey's second-price auction.
  • Characterizations of incentive-compatible mechanisms and the revenue equivalence theorem.
  • Profit-maximizing auctions.
  • The Myerson-Satterthwaite impossibility for bilateral trade.
  • Two-sided matching markets à la Gale and Shapley, school choice, and kidney exchange.

As the list above suggests, this sequence is going to be semi-technical, but my foremost goal is to convey the intuition behind these results. Since mechanism design builds on game theory, take a look at Yvain's Game Theory Intro if you want to brush up.

Various resources:

I plan on following up on this sequence with another focusing on group rationality and information aggregation, surveying scoring rules and prediction markets among other topics.

Suggestions and comments are very welcome.

Four major problems with neuroscience

11 NancyLebovitz 22 August 2012 05:25AM

A discussion of four errors which lead to false positives-- neglecting maturation (that brains change with time, even without intervention, learning effects (people who take a test more than once get better at it), regression to the mean (people who are unusually good or bad at something will probably have a more average score on subsequent attempts), and the placebo effect.

The link above is a summary of a lecture which isn't playing for me, so any further information about the lecture would be greatly appreciated.

A Gameplay Exploration of Yudkowsky's "Twelve Virtues"

43 ac3raven 18 May 2011 06:56PM

Hello Less Wrong, this is my first post (kind of).  I belong to a small game development company called Shiny Ogre Games.  We have a vested interest in making games that, as Johnathan Blow puts it, "speak to the human condition."  I am here to announce our next project for you.

In this announcement for Shiny Ogre's next project, There are two points to address.  Firstly:

Thought is a process like any other. The methods by which we think can be identified, specified, defined, categorized and even predicted.  One method of thinking that has been thoroughly defined is rationality.  Many would consider rationality (i.e. the careful exercise of reason), to be an essential path toward enlightenment (hence this).

Secondly: The objective, logical, and mechanical approach to reason that rationality takes, meshes nicely with game development, because any well-defined system can be turn into a game.  A game is a system composed of players making decisions while considering objectives, governed by a rule set.

Where there is no decision there can be no game.  Where decisions matter, a game can make them matter more.

Therefore, rationality is a core component of game playing.

Games are learning tools.  They are perhaps the best learning tool available to humans, because they invoke our biological tendency to play.

With that said, our announcement:

We're making a video game about rationality.

The game will explore rationality in the context of Eliezer Yudkowsky's "Twelve Virtues of Rationality" (which we have permission for).  From a narrative perspective the game takes place inside a mind on the brink of epiphany and will heavily feature themes from Plato's "Allegory of the Cave".

Yudkowsky's twelve virtues are the basis of the twelve levels in the game, and will feature each virtue in metaphorical form.  The underlying message here is that if you master all of the twelve virtues (by completing all of the twelve levels), you will achieve 'epiphany'.

The game is a 2D side-scrolling puzzle-platformer.  The player assumes the role of a figure that represents his/her own conscious mind while it constructs machines (ala "Incredible Machine") that are a metaphor for the thoughts and concepts that one would create while meditating on a complex problem.

We will update our progress and share development information on our website here, as well as with posts on Less Wrong, our twitter account, and the game's website.

You can expect discussions of design decisions for this project to be written frequently from the angle of game design theory.  We may also release a small documentary film of the development process after the release of the game.

A release date has been set (and its not too long from now), but I don't want to announce it just yet.

Here is some concept art for our Curiosity metaphor (you can view more art at our website linked above):

If you're interested, just upvote and/or comment.  If you have any specific queries related to this project or about game design in general, it would be cool if you went here.

We will be sharing our progress as we make this game over the next few months.  So pay attention to Less Wrong and/or shinyogre.com for updates.

Thanks!

 

Designing serious games - a request for help

10 taryneast 22 March 2011 11:29AM

We need some ideas for serious games. Games that will help us be better. Games that reward us for improving ourselves (even if just by the satisfaction of seeing our scores improve). Games that will help us in our quest of Tsoyoku Naritai

We've got an upcoming hackday in London - where we'll have a (small) bunch of people able to code up any good ideas into something usable... but we need **you** to help us come up with a whole bunch of good ideas. 

To start with, they should be simple ideas - not as complex as Rationalist Clue (which is an awesome idea... but we all have dayjobs too). I've got in mind something like the kinds of games you see at luminosity

The ideas should address individual biases - a way of training us to: a) recognise when we've accidentally engaged a bias b) reward us when we find a way to get the "right answer" in an unbiased manner.

 

We can do the programming (more help would of course be welcome), we can even come up with some ideas of our own... 

but we are few, and you are many... and the more ideas we get, the better we can choose between them... so let's roll.