[Link] Small-game fallacies: a Problem for Prediction Markets

10 Antisuji 28 May 2015 03:32AM

Nick Szabo writes about the dangers of taking assumptions that are valid in small, self-contained games and applying them to larger, real-world "games," a practice he calls a small-game fallacy.

Interactions between small games and large games infect most works of game theory, and much of microeconomics, often rendering such analyses useless or worse than useless as a guide for how the "players" will behave in real circumstances. These fallacies tend to be particularly egregious when "economic imperialists" try to apply the techniques of economics to domains beyond the traditional efficient-markets domain of economics, attempting to bring economic theory to bear to describe law, politics, security protocols, or a wide variety of other institutions that behave very differently from efficient markets. However as we shall see, small-game fallacies can sometimes arise even in the analysis of some very market-like institutions, such as "prediction markets."

This last point, which he expands on later in the post, will be of particular interest to some readers of LW. The idea is that while a prediction market does incentivize feeding accurate information into the system, the existence of the market also gives rise to parallel external incentives. As Szabo glibly puts it,

A sufficiently large market predicting an individual's death is also, necessarily, an assassination market...

Futarchy, it seems, will have some kinks to work out.

[Link] YC President Sam Altman: The Software Revolution

4 Antisuji 19 February 2015 05:13AM

Writing about technological revolutions, Y Combinator president Sam Altman warns about the dangers of AI and bioengineering (discussion on Hacker News):

Two of the biggest risks I see emerging from the software revolution—AI and synthetic biology—may put tremendous capability to cause harm in the hands of small groups, or even individuals.

I think the best strategy is to try to legislate sensible safeguards but work very hard to make sure the edge we get from technology on the good side is stronger than the edge that bad actors get. If we can synthesize new diseases, maybe we can synthesize vaccines. If we can make a bad AI, maybe we can make a good AI that stops the bad one.

The current strategy is badly misguided. It’s not going to be like the atomic bomb this time around, and the sooner we stop pretending otherwise, the better off we’ll be. The fact that we don’t have serious efforts underway to combat threats from synthetic biology and AI development is astonishing.

On the one hand, it's good to see more mainstream(ish) attention to AI safety. On the other hand, he focuses on the mundane (though still potentially devastating!) risks of job destruction and concentration of power, and his hopeful "best strategy" seems... inadequate.

[LINK] How Long Does Habit Formation Take?

17 Antisuji 04 January 2014 01:33AM

Related: Common failure modes in habit formation

I ran across this bit of pop-sci (a review of Jeremy Dean's Making Habits, Breaking Habits), which claims that habits typically take around 66 days to form, not the 21 days that self-help articles tend to cite. The somewhat surprising thing to me, on reflection, was how readily I'd taken the 21-day statistic as fact. From the article:

When he became interested in how long it takes for us to form or change a habit, psychologist Jeremy Dean found himself bombarded with the same magic answer from popular psychology websites and advice columns: 21 days. And yet, strangely — or perhaps predictably, for the internet — this one-size-fits-all number was being applied to everything from starting a running regimen to keeping a diary, but wasn’t backed by any concrete data.

The original article is here. Abstract:

To investigate the process of habit formation in everyday life, 96 volunteers chose an eating, drinking or activity behaviour to carry out daily in the same context (for example ‘after breakfast’) for 12 weeks. They completed the self-report habit index (SRHI) each day and recorded whether they carried out the behaviour. The majority (82) of participants provided sufficient data for analysis, and increases in automaticity (calculated with a sub-set of SRHI items) were examined over the study period. Nonlinear regressions fitted an asymptotic curve to each individual's automaticity scores over the 84 days. The model fitted for 62 individuals, of whom 39 showed a good fit. Performing the behaviour more consistently was associated with better model fit. The time it took participants to reach 95% of their asymptote of automaticity ranged from 18 to 254 days; indicating considerable variation in how long it takes people to reach their limit of automaticity and highlighting that it can take a very long time. Missing one opportunity to perform the behaviour did not materially affect the habit formation process. With repetition of a behaviour in a consistent context, automaticity increases following an asymptotic curve which can be modelled at the individual level. [My emphasis.]

My comments:

  • There is an observed “automaticity plateau.” Can individuals influence the height of the plateau through interventions such as rewards? Would this change the exponential rate constant? Or do we have less control over these things than we think?
  • 95% of maximum automaticity doesn't quite seem like the right metric to use to describe habit formation, especially if the maximum is on the low side.
  • Presumably you'd need familiarity with the SRHI survey to answer this, but it's not clear to me what an automaticity score of 40 really means. (Examples or a baseline might help: what's my automaticity for toothbrushing? checking email?)
  • N=96 seems small. It seems slightly problematic that the 14 participants who dropped out were not included in the analysis, and rather problematic that they used a 3-parameter model and only got a ‘good fit’ for half of the participants. (I'm not an expert in this, so I'd appreciate knowing if my intuitions here are right.)
  • It seems that changing habits is harder than I'd previously thought, at least in the absence of CFAR-like techniques. (Which we still don't know if it works, as far as I know. I'm looking forward to their research.)

Interactive Infographic on Simpson's Paradox

25 Antisuji 20 September 2013 05:37PM

Since Simpson's Paradox has been discussed here recently (and not so recently), I thought I'd share this interactive1 infographic that I found via the FlowingData blog. I already understood Simpson's Paradox pretty well, but playing with the sliders helped me get a more intuitive feel for it.

I expect similar tools would be helpful for explaining Bayes' Theorem and some of the other things we talk about on LW (like Pareto efficiency and Nash equilibria). Do such things exist?

 


1 The interactive part is farther down the page.

Stats Advice on a New N-back Game

4 Antisuji 29 May 2013 09:44PM

Cross-posted to my blog. I expect this will be of some interest to the LessWrong community both because of previous interest in N-back and because of the opportunity to apply Bayesian statistics to a real-world problem. The main reason I'm writing this article is to get feedback on my approach and to ask for help in the areas where I'm stuck. For some background, I'm a software developer who's been working in games for 7+ years and recently left my corporate job to work on this project full-time.

As I mentioned here and here, since early February I've been working on an N-back-like mobile game. I plan to release for iOS this summer and for Android a few months later if all goes well. I have fully implemented the core gameplay and most of the visual styling and UI, and am currently working with a composer on the sound and music.

I am just now starting on the final component of the game: an adaptive mode that assesses the player's skill and presents challenges that are tuned to induce a state of flow.

The Problem

The game is broken down into waves, each of which presents an N-back-like task with certain parameters, such as the number of attributes, the number of variants in each attribute, the tempo, and so on. I would like to find a way to collapse these parameters into a single difficulty parameter that I can compare against a player's skill level to predict their performance on a given wave.

But I realize that some players will be better at some challenges than others (e.g. memory, matching multiple attributes, handling fast tempos, dealing with visual distractions like rotation, or recognizing letters). Skill and difficulty are multidimensional quantities, and this makes performance hard to predict. The question is, is there a single-parameter approximation that delivers an adequate experience? Additionally, the task is not pure N-back — I've made it more game-like — and as a result the relationship between the game parameters and the overall difficulty is not as straightforward as it would be in a cleaner environment (e.g. difficulty might be nearly linear in tempo for some set-ups but highly non-linear for others).

I have the luxury of having access to fairly rich behavioral data. The game is partly a rhythm game, so not only do I know whether a match has been made correctly (or a non-match correctly skipped) but I also know the timing of a player's positive responses. A player with higher skill should have smaller timing errors, so a well-timed match is evidence for higher skill. I am still unsure exactly how I can use this information optimally.

I plan to display a plot of player skill over time, but this opens another set of questions. What exactly am I plotting? How do I model player skill over time (just a time-weighted average? as a series of slopes and plateaus? how should I expect skill to change over a period of time without any play?)? How much variation in performance is due to fatigue, attention, caffeine, etc.? Do I show error bars or box plots? What units do I use?

And finally, how do I turn a difficulty and a skill level into a prediction of performance? What is the model of the player playing the game?

Main Questions

  • Is there an adequate difficulty parameter and if so how do I calculate it?
  • Can I use timing data to improve predictions? How?
  • What model do I use for player skill changing over time?
  • How do I communicate performance stats to the user? Box and whiskers? Units?
  • What is the model of the player and how do I turn that into a prediction?

My Approach

I've read Sivia, so I have some theoretical background on how to solve this kind of problem, but only limited real-world experience. These are my thoughts so far.

Modeling gameplay performance as Bernoulli trials seems ok. That is, given a skill level S and a difficulty D, performance on a set of N matches should be closely matched by N Bernoulli trials with probability of success p(S, D) as follows:

  • if S ≪ D, p = 0.5
  • if S ≫ D, p is close to 1.0 (how close?)
  • if S = D, p = 0.9 feels about right
  • etc.

Then I can update S (and maybe D? see next paragraph) on actual player performance. This will result in a new probability density function over the "true" value of S, which will hopefully be unimodal and narrow enough to report as a single best estimate (possibly with error bars). Which reminds me, what do I use as a prior for S? And what happens if the player just stops playing halfway through, or hands the game to their 5-year-old?

Determining difficulty is another hard problem. I currently have a complicated ad-hoc formula that I cobbled together with logarithms, exponentials, and magic numbers, and lots of trial and error. It seems to work pretty well for the limited set of levels I've tested with a small group of playtesters, but I'm worried that it won't predict difficulty well outside of that domain. One possibility is to croud-source it: after release I'd collect performance data across all users and update the difficulty ratings on the fly. This seems risky and difficult, and the initial difficulty ratings might be way off, which would lead to poor initial user experiences with the adaptive mode. I would also have to worry about maintaining a server back-end to gather the data and report on updated difficulty levels.

Request For Feedback

So, any suggestions on how to tackle these problems? Or the first place to start looking?

I'm pretty excited about the potential to collect real-world data on skill acquisition over time. If there is sufficient interest I'll consider making the raw data public, and even instrument the code to collect other data of interest, by request. I do have some concerns over data privacy, so I may allow users to opt out of sending their data up to the server.

[LINK] "Moral Machines" article in the New Yorker links to SI paper

16 Antisuji 28 November 2012 01:38AM

Link

Within two or three decades the difference between automated driving and human driving will be so great you may not be legally allowed to drive your own car, and even if you are allowed, it would immoral of you to drive, because the risk of you hurting yourself or another person will be far greater than if you allowed a machine to do the work.

That moment will be significant not just because it will signal the end of one more human niche, but because it will signal the beginning of another: the era in which it will no longer be optional for machines to have ethical systems.

The discussion itself is mainly concerned with the behavior of self-driving cars and robot soldiers rather than FAI, but Marcus does obliquely reference the prickliness of the problem. After briefly introducing wireheading (presumably as an example of what can go wrong), he links to http://singularity.org/files/SaME.pdf, saying:

Almost any easy solution that one might imagine leads to some variation or another on the Sorceror’s Apprentice, a genie that’s given us what we’ve asked for, rather than what we truly desire.

He also mentions FHI and Yale Bioethics Center along with SingInst:

A tiny cadre of brave-hearted souls at Oxford, Yale, and the Berkeley California Singularity Institute are working on these problems, but the annual amount of money being spent on developing machine morality is tiny.

It's a mainstream introduction, and perhaps not the best or most convincing one, but I think it's a positive development that machine ethics is getting a serious treatment in the mainstream media.