Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Rationality Compendium: Principle 2 - You are implemented on a human brain

5 ScottL 29 August 2015 04:24PM

Irrationality is ingrained in our humanity. It is fundamental to who we are. This is because being human means that you are implemented on kludgy and limited wetware (a human brain). A consequence of this is that biases  and irrational thinking are not mistakes, persay, they are not misfirings or accidental activations of neurons. They are the default mode of operation for wetware that has been optimized for purposes other than truth maximization.

 

If you want something to blame for the fact that you are innately irrational, then you can blame evolution . Evolution tends to not to produce optimal organisms, but instead produces ones that are kludgy , limited and optimized for criteria relating to ancestral environments rather than for criteria relating to optimal thought.

 

A kludge is a clumsy or inelegant, yet surprisingly effective, solution to a problem. The human brain is an example of a kludge. It contains many distinct substructures dating from widely separated periods of evolutionary development . An example of this is the two kinds of processes in human cognition where one is fast (type 1) and the other is slow (type2) 

There are many other characteristics of the brain that induce irrationality. The main ones are that:

  • The brain is innately limited in its computational abilities and so it must use heuristics , which are mental shortcuts that ease the cognitive load of making a decision. 
  • The brain has a tendency to blindly use salient or pre-existing responses to answers rather than developing new answers or thoroughly checking pre-existing solutions 
  • The brain does not inherently value truth. One of the main reasons for this is that many of the biases can actually be adaptive. An example of an adaptive bias is the sexual over perception bias  in men. From a truth-maximization perspective young men who assume that all women want them are showing severe social-cognitive inaccuracies, judgment biases, and probably narcissistic personality disorder. However, from an evolutionary perspective, the same young men are behaving in a more optimal manner. One which has consistently maximized the reproductive success of their male ancestors. Another similar example is the bias for positive perception of partners .
  • The brain acts more like a coherence maximiser than a truth maximiser, which makes people liable to believing falsehoods . If you want to believe something or you are often in situations in which two things just happen to be related then your brain is often by default going to treat them as if they were right 
  • The brain trusts its own version of reality much more than other peoples. This makes people defend their beliefs even when doing so is extremely irrational . It is also makes it hard for people to change their minds  and to accept when they are wrong
  • Disbelief requires System 2 thought . This means that if system 2 is engaged then we are liable to believe pretty much anything. System 1 is gullible and biased to believe. It is system 2 that is in charge of doubting and disbelieving.

One important non-brain related factor is that we must make use of and live with our current adaptations . People cannot reconform themselves to fulfill purposes suitable to their current environment, but must instead make use of pre-existing machinery that has been optimised for other environments. This means that there is probably never going to be any miracle cures to irrationality because eradicating it would require that you were so fundamentally altered that you were no longer human.

 

One of the first major steps on the path to becoming more rational, is the realisation that you are not only by default irrational, but that you are always fundamentally comprimised. This doesn't mean that improving your rationality is impossible. It just means that if you stop applying your knowledge of what improves rationality then you will slip back into irrationality. This is because the brain is a kludge. It works most of the time, but in some cases its innate and natural course of action must be diverted if we are to be rational. The good news is that this kind of diversion is possible. This is because humans possess second order thinking . This means that they can observe their inherent flaws and systematic errors. They can then through studying the laws of thought and action apply second order corrections and from doing so become more rational.  

 

The process of applying these second order corrections or training yourself to mitigate the effects of your propensities is called debiasing . Debiasing is not a thing that you can do once and then forget about. It is something that you must either be doing constantly or that you must instill into habits so that it occurs without volitional effort. There are generally three main types of debaising and they are described below: 

  • Counteracting the effects of bias - this can be done by adjusting your estimates or opinions in order to avoid errors due to biases. This is probably the hardest of the three types of debiasing because to do it correctly you need to know exactly how much you are already biased. This is something that people are rarely aware of.
  • Catching yourself when you are being or could be biased and applying a cogntive override. The basic idea behind this is that you observe and track your own thoughts and emotions so that you can catch yourself before you move to deeply into irrational modes of thinking. This is hard because it requires that you have superb self-awareness skills and these often take a long time to develop and train. Once you have caught yourself it is often best to resort to using formal thought in algebra, logic, probability theory or decision theory etc. It is also useful to instill habits in yourself that would allow this observation to occur without conscious and volitional effort. It should be noted that incorrectly applying the first two methods of debiasing can actually make you more biased and that this is a common conundrum and problem faced by beginners to rationality training 
  • Understanding the situations which make you biased so that you can avoid them  - the best way to achieve this is simply to ask yourself: how can I become more objective? You do this by taking your biased and faulty perspective as much as possible out of the equation. For example, instead of taking measurements yourself you could get them taken automatically by some scientific instrument.

Related Materials

Wikis:

  • Bias - refers to the obstacles to truth which are produced by our kludgy and limited wetware (brains) working exactly the way that they should. 
  • Evolutionary psychology - the idea of evolution as the idiot designer of humans - that our brains are not consistently well-designed - is a key element of many of the explanations of human errors that appear on this website.
  • Slowness of evolution- The tremendously slow timescale of evolution, especially for creating new complex machinery (as opposed to selecting on existing variance), is why the behavior of evolved organisms is often better interpreted in terms of what did in fact work  
  • Alief - an independent source of emotional reaction which can coexist with a contradictory belief. For example, the fear felt when a monster jumps out of the darkness in a scary movie is based on the alief that the monster is about to attack you, even though you believe that it cannot. 
  • Wanting and liking - The reward system consists of three major components:
    • Liking: The 'hedonic impact' of reward, comprised of (1) neural processes that may or may not be conscious and (2) the conscious experience of pleasure.
    • Wanting: Motivation for reward, comprised of (1) processes of 'incentive salience' that may or may not be conscious and (2) conscious desires.
    • Learning: Associations, representations, and predictions about future rewards, comprised of (1) explicitpredictions and (2) implicit knowledge and associative conditioning (e.g. Pavlovian associations). 
  • Heuristics and biases - program in cognitive psychology tries to work backward from biases (experimentally reproducible human errors) to heuristics (the underlying mechanisms at work in the brain). 
  • Cached thought – is an answer that was arrived at by recalling a previously-computed conclusion, rather than performing the reasoning from scratch.  
  • Sympathetic Magic - humans seem to naturally generate a series of concepts known as sympathetic magic, a host of theories and practices which have certain principles in common, two of which are of overriding importance: the Law of Contagion holds that two things which have interacted, or were once part of a single entity, retain their connection and can exert influence over each other; the Law of Similarity holds that things which are similar or treated the same establish a connection and can affect each other. 
  • Motivated Cognition - an academic/technical term for various mental processes that lead to desired conclusions regardless of the veracity of those conclusions.   
  • Rationalization - Rationalization starts from a conclusion, and then works backward to arrive at arguments apparently favoring that conclusion. Rationalization argues for a side already selected; rationality tries to choose between sides.  
  • Opps - There is a powerful advantage to admitting you have made a large mistake. It's painful. It can also change your whole life.
  • Adaptation executors - Individual organisms are best thought of as adaptation-executers rather than as fitness-maximizers. Our taste buds do not find lettuce delicious and cheeseburgers distasteful once we are fed a diet too high in calories and too low in micronutrients. Tastebuds are adapted to an ancestral environment in which calories, not micronutrients, were the limiting factor. Evolution operates on too slow a timescale to re-adapt to adapt to a new conditions (such as a diet).
  • Corrupted hardware - our brains do not always allow us to act the way we should. Corrupted hardware refers to those behaviors and thoughts that act for ancestrally relevant purposes rather than for stated moralities and preferences.
  • Debiasing - The process of overcoming bias. It takes serious study to gain meaningful benefits, half-hearted attempts may accomplish nothing, and partial knowledge of bias may do more harm than good. 
  • Costs of rationality - Becoming more epistemically rational can only guarantee one thing: what you believe will include more of the truth. Knowing that truth might help you achieve your goals, or cause you to become a pariah. Be sure that you really want to know the truth before you commit to finding it; otherwise, you may flinch from it.
  • Valley of bad rationality - It has been observed that when someone is just starting to learn rationality, they appear to be worse off than they were before. Others, with more experience at rationality, claim that after you learn more about rationality, you will be better off than you were before you started. The period before this improvement is known as "the valley of bad rationality".
  • Dunning–Kruger effect - is a cognitive bias wherein unskilled individuals suffer from illusory superiority, mistakenly assessing their ability to be much higher than is accurate. This bias is attributed to a metacognitive inability of the unskilled to recognize their ineptitude. Conversely, highly skilled individuals tend to underestimate their relative competence, erroneously assuming that tasks that are easy for them are also easy for others. 
  • Shut up and multiply - In cases where we can actually do calculations with the relevant quantities. The ability to shut up and multiply, to trust the math even when it feels wrong is a key rationalist skill.  

Posts

Popular Books:

Papers:

  • Haselton, M. (2003). The sexual overperception bias: Evidence of a systematic bias in men from a survey of naturally occurring events. Journal of Research in Personality, 34-47.
  • Hasselton, M., & Buss, D. (2000). Error Management Theory: A New Perspective on Biases in Cross-Sex Mind Reading. Jounral of Personality and Social Psychology, 81-91. 
  • Murray, S., Griffin, D., & Holmes, J. (1996). The Self-Fulfilling Nature of Positive Illusions in Romantic Relationships: Love Is Not Blind, but Prescient. Journal of Personality and Social Psychology,, 1155-1180. 
  • Gilbert, D.T.,  Tafarodi, R.W. and Malone, P.S. (1993) You can't not believe everything you read. Journal of Personality and Social Psychology, 65, 221-233 
 

Notes on decisions I have made while creating this post

 (these notes will not be in the final draft): 

  • This post doesn't have any specific details on debiasing or the biases. I plan to provide these details in later posts. The main point of this post is convey the idea in the title.

Is my brain a utility minimizer? Or, the mechanics of labeling things as "work" vs. "fun"

9 contravariant 28 August 2015 01:12AM

I recently encountered something that is, in my opinion, one of the most absurd failure modes of the human brain. I first encountered this after introspection on useful things that I enjoy doing, such as programming and writing. I noticed that my enjoyment of the activity doesn't seem to help much when it comes to motivation for earning income. This was not boredom from too much programming, as it did not affect my interest in personal projects. What it seemed to be, was the brain categorizing activities into "work" and "fun" boxes. On one memorable occasion, after taking a break due to being exhausted with work, I entertained myself, by programming some more, this time on a hobby personal project (as a freelancer, I pick the projects I work on so this is not from being told what to do). Relaxing by doing the exact same thing that made me exhausted in the first place.

The absurdity of this becomes evident when you think about what distinguishes "work" and "fun" in this case, which is added value. Nothing changes about the activity except the addition of more utility, making a "work" strategy always dominate a "fun" strategy, assuming the activity is the same. If you are having fun doing something, handing you some money can't make you worse off. Making an outcome better makes you avoid it. Meaning that the brain is adopting a strategy that has a (side?) effect of minimizing future utility, and it seems like it is utility and not just money here - as anyone who took a class in an area that personally interested them knows, other benefits like grades recreate this effect just as well. This is the reason I think this is among the most absurd biases - I can understand akrasia, wanting the happiness now and hyperbolically discounting what happens later, or biases that make something seem like the best option when it really isn't. But knowingly punishing what brings happiness just because it also benefits you in the future? It's like the discounting curve dips into the negative region. I would really like to learn where is the dividing line between which kinds of added value create this effect and which ones don't (like money obviously does, and immediate enjoyment obviously doesn't). Currently I'm led to believe that the difference is present utility vs. future utility, (as I mentioned above) or final vs. instrumental goals, and please correct me if I'm wrong here.

This is an effect that has been studied in psychology and called the overjustification effect, called that because the leading theory explains it in terms of the brain assuming the motivation comes from the instrumental gain instead of the direct enjoyment, and then reducing the motivation accordingly. This would suggest that the brain has trouble seeing a goal as being both instrumental and final, and for some reason the instrumental side always wins in a conflict. However, its explanation in terms of self-perception bothers me a little, since I find it hard to believe that a recent creation like self-perception can override something as ancient and low-level as enjoyment of final goals. I searched LessWrong for discussions of the overjustification effect, and the ones I found discussed it in the context of self-perception, not decision-making and motivation. It is the latter that I wanted to ask for your thoughts on.

 

A list of apps that are useful to me. (And other phone details)

9 Elo 22 August 2015 12:24PM

 

I have noticed I often wish "Damn I wish someone had made an app for that" and when I search for it I can't find it.  Then I outsource the search to facebook or other people; and they can usually say - yes, its called X.  Which I can put down to an inability to know how to search for an app on my part; more than anything else.

With that in mind; I wanted to solve the problem of finding apps for other people.

The following is a list of apps that I find useful (and use often) for productive reasons:


The environment

This list is long.  The most valuable ones are the top section that I use regularly.  

Other things to mention:

Internal storage - I have a large internal memory card because I knew I would need lots of space.  So I played the "out of sight out of mind game" and tried to give myself as much space as possible by buying a large internal card.

Battery - I use anker external battery blocks to save myself the trouble of worrying about batteries.  If prepared I leave my house with 2 days of phone charge (of 100% use).  I used to count "wins" of days I beat my phone battery (stay awake longer than it) but they are few and far between.  Also I doubled my external battery power and it sits at two days not one (28000mA + 2*460ma spare phone batteries)

Phone - I have a Samsung S4 (android Running KitKat) because it has a few features I found useful that were not found in many other phones - Cheap, Removable battery, external storage card, replaceable case.

Screen cover - I am using the one that came with the phone still

I carry a spare phone case, in the beginning I used to go through one each month; now I have a harder case than before it hasn't broken.

MicroUSB cables - I went through a lot of effort to sort this out, it's still not sorted, but its "okay for now".  The advice I have - buy several good cables (read online reviews about it), test them wherever possible, and realise that they die.  Also carry a spare or two.

Restart - I restart my phone probably most days when it gets slow.  It's got programming bugs, but this solution works for now.

The overlays

These sit on my screen all the time.

Data monitor - Gives an overview of bits per second upload or download. updated every second.

CpuTemp - Gives an overlay of the current core temperature.  My phone is always hot, I run it hard with bluetooth, GPS and wifi blaring all the time.  I also have a lot of active apps.

Mindfulness bell - My phone makes a chime every half hour to remind me to check, "Am I doing something of high-value right now?" it sometimes stops me from doing crap things.

Facebook chat heads - I often have them open, they have memory leaks and start slowing down my phone after a while, I close and reopen them when I care enough.

 

The normals:

Facebook - communicate with people.  I do this a lot.

Inkpad - its a note-taking app, but not an exceptionally great one; open to a better suggestion.

Ingress - it makes me walk; it gave me friends; it put me in a community.  Downside is that it takes up more time than you want to give it.  It's a mobile GPS game.  Join the Resistance.

Maps (google maps) - I use this most days; mostly for traffic assistance to places that I know how to get to.

Camera - I take about 1000 photos a month.  Generic phone-app one.

Assistive light - Generic torch app (widget) I use this daily.

Hello - SMS app.  I don't like it but its marginally better than the native one.

Sunrise calendar - I don't like the native calendar; I don't like this or any other calendar.  This is the least bad one I have found.  I have an app called "facebook sync" which helps with entering in a fraction of the events in my life.  

Phone, address book, chrome browser.

GPS logger - I have a log of my current gps location every 5 minutes.  If google tracks me I might as well track myself.  I don't use this data yet but its free for me to track; so if I can find a use for the historic data that will be a win.

 

Quantified apps:

Fit - google fit; here for multiple redundancy

S Health - Samsung health - here for multiple redundancy

Fitbit - I wear a flex step tracker every day, and input my weight daily manually through this app

Basis - I wear a B1 watch, and track my sleep like a hawk.

Rescuetime - I track my hours on technology and wish it would give a better breakdown. (I also paid for their premium service)

Voice recorder - generic phone app; I record around 1-2 hours of things I do per week.  Would like to increase that.

Narrative - I recently acquired a life-logging device called a narrative, and don't really know how to best use the data it gives.  But its a start.

How are you feeling? - Mood tracking app - this one is broken but the best one I have found, it doesn't seem to open itself after a phone restart; so it won't remind you to enter in a current mood.  I use a widget so that I can enter in the mood quickly.  The best parts of this app are the way it lets you zoom out, and having a 10 point scale.  I used to write a quick sentence about what I was feeling, but that took too much time so I stopped doing it.

Stopwatch - "hybrid stopwatch" - about once a week I time something and my phone didn't have a native one.  This app is good at being a stopwatch.

Callinspector - tracks ingoing or outgoing calls and gives summaries of things like, who you most frequently call, how much data you use, etc.  can also set data limits.

 

Misc

Powercalc - the best calculator app I could find

Night mode - for saving batter (it dims your screen), I don't use this often but it is good at what it does.  I would consider an app that dims the blue light emitted from my screen; however I don't notice any negative sleep effects so I have been putting off getting around to it.

Advanced signal status - about once a month I am in a place with low phone signal - this one makes me feel better about knowing more details of what that means.

Ebay - To be able to buy those $5 solutions to problems on the spot is probably worth more than $5 of "impulse purchases" that they might be classified as.

Cal - another calendar app that sometimes catches events that the first one misses.

ES file explorer - for searching the guts of my phone for files that are annoying to find.  Not as used or as useful as I thought it would be but still useful.

Maps.Me - I went on an exploring adventure to places without signal; so I needed an offline mapping system.  This map saved my life.

Wikipedia - information lookup

Youtube - don't use it often, but its there.

How are you feeling? (again) - I have this in multiple places to make it as easy as possible for me to enter in this data

Play store - Makes it easy to find.

Gallery - I take a lot of photos, but this is the native gallery and I could use a better app.

 

Social

In no particular order;

Facebook groups, Yahoo Mail, Skype, Facebook Messenger chat heads, Whatsapp, meetup, google+, Hangouts, Slack, Viber, OKcupid, Gmail, Tinder.

They do social things.  Not much to add here.

 

Not used:

Trello

Workflowy

pocketbook

snapchat

AnkiDroid - Anki memoriser app for a phone.

MyFitnessPal - looks like a really good app, have not used it 

Fitocracy - looked good

I got these apps for a reason; but don't use them.

 

Not on my front pages:

These I don't use as often; or have not moved to my front pages (skipping the ones I didn't install or don't use)

S memo - samsung note taking thing, I rarely use, but do use once a month or so.

Drive, Docs, Sheets - The google package.  Its terrible to interact with documents on your phone, but I still sometimes access things from my phone.

bubble - I don't think I have ever used this

Compass pro - gives extra details about direction. I never use it.

(ingress apps) Glypher, Agentstats, integrated timer, cram, notify

TripView (public transport app for my city)

Convertpad - converts numbers to other numbers. Sometimes quicker than a google search.

ABC Iview - National TV broadcasting channel app.  Every program on this channel is uploaded to this app, I have used it once to watch a documentary since I got the app.

AnkiDroid - I don't need to memorise information in the way it is intended to be used; so I don't use it. Cram is also a flashcard app but I don't use it.

First aid - I know my first aid but I have it anyway for the marginal loss of 50mb of space.

Triangle scanner - I can scan details from NFC chips sometimes.

MX player - does videos better than native apps.

Zarchiver - Iunno.  Does something.

Pandora - Never used

Soundcloud - used once every two months, some of my friends post music online.

Barcode scanner - never used

Diskusage - Very useful.  Visualises where data is being taken up on your phone, helps when trying to free up space.

Swiftkey - Better than native keyboards.  Gives more freedom, I wanted a keyboard with black background and pale keys, swiftkey has it.

Google calendar - don't use it, but its there to try to use.

Sleepbot - doesn't seem to work with my phone, also I track with other methods, and I forget to turn it on; so its entirely not useful in my life for sleep tracking.

My service provider's app.

AdobeAcrobat - use often; not via the icon though.

Wheresmydroid? - seems good to have; never used.  My phone is attached to me too well for me to lose it often.  I have it open most of the waking day maybe.

Uber - I don't use ubers.

Terminal emulator, AIDE, PdDroid party, Processing Android, An editor for processing, processing reference, learn C++ - programming apps for my phone, I don't use them, and I don't program much.

Airbnb - Have not used yet, done a few searches for estimating prices of things.

Heart rate - measures your heart rate using the camera/flash.  Neat, not useful other than showing off to people how its possible to do.

Basis - (B1 app), - has less info available than their new app

BPM counter - Neat if you care about what a "BPM" is for music.  Don't use often.

Sketch guru - fun to play with, draws things.

DJ studio 5 - I did a dj thing for a friend once, used my phone.  was good.

Facebook calendar Sync - as the name says.

Dual N-back - I Don't use it.  I don't think it has value giving properties.

Awesome calendar - I don't use but it comes with good reccomendations.

Battery monitor 3 - Makes a graph of temperature and frequency of the cores.  Useful to see a few times.  Eventually its a bell curve.

urbanspoon - local food places app.

Gumtree - Australian Ebay (also ebay owns it now)

Printer app to go with my printer

Car Roadside assistance app to go with my insurance

Virgin air entertainment app - you can use your phone while on the plane and download entertainment from their in-flight system.


Two things now;

What am I missing? Was this useful?  Ask me to elaborate on any app and why I used it.  If I get time I will do that anyway. 

P.S. this took two hours to write.

P.P.S - I was intending to make, keep and maintain a list of useful apps, that is not what this document is.  If there are enough suggestions that it's time to make and keep a list; I will do that.

How to fix academia?

8 passive_fist 20 August 2015 12:50AM

I don't usually submit articles to Discussion, but this news upset me so much that I think there is a real need to talk about it.

http://www.nature.com/news/faked-peer-reviews-prompt-64-retractions-1.18202

A leading scientific publisher has retracted 64 articles in 10 journals, after an internal investigation discovered fabricated peer-review reports linked to the articles’ publication.

The cull comes after similar discoveries of ‘fake peer review’ by several other major publishers, including London-based BioMed Central, an arm of Springer, which began retracting 43 articles in March citing "reviews from fabricated reviewers". The practice can occur when researchers submitting a paper for publication suggest reviewers, but supply contact details for them that actually route requests for review back to the researchers themselves.

Types of Misconduct

We all know that academia is a tough place to be in. There is constant pressure to 'publish or perish', and people are given promotions and pay raises directly as a result of how many publications and grants they are awarded. I was awarded a PhD recently so the subject of scientific honesty is dear to my heart.

I'm of course aware of misconduct in the field of science. 'Softer' forms of misconduct include things like picking only results that are consistent with your hypothesis or repeating experiments until you get low p-values. This kind of thing sometimes might even happen non-deliberately and subconsciously, which is why it is important to disclose methods and data.

'Harder' forms of misconduct include making up data and fudging numbers in order to get published and cited. This is of course a very deliberate kind of fraud, but it is still easy to see how someone could be led to this kind of behaviour by virtue of the incredible pressures that exist. Here, the goal is not just academic advancement, but also obtaining recognition. The authors in this case are confident that even though their data is falsified, their reasoning (based, of course, on falsified data) is sound and correct and stands up to scrutiny.

What is the problem?

But the kind of misconduct being mentioned in the linked article is extremely upsetting to me, beyond the previous types of misconduct. It is a person or (more likely) a group of people knowing full well that their publication would not stand up to serious scientific scrutiny. Yet they commit the fraud anyway, guessing that no one will actually ever seriously scrutinize their work and it will take it at face value due to being present in a reputable journal. The most upsetting part is that they are probably right in this assessment

Christie Aschwanden wrote a piece about this recently on FiveThirtyEight. She makes the argument that cases of scientific misconduct are still rare and not important in the grand scheme of things. I only partially agree with this. I agree that science is still mostly trustworthy, but I don't necessarily agree that scientific misconduct is too rare to be worth worrying about. It would be much more honest to say that we simply do not know the extent of scientific misconduct, because there is no comprehensive system in place to detect it. Surveys on this have indicated that as much as 1/3 of scientists admit to some form of questionable practices, with 2% admitting to downright fabrication or falsification of evidence. These figures could be widely off the mark. It is, unfortunately, easy to commit fraud without being detected.

Aschwanden's conclusion is that the problem is that science is difficult. With this I agree wholeheartedly. And to this I'd add that science has probably become too big. A few years ago I did some research in the area of nitric oxide (NO) transmission in the brain. I did a search and found 55,000 scientific articles from reputable publications with "nitric oxide" in the title. Today this number is over 62,000. If you expand this to both the title and abstract, you get about 160,000. Keep in mind that these are only the publications that have actually passed the process of peer review.

I have read only about 1,000 articles total during the entirety of my PhD, and probably <100 in the actual level of depth required to locate flaws in reasoning. The problem with science becoming too big is that it's easy to hide things. There are always going to be fewer fact-checkers than authors, and it is much harder to argue logically about things than it is to simply write things. The more the noise, the harder it becomes to listen.

It was not always this way. The rate of publication is increasing rapidly, outstripping even the rate of growth in number of scientists. Decades ago publications played only a minor role in the scientific process. Publications mostly had the role of disseminating important information to a large audience. Today, the opposite is true - most articles have a small audience (as, in people with the will and ability to read them), consisting of perhaps only a handful of individuals - often only the people in the same research group of institutional department. This leads to the problem where it is often seen that many publications actually receive most of their citations from people who are friends or colleagues of the authors.

Some people have suggested that because of the recent high-level cases of fraud that have been uncovered, there is now increased scrutiny and fraud is going to be uncovered more rapidly. This may be true for the types of fraud that already have been uncovered, but fraudsters are always going to be able to stay ahead of the scrutinizers. Experience with other forms of crime show this quite clearly. Before the article in nature I had never even thought about the possibility of sending reviews back to myself. It simply never occurred to me. All of these considerations lead me to believe that the problem of scientific fraud may actually get worse, not better, over time. Unless the root of the problem is attacked.

How Can it be Solved?

So how to solve the problem of scientific misconduct? I don't have any good answers. I can think of things like "Stop awarding people for mere number of publications" and "Gauge the actual impact of science rather than empty metrics like number of citations or impact factor." But I can't think of any good way to do these things. Some alternatives - like using, for instance, social media to gauge the importance of a scientific discovery - would almost certainly lead to a worse situation than we have now.

A small way to help might be to adopt a payment system for peer-review. That is, to get published, you pay a certain amount of money for researchers to review your work. Currently, most reviewers offer their services for free (however they are sometimes allocated a certain amount of time for peer-review in their academic salary). A pay system would at least give an incentive for people to rigorously review work rather than simply trying to optimize for minimum amount of time invested in review. It would also reduce the practice of parasitic submissions (people submitting to short-turnaround-time, high-profile journals like Nature just to get feedback on their work for free) and decrease the flow volume of papers submitted for review. However, it would also incentivize a higher rate of rejection to maximize profits. And it would disproportionately impact scientists from places with less scientific funding.

What are the real options we have here to minimize misconduct?

Truth seeking as an optimization process

7 ScottL 18 August 2015 11:03AM

From the costs of rationality wiki:

Becoming more epistemically rational can only guarantee one thing: what you believe will include more of the truth . Knowing that truth might help you achieve your goals , or cause you to become a pariah. Be sure that you really want to know the truth before you commit to finding it; otherwise, you may flinch from it.

The reason that truth seeking is often seen as being integral to rationality is that in order to make optimal decisions you must first be able to make accurate predictions. Delusions, or false beliefs, are self-imposed barriers to accurate prediction. They are surprise inducers. It is because of this that the rational path is often to break delusions, but you should remember that doing so is a slow and hard process that is rife with potential problems.

Below I have listed three scenarios in which a person could benefit from considering the costs of truth seeking. The first scenario is when seeking a more accurate measurement is computationally expensive and not really required. The second scenario is when you know that the truth will be emotionally distressing to another person and that this person is not in an optimal state to handle this truth. The third scenario is when you are trying to change the beliefs of others. It is often beneficial if you can understand the costs involved for them to change their beliefs as well as their perspective. This allows you to become better able to actually change their beliefs rather than to just win an argument.

 

Scenario 1: computationally expensive truth

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. – Donald Knuth

If optimization requires significant effort and only results in minimal gains in utility, then it is not worth it. If you only need to be 90% sure that something is true and you are currently 98% sure that it is, then it is not worth spending some extra effort to get to 99% certainty. For example, if you are testing ballistics on Earth then it may be appropriate to use Newtons laws even though they are known to be inexact in some extreme conditions. Now, this does not mean that optimization should never be done. Sometimes that extra 1% certainty is actually extremely important. What it does mean is that you should be spending your resources wisely. The beliefs that you do make should lead to increased abilities to anticipate accurately. You should also remember occams Razor. If you are committing yourself to a decision procedure that is accurate, but slow and wasteful then you will probably be outcompeted by others who spend their resources on more suitable and worthy activities.

 

Scenario 2: emotionally distressing truth

Assume for a moment that you have a child and that you have just finished watching that child fail horribly at a school performance. If your child then asks you, while crying, how the performance was. Do you tell them the truth in full or not? Most people would choose not to and would instead attempt to calm and comfort the child. To do otherwise is not seen as rational, but is instead seen as situationally unaware, rude and impolite. Obviously, some ways of telling the truth are worse than others. But, overall telling the full truth is probably not going to be the most prudent thing to do in this situation. This is because the child is not in an emotional state that will allow them to handle the truth well. The truth in this situation is unlikely to lead to improvement and will instead lead to further stress and trauma which will often cause future performance anxiety, premature optimization and other issues. For these reasons, I think that the truth should not be expressed in this situation. This does not mean that I think the rational person should forget about what has happened. They should instead remember it so that they can bring it up when the child is in an emotional state that would allow them to be better able to implement any advice that is given. For example, when practicing in a safe environment.

I want to point out that avoiding the truth is not what I am advocating. I am instead saying that we should be strategic about telling potentially face-threatening or emotionally distressing truths. I do believe that repression and avoidance of issues that have a persistent nature most often tends to lead to exacerbation or resignation of those issues. Hiding from the truth rarely improves the situation. Consider the child if you don't ever mention the performance because you don't want to cause the child pain then they are still probably going to get picked on at school. Knowing this, we can say that the best thing to do is to bring up the truth and frame it in a particular situation where the child can find it useful and come to be able to better handle it.

 

Scenario 3: psychologically exhausting truth

If we remember that truth seeking involves costs, then we are more likely to be aware of how we can reduce this cost when we are trying to change the beliefs of others. If you are trying to convince someone and they do not agree with you, this may not be because your arguments are weak or that the other person is stupid. It may just be that there is a significant cost involved for them to either understand your argument or to update their beliefs. If you want to convince someone and also avoid the illusion of transparency, then it is best to take into account the following:

  • You should try to end arguments well and to avoid vitriol - the emotional contagion heuristic leads people to avoid contact with people or objects viewed as "contaminated" by previous contact with someone or something viewed as bad—or, less often, to seek contact with objects that have been in contact with people or things considered good. If someone gets emotional when you are in an argument, then you are going to be less likely to change their minds about that topic in the future. It is also a good idea to consider the peak-end rule which basically means that you should try to end your arguments well.
  • If you find that someone is already closed off due to emotional contagion, then you should try a surprising strategy so that your arguments aren't stereotyped and avoided. As elizer said here
  • The first rule of persuading a negatively disposed audience - rationally or otherwise - is not to say the things they expect you to say. The expected just gets filtered out, or treated as confirmation of pre-existing beliefs regardless of its content.

  • Processing fluency - is the ease with which information is processed. You should ask yourself if your argument worded in such a way that it is fluent and easy to understand?
  • Cognitive dissonance - is a measure of how much your argument conflicts with the other persons pre-existing beliefs? Perhaps, you need to convince them of a few other points first before your argument will work
  • Inferential distance - is about how much background information that they need access to in order for them to understand your argument?
  • Leave a line of retreat - think about whether they can admit that they were wrong without also looking stupid or foolish? In winning arguments there are generally two ways that you can go about it. The first is to totally demolish the other persons position. The second is to actually change their minds. The first leaves them feeling wrong, stupid and foolish which is often going to make them start rationalizing. The second just makes them feel wrong. You win arguments the second way by seeming to be reasonable and non face threatening. A good way to do this is through empathy and understanding the argument from the other persons position. It is important to see things as others would see them because we don't see the world as it is; we see the world as we are. The other person is not stupid or lying they might just in the middle of what I call an 'epistemic contamination cascade' (perhaps there is already a better name for this). It is when false beliefs lead to filters, framing effects and other false beliefs. Another potential benefit from viewing the argument from the other persons perspective is that it is possible that you may come to realise that your own is not as steadfast as you once believed.
  • Maximise the cost of holding a false belief - ask yourself if there are any costs to them if they continue to hold a belief that you believe is false? One way to cause some cost is to convince their friends and associates of your position. The extra social pressure may help in getting them to change their minds.
  • Give it time and get them inspecting their maps rather than information that has been filtered through their map. It is possible that there are filtering and framing effects which mean that your arguments are being distorted by the other person? Consider a depressed person: you can argue with them, but this is not likely to be overly helpful. THis is because it is likely that while arguing you will need to contradict them and this will probably lead to them blocking out what you are saying. I think that in these kinds of situations what you really need to do is to get them to inspect their own maps. This can be done by asking "what" or "how does that make you" type of questions. For example,“What are you feeling?”,“What’s going on?” and“What can I do to help?”. There are two main benefits to these types of questions over arguments. The first is that it gets them inspecting their maps and the second is that it is much harder for them to block out the responses since they are the ones providing them. This is a related quote from Sarah Silverman's book:
  • My stepfather, John O'Hara, was the goodest man there was. He was not a man of many words, but of carefully chosen ones. He was the one parent who didn't try to fix me. One night I sat on his lap in his chair by the woodstove, sobbing. He just held me quietly and then asked only, 'What does it feel like?' It was the first time I was prompted to articulate it. I thought about it, then said, "I feel homesick." That still feels like the most accurate description--I felt homesick, but I was home. - Sarah Silverman

  • Remember the other-optimizing bias and that perspectival types of issues need to be resolved by the individual facing them. If you have a goal to change another persons minds, then it often pays dividends to not only understand why they are wrong, but also why they think they are right or at least unaware that they are wrong. This kind of understanding can only come from empathy. Sometimes it is impossible to truly understand what another person is going through, but you should always try, without condoning or condemning, to see things as they are from the other persons perspective. Remember that hatred blinds and so does love. You should always be curious and seek to understand things as they are, not as you wish them, fear them or desire them to be. It is only when you can do this that you can truly understand the costs involved for someone else to change their minds.

 

If you take the point of view that changing beliefs is costly. Then you are less likely to be surprised when others don't want to change their beliefs. You are also more likely to think about how you can make the process of changing their beliefs easier for them.

 

Some other examples of when seeking the truth is not necessarily valuable are:

  • Fiction writing and the cinematic experience
  • When the pragmatic meaning does not need truth, but the semantic meaning does. An example is "Hi. How are you?" and other similar greetings which are peculiar because they look the same as questions or adjacency pairs, but function slightly differently. They are like a kind of ritualised question in which the answer is normally pre-specified or at least the detail of the answer is. If someone asks: "How are you" it is seen as aberrant to answer the question in full detail with the truth rather than simply with fine, which may be a lie. If they actually do want to know how you are, then they will probably ask a follow up question after the greeting like "so, is everything good with the kids".
  • Evolutionary biases which cause delusions, but may help in perspectival and self confidence issues. For example, the sexual over perception bias from men. From a truth-maximization perspective young men who assume that all women want them are showing severe social-cognitive inaccuracies, judgment biases, and probably narcissistic personality disorder. However, from an evolutionary perspective, the same young men are behaving more optimally. That is, the bias is an adaptive bias one which has consistently maximized the reproductive success of their male ancestors. Other examples are the women's underestimation of men's commitment bias and positively biased perceptions of partners

 

tldr: this post posits that truth seeking should be viewed as an optimization process. This means that it may not always be worth it.

Astronomy, Astrobiology, & The Fermi Paradox I: Introductions, and Space & Time

40 CellBioGuy 26 July 2015 07:38AM

This is the first in a series of posts I am putting together on a personal blog I just started two days ago as a collection of my musings on astrobiology ("The Great A'Tuin" - sorry, I couldn't help it), and will be reposting here.  Much has been written here about the Fermi paradox and the 'great filter'.   It seems to me that going back to a somewhat more basic level of astronomy and astrobiology is extremely informative to these questions, and so this is what I will be doing.  The bloggery is intended for a slightly more general audience than this site (hence much of the content of the introduction) but I think it will be of interest.  Many of the points I will be making are ones I have touched on in previous comments here, but hope to explore in more detail.

This post is a combined version of my first two posts - an introduction, and a discussion of our apparent position in space and time in the universe.  The blog posts may be found at:

http://thegreatatuin.blogspot.com/2015/07/whats-all-this-about.html

http://thegreatatuin.blogspot.com/2015/07/space-and-time.html

Text reproduced below.

 

 



What's all this about?


This blog is to be a repository for the thoughts and analysis I've accrued over the years on the topic of astrobiology, and the place of life and intelligence in the universe.  All my life I've been pulled to the very large and the very small.  Life has always struck me as the single most interesting thing on Earth, with its incredibly fine structure and vast, amazing history and fantastic abilities.  At the same time, the vast majority of what exists is NOT on Earth.  Going up in size from human-scale by the same number of orders of magnitude as you go down through to get to a hydrogen atom, you get just about to Venus at its closest approach to Earth - or one billionth the distance to the nearest star.  The large is much larger than the small is small.  On top of this, we now know that the universe as we know it is much older than life on Earth.  And we know so little of the vast majority of the universe.

There's a strong tendency towards specialization in the sciences.  These days, there pretty much has to be for anybody to get anywhere.  Much of the great foundational work of physics was done on tabletops, and the law of gravitation was derived from data on the motions of the planets taken without the benefit of so much as a telescope.  All the low-hanging fruit has been picked.  To continue to further knowledge of the universe, huge instruments and vast energies are put to bear in astronomy and physics.  Biology is arguably a bit different, but the very complexity that makes living systems so successful and so fascinating to study means that there is so much to study that any one person is often only looking at a very small problem.

This has distinct drawbacks.  The universe does not care for our abstract labels of fields and disciplines - it simply is, at all scales simultaneously at all times and in all places.  When people focus narrowly on their subject of interest, it can prevent them from realizing the implications of their findings on problems usually considered a different field.

It is one of my hopes to try to bridge some gaps between biology and astronomy here.  I very nearly double-majored in biology and astronomy in college; the only thing that prevented this (leading to an astronomy minor) was a bad attitude towards calculus.  As is, I am a graduate student studying basic cell biology at a major research university, who nonetheless keeps in touch with a number of astronomer friends and keeps up with the field as much as possible.  I quite often find that what I hear and read about has strong implications for questions of life elsewhere in the universe, but see so few of these implications actually get publicly discussed. All kinds of information shedding light on our position in space and time, the origins of life, the habitability of large chunks of the universe, the course that biospheres take, and the possible trajectories of intelligences seem to me to be out there unremarked.

It is another of my hopes to try, as much as is humanly possible, to take a step back from the usual narratives about extraterrestrial life and instead focus from something closer to first principles.  What we actually have observed and have not, what we can observe and what we cannot, and what this leaves open, likely, or unlikely.  In my study of the history of the ideas of extraterrestrial life and extraterrestrial intelligence, all too often these take a back seat to popular narratives of the day.  In the 16th century the notion that the Earth moved in a similar way to the planets gained currency and lead to the suppositions that they might be made of similar stuff and that the planets might even be inhabited.  The hot question was, of course, if their inhabitants would be Christians and their relationship with God given the anthropocentric biblical creation stories.  In the late 19th and early 20th century, Lowell's illusory canals on Mars were advanced as evidence for a Martian socialist utopia.  In the 1970s, Carl Sagan waxed philosophical on the notion that contacting old civilizations might teach us how to save ourselves from nuclear warfare.  Today, many people focus on the Fermi paradox - the apparent contradiction that since much of the universe is quite old, extraterrestrials experiencing continuing technological progress and growth should have colonized and remade it in their image long ago and yet we see no evidence of this.  I move that all of these notions have a similar root - inflating the hot concerns and topics of the day to cosmic significance and letting them obscure the actual, scientific questions that can be asked and answered.

Life and intelligence in the universe is a topic worth careful consideration, from as many angles as possible.  Let's get started.

 


Space and Time


Those of an anthropic bent have often made much of the fact that we are only 13.7 billion years into what is apparently an open-ended universe that will expand at an accelerating rate forever.  The era of the stars will last a trillion years; why do we find ourselves at this early date if we assume we are a ‘typical’ example of an intelligent observer?  In particular, this has lent support to lines of argument that perhaps the answer to the ‘great silence’ and lack of astronomical evidence for intelligence or its products in the universe is that we are simply the first.  This notion requires, however, that we are actually early in the universe when it comes to the origin of biospheres and by extension intelligent systems.  It has become clear recently that this is not the case. 

The clearest research I can find illustrating this is the work of Sobral et al, illustrated here http://arxiv.org/abs/1202.3436 via a paper on arxiv  and here http://www.sciencedaily.com/releases/2012/11/121106114141.htm via a summary article.  To simplify what was done, these scientists performed a survey of a large fraction of the sky looking for the emission lines put out by emission nebulae, clouds of gas which glow like neon lights excited by the ultraviolet light of huge, short-lived stars.  The amount of line emission from a galaxy is thus a rough proxy for the rate of star formation – the greater the rate of star formation, the larger the number of large stars exciting interstellar gas into emission nebulae.  The authors use redshift of the known hydrogen emission lines to determine the distance to each instance of emission, and performed corrections to deal with the known expansion rate of the universe.  The results were striking.  Per unit mass of the universe, the current rate of star formation is less than 1/30 of the peak rate they measured 11 gigayears ago.  It has been constantly declining over the history of the universe at a precipitous rate.  Indeed, their preferred model to which they fit the trend converges towards a finite quantity of stars formed as you integrate total star formation into the future to infinity, with the total number of stars that will ever be born only being 5% larger than the number of stars that have been born at this time. 

In summary, 95% of all stars that will ever exist, already exist.  The smallest longest-lived stars will shine for a trillion years, but for most of their history almost no new stars will have formed.

At first this seems to reverse the initial conclusion that we came early, suggesting we are instead latecomers.  This is not true, however, when you consider where and when stars of different types can form and the fact that different galaxies have very different histories.  Most galaxies formed via gravitational collapse from cool gas clouds and smaller precursor galaxies quite a long time ago, with a wide variety of properties.  Dwarf galaxies have low masses, and their early bursts of star formation lead to energetic stars with strong stellar winds and lots of ultraviolet light which eventually go supernova.  Their energetic lives and even more energetic deaths appear to usually blast star-forming gases out of their galaxies’ weak gravity or render it too hot to re-collapse into new star-forming regions, quashing their star formation early.  Giant elliptical galaxies, containing many trillions of stars apiece and dominating the cores of galactic clusters, have ample gravity but form with nearly no angular momentum.  As such, most of their cool gas falls straight into their centers, producing an enormous burst of low-heavy-element star formation that uses most of the gas.  The remaining gas is again either blasted into intergalactic space or rendered too hot to recollapse and accrete by a combination of the action of energetic young stars and the infall of gas onto the central black hole producing incredibly energetic outbursts.   (It should be noted that a full 90% of the non-dark-matter mass of the universe appears to be in the form of very thin X-ray-hot plasma clouds surrounding large galaxy clusters, unlikely to condense to the point of star formation via understood processes.)  Thus, most dwarf galaxies and giant elliptical galaxies contributed to the early star formation of the universe but are producing few or no stars today, have very low levels of heavy element rich stars, and are unlikely to make many more going into the future.

Spiral galaxies are different.  Their distinguishing feature is the way they accreted – namely with a large amount of angular momentum.  This allows large amounts of their cool gas to remain spread out away from their centers.  This moderates the rate of star formation, preventing the huge pulses of star formation and black hole activation that exhausts star-forming gas and prevents gas inflow in giant ellipticals.  At the same time, their greater mass than dwarf galaxies ensures that the modest rate of star formation they do undergo does not blast nearly as much matter out of their gravitational pull.  Some does leave over time, and their rate of inflow of fresh cool gas does apparently decrease over time – there are spiral galaxies that do seem to have shut down star formation.  But on the whole a spiral is a place that maintains a modest rate of star formation for gigayears, while heavy elements get more and more enriched over time.  These galaxies thus dominate the star production in the later eras of the universe, and dominate the population of stars produced with large amounts of heavy elements needed to produce planets like ours.  They do settle down slowly over time, and eventually all spirals will either run out of gas or merge with each other to form giant ellipticals, but for a long time they remain a class apart.

Considering this, we’re just about where we would expect a planet like ours (and thus a biosphere-as-we-know-it) to exist in space and on a coarse scale in time.  Let’s look closer at our galaxy now.  Our galaxy is generally agreed to be about 12 billion years old based on the ages of globular clusters, with a few interloper stars here and there that are older and would’ve come from an era before the galaxy was one coherent object.  It will continue forming stars for about another 5 gigayears, at which point it will undergo a merger with the Andromeda galaxy, the nearest large spiral galaxy.  This merger will most likely put an end to star formation in the combined resultant galaxy, which will probably wind up as a large elliptical after one final exuberant starburst.  Our solar system formed about 4.5 gigayears ago, putting its formation pretty much halfway along the productive lifetime of the galaxy (and probably something like 2/3 of the way along its complement of stars produced, since spirals DO settle down with age, though more of its later stars will be metal-rich).

On a stellar and planetary scale, we once again find ourselves where and when we would expect your average complex biosphere to be.  Large stars die fast – star brightness goes up with the 3.5th power of star mass, and thus star lifetime goes down with the 2.5th power of mass.  A 2 solar mass star would be 11 times as bright as the sun and only live about 2 billion years – a time along the evolution of life on Earth before photosynthesis had managed to oxygenate the air and in which the majority of life on earth (but not all – see an upcoming post) could be described as “algae”.  Furthermore, although smaller stars are much more common than larger stars (the Sun is actually larger than over 80% of stars in the universe) stars smaller than about 0.5 solar masses (and thus 0.08 solar luminosities) are usually ‘flare stars’ – possessing very strong convoluted magnetic fields and periodically putting out flares and X-ray bursts that would frequently strip away the ozone and possibly even the atmosphere of an earthlike planet. 

All stars also slowly brighten as they age – the sun is currently about 30% brighter than it was when it formed, and it will wind up about twice as bright as its initial value just before it becomes a red giant.  Depending on whose models of climate sensitivity you use, the Earth’s biosphere probably has somewhere between 250 million years and 2 billion years before the oceans boil and we become a second Venus.  Thus, we find ourselves in the latter third-to-twentieth of the history of Earth’s biosphere (consistent with complex life taking time to evolve).

Together, all this puts our solar system – and by extension our biosphere – pretty much right where we would expect to find it in space, and right in the middle of where one would expect to find it in time.  Once again, as observers we are not special.  We do not find ourselves in the unexpectedly early universe, ruling out one explanation for the Fermi paradox sometimes put forward – that we do not see evidence for intelligence in the universe because we simply find ourselves as the first intelligent system to evolve.  This would be tenable if there was reason to think that we were right at the beginning of the time in which star systems in stable galaxies with lots of heavy elements could have birthed complex biospheres.  Instead we are utterly average, implying that the lack of obvious intelligence in the universe must be resolved either via the genesis of intelligent systems being exceedingly rare or intelligent systems simply not spreading through the universe or becoming astronomically visible for one reason or another. 

In my next post, I will look at the history of life on Earth, the distinction between simple and complex biospheres, and the evidence for or against other biospheres elsewhere in our own solar system.

Self-improvement without self-modification

3 Stuart_Armstrong 23 July 2015 09:59AM

This is just a short note to point out that AIs can self-improve without having to self-modify. So locking down an agent from self-modification is not an effective safety measure.

How could AIs do that? The easiest and the most trivial is to create a subagent, and transfer their resources and abilities to it ("create a subagent" is a generic way to get around most restriction ideas).

Or it the AI remains unchanged and in charge, it could change the whole process around itself, so that the whole process changes and improves. For instance, if the AI is inconsistent and has to pay more attention to problems that are brought to its attention than problems that aren't, it can start to act to manage the news (or the news-bearers) to hear more of what it wants. If it can't experiment on humans, it will give advice that will cause more "natural experiments", and so on. It will gradually try to reform its environment to get around its programmed limitations.

Anyway, that was nothing new or deep, just a reminder point I hadn't seen written out.

 

AI: requirements for pernicious policies

7 Stuart_Armstrong 17 July 2015 02:18PM

Some have argued that "tool AIs" are safe(r). Recently, Eric Drexler decomposed AIs into "problem solvers" (eg calculators), "advisors" (eg GPS route planners), and actors (autonomous agents). Both solvers and advisors can be seen as examples of tools.

People have argued that tool AIs are not safe. It's hard to imagine a calculator going berserk, no matter what its algorithm is, but it's not too hard to come up with clear examples of dangerous tools. This suggests the solvers vs advisors vs actors (or tools vs agents, or oracles vs agents) is not the right distinction.

Instead, I've been asking: how likely is the algorithm to implement a pernicious policy? If we model the AI as having an objective function (or utility function) and algorithm that implements it, a pernicious policy is one that scores high in the objective function but is not at all what is intended. A pernicious function could be harmless and entertaining or much more severe.

I will lay aside, for the moment, the issue of badly programmed algorithms (possibly containing its own objective sub-functions). In any case, to implement a pernicious function, we have to ask these questions about the algorithm:

  1. Do pernicious policies exist? Are there many?
  2. Can the AI find them?
  3. Can the AI test them?
  4. Would the AI choose to implement them?

The answer to 1. seems to be trivially yes. Even a calculator could, in theory, output a series of messages that socially hack us, blah, take over the world, blah, extinction, blah, calculator finishes its calculations. What is much more interesting is some types of agents have many more pernicious policies than others. This seems the big difference between actors and other designs. An actor AI in complete control of the USA or Russia's nuclear arsenal has all sort of pernicious policies easily to hand; an advisor or oracle has much fewer (generally going through social engineering), a tool typically even less. A lot of the physical protection measures are about reducing the number of sucessfull pernicious policies the AI has a cess to.

The answer to 2. is mainly a function of the power of the algorithm. A basic calculator will never find anything dangerous: its programming is simple and tight. But compare an agent with the same objective function and the ability to do an unrestricted policy search with vast resources... So it seems that the answer to 2. does not depend on any solver vs actor division, but purely on the algorithm used.

And now we come to the big question 3., whether the AI can test these policies. Even if the AI can find pernicious policies that rank high on its objective function, it will never implement them unless it can ascertain this fact. And there are several ways it could do so. Let's assume that a solver AI has a very complicated objective function - one that encodes many relevant facts about the real world. Now, the AI may not "care" about the real world, but it has a virtual version of that, in which it can virtually test all of its policies. With a detailed enough computing power, it can establish whether the pernicious policy would be effective at achieving its virtual goal. If this is a good approximation of how the pernicious policy would behave in the real world, we could have a problem.

But extremely detailed objective functions are unlikely. But even simple ones can show odd behaviour if the agents gets to interact repeatedly with the real world - this is the issue with reinforcement learning. Suppose that the agent attempts a translation job, and is rewarded on the accuracy of its translation. Depending on the details of what the AI knows and who choose the rewards, the AI could end up manipulating its controllers, similarly to this example. The problem is that one there is any interaction, all the complexity of humanity could potentially show up in the reward function, even if the objective function is simple.

Of course, some designs make this very unlikely - resetting the AI periodically can help to alleviate the problem, as can choosing more objective criteria for any rewards. Lastly on this point, we should mention the possibility that human R&D, by selecting and refining the objective function and the algorithm, could take the roll of testing the policies. This is likely to emerge only in cases where many AI designs are considered, and the best candiates are retained based on human judgement.

Finally we come to the question of whether the AI will implement the policy if it's found it and tested it. You could say that the point of FAI is to create an AI that doesn't choose to implement pernicious policies - but, more correctly, the point of FAI is to ensure that very few (or zero) pernicious policies exist in the first place, as they all score low on the utility function. However, there are a variety of more complicated designs - satisficers, agents using crude measures - where the questions of "Do pernicious policies exist?" and "Would the AI choose to implement them?" could become quite distinct.

 

Conclusion: a more through analysis of AI designs is needed

A calculator is safe, because it is a solver, it has a very simple objective function, with no holes in the algorithm, and it can neither find nor test any pernicious policies. It is the combination of these elements that makes it almost certainly safe. If we want to make the same claim about other designs, neither "it's just a solver" or "it's objective function is simple" would be enough; we need a careful analysis.

Though, as usual, "it's not certainly safe" is a quite distinct claim from "it's (likely) dangerous", and they should not be conflated.

An overall schema for the friendly AI problems: self-referential convergence criteria

16 Stuart_Armstrong 13 July 2015 03:34PM

A putative new idea for AI control; index here.

After working for some time on the Friendly AI problem, it's occurred to me that a lot of the issues seem related. Specifically, all the following seem to have commonalities:

Speaking very broadly, there are two features all them share:

  1. The convergence criteria are self-referential.
  2. Errors in the setup are likely to cause false convergence.

What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup. In other words, the stopping point (and the the convergence to the stopping point) is entirely self-referentially defined: the morality judges itself. It does not include any other moral considerations. You input your initial moral intuitions and values, and you hope this will cause the end result to be "nice", but the definition of the end result does not include your initial moral intuitions (note that some moral realists could see this process dependence as a positive - except for the fact that these processes have many convergent states, not just one or a small grouping).

So when the process goes nasty, you're pretty sure to have achieved something self-referentially stable, but not nice. Similarly, a nasty CEV will be coherent and have no desire to further extrapolate... but that's all we know about it.

The second feature is that any process has errors - computing errors, conceptual errors, errors due to the weakness of human brains, etc... If you visualise this as noise, you can see that noise in a convergent process is more likely to cause premature convergence, because if the process ever reaches a stable self-referential state, it will stay there (and if the process is a long one, then early noise will cause great divergence at the end). For instance, imagine you have to reconcile your belief in preserving human cultures with your beliefs in human individual freedom. A complex balancing act. But if, at any point along the way, you simply jettison one of the two values completely, things become much easier - and once jettisoned, the missing value is unlikely to ever come back.

Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps - and again, once you lose something of value in your system, you don't tend to get if back.

 

Solutions

And again, very broadly speaking, there are several classes of solutions to deal with these problems:

  1. Reduce or prevent errors in the extrapolation (eg solving the agent tiling problem).
  2. Solve all or most of the problem ahead of time (eg traditional FAI approach by specifying the correct values).
  3. Make sure you don't get too far from the starting point (eg reduced impact AI, tool AI, models as definitions).
  4. Figure out the properties of a nasty convergence, and try to avoid them (eg some of the ideas I mentioned in "crude measures", general precautions that are done when defining the convergence process).

 

The fairness of the Sleeping Beauty

1 MrMind 07 July 2015 08:25AM

This post will attempt a (yet another) analysis of the problem of the Sleeping Beauty, in terms of Jaynes' framework "probability as extended logic" (aka objective Bayesianism).

TL,DR: The problem of the sleeping beauty reduces to interpreting the sentence “a fair coin is tossed”: it can mean either that no results of the toss is favourite, or that the coin toss is not influenced by anthropic information, but not both at the same time. Fairness is a property in the mind of the observer that must be further clarified: the two meanings cannot be confused.

What I hope to show is that the two standard solutions, 1/3 and 1/2 (the 'thirder' and the 'halfer' solutions), are both consistent and correct, and the confusion lies only in the incorrect specification of the sentence "a fair coin is tossed".

The setup is given both in the Lesswrong's wiki and in Wikipedia, so I will not repeat it here. 

I'm going to symbolize the events in the following way: 

- It's Monday = Mon
- It's Tuesday = Tue
- The coin landed head = H
- The coin landed tail = T
- statement "A and B" = A & B
- statement "not A" = ~A

The problem setup leads to an uncontroversial attributions of logical structure:

1)    H = ~T (the coin can land only on head or tail)

2)    Mon = ~Tue (if it's Tuesday, it cannot be Monday, and viceversa) 

And of probability:

3)    P(Mon|H) = 1 (upon learning that the coin landed head, the sleeping beauty knows that it’s Monday)

4)    P(T|Tue) = 1 (upon learning that it’s Tuesday, the sleeping beauty knows that the coin landed tail)

Using the indifference principle, we can also derive another equation.

Let's say that the Sleeping Beauty is awaken and told that the coin landed tail, but nothing else. Since she has no information useful to distinguish between Monday and Tuesday, she should assign both events equal probability. That is:

5)    P(Mon|T) = P(Tue|T)

Which gives

6)    P(Mon & T) = P(Mon|T)P(T) = P(Tue|T)P(T) = P(Tue & T)

It's here that the analysis between "thirder" and "halfer" starts to diverge.

The wikipedia article says "Guided by the objective chance of heads landing being equal to the chance of tails landing, it should therefore hold that". We know however that there's no such thing as 'the objective chance'.

Thus, "a fair coin will be tossed", in this context, will mean different things for different people.

The thirders interpret the sentence to mean that beauty learns no new facts about the coin upon learning that it is Monday.

They thus make the assumption:

(TA) P(T|Mon) = P(H|Mon)

So:

7)    P(Mon & H) = P(H|Mon)P(Mon) = P(T|Mon)P(Mon) = P(Mon & T)

From 6) and 7) we have:

8)    P(Mon & H) = P(Mon & T) = P(Tue & T)

And since those events are a partition of unity, P(Mon & H) = 1/3.

And indeed from 8) and 3):

9)    1/3 =  P(Mon & H) = P(Mon|H)P(H) = P(H)

So that, under TA, P(H) = 1/3 and P(T) = 2/3.

Notice that also, since if it’s Monday the coin landed either on head or tail, P(H|Mon) = 1/2.

The thirder analysis of the Sleeping Beauty problem is thus one in which "a fair coin is tossed" means "Sleeping Beauty receives no information about the coin from anthropic information".

There is however another way to interpret the sentence, that is the halfer analysis:

(HA) P(T) = P(H)

Here, a fair coin is tossed means simply that we assign no preference to either side of the coin.

Obviously from 1:

10)  P(T) + P(H) = 1

So that, from 10) and HA)

11) P(H) = 1/2, P(T) = 1/2

But let’s not stop here, let’s calculate P(H|Mon).

First of all, from 3) and 11)

12) P(H & Mon) = P(H|Mon)P(Mon) = P(Mon|H)P(H) = 1/2

From 5) and 11) also

13) P(Mon & T) = 1/4

But from 12) and 13) we get

14) P(Mon) = P(Mon & T) + P(Mon & H) = 1/2 + 1/4 = 3/4

So that, from 12) and 14)

15) P(H|Mon) = P(H & Mon) / P(Mon) = 1/2 / 3/4 = 2/3

We have seen that either P(H) = 1/2 and P(H|Mon) = 2/3, or P(H) = 2/3 and P(H|Mon) = 1/2.

Nick Bostrom is correct in saying that self-locating information changes the probability distribution, but this is true in both interpretations.

The problem of the sleeping beauty reduces to interpreting the sentence “a fair coin is tossed”: it can mean either that no results of the toss is favourite, or that the coin toss is not influenced by anthropic information, that is, you can attribute the fairness of the coin to prior or posterior distribution.

Either P(H)=P(T) or P(H|Mon)=P(T|Mon), but both at the same time is not possible.

If probability were a physical property of the coin, then so would be its fairness. But since the causal interactions of the coin possess both kind of indifference (balance and independency from the future), that would make the two probability equivalent. 

That such is not the case just means that fairness is a property in the mind of the observer that must be further clarified, since the two meanings cannot be confused.

View more: Next