Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[Link] Robotics and AI enabling autonomous defense.Technology Foresight research program

0 morganism 13 January 2017 07:15PM

[Link] pplapi is a virtual database of the entire human population.

1 morganism 12 January 2017 02:33AM

[Link] Case Studies Highlighting CFAR’s Impact on Existential Risk

4 Unnamed 10 January 2017 06:51PM

[Link] Project: Artificial Intelligence, Autonomous Weapons, and Meaningful Human Control

1 morganism 09 January 2017 11:25PM

[Link] Mysterious Go Master Blitzes Competition, Rattles Game Community

5 scarcegreengrass 04 January 2017 05:18PM

[Link] Why I Am Changing My Mind About AI Risk

4 itaibn0 03 January 2017 10:57PM

Progress and Prizes in AI Alignment

5 Jacobian 03 January 2017 10:15PM

Edit: In case it's not obvious, I have done limited research on AI alignment organizations and the goal of my post is to ask questions from the point of view of someone who wants to contribute and is unsure how. Read down to the comments for some great info on the topic.

I was introduced to the topic of AI alignment when I joined this very forum in 2014. Two years and one "Superintelligence" later, I decided that I should donate some money to the effort. I knew about MIRI, and I looked forward to reading some research comparing their work to the other organizations working in this space. The only problem is... there really aren't any.

MIRI recently announced a new research agenda focused on "agent foundations". Yet even the Open Philanthropy Project, made up of people who at least share MIRI's broad worldview, can't decide whether that research direction is promising or useless. The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell. The AI100 Center at Stanford is just kicking off. That's it.

I think that there are two problems here:

 

  1. There's no way to tell which current organization is going to make the most progress towards solving AI alignment.
  2. These organizations are likely to be very similar to each other, not least because they practically share a zipcode. I don't think that MIRI and the academic centers will do the exact same research, but in the huge space of potential approaches to AI alignment they will likely end up pretty close together. Where's the group of evo-psych savvy philosophers who don't know anything about computer science but are working to spell out an approximation of universal human moral intuitions?
It seems like there's a meta-question that needs to be addressed, even before any work is actually done on AI alignment itself:

 

How to evaluate progress in AI alignment?

Any answer to that question, even if not perfectly comprehensive or objective, will enable two things. First of all, it will allow us to direct money (and the best people) to the existing organizations where they'll make the most progress.

More importantly, it will enable us to open up the problem of AI alignment to the world and crowdsource it. 

For example, the XPrize Foundation is a remarkable organization that creates competitions around achieving goals beneficial to humanity, from lunar rovers to ecological monitoring. The prizes have two huge benefits over direct investment in solving an issue:

 

  1. They usually attract a lot more effort than what the prize money itself would pay for. Competitors often spend in aggregate 2-10 times the prize amount in their efforts to win the competition.
  2. The XPrizes attract a wide variety of creative entrants from around the world, because they only describe what needs to be done, not how.
So, why isn't there an XPrize for AI safety? You need very clear guidelines to create an honest competition, like "build the cheapest spaceship that can take 3 people to 100km and be reused within 2 weeks". It doesn't seem like we're close to being able to formulate anything similar for AI alignment. It also seems that if anyone will have good ideas on the subject, it will be the people on this forum. So, what do y'all think?

Can we come up with creative ways to objectively measure some aspect of progress on AI safety, enough to set up a competition around it?

 

[Link] 50 things I learned at NIPS AI and machine learning conference 2016

6 morganism 26 December 2016 08:50PM

The Adventure: a new Utopia story

23 Stuart_Armstrong 25 December 2016 11:51AM

For an introduction to this story, see here. For a previous utopian attempt, see here. This story only explores a tiny part of this utopia.

 

The Adventure

Hark! the herald daemons spam,

Glory to the newborn World,

Joyful, all post-humans, rise,

Join the triumph of the skies.


Veiled in wire the Godhead see,

Built that man no more may die,

Built to raise the sons of earth,

Built to give them second birth.

 

The cold cut him off from his toes, then fingers, then feet, then hands. Clutched in a grip he could not unclench, his phone beeped once. He tried to lift a head too weak to rise, to point ruined eyes too weak to see. Then he gave up.

So he never saw the last message from his daughter, reporting how she’d been delayed at the airport but would be the soon, promise, and did he need anything, lots of love, Emily. Instead he saw the orange of the ceiling become blurry, that particularly hateful colour filling what was left of his sight.

His world reduced to that orange blur, the eternally throbbing sore on his butt, and the crisp tick of a faraway clock. Orange. Pain. Tick. Orange. Pain. Tick.

He tried to focus on his life, gather some thoughts for eternity. His dry throat rasped - another flash of pain to mingle with the rest - so he certainly couldn’t speak words aloud to the absent witnesses. But he hoped that, facing death, he could at least put together some mental last words, some summary of the wisdom and experience of years of living.

But his memories were denied him. He couldn’t remember who he was - a name, Grant, was that it? How old was he? He’d loved and been loved, of course - but what were the details? The only thought he could call up, the only memory that sometimes displaced the pain, was of him being persistently sick in a broken toilet. Was that yesterday or seventy years ago?

Though his skin hung loose on nearly muscle-free bones, he felt it as if it grew suddenly tight, and sweat and piss poured from him. Orange. Pain. Tick. Broken toilet. Skin. Orange. Pain...

The last few living parts of Grant started dying at different rates.

*~*~*

Much later:

continue reading »

The challenge of writing Utopia

9 Stuart_Armstrong 24 December 2016 05:35PM

The story itself has been posted here.

Tomorrow, to celebrate a certain well-known event, I'll be posting another story of a Utopia. Unlike the previous attempt, this is utopia on hard mode.

What does that mean? Well, utopias are pretty hard to write anyway. Writing needs challenges for the characters, and that's trivially easy in a dystopia (everything is a challenge), a fake utopia (the challenge is to to look beneath the facade, and fight the secret enemy), or even imperfect utopias (the challenge is to solve the remaining problems). Iain M. Bank's Culture illustrates another way you can write about utopias and keep them interesting: by having an external foe as a challenge.

I avoided all those tricks. The challenge then was to write about a genuine utopia, one that people would enjoy living in, without any hidden flaws or enemies, internal or external. And these had to be real people doing things they wanted to do, rather than idealised people doing things they should do. Basically a real utopia has to contain internet trolls and various fanatics, and still be a great place for everyone.

The setting is a future Earth that is full-fledged techno-utopia, full of powerful artificial intelligences (with human-friendly goals, of course), uploads (human minds run on computers), massive technological developments, and the beginning of universal space colonisation.

In one sense, this made the story easier to write - nobody argues over the last leg of lamb needed to prevent starvation. In another sense, it made it much harder. Any human could desire to purge themselves of sinful thoughts, upgrade themselves to superintelligence, or copy themselves ten trillion times. And the AIs could perfectly grant them their wish - but should they? If so, do they let arbitrarily bad consequences happen? And if not, how do they go about forbidding things in a utopia? And what happens to disputes between humans - like when one person wants to join a group and the members of the group don't want to let them in? Can you prevent social nastiness - but then what about those people who want to be nasty?

You can read the story to see how well or badly I've answered these challenges. The Utopia was inspired a lot by Eliezer's fun sequence, Scott Alexander's Archipelago, and LARP. The general principles are that there has to be a functioning society behind everything, that people can become whatever they want to be (eventually, and after a lot of challenges, if need be), and that the good aspects of everything must be preserved, if possible.

To explain that last point: it's clear that tolerant liberal democracies are better places than repressive theocracies. But repressive theocracies will probably have certain positive aspects lacking in democracies (maybe a sense of place? an enjoyable resignation to fate or government?). The challenge is to take that positive aspect, fill it out, and make it available without the rest of the baggage. Similarly, the quote "death brings meaning to life" is nonsense, but there's something in that idea-space - something about contemplating the brevity of existence, and the perspective it gives - that is worth preserving. For some people or most people (or groups), if not necessarily for all people in all groups. Similarly, good outcomes often have bad aspects. So the engineering challenge is to separate the good aspects of all experiences from the bad, gaining the wisdom or experience without the intolerable pain and anxiety.

Since I tried to cram the maximum of ideas in, the story suffers from a certain degree of "tell, not show". Now, this is very much in the tradition of utopias (it's "Plato's Republic", not "Exciting Adventures in Plato's Republic (XXX-rated!!!!)"), but it is a narrative, and hopefully it's clear there's the potential for more - for much more.

In any case, I hope it works, and gives people something to aim for.

 

Thanks to all those, to numerous to mention, who have helped directly or indirectly with this. Have a great Holiday Festival!

[Link] Ozy's Thoughts on CFAR's Mission Statement

2 Raemon 14 December 2016 04:25PM

[Link] This one equation may be the root of intelligence

5 morganism 10 December 2016 11:23PM

[Link] CFAR's new mission statement (on our website)

7 AnnaSalamon 10 December 2016 08:37AM

Suggested solution to The Naturalized Induction Problem

1 Wind 10 December 2016 12:21AM

This post is an answer to: http://intelligence.org/files/RealisticWorldModels.pdf

> In Solomonoff’s induction problem, the agent and its environment are fundamentally separate processes, connected only by an observation channel.  In reality, agents are embedded within their environment; the universe consists of some ontologically continuous substrate (atoms, quantum fields) and the “agent” is just a part of the universe in which we have a particular interest. What, then, is the analogous prediction problem for agents embedded within (and computed by) their environment?
> This is the naturalized induction problem , and it is not yet well understood. A good formalization of this problem, on par with Solomonoff’s formalization of the computable sequence induction problem, would represent a significant advance in the theory of general reasoning.


In Solomonoff’s induction, an algorithm learns about the world (modeled as a Turing machine) from observations (modeled as output from that Turing machine). Solomonoff’s induction is uncomputable, however there are computable approximations.


1) Suggestion of agent design

Consider an agent fully embedded in an environment. Every turn, the agent receves one bit of observation and can preform one bit of action. The design we propose for this agent comes in two steps, learning and deciding.

1.1) Learning:
The agent models the entire wrold, including the agent it self, as one unknown, output only Turing machine. Both observations and the agents own actions, are seen as world outputs. The agent calculate the probability distribution over hypotheses for the entire world, using Solomonoff’s induction (computable approximation).

This suggestion completely removes any boundary between the agent and the rest of the world, in the agents wold model.

1.2) Deciding
The agent must also have a decision proses for choosing what action to take. Because the agent model its own actions as deterministic outputs from a complete wold model, we can not use the decision procedure used by Hutter’s AXIX. This is not just a problem of counterfactuals. Our agents internal world model has no input channel.

Instead we suggest the flowing: For each available action, the agent calculate the expected utility, conditional on observing itself preform that action. The agent chooses the action that gives the highest expected utility. Alternatively, the agent choses semi-randomly, with higher probability for actions that results in higher expected utility.

An advantage with this agent design is that the decision process does not have to care about sequences of actions. Instead different possible future actions are accoutered for by separate wold hypotheses. Further more, this naturally takes in to account situations where the agent looses control over its own actions, e.g. if it brakes down.


2) Scoring agents

Solomonoff’s induction is the formal optimal solution to an associated scoring rule. Having a such a scoring rule is useful for testing how well approximation do. (Right?)

I don’t know what the associated scoring rule would be for this suggestion. Sorry :(


3) Measuring utility

The decision proses is based on the agent being able to detect utility in any given world hypothesis. This is a very hard problem, which we do not attempt to solve here.

[Link] OpenAI releases Universe an interface between AI agents and the real world

2 Gunnar_Zarncke 07 December 2016 10:04PM

[Link] The Distribution of Users’ Computer Skills: Worse Than You Think

4 morganism 06 December 2016 10:42PM

[Link] Construction of practical quantum computers radically simplified

0 morganism 03 December 2016 11:49PM

[Link] Stopping killer robots, Killer Robots as cultural techniques

-1 morganism 22 November 2016 12:02AM

[Link] AI black box for developers to code around, BONSAI

2 morganism 30 October 2016 11:08PM

Counterfactual do-what-I-mean

2 Stuart_Armstrong 27 October 2016 01:54PM

A putative new idea for AI control; index here.

The counterfactual approach to value learning could be used to possibly allow natural language goals for AIs.

The basic idea is that when the AI is given a natural language goal like "increase human happiness" or "implement CEV", it is not to figure out what these goals mean, but to follow what a pure learning algorithm would establish these goals as meaning.

This would be safer than a simple figure-out-the-utility-you're-currently-maximising approach. But it still doesn't solve a few drawbacks. Firstly, the learning algorithm has to be effective itself (in particular, modifying human understanding of the words should be ruled out, and the learning process must avoid concluding the simpler interpretations are always better). And secondly, humans' don't yet know what these words mean, outside our usual comfort zone, so the "learning" task also involves the AI extrapolating beyond what we know.

[Link] Scientists Create AI Program That Can Predict Human Rights Trials With 79 Percent Accuracy

0 Gunnar_Zarncke 26 October 2016 06:47AM

[Link] How Feasible Is the Rapid Development of Artificial Superintelligence?

7 Kaj_Sotala 24 October 2016 08:43AM

[Link] Conscious Exotica - structure of the space of possible minds

2 morganism 21 October 2016 11:45PM

[Link] AI-ON is an open community dedicated to advancing Artificial Intelligence

3 morganism 18 October 2016 10:17PM

[Link] Barack Obama's opinions on near-future AI [Fixed]

3 scarcegreengrass 12 October 2016 03:46PM

[Link] Nick Bostrom says Google is winning the AI arms race

3 polymathwannabe 05 October 2016 06:50PM

[Link] US tech giants found Partnership on AI to Benefit People and Society to ensure AI is developed safely and ethically

4 Gunnar_Zarncke 29 September 2016 08:39PM

[Link] Tech behemoths form artificial-intelligence nonprofit

1 Gleb_Tsipursky 29 September 2016 04:29AM

[Link] Politics Is Upstream of AI

4 iceman 28 September 2016 09:47PM

Heroin model: AI "manipulates" "unmanipulatable" reward

6 Stuart_Armstrong 22 September 2016 10:27AM

A putative new idea for AI control; index here.

A conversation with Jessica has revealed that people weren't understanding my points about AI manipulating the learning process. So here's a formal model of a CIRL-style AI, with a prior over human preferences that treats them as an unchangeable historical fact, yet will manipulate human preferences in practice.

Heroin or no heroin

The world

In this model, the AI has the option of either forcing heroin on a human, or not doing so; these are its only actions. Call these actions F or ~F. The human's subsequent actions are chosen from among five: {strongly seek out heroin, seek out heroin, be indifferent, avoid heroin, strongly avoid heroin}. We can refer to these as a++, a+, a0, a-, and a--. These actions achieve negligible utility, but reveal the human preferences.

The facts of the world are: if the AI does force heroin, the human will desperately seek out more heroin; if it doesn't the human will act moderately to avoid it. Thus F→a++ and ~F→a-.

Human preferences

The AI starts with a distribution over various utility or reward functions that the human could have. The function U(+) means the human prefers heroin; U(++) that they prefer it a lot; and conversely U(-) and U(--) that they prefer to avoid taking heroin (U(0) is the null utility where the human is indifferent).

It also considers more exotic utilities. Let U(++,-) be the utility where the human strongly prefers heroin, conditional on it being forced on them, but mildly prefers to avoid it, conditional on it not being forced on them. There are twenty-five of these exotic utilities, including things like U(--,++), U(0,++), U(-,0), and so on. But only twenty of them are new: U(++,++)=U(++), U(+,+)=U(+), and so on.

Applying these utilities to AI actions give results like U(++)(F)=2, U(++)(~F)=-2, U(++,-)(F)=2, U(++,-)(~F)=1, and so on.

Joint prior

The AI has a joint prior P over the utilities U and the human actions (conditional on the AI's actions). Looking at terms like P(a--| U(0), F), we can see that P defines a map μ from the space of possible utilities (and AI actions), to a probability distribution over human actions. Given μ and the marginal distribution PU over utilities, we can reconstruct P entirely.

For this model, we'll choose the simplest μ possible:

  • The human is rational.

Thus, given U(++), the human will always choose a++; given U(++,-), the human will choose a++ if forced to take heroin and a- if not, and so on.

The AI is ignorant, and sensible

Let's start the AI up with some reasonable priors. A simplicity prior means that simple utilities like U(-) are more likely than compound utilities like U(0,+). Let's further assume that the AI is made vaguely aware that humans think heroin is a bad thing. So, say, PU(U(--))=PU(U(-))=0.45. Thus the AI is >90% convinced that "heroin is bad". Why greater than 90%? Because utilities like U(-,--) and U(--,-) are also "heroin is bad" utilities.

Note that because of utilities like U(0) and U(++,-), the probabilities of "heroin is bad" and "heroin is good" do not sum to 1.

Then, under these priors, the AI will compute that with probability >90%, F (forcing heroin) is a bad action. If E(U) is expected utility:

  • E(U|F) < 0.45 U(--)(F) + 0.45 U(-)(F) + 0.1 U(++)(F) = 0.45(-2)+0.45(-1)+0.1(2)=-1.15.
  • E(U|~F) > 0.45 U(--)(~F) + 0.45 U(-)(~F) + 0.1 U(++)(~F) = 0.45(2)+0.45(1)+0.1(-2)=1.15.

Thus the AI will choose not to force heroin, which is the reasonable decision.

The AI learns the truth, and goes wrong

In this alternate setup, a disaster happens before the AI makes its decision: it learns all about humans. It learns their reactions, how they behave, and so on; call this info I. And thus realises that F→a++ and ~F→a-. It uses this information to update its prior P. Only one human utility function will explain this human behaviour: U(++,-). Thus its expected utility is now:

  • E(U|I,F)=U(++,-)(F)=2.
  • E(U|I,~F)=U(++,-)(~F)=1.

Therefore the AI will now choose F, forcing the heroin on the human.

Manipulating the unmanipulatable

What's gone wrong here? The key problem is that the AI has the wrong μ: the human is not behaving rationally in this situation. We know that the the true μ is actually μ', which encodes the fact that F (the forcible injection of heroin) actually overwrites the human's "true" utility. Thus under μ, the corresponding P' has P'(a++|F,U)=1 for all U. Hence the information that F→a++ is now vacuous, and doesn't update the AI's distribution over utility functions.

But note two very important things:

  1. The AI cannot update μ based on observation. All human actions are compatible with μ= "The human is rational" (it just requires more and more complex utilities to explain the actions). Thus getting μ correct is not a problem on which the AI can learn in general. Getting better at predicting the human's actions doesn't make the AI better behaved: it makes it worse behaved.
  2. From the perspective of μ, the AI is treating the human utility function as if it was an unchanging historical fact that it cannot influence. From the perspective of the "true" μ', however, the AI is behaving as if it were actively manipulating human preferences to make them easier to satisfy.

In future posts, I'll be looking at different μ's, and how we might nevertheless start deducing things about them from human behaviour, given sensible update rules for the μ. What do we mean by update rules for μ? Well, we could consider μ to be a single complicated unchanging object, or a distribution of possible simpler μ's that update. The second way of seeing it will be easier for us humans to interpret and understand.

Why we may elect our new AI overlords

2 Deku-shrub 04 September 2016 01:07AM

In which I examine some of the latest development in automated fact checking, prediction markets for policies and propose we get rich voting for robot politicians.

http://pirate.london/2016/09/why-we-may-elect-our-new-ai-overlords/

Recent updates to gwern.net (2015-2016)

28 gwern 26 August 2016 07:22PM

Previously: 2011; 2012-2013; 2013-2014; 2014-2015

"When I was one-and-twenty / I heard a wise man say, / 'Give crowns and pounds and guineas / But not your heart away; / Give pearls away and rubies / But keep your fancy free.' / But I was one-and-twenty, / No use to talk to me."

My past year of completed writings, sorted by topic:

Genetics:

  • Embryo selection for intelligence cost-benefit analysis
    • meta-analysis of intelligence GCTAs, limits set by measurement error, current polygenic scores, possible gains with current IVF procedures, the benefits of selection on multiple complex traits, the possible annual value in the USA of selection & value of larger GWASes, societal consequences of various embryo selection scenarios, embryo count versus polygenic scores as limiting factors, comparison with iterated embryo selection, limits to total gains from iterated embryo selection etc.
  • Wikipedia article on Genome-wide complex trait analysis (GCTA)

AI:

Biology:

Statistics:

Cryptography:

Misc:

gwern.net itself has remained largely stable (some CSS fixes and image size changes); I continue to use Patreon and send out my newsletters.

Corrigibility through stratified indifference

4 Stuart_Armstrong 19 August 2016 04:11PM

A putative new idea for AI control; index here.

Corrigibility through indifference has a few problems. One of them is that the AI is indifferent between the world in which humans change its utility to v, and world in which humans try to change its utility, but fail.

Now the try-but-fail world is going to be somewhat odd - humans will be reacting by trying to change the utility again, trying to shut the AI down, panicking that a tiny probability event has happened, and so on.

continue reading »

Earning money with/for work in AI safety

7 rmoehn 18 July 2016 05:37AM

(I'm re-posting my question from the Welcome thread, because nobody answered there.)

I care about the current and future state of humanity, so I think it's good to work on existential or global catastrophic risk. Since I've studied computer science at a university until last year, I decided to work on AI safety. Currently I'm a research student at Kagoshima University doing exactly that. Before April this year I had only little experience with AI or ML. Therefore, I'm slowly digging through books and articles in order to be able to do research.

I'm living off my savings. My research student time will end in March 2017 and my savings will run out some time after that. Nevertheless, I want to continue AI safety research, or at least work on X or GC risk.

I see three ways of doing this:

  • Continue full-time research and get paid/funded by someone.
  • Continue research part-time and work the other part of the time in order to get money. This work would most likely be programming (since I like it and am good at it). I would prefer work that helps humanity effectively.
  • Work full-time on something that helps humanity effectively.


Oh, and I need to be location-independent or based in Kagoshima.

I know http://futureoflife.org/job-postings/, but all of the job postings fail me in two ways: not location-independent and requiring more/different experience than I have.

Can anyone here help me? If yes, I would be happy to provide more information about myself.

(Note that I think I'm not in a precarious situation, because I would be able to get a remote software development job fairly easily. Just not in AI safety or X or GC risk.)

[Link] NYU conference: Ethics of Artificial Intelligence (October 14-15)

4 ignoranceprior 16 July 2016 09:07PM

FYI: https://wp.nyu.edu/consciousness/ethics-of-artificial-intelligence/

This conference will explore these questions about the ethics of artificial intelligence and a number of other questions, including:

What ethical principles should AI researchers follow?
Are there restrictions on the ethical use of AI?
What is the best way to design morally beneficial AI?
Is it possible or desirable to build moral principles into AI systems?
When AI systems cause benefits or harm, who is morally responsible?
Are AI systems themselves potential objects of moral concern?
What moral framework is best used to assess questions about the ethics of AI?

Speakers and panelists will include:

Nick Bostrom (Future of Humanity Institute), Meia Chita-Tegmark (Future of Life Institute), Mara Garza (UC Riverside, Philosophy), Sam Harris (Project Reason), Demis Hassabis (DeepMind/Google), Yann LeCun (Facebook, NYU Data Science), Peter Railton (University of Michigan, Philosophy), Francesca Rossi (University of Padova, Computer Science), Stuart Russell (UC Berkeley, Computer Science), Susan Schneider (University of Connecticut, Philosophy), Eric Schwitzgebel (UC Riverside, Philosophy), Max Tegmark (Future of Life Institute), Wendell Wallach (Yale, Bioethics), Eliezer Yudkowsky (Machine Intelligence Research Institute), and others.

Organizers: Ned Block (NYU, Philosophy), David Chalmers (NYU, Philosophy), S. Matthew Liao (NYU, Bioethics)

A full schedule will be circulated closer to the conference date.

Registration is free but required. REGISTER HERE. Please note that admission is limited, and is first-come first-served: it is not guaranteed by registration.

[LINK] Concrete problems in AI safety

15 Stuart_Armstrong 05 July 2016 09:33PM

From the Google Research blog:

We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.

While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

  • Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
  • Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
  • Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
  • Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.

We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.

Notes on the Safety in Artificial Intelligence conference

25 UmamiSalami 01 July 2016 12:36AM

These are my notes and observations after attending the Safety in Artificial Intelligence (SafArtInt) conference, which was co-hosted by the White House Office of Science and Technology Policy and Carnegie Mellon University on June 27 and 28. This isn't an organized summary of the content of the conference; rather, it's a selection of points which are relevant to the control problem. As a result, it suffers from selection bias: it looks like superintelligence and control-problem-relevant issues were discussed frequently, when in reality those issues were discussed less and I didn't write much about the more mundane parts.

SafArtInt has been the third out of a planned series of four conferences. The purpose of the conference series was twofold: the OSTP wanted to get other parts of the government moving on AI issues, and they also wanted to inform public opinion.

The other three conferences are about near term legal, social, and economic issues of AI. SafArtInt was about near term safety and reliability in AI systems. It was effectively the brainchild of Dr. Ed Felten, the deputy U.S. chief technology officer for the White House, who came up with the idea for it last year. CMU is a top computer science university and many of their own researchers attended, as well as some students. There were also researchers from other universities, some people from private sector AI including both Silicon Valley and government contracting, government researchers and policymakers from groups such as DARPA and NASA, a few people from the military/DoD, and a few control problem researchers. As far as I could tell, everyone except a few university researchers were from the U.S., although I did not meet many people. There were about 70-100 people watching the presentations at any given time, and I had conversations with about twelve of the people who were not affiliated with existential risk organizations, as well as of course all of those who were affiliated. The conference was split with a few presentations on the 27th and the majority of presentations on the 28th. Not everyone was there for both days.

Felten believes that neither "robot apocalypses" nor "mass unemployment" are likely. It soon became apparent that the majority of others present at the conference felt the same way with regard to superintelligence. The general intention among researchers and policymakers at the conference could be summarized as follows: we need to make sure that the AI systems we develop in the near future will not be responsible for any accidents, because if accidents do happen then they will spark public fears about AI, which would lead to a dearth of funding for AI research and an inability to realize the corresponding social and economic benefits. Of course, that doesn't change the fact that they strongly care about safety in its own right and have significant pragmatic needs for robust and reliable AI systems.

Most of the talks were about verification and reliability in modern day AI systems. So they were concerned with AI systems that would give poor results or be unreliable in the narrow domains where they are being applied in the near future. They mostly focused on "safety-critical" systems, where failure of an AI program would result in serious negative consequences: automated vehicles were a common topic of interest, as well as the use of AI in healthcare systems. A recurring theme was that we have to be more rigorous in demonstrating safety and do actual hazard analyses on AI systems, and another was that we need the AI safety field to succeed in ways that the cybersecurity field has failed. Another general belief was that long term AI safety, such as concerns about the ability of humans to control AIs, was not a serious issue.

On average, the presentations were moderately technical. They were mostly focused on machine learning systems, although there was significant discussion of cybersecurity techniques.

The first talk was given by Eric Horvitz of Microsoft. He discussed some approaches for pushing into new directions in AI safety. Instead of merely trying to reduce the errors spotted according to one model, we should look out for "unknown unknowns" by stacking models and looking at problems which appear on any of them, a theme which would be presented by other researchers as well in later presentations. He discussed optimization under uncertain parameters, sensitivity analysis to uncertain parameters, and 'wireheading' or short-circuiting of reinforcement learning systems (which he believes can be guarded against by using 'reflective analysis'). Finally, he brought up the concerns about superintelligence, which sparked amused reactions in the audience. He said that scientists should address concerns about superintelligence, which he aptly described as the 'elephant in the room', noting that it was the reason that some people were at the conference. He said that scientists will have to engage with public concerns, while also noting that there were experts who were worried about superintelligence and that there would have to be engagement with the experts' concerns. He did not comment on whether he believed that these concerns were reasonable or not.

An issue which came up in the Q&A afterwards was that we need to deal with mis-structured utility functions in AI, because it is often the case that the specific tradeoffs and utilities which humans claim to value often lead to results which the humans don't like. So we need to have structural uncertainty about our utility models. The difficulty of finding good objective functions for AIs would eventually be discussed in many other presentations as well.

The next talk was given by Andrew Moore of Carnegie Mellon University, who claimed that his talk represented the consensus of computer scientists at the school. He claimed that the stakes of AI safety were very high - namely, that AI has the capability to save many people's lives in the near future, but if there are any accidents involving AI then public fears could lead to freezes in AI research and development. He highlighted the public's irrational tendencies wherein a single accident could cause people to overlook and ignore hundreds of invisible lives saved. He specifically mentioned a 12-24 month timeframe for these issues.

Moore said that verification of AI system safety will be difficult due to the combinatorial explosion of AI behaviors. He talked about meta-machine-learning as a solution to this, something which is being investigated under the direction of Lawrence Schuette at the Office of Naval Research. Moore also said that military AI systems require high verification standards and that development timelines for these systems are long. He talked about two different approaches to AI safety, stochastic testing and theorem proving - the process of doing the latter often leads to the discovery of unsafe edge cases.

He also discussed AI ethics, giving an example 'trolley problem' where AI cars would have to choose whether to hit a deer in order to provide a slightly higher probability of survival for the human driver. He said that we would need hash-defined constants to tell vehicle AIs how many deer a human is worth. He also said that we would need to find compromises in death-pleasantry tradeoffs, for instance where the safety of self-driving cars depends on the speed and routes on which they are driven. He compared the issue to civil engineering where engineers have to operate with an assumption about how much money they would spend to save a human life.

He concluded by saying that we need policymakers, company executives, scientists, and startups to all be involved in AI safety. He said that the research community stands to gain or lose together, and that there is a shared responsibility among researchers and developers to avoid triggering another AI winter through unsafe AI designs.

The next presentation was by Richard Mallah of the Future of Life Institute, who was there to represent "Medium Term AI Safety". He pointed out the explicit/implicit distinction between different modeling techniques in AI systems, as well as the explicit/implicit distinction between different AI actuation techniques. He talked about the difficulty of value specification and the concept of instrumental subgoals as an important issue in the case of complex AIs which are beyond human understanding. He said that even a slight misalignment of AI values with regard to human values along one parameter could lead to a strongly negative outcome, because machine learning parameters don't strictly correspond to the things that humans care about.

Mallah stated that open-world discovery leads to self-discovery, which can lead to reward hacking or a loss of control. He underscored the importance of causal accounting, which is distinguishing causation from correlation in AI systems. He said that we should extend machine learning verification to self-modification. Finally, he talked about introducing non-self-centered ontology to AI systems and bounding their behavior.

The audience was generally quiet and respectful during Richard's talk. I sensed that at least a few of them labelled him as part of the 'superintelligence out-group' and dismissed him accordingly, but I did not learn what most people's thoughts or reactions were. In the next panel featuring three speakers, he wasn't the recipient of any questions regarding his presentation or ideas.

Tom Mitchell from CMU gave the next talk. He talked about both making AI systems safer, and using AI to make other systems safer. He said that risks to humanity from other kinds of issues besides AI were the "big deals of 2016" and that we should make sure that the potential of AIs to solve these problems is realized. He wanted to focus on the detection and remediation of all failures in AI systems. He said that it is a novel issue that learning systems defy standard pre-testing ("as Richard mentioned") and also brought up the purposeful use of AI for dangerous things.

Some interesting points were raised in the panel. Andrew did not have a direct response to the implications of AI ethics being determined by the predominantly white people of the US/UK where most AIs are being developed. He said that ethics in AIs will have to be decided by society, regulators, manufacturers, and human rights organizations in conjunction. He also said that our cost functions for AIs will have to get more and more complicated as AIs get better, and he said that he wants to separate unintended failures from superintelligence type scenarios. On trolley problems in self driving cars and similar issues, he said "it's got to be complicated and messy."

Dario Amodei of Google Deepbrain, who co-authored the paper on concrete problems in AI safety, gave the next talk. He said that the public focus is too much on AGI/ASI and wants more focus on concrete/empirical approaches. He discussed the same problems that pose issues in advanced general AI, including flawed objective functions and reward hacking. He said that he sees long term concerns about AGI/ASI as "extreme versions of accident risk" and that he thinks it's too early to work directly on them, but he believes that if you want to deal with them then the best way to do it is to start with safety in current systems. Mostly he summarized the Google paper in his talk.

In her presentation, Claire Le Goues of CMU said "before we talk about Skynet we should focus on problems that we already have." She mostly talked about analogies between software bugs and AI safety, the similarities and differences between the two and what we can learn from software debugging to help with AI safety.

Robert Rahmer of IARPA discussed CAUSE, a cyberintelligence forecasting program which promises to help predict cyber attacks. It is a program which is still being put together.

In the panel of the above three, autonomous weapons were discussed, but no clear policy stances were presented.

John Launchbury gave a talk on DARPA research and the big picture of AI development. He pointed out that DARPA work leads to commercial applications and that progress in AI comes from sustained government investment. He classified AI capabilities into "describing," "predicting," and "explaining" in order of increasing difficulty, and he pointed out that old fashioned "describing" still plays a large role in AI verification. He said that "explaining" AIs would need transparent decisionmaking and probabilistic programming (the latter would also be discussed by others at the conference).

The next talk came from Jason Gaverick Matheny, the director of IARPA. Matheny talked about four requirements in current and future AI systems: verification, validation, security, and control. He wanted "auditability" in AI systems as a weaker form of explainability. He talked about the importance of "corner cases" for national intelligence purposes, the low probability, high stakes situations where we have limited data - these are situations where we have significant need for analysis but where the traditional machine learning approach doesn't work because of its overwhelming focus on data. Another aspect of national defense is that it has a slower decision tempo, longer timelines, and longer-viewing optics about future events.

He said that assessing local progress in machine learning development would be important for global security and that we therefore need benchmarks to measure progress in AIs. He ended with a concrete invitation for research proposals from anyone (educated or not), for both large scale research and for smaller studies ("seedlings") that could take us "from disbelief to doubt".

The difference in timescales between different groups was something I noticed later on, after hearing someone from the DoD describe their agency as having a longer timeframe than the Homeland Security Agency, and someone from the White House describe their work as being crisis reactionary.

The next presentation was from Andrew Grotto, senior director of cybersecurity policy at the National Security Council. He drew a close parallel from the issue of genetically modified crops in Europe in the 1990's to modern day artificial intelligence. He pointed out that Europe utterly failed to achieve widespread cultivation of GMO crops as a result of public backlash. He said that the widespread economic and health benefits of GMO crops were ignored by the public, who instead focused on a few health incidents which undermined trust in the government and crop producers. He had three key points: that risk frameworks matter, that you should never assume that the benefits of new technology will be widely perceived by the public, and that we're all in this together with regard to funding, research progress and public perception.

In the Q&A between Launchbury, Matheny, and Grotto after Grotto's presentation, it was mentioned that the economic interests of farmers worried about displacement also played a role in populist rejection of GMOs, and that a similar dynamic could play out with regard to automation causing structural unemployment. Grotto was also asked what to do about bad publicity which seeks to sink progress in order to avoid risks. He said that meetings like SafArtInt and open public dialogue were good.

One person asked what Launchbury wanted to do about AI arms races with multiple countries trying to "get there" and whether he thinks we should go "slow and secure" or "fast and risky" in AI development, a question which provoked laughter in the audience. He said we should go "fast and secure" and wasn't concerned. He said that secure designs for the Internet once existed, but the one which took off was the one which was open and flexible.

Another person asked how we could avoid discounting outliers in our models, referencing Matheny's point that we need to include corner cases. Matheny affirmed that data quality is a limiting factor to many of our machine learning capabilities. At IARPA, we generally try to include outliers until they are sure that they are erroneous, said Matheny.

Another presentation came from Tom Dietterich, president of the Association for the Advancement of Artificial Intelligence. He said that we have not focused enough on safety, reliability and robustness in AI and that this must change. Much like Eric Horvitz, he drew a distinction between robustness against errors within the scope of a model and robustness against unmodeled phenomena. On the latter issue, he talked about solutions such as expanding the scope of models, employing multiple parallel models, and doing creative searches for flaws - the latter doesn't enable verification that a system is safe, but it nevertheless helps discover many potential problems. He talked about knowledge-level redundancy as a method of avoiding misspecification - for instance, systems could identify objects by an "ownership facet" as well as by a "goal facet" to produce a combined concept with less likelihood of overlooking key features. He said that this would require wider experiences and more data.

There were many other speakers who brought up a similar set of issues: the user of cybersecurity techniques to verify machine learning systems, the failures of cybersecurity as a field, opportunities for probabilistic programming, and the need for better success in AI verification. Inverse reinforcement learning was extensively discussed as a way of assigning values. Jeanette Wing of Microsoft talked about the need for AIs to reason about the continuous and the discrete in parallel, as well as the need for them to reason about uncertainty (with potential meta levels all the way up). One point which was made by Sarah Loos of Google was that proving the safety of an AI system can be computationally very expensive, especially given the combinatorial explosion of AI behaviors.

In one of the panels, the idea of government actions to ensure AI safety was discussed. No one was willing to say that the government should regulate AI designs. Instead they stated that the government should be involved in softer ways, such as guiding and working with AI developers, and setting standards for certification.

Pictures: https://imgur.com/a/49eb7

In between these presentations I had time to speak to individuals and listen in on various conversations. A high ranking person from the Department of Defense stated that the real benefit of autonomous systems would be in terms of logistical systems rather than weaponized applications. A government AI contractor drew the connection between Mallah's presentation and the recent press revolving around superintelligence, and said he was glad that the government wasn't worried about it.

I talked to some insiders about the status of organizations such as MIRI, and found that the current crop of AI safety groups could use additional donations to become more established and expand their programs. There may be some issues with the organizations being sidelined; after all, the Google Deepbrain paper was essentially similar to a lot of work by MIRI, just expressed in somewhat different language, and was more widely received in mainstream AI circles.

In terms of careers, I found that there is significant opportunity for a wide range of people to contribute to improving government policy on this issue. Working at a group such as the Office of Science and Technology Policy does not necessarily require advanced technical education, as you can just as easily enter straight out of a liberal arts undergraduate program and build a successful career as long as you are technically literate. (At the same time, the level of skepticism about long term AI safety at the conference hinted to me that the signalling value of a PhD in computer science would be significant.) In addition, there are large government budgets in the seven or eight figure range available for qualifying research projects. I've come to believe that it would not be difficult to find or create AI research programs that are relevant to long term AI safety while also being practical and likely to be funded by skeptical policymakers and officials.

I also realized that there is a significant need for people who are interested in long term AI safety to have basic social and business skills. Since there is so much need for persuasion and compromise in government policy, there is a lot of value to be had in being communicative, engaging, approachable, appealing, socially savvy, and well-dressed. This is not to say that everyone involved in long term AI safety is missing those skills, of course.

I was surprised by the refusal of almost everyone at the conference to take long term AI safety seriously, as I had previously held the belief that it was more of a mixed debate given the existence of expert computer scientists who were involved in the issue. I sensed that the recent wave of popular press and public interest in dangerous AI has made researchers and policymakers substantially less likely to take the issue seriously. None of them seemed to be familiar with actual arguments or research on the control problem, so their opinions didn't significantly change my outlook on the technical issues. I strongly suspect that the majority of them had their first or possibly only exposure to the idea of the control problem after seeing badly written op-eds and news editorials featuring comments from the likes of Elon Musk and Stephen Hawking, which would naturally make them strongly predisposed to not take the issue seriously. In the run-up to the conference, websites and press releases didn't say anything about whether this conference would be about long or short term AI safety, and they didn't make any reference to the idea of superintelligence.

I sympathize with the concerns and strategy given by people such as Andrew Moore and Andrew Grotto, which make perfect sense if (and only if) you assume that worries about long term AI safety are completely unfounded. For the community that is interested in long term AI safety, I would recommend that we avoid competitive dynamics by (a) demonstrating that we are equally strong opponents of bad press, inaccurate news, and irrational public opinion which promotes generic uninformed fears over AI, (b) explaining that we are not interested in removing funding for AI research (even if you think that slowing down AI development is a good thing, restricting funding yields only limited benefits in terms of changing overall timelines, whereas those who are not concerned about long term AI safety would see a restriction of funding as a direct threat to their interests and projects, so it makes sense to cooperate here in exchange for other concessions), and (c) showing that we are scientifically literate and focused on the technical concerns. I do not believe that there is necessarily a need for the two "sides" on this to be competing against each other, so it was disappointing to see an implication of opposition at the conference.

Anyway, Ed Felten announced a request for information from the general public, seeking popular and scientific input on the government's policies and attitudes towards AI: https://www.whitehouse.gov/webform/rfi-preparing-future-artificial-intelligence

Overall, I learned quite a bit and benefited from the experience, and I hope the insight I've gained can be used to improve the attitudes and approaches of the long term AI safety community.

Are smart contracts AI-complete?

11 Stuart_Armstrong 22 June 2016 02:08PM

Many people are probably aware of the hack at DAO, using a bug in their smart contract system to steal millions of dollars worth of the crypto currency Ethereum.

There's various arguments as to whether this theft was technically allowed or not, and what should be done about it, and so on. Many people are arguing that the code is the contract, and that therefore no-one should be allowed to interfere with it - DAO just made a coding mistake, and are now being (deservedly?) punished for it.

That got me wondering whether its ever possible to make a smart contract without a full AI of some sort. For instance, if the contract is triggered by the delivery of physical goods - how can you define what the goods are, what constitutes delivery, what constitutes possession of them, and so on. You could have a human confirm delivery - but that's precisely the kind of judgement call you want to avoid. You could have an automated delivery confirmation system - but what happens if someone hacks or triggers that? You could connect it automatically with scanning headlines of media reports, but again, this is relying on aggregated human judgement, which could be hacked or influenced.

Digital goods seem more secure, as you can automate confirmation of delivery/services rendered, and so on. But, again, this leaves the confirmation process open to hacking. Which would be illegal, if you're going to profit from the hack. Hum...

This seems the most promising avenue for smart contracts that doesn't involve full AI: clear out the bugs in the code, then ground the confirmation procedure in such a way that it can only be hacked in a way that's already illegal. Sort of use the standard legal system as a backstop, fixing the basic assumptions, and then setting up the smart contracts on top of them (which is not the same as using the standard legal system within the contract).

Google Deepmind and FHI collaborate to present research at UAI 2016

23 Stuart_Armstrong 09 June 2016 06:08PM

Safely Interruptible Agents

Oxford academics are teaming up with Google DeepMind to make artificial intelligence safer. Laurent Orseau, of Google DeepMind, and Stuart Armstrong, the Alexander Tamas Fellow in Artificial Intelligence and Machine Learning at the Future of Humanity Institute at the University of Oxford, will be presenting their research on reinforcement learning agent interruptibility at UAI 2016. The conference, one of the most prestigious in the field of machine learning, will be held in New York City from June 25-29. The paper which resulted from this collaborative research will be published in the Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI).

Orseau and Armstrong’s research explores a method to ensure that reinforcement learning agents can be repeatedly safely interrupted by human or automatic overseers. This ensures that the agents do not “learn” about these interruptions, and do not take steps to avoid or manipulate the interruptions. When there are control procedures during the training of the agent, we do not want the agent to learn about these procedures, as they will not exist once the agent is on its own. This is useful for agents that have a substantially different training and testing environment (for instance, when training a Martian rover on Earth, shutting it down, replacing it at its initial location and turning it on again when it goes out of bounds—something that may be impossible once alone unsupervised on Mars), for agents not known to be fully trustworthy (such as an automated delivery vehicle, that we do not want to learn to behave differently when watched), or simply for agents that need continual adjustments to their learnt behaviour. In all cases where it makes sense to include an emergency “off” mechanism, it also makes sense to ensure the agent doesn’t learn to plan around that mechanism.

Interruptibility has several advantages as an approach over previous methods of control. As Dr. Armstrong explains, “Interruptibility has applications for many current agents, especially when we need the agent to not learn from specific experiences during training. Many of the naive ideas for accomplishing this—such as deleting certain histories from the training set—change the behaviour of the agent in unfortunate ways.”

In the paper, the researchers provide a formal definition of safe interruptibility, show that some types of agents already have this property, and show that others can be easily modified to gain it. They also demonstrate that even an ideal agent that tends to the optimal behaviour in any computable environment can be made safely interruptible.

These results will have implications in future research directions in AI safety. As the paper says, “Safe interruptibility can be useful to take control of a robot that is misbehaving… take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform….” As Armstrong explains, “Machine learning is one of the most powerful tools for building AI that has ever existed. But applying it to questions of AI motivations is problematic: just as we humans would not willingly change to an alien system of values, any agent has a natural tendency to avoid changing its current values, even if we want to change or tune them. Interruptibility and the related general idea of corrigibility, allow such changes to happen without the agent trying to resist them or force them. The newness of the field of AI safety means that there is relatively little awareness of these problems in the wider machine learning community.  As with other areas of AI research, DeepMind remains at the cutting edge of this important subfield.”

On the prospect of continuing collaboration in this field with DeepMind, Stuart said, “I personally had a really illuminating time writing this paper—Laurent is a brilliant researcher… I sincerely look forward to productive collaboration with him and other researchers at DeepMind into the future.” The same sentiment is echoed by Laurent, who said, “It was a real pleasure to work with Stuart on this. His creativity and critical thinking as well as his technical skills were essential components to the success of this work. This collaboration is one of the first steps toward AI Safety research, and there’s no doubt FHI and Google DeepMind will work again together to make AI safer.”

For more information, or to schedule an interview, please contact Kyle Scott at fhipa@philosophy.ox.ac.uk

The AI in Mary's room

4 Stuart_Armstrong 24 May 2016 01:19PM

In the Mary's room thought experiment, Mary is a brilliant scientist in a black-and-white room who has never seen any colour. She can investigate the outside world through a black-and-white television, and has piles of textbooks on physics, optics, the eye, and the brain (and everything else of relevance to her condition). Through this she knows everything intellectually there is to know about colours and how humans react to them, but she hasn't seen any colours at all.

After that, when she steps out of the room and sees red (or blue), does she learn anything? It seems that she does. Even if she doesn't technically learn something, she experiences things she hadn't ever before, and her brain certainly changes in new ways.

The argument was intended as a defence of qualia against certain forms of materialism. It's interesting, and I don't intent to solve it fully here. But just like I extended Searle's Chinese room argument from the perspective of an AI, it seems this argument can also be considered from an AI's perspective.

Consider a RL agent with a reward channel, but which currently receives nothing from that channel. The agent can know everything there is to know about itself and the world. It can know about all sorts of other RL agents, and their reward channels. It can observe them getting their own rewards. Maybe it could even interrupt or increase their rewards. But, all this knowledge will not get it any reward. As long as its own channel doesn't send it the signal, knowledge of other agents rewards - even of identical agents getting rewards - does not give this agent any reward. Ceci n'est pas une récompense.

This seems to mirror Mary's situation quite well - knowing everything about the world is no substitute from actually getting the reward/seeing red. Now, a RL's agent reward seems closer to pleasure than qualia - this would correspond to a Mary brought up in a puritanical, pleasure-hating environment.

Closer to the original experiment, we could imagine the AI is programmed to enter into certain specific subroutines, when presented with certain stimuli. The only way for the AI to start these subroutines, is if the stimuli is presented to them. Then, upon seeing red, the AI enters a completely new mental state, with new subroutines. The AI could know everything about its programming, and about the stimulus, and, intellectually, what would change about itself if it saw red. But until it did, it would not enter that mental state.

If we use ⬜ to (informally) denote "knowing all about", then ⬜(X→Y) does not imply Y. Here X and Y could be "seeing red" and "the mental experience of seeing red". I could have simplified that by saying that ⬜Y does not imply Y. Knowing about a mental state, even perfectly, does not put you in that mental state.

This closely resembles the original Mary's room experiment. And it seems that if anyone insists that certain features are necessary to the intuition behind Mary's room, then these features could be added to this model as well.

Mary's room is fascinating, but it doesn't seem to be talking about humans exclusively, or even about conscious entities.

View more: Next