Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
The challenge of friendliness in Artificial Intelligence is to ensure how a general intelligence will be of utility instead of being destructive or pathologically indifferent to the values of existing individuals or aims and goals of their creation. The current provision of computer science is likely to yield bugs and way too technical and inflexible guidelines of action. It is known to be inadequate to handle the job sufficiently. However the challenge of friendliness is also faced by natural intelligences, those that are not designed by an intelligence but molded into being by natural selection.
We know that natural intelligences do the job adequately enough that we do not think that natural intelligence unfriendliness is a significant existential threat. Like plants do solar energy capturing way more efficently and maybe utilising quantum effects that humans can't harness, we know that natural intelligences are using friendliness technology that is of higher caliber that we can build into machines. However as we progress this technology maybe lacking dangerously behind and we need to be able to apply it to hardware in addition to wetware and potentially boost it to new levels.
The earliest concrete example of a natural intelligence being controlled for friendliness I can think of is Socrates. He was charged for "corruption of the heart of the societys youngters". He defended that his stance of questioning everything was without fault. He was however found quilty even thought the trial could be identified with faults. The jury might have been politically motivated or persuaded and the citizens might have expected the results to not be taken seriously. While Socrates was given a very real possibility of escaping imprisonment and capital punishment he did not circumvent his society operation. In fact he was obidient enough that he acted as his own executioner drinking the poison himself. Because of the kind of farce his teachers death had been Plato lost hope for the principles that lead to such an absurd result him becoming skeptical of democrasy.
However if the situation would have been about a artificial intelligence a lot of things went very right. The intelligences society became scared of him and asked it to die. There was dialog about how the deciders were ignorant and stupid and that nothing questionable had been done. However ultimately when issues of miscommunications had been cleared and the society insisted upon its expression of will instead of circumventing the intervention the intelligence pulled its own plug voluntarily. Therefore Socrates was propably the first friendly (natural) intelligence.
The mechanism used in this case was that of a juridical system. That is a human society recognises that certain acts and individuals are worth restraining for the danger that they pose to the common good. A common method is incarcenation and the threat of it. That is certain bad acts can be tolerated in the wild and corrective action can then be employed. When there is reason to expect bad acts or no reason to expect good acts individuals can be restricted in never being able to act in the first place. Whether a criminal is released early can depend on whether there is reason to expect not to be a repeat offender. That is understanding how an agent acts makes it easier to grant operating priviledges. Such hearings are very analogous to a gatekeeper and a AI in a AI-boxing situation.
However when a new human is created it is not assumed hostile until proven friendly. Rather humans are born innocent but powerless. A fully educated and socialised intelligence is assigned for multiple year observation and control period. These so called "parents" have a very wide freedom on programming principles. However human psychology also has peroid of "peer guidedness" where the opinion of peers becomes important. When a youngter grows his thinking is constantly being monitored and things like time of onset of speech are monitored with interest. They also gain guidance on very trivial thinking skills. While this has culture passing effect it also keeps the parent very updated on what is the mental status of the child. Never is a child allowed to grow or reason extended amounts of time isolated. Thus the task of evaluating whether an unknown individual is friendly or not is not encountered. There is never a need to turing-test that a child "works". There is always a maintainer and it has the equivalent of psychological growth logs.
However despite all these measures we know that small children can be cruel and have little empathy. However instead of shelving them as rejects we either accomodate them with an environment that minimises the harm or direct them to a more responcible path. When a child ask a question on how they should approach a particular kind of situation this can be challenging for the parent to answer. The parent might also resort to giving a best-effort answer that might not be entirely satisfactory or even wrong advice may be given. However children have dialog with their parents and other peers.
An interesting question is does parenting break down if the child is intellectually too developed compared to the parent or parenting environment? It's also worth noting that children are not equipped with a "constitution of morality". Some things they infer from experience. Some ethical rules are thougth them explicitly. They learn to apply the rules and interpret them in different situations. Some rules might be contradictory and some moral authorities trusted more.
Beoynd the individual level groups of people have an mechanism of acccepting other groups. This doesn't always happen without conditions. However here things seem to work much less efficently. If two groups of people differ in values enough they might start a war of ideology against each other. This kind of war usually concludes with physical action instead of arguments. Suppression of Nazi Germany can be seen as friendliness immune reaction. Normally divergent values and issues having countries wanted and could unite against a different set of values that was tried to be imposed by force. However the success Nazis had can debatably be attributed for a lousy conclusion of world war I. The effort extended to build peace varies and contests with other values.
Friendliness migth also have an important component that it is relative to a set of values. A society will support the upring of certain kinds of children with the suppression of certain other kinds. USSR had officers that's sole job was to protect that things were going according to party line. At this point we have trouble getting a computer to follow anyones values. However it might be important to ask "friendly to whom?". The exploration of friendliness is also an exploration in hostility. We want to be hostile towards UFAIs. It would be awful for a AI to be friendly only towards it's inventor, or only towards it's company. However we have been hostile to neardentals. Was that wrong? Would it be a signficant loss to developed sentience if AIs were less than friendly to humans?
If we ask our grandgrandgrandparents on how we should conduct things they might give a different version than we have. It's expectable that our children are capable of going beyond our morality. Ensuring that a societys values are never violated would be to freeze them in time indefinately. In this way there can be danger in developing a too friendly AI. For that AI could never be truly superhuman. In a way if my child asks me a morally challenging question and I change my opinion about it by the result of that conversation it might be a friendliness failure. Instead of imparting values I receive them with the values causal history being in the inside of a young head instead of a cultural heritage of a longlived civilization.
As a civilizaton we have mapped a variety of thoughts and psyche- and organizational strucutres on how they work. The thought space on how an AI might think is poorly mapped. However we are spreading our understandig on cognitive diversity learning about how austistic persons think as well as dolphins. We can establish things liek that some savants are really good with dates and that askingn about dates from that kind of person is more realiable than an ordinary person. To be able to use AI thinking we need to understand what AI thought is. Up to now we have not needed to study in detail how humans think. We can just adapt to the way they do without attending to how it works. But in similar that we need to know the structure of a particle accelerator to be able to say that it provides information about particle behaviour we need to know why it would make sense to take what an AI says seriously. The challenge would be the same if we were asked to listen seriously to a natural intelligence from a foreign culture. Thus the enemy is inferential distance itself rather than the resultant thought processes. For we know that we can create things we don't understand. Thus it's important to understand that doing things you don't understand is a recipe for disaster. And we must not fool ourself that we understand what a machine thinking would be. Only once we have convinced our fellow natural intelligences that we know what we are doing can it make sense to listen to our creations. Socrates could not explain himself so his effect on others was unsafe. If you need to influence others you need to explain why you are doing so.
"the next time you’re stopping yourself from trying something because the conditions are not optimal, remember that those optimal conditions may not have been the reason it worked. They may not be the cause. They may not even be correlated. They may just be a myth you’ve bought into or sold yourself that limits you from breaking out and exceeding your expectations."
I sometimes let imaginary versions of myself make decisions for me.
(I also sometimes imagine what Anna would do, and then do that. I call it "Annajitsu".)
On September 26th, 1983, the world was nearly destroyed by nuclear war. That day is Petrov Day, named for the man who averted it. Petrov Day is now a yearly event on September 26 commemorating the anniversary of the Petrov incident. Last year, Citadel, the Boston-area rationalist house, performed a ritual on Petrov day. We will be doing it again - and have published a revised version, for anyone else who wants to have a Petrov Day celebration themselves.
The purpose of the ritual is to make catastrophic and existential risk emotionally salient, by putting it into historical context and providing positive and negative examples of how it has been handled. This is not for the faint of heart and not for the uninitiated; it is aimed at those who already know what catastrophic and existential risk is, have some background knowledge of what those risks are, and believe (at least on an abstract level) that preventing those risks from coming to pass is important.
Petrov Day is designed for groups of 5-10 people, and consists of a series of readings and symbolic actions which people take turns doing. It is easy to organize; you'll need a few simple props (candles and a candle-holder) and a printout of the program for each person, but other than that no preparation is necessary.
There will be a Petrov Day ritual hosted at Citadel (Boston area) and at Highgarden (New York area). If you live somewhere else, consider running one yourself!
There is a lot of mainstream interest in machine ethics now. Here are some links to some popular articles on this topic.
By danah boyd, claiming that 'tech folks' are designing systems that implement an idea of fairness that comes from neoliberal ideology.
danah boyd (who spells her name with no capitalization) runs the Data & Society, a "think/do tank" that aims to study this stuff. They've recently gotten MacArthur Foundation funding for studying the ethical and political impact of intelligent systems.
A few observations:
First, there is no mention of superintelligence or recursively self-modifying anything. These scholars are interested in how, in the near future, the already comparatively powerful machines have moral and political impact on the world.
Second, these groups are quite bad at thinking in a formal or mechanically implementable way about ethics. They mainly seem to recapitulate the same tired tropes that have been resonating through academia for literally decades. On the contrary, mathematical formulation of ethical positions appears to be ya'll's specialty.
Third, however much the one-true-morality may be indeterminate or presently unknowable, progress towards implementable descriptions of various plausible moral positions could at least be incremental steps forward towards an understanding of how to achieve something better. Considering a slow take-off possible future, iterative testing and design of ethical machines with high computational power seems like low-hanging fruit that could only better inform longer-term futurist thought.
Personally, I try to do work in this area and find the lack of serious formal work in this area deeply disappointing. This post is a combination heads up and request to step up your game. It's go time.
UC Berkeley School of Infromation
Phenomenology is the study of the structures of experience and consciousness. Literally, it is the study of "that which appears". The first time you look at a twig sticking up out of the water, you might be curious and ask, "What forces cause things to bend when placed in water?" If you're a curious phenomenologist, though, you'll ask things like, "Why does that twig in water appear as though bent? Do other things appear to bend when placed in water? Do all things placed in water appear to bend to the same degree? Are there things that do not appear to bend when placed in water? Does my perception of the bending depend on the angle or direction from which I observe the twig?"
Pehenomenology means breaking experience down to its more basic components, and being precise in our descriptions of what we actually observe, free of further speculation and assumption. A phenomenologist recognizes the difference between observing "a six-sided cube", and observing the three faces, at most, from which we extrapolate the rest.
I consider phenomenology to be a central skill of rationality. The most obvious example: You're unlikely to generate alternative hypotheses when the confirming observation and the favored hypothesis are one and the same in your experience of experience. The importance of phenomenology to rationality goes deeper than that, though. Phenomenology trains especially fine grained introspection. The more tiny and subtle are the thoughts you're aware of, the more precise can be the control you gain over the workings of your mind, and the faster can be your cognitive reflexes.
(I do not at all mean to say that you should go read Husserl and Heidegger. Despite their apparent potential for unprecedented clarity, the phenomenologists, without exception, seem to revel in obfuscation. It's probably not worth your time to wade through all of that nonsense. I've mostly read about phenomenology myself for this very reason.)
I've been doing some experimental phenomenology of late.
I've noticed that rationality, in practice, depends on noticing. Some people have told me this is basically tautological, and therefore uninteresting. But if I'm right, I think it's likely very important to know, and to train deliberately.
The difference between seeing the twig as bent and seeing the twig as seeming bent may seem inane. It is not news that things that are bent tend to seem bent. Without that level of granularity in your observations, though, you may not notice that it could be possible for things to merely seem bent without being bent. When we're talking about something that may be ubiquitous to all applications of rationality, like noticing, it's worth taking a closer look at the contents of our experiences.
Many people talk about "noticing confusion", because Eliezer's written about it. Really, though, every successful application of a rationality skill begins with noticing. In particular, applied rationality is founded on noticing opportunities and obstacles. (To be clear, I'm making this up right this moment, so as far as I know it's not a generally agreed-upon thing. That goes for nearly everything in this post. I still think it's true.) You can be the most technically skilled batter in the world, and it won't help a bit if you consistently fail to notice when the ball whizzes by you--if you miss the opportunities to swing. And you're not going to run very many bases if you launch the ball straight at an opposing catcher--if you're oblivious to the obstacles.
It doesn't matter how many techniques you've learned if you miss all the opportunities to apply them, and fail to notice the obstacles when they get in your way. Opportunities and obstacles are everywhere. We can only be as strong as our ability to notice the ones that will make a difference.
Inspired by Whales' self-experiment in noticing confusion, I've been practicing noticing things. Not difficult or complicated things, like noticing confusion, or noticing biases. I've just been trying to get a handle on noticing, full stop. And it's been interesting.
What does it mean to notice something, and what does it feel like?
I started by checking to see what I expected it to feel like to notice that it's raining, just going from memory. I tried for a split-second prediction, to find what my brain automatically stored under "noticing rain". When I thought about noticing rain, I got this sort of vague impression of rainyness, which included few sensory details and was more of an overall rainy feeling. My brain tried to tell me that "noticing rain" meant "being directly acquainted with rainyness", in much the same way that it tries to tell me it's experiencing a cube when it's actually only experiencing a pattern of light and shadows I interpret as three faces.
Then, I waited for rain. It didn't take long, because I'm in North Carolina for the month. (This didn't happen last time I was in North Carolina, so perhaps I just happened to choose The One Valley of Eternal Rain.)
The real "noticing rain" turned out to be a response to the physical sensations concurrent with the first raindrop falling on my skin. I did eventually have an "abstract rainyness feeling", but that happened a full two seconds later. My actual experience went like this.
It was cloudy and humid. This was not at the forefront of my attention, but it slowly moved in that direction as the temperature dropped. I was fairly focused on reading a book.
(I'm a little baffled by the apparent gradient between "not at all conscious of x" and "fully aware of x". I don't know how that works, but I experience the difference between being a little aware of the sky being cloudy and being focused on the patterns of light in the clouds, as analogous to the difference between being very-slightly-but-not-uncomfortably warm and burning my hand on the stove.)
My awareness of something like an "abstract rainyness feeling" moved further toward consciousness as the wind picked up. Suddenly--and the suddenness was an important part of the experience--I felt something like a cool, dull pin-prick on my arm. I looked at it, saw the water, and recognized it as a raindrop. Over the course of about half a second, several sensations leapt forward into full awareness: the darkness of my surroundings, the humidity in the air, the dark grey-blueness of the sky, the sound of rain on leaves like television static, the scent of ozone and damp earth, the feeling of cool humid wind on my face, and the word "rain" in my internal monologue.
I think it is that sudden leaping forward of many associated sensations that I would call "noticing rain".
After that, I felt a sort of mental step backward--though it was more like a zooming out or sliding away than a discrete step--from the sensations, and then a feeling of viewing them from the outside. There was a sensation of the potential to access other memories of times when it's rained.
(Sensations of potential are fascinating to me. I noticed a few weeks ago that after memorizing a list of names and faces, I could predict in the first half second of seeing the face whether or not I'd be able to retrieve the name in the next five seconds. Before I actually retrieved the name. What??? I don't know either.)
Only then did all of it resolve into the more distant and abstract "feeling of rainyness" that I'd predicted before. The resolution took four times as long as the simultaneous-leaping-into-consciousness-of-related-sensations that I now prefer to call "noticing", and ten times as long as the first-raindrop-pin-prick, which I think I'll call the "noticing trigger" if it turns out to be a general class of pre-noticing experiences.
("Can you really distinguish between 200 and 500 milliseconds?" Yes, but it's an acquired skill. I spent a block of a few minutes every day for a month, then several blocks a day for about a week, doing this Psychomotor Vigiliance Task when I was gathering data for the polyphasic sleep experiment. (No, I'm sorry, to the best of my knowledge Leverage has not yet published anything on the results of this. Long story short: Everyone who wasn't already polyphasic is still not polyphasic today.) It gives you fast feedback on simple response time. I'm not sure if it's useful for anything else, but it comes in handy when taking notes on experiences that pass very quickly.)
Noticing Environmental Cues
My second experiment was in repeated noticing. This is more closely related to rationality as habit cultivation.
Can I get better at noticing something just by practicing?
I had an intuition that I should give myself some outward sign of having noticed, lest I not notice that I noticed, and decided to snap my fingers every time I noticed a red barn roof.
On the first drive, I noticed one red barn roof. That happened when I was almost at my destination and I thought, "Oh right, I'm supposed to be noticing red barn roofs, oops" then started actively searching for them.
Noticing a red barn roof while searching for it feels very different from noticing rain while reading a book. With the rain, it felt sort of like waking up, or like catching my name in an overheard conversation. There was a complete shift in what my brain was doing. With the barn roof, it was like I had a box with a red-barn-roof-shaped hole, and it felt like completion when a I grabbed a roof and dropped it through the hole. I was prepared for the roof, and it was a smaller change in the contents of consciousness.
I noticed two on the way back, also while actively searching for them, before I started thinking about something else and became oblivious.
I thought that maybe there weren't enough red barn roofs, and decided to try noticing red roofs of all sorts of buildings the next day. This, it turns out, was the correct move.
On day two of red-roof-noticing, I got lots of practice. I noticed around fifteen roofs on the way to the store, and around seven on the way back. By the end, I was not searching for the roofs as intently as I had been the day before, but I was still explicitly thinking about the project. I was still aware of directing my eyes to spend extra time at the right level in my field of vision to pick up roofs. It was like waving the box around and waiting for something to fall in, while thinking about how to build boxes.
I went out briefly again on day two, and on the way back, I noticed a red roof while thinking about something else entirely. Specifically, I was thinking about the possibility of moving to Uruguay, and whether I knew enough Spanish to survive. In the middle of one of those unrelated thoughts, my eyes moved over a barn roof and stayed there briefly while I had the leaping-into-consciousness experience with respect to the sensations of redness, recognizing something as shaped like a building, and feeling the impulse to snap my fingers. It was like I'd been wearing the box as a hat to free up my hands, and I'd forgotten about it. And then, with a heavy ker-thunk, the roof became my new center of attention.
And oh my gosh, it was so exciting! It sounds so absurd in retrospect to have been excited about noticing a roof. But I was! It meant I'd successfully installed a new cognitive habit to run in the background. On purpose. "Woo hoo! Yeah!" (I literally said that.)
On the third day, I noticed TOO MANY red roofs. I followed the same path to the store as before, but I noticed somewhere between twenty and thirty red roofs. I got about the same number going back, so I think I was catching nearly all the opportunities to notice red roofs. (I'd have to do it for a few days to be sure.) There was a pattern to noticing, where I'd notice-in-the-background, while thinking about something else, the first roof, and then I'd be more specifically on the lookout for a minute or two after that, before my mind wandered back to something other than roofs. I got faster over time at returning to my previous thoughts after snapping my fingers, but there were still enough noticed roofs to intrude uncomfortably upon my thoughts. It was getting annoying.
So I decided to switch back to only noticing the red roofs of barns in particular.
Extinction of the more general habit didn't take very long. It was over by the end of my next fifteen minute drive. For the first three times I saw a roof, I rose my hand a little to snap my fingers before reminding myself that I don't care about non-barns anymore. The next couple times I didn't raise my hand, but still forcefully reminded myself of my disinterest in my non-barns. The promotion of red roofs into consciousness got weaker with each roof, until the difference between seeing a non-red non-barn roof and a red non-barn roof was barely perceptible. That was my drive to town today.
On the drive back, I noticed about ten red barn roofs. Three I noticed while thinking about how to install habits, four while thinking about the differences between designing exercises for in-person workshops and designing exercises to put in books, and three soon enough after the previous barn to probably count as "searching for barns".
So yes, for at least some things, it seems I can get better at noticing them my by practicing.
What These Silly Little Experiments Are Really About
My plan is to try noticing an internal psychological phenomenon next, but still something straightforward that I wouldn't be motivated not to notice. I probably need to try a couple things to find something that works well. I might go with "thinking the word 'tomorrow' in my internal monologue", for example, or possibly "wondering what my boyfriend is thinking about". I'll probably go with something more like the first, because it is clearer, and zooms in on "noticing things inside my head" without the extra noise of "noticing things that are relatively temporally indiscrete", but the second is actually a useful thing to notice.
Most of the useful things to notice are a lot less obvious than "thinking the word 'tomorrow' in my internal monologue". From what I've learned so far, I think that for "wondering what my boyfriend is thinking about", I'll need to pick out a couple of very specific, instantaneous sensations that happen when I'm curious what my boyfriend is thinking about. I expect that to be a repetition of the rain experiment, where I predict what it will feel like, then wait 'til I can gather data in real time. Once I have a specific trigger, I can repeat the red roof experiment to catch the tiny moments when I wonder what he's thinking. I might need to start with a broader category, like "notice when I'm thinking about my boyfriend", get used to noticing those sensations, and then reduce the set of sensations I'm watching out for to things that happen only when I'm curious what my boyfriend is thinking.
After that, I imagine I'll want to practice with different kinds of actions I can take when I notice a trigger. (If you've never heard of Implementation Intentions, I suggest trying them out.) So far, I've used the physical action of snapping my fingers. That was originally for clarity in recognizing the noticing, but it's also a behavioral response to a trigger. I could respond with a psychological behavior instead of a physical one, like "imagining a carrot". A useful response to noticing that I'm curious about what my boyfriend is thinking would be "check to see if he's busy" and then "say, 'What are you thinking about?'"
See, this "noticing" thing sounds boringly simple at first, and not worth much consideration in the art of rationality. Even in his original "noticing confusion" post, Eliezer really talked more about recognizing the implications of confusion than about the noticing itself.
Noticing is more complicated than it seems at first, and it's easy to mix it up with responding. There's a whole sub-art to noticing, and I really think that deliberate practice is making me better at it. Responses can be hard. It's essential to make noticing as effortless as possible. Then you can break the noticing and the responding apart, and you can recognize reality even before you know what to do with it.
This article discusses how upvotes and downvotes influence the quality of posts on online communities. The article claims that downvotes lead to more posts of lower quality from the downvoted commenter.
From the abstract:
Social media systems rely on user feedback and rating mechanisms for personalization, ranking, and content filtering. [...] This paper investigates how ratings on a piece of content affect its author’s future behavior. [...] [W]e find that negative feedback leads to significant behavioral changes that are detrimental to the community. Not only do authors of negatively-evaluated content contribute more, but also their future posts are of lower quality, and are perceived by the community as such. In contrast, positive feedback does not carry similar effects, and neither encourages rewarded authors to write more, nor improves the quality of their posts.
The authors of the article are Justin Cheng, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec.
Edited to add:
Discussion article for the meetup : Washington, D.C.: Mini Talks
We will be meeting in the Kogod Courtyard of the National Portrait Gallery (8th and F Sts or 8th and G Sts NW, go straight past the information desk from either entrance) to take turns presenting short (~10-20 minute) lectures on random topics. As before, the period from 3:00 to 3:30 will be reserved for congregating; the talks will run as long as people are interested or until the museum closes at 7:00 p.m., whichever comes first. (If you need to show up late or leave early, that's fine - feel free to let us know if you want to give a talk at a particular time.)
- Sept. 28: Book Swap
- Oct. 5: Fun & Games
- Oct. 12: TBA
- Oct. 19: Mini Talks
- Oct. 26: TBA
Discussion article for the meetup : Washington, D.C.: Mini Talks
Discussion article for the meetup : MelbLW: September Social Meetup
September's social meetup is scheduled for this Friday (19th September) as usual. This month, we will be returning to Alchemist's Refuge.
Our social meetups are relaxed, informal events where we chat and often play games. The start and finish times are very loose - people will be coming and going throughout the night, so don't worry if you are coming later.
Where? Alchemist's Refuge, 328 Little Collins St, Melbourne - near the corner of Queen St, downstairs from Games Laboratory
When? From 6:30pm until late, Friday September 19th
Contact? If you have any questions, just text or call me (Richard) on 0421231789
Dinner? There are a number of take-away places nearby that deliver to Alchemist's Refuge. It is also quite likely that a group of us will go out for late night souvlakis after Refuge closes.
Games? Alchemist's Refuge do allow board games and they also have a number that can be borrowed for a minor fee. Ask around and you'll easily find some others to join you!
To organise similar events, please send an email to email@example.com
Discussion article for the meetup : MelbLW: September Social Meetup
Discussion article for the meetup : Perth, Australia: Games night
Come play Zendo, an inductive logic game. Rowdy will teach us, and I'm told the rules are pretty simple.
Zendo seems cool because it can perhaps teach the skill of "looking into the dark".
We'll be at Sync Labs, a coworking space in Leederville. The entrance is between Niche and Cranked. Don't trust Google Maps!
You can RSVP here: http://www.meetup.com/Perth-Less-Wrong/events/207744102/
Discussion article for the meetup : Perth, Australia: Games night
View more: Next